Skip to content

Tags: ilumiere/llama.cpp

Tags

gguf-v0.10.0

Toggle gguf-v0.10.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : use F32 precision in GLM4 attention and no FA (ggerganov#9130)

b3616

Toggle b3616's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Add a space to supress a cmake warning (ggerganov#9133)

b3615

Toggle b3615's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Add oneDNN primitive support (ggerganov#9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

b3614

Toggle b3614's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : simplify Mamba with advanced batch splits (ggerganov#8526)

* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b3613

Toggle b3613's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : support reading arguments from environment variables (ggerga…

…nov#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var

b3612

Toggle b3612's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : support for `falcon-mamba` architecture (ggerganov#9074)

* feat: initial support for llama.cpp

* fix: lint

* refactor: better refactor

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix: address comments

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* fix: add more cleanup and harmonization

* fix: lint

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* fix: change name

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* add in operator

* fix: add `dt_b_c_rms` in `llm_load_print_meta`

* fix: correct printf format for bool

* fix: correct print format

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* llama : quantize more Mamba tensors

* llama : use f16 as the fallback of fallback quant types

---------

Co-authored-by: compilade <git@compilade.net>

b3611

Toggle b3611's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava : zero-initialize clip_ctx structure fields with aggregate init…

…ialization 908)

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

b3610

Toggle b3610's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : std::move llm_bigram_bpe from work_queue (ggerganov#9062)

* llama : std::move llm_bigram_bpe from work_queue

This commit updates the retrieval of llm_bigram_bpe objects from
work_queue.top() by using std::move.

The motivation for this is to avoid the copying of the std::string
`text` member of the llm_bigram_bpe struct.

* squash! llama : std::move llm_bigram_bpe from work_queue

Introduced a MovablePriorityQueue class to allow moving elements
out of the priority queue for llm_bigram_bpe.

* squash! llama : std::move llm_bigram_bpe from work_queue

Rename MovablePriorityQueue to lama_priority_queue.

* squash! llama : std::move llm_bigram_bpe from work_queue

Rename lama_priority_queue -> llama_priority_queue.

b3609

Toggle b3609's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the L…

…LAVA CLIP model. (ggerganov#8984)

* llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model.

- The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
- A GGML_OP_ACC shader has been added.
- The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* fix-up coding style.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* Fix-up the missing initial parameter to resolve the compilation warning.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Add missing parameters.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Use nb1 and nb2 for dst.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* Fix check results ggml_acc call

---------

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
Co-authored-by: 0cc4m <picard12@live.de>

b3608

Toggle b3608's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] fallback mmvq (ggerganov#9088)

* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>

---------

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>