Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

Open
shuaiscott opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@shuaiscott
Copy link

What is the issue?

When using the official Ollama model of snowflake-arctic-embed-l (latest/335m - 21ab8b9b0545), if input is greater than 512 tokens, instead of truncating, the model encounters an error.

On a previous version (0.3.9) when you pass it more than 512 tokens, it returns only [0,0,0...] embeddings.
In 0.5.4, Ollama returns a 500 error and the logs show that "Process xxxxxx (ollama_llama_se) of user xxx dumped core"

Logs:

llama_model_load: vocab only - skipping tensors
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
SIGSEGV: segmentation violation
PC=0x7fcc733ecc57 m=5 sigcode=1 addr=0x207203fe0
signal arrived during ago violation
goroutine 8 gp=0xc0000f21c0 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x562b649d47d0, 0xc000073b90)
        runtime/cgocall.go:167
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fcbf115bfa0, {0x2, 0x7fcbf0b80590, 0x0, 0x0, 0x7fcbf0b80da0, 0x7fcbf0b815b, 0x7fcbf0b81dc0, 0x7fcbf1144dc0})
...

I've checked my Ollama parameters and this occurs when "truncate": true. Other embedding models properly truncates the input and I see the INFO log in Ollama say "input truncated". I don't see this message with snowflake-arctic-embed-l.

When "truncate" is set to false, I get the expected "input length exceeds maximum context length".

https://ollama.com/library/snowflake-arctic-embed

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.4

@shuaiscott shuaiscott added the bug Something isn't working label Jan 10, 2025
@rick-github
Copy link
Collaborator

#7288

The problem can be worked around by setting num_ctx for the model to the actual context length of the model, rather than the default value of 2048 that ollama uses. You can either do that by setting num_ctx in the API call ("options":{"num_ctx":512}) or by creating a copy of the model with the parameter:

$ ollama show --modelfile  snowflake-arctic-embed:l > Modelfile
$ echo PARAMETER num_ctx 512 >> Modelfile
$ ollama create snowflake-arctic-embed:l-c512

and then adjust the client to use snowflake-arctic-embed:l-c512 instead of snowflake-arctic-embed:l.

@jmorganca jmorganca self-assigned this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants