Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

shuaiscott · 2025-01-10T15:29:00Z

What is the issue?

When using the official Ollama model of snowflake-arctic-embed-l (latest/335m - 21ab8b9b0545), if input is greater than 512 tokens, instead of truncating, the model encounters an error.

On a previous version (0.3.9) when you pass it more than 512 tokens, it returns only [0,0,0...] embeddings.
In 0.5.4, Ollama returns a 500 error and the logs show that "Process xxxxxx (ollama_llama_se) of user xxx dumped core"

Logs:

llama_model_load: vocab only - skipping tensors
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
ggml-cpu.c:8400: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
SIGSEGV: segmentation violation
PC=0x7fcc733ecc57 m=5 sigcode=1 addr=0x207203fe0
signal arrived during ago violation
goroutine 8 gp=0xc0000f21c0 m=5 mp=0xc000100008 [syscall]:
runtime.cgocall(0x562b649d47d0, 0xc000073b90)
        runtime/cgocall.go:167
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x7fcbf115bfa0, {0x2, 0x7fcbf0b80590, 0x0, 0x0, 0x7fcbf0b80da0, 0x7fcbf0b815b, 0x7fcbf0b81dc0, 0x7fcbf1144dc0})
...

I've checked my Ollama parameters and this occurs when "truncate": true. Other embedding models properly truncates the input and I see the INFO log in Ollama say "input truncated". I don't see this message with snowflake-arctic-embed-l.

When "truncate" is set to false, I get the expected "input length exceeds maximum context length".

https://ollama.com/library/snowflake-arctic-embed

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.5.4

The text was updated successfully, but these errors were encountered:

rick-github · 2025-01-10T15:50:53Z

#7288

The problem can be worked around by setting num_ctx for the model to the actual context length of the model, rather than the default value of 2048 that ollama uses. You can either do that by setting num_ctx in the API call ("options":{"num_ctx":512}) or by creating a copy of the model with the parameter:

$ ollama show --modelfile  snowflake-arctic-embed:l > Modelfile
$ echo PARAMETER num_ctx 512 >> Modelfile
$ ollama create snowflake-arctic-embed:l-c512

and then adjust the client to use snowflake-arctic-embed:l-c512 instead of snowflake-arctic-embed:l.

shuaiscott added the bug Something isn't working label Jan 10, 2025

jmorganca self-assigned this Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

shuaiscott commented Jan 10, 2025

rick-github commented Jan 10, 2025

Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

Ollama version doesn't properly truncate tokens to 512 max for official snowflake-arctic-embed-l model #8376

Comments

shuaiscott commented Jan 10, 2025

What is the issue?

OS

GPU

CPU

Ollama version

rick-github commented Jan 10, 2025