Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix embeddings memory corruption #6467

Merged
merged 2 commits into from
Aug 22, 2024
Merged

Fix embeddings memory corruption #6467

merged 2 commits into from
Aug 22, 2024

Conversation

dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented Aug 22, 2024

The patch was leading to a buffer overrun corruption. Once removed though, parallism in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To work around this, only use slot 0 for embeddings.

Fixes #6435

@dhiltgen dhiltgen marked this pull request as draft August 22, 2024 19:40
@dhiltgen
Copy link
Collaborator Author

The embeddings before/after this change are the same, but just the prompt_eval_count is off for some reason.

@dhiltgen
Copy link
Collaborator Author

Turns out the prompt_eval_count has changed since 0.3.5, so this is unrelated to my change. The test's assumption no longer holds. I'll update that in the test.

@dhiltgen dhiltgen marked this pull request as ready for review August 22, 2024 20:08
server/sched.go Outdated Show resolved Hide resolved
The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.
The token eval count has changed with recent llama.cpp bumps (0.3.5+)
@dhiltgen dhiltgen merged commit 90ca841 into ollama:main Aug 22, 2024
15 checks passed
@dhiltgen dhiltgen deleted the embeddings branch August 22, 2024 21:51
deep93333 pushed a commit to deep93333/ollama that referenced this pull request Sep 9, 2024
* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.3.6 /api/embed return 500 if more items are provided in input
2 participants