Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding Normalization Options #8145

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gabe-l-hart
Copy link
Contributor

Description

This PR introduces the new API parameter normalize for the /api/embed and /api/embeddings endpoints that allow the user to explicitly enable/disable normalization. The default behavior of both endpoints is preserved (normalization for embed, no normalization for embeddings), but this allows them to produce equivalent output to avoid future confusion.

Issues

Discussion

The /api/embeddings endpoint is deprecated according to the docs. An argument could be made that we should not add this functionality to this endpoint and simply document around the discrepancy. I'd be totally open to this!

Testing

I updated the embed_test.go to include the various combinations against embed and embedding. I ran them as follows:

OLLAMA_TEST_EXISTING=true go test -tags=integration ./integration/ -test.run TestAllMiniLM.*

Branch: LegacyEmbeddingsNormalization

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
…Handler

The default normalization behavior remains the same (true for embed, false
for embeddings), but this allows users to specify the behavior they want.

Branch: LegacyEmbeddingsNormalization

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Also include notes about the discrepency between embed and embeddings with
default behavior.

Branch: LegacyEmbeddingsNormalization

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: LegacyEmbeddingsNormalization

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@jmorganca jmorganca self-assigned this Dec 23, 2024
@jmorganca jmorganca self-requested a review December 23, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can you make the normalize optional for embeddings? Add option to /api/embed to not normalize embeddings
2 participants