Merge branch 'main' into features/openaimodellist

bentoml · Oct 14, 2023 · fd6c65e · fd6c65e
2 parents 4861331 + c1ca7cc
commit fd6c65e
Show file tree

Hide file tree

Showing 19 changed files with 338 additions and 735 deletions.
diff --git a/.github/workflows/build-embedding.yml b/.github/workflows/build-embedding.yml
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -14,8 +14,7 @@ repos:
         verbose: true
         exclude: |
           (?x)^(
-              openllm-client/src/openllm_client/pb.*|
-              openllm-python/src/openllm/cli/entrypoint.py
+              openllm-client/src/openllm_client/pb.*
           )$
   - repo: https://github.com/astral-sh/ruff-pre-commit
     rev: 'v0.0.292'

diff --git a/README.md b/README.md
@@ -107,7 +107,6 @@ Options:
 
 Commands:
   build       Package a given models into a Bento.
-  embed       Get embeddings interactively, from a terminal.
   import      Setup LLM interactively.
   instruct    Instruct agents interactively for given tasks, from a...
   models      List all supported models.
@@ -867,47 +866,6 @@ openllm build opt --adapter-id ./path/to/adapter_id --build-ctx .
 > We will gradually roll out support for fine-tuning all models.
 > Currently, the models supporting fine-tuning with OpenLLM include: OPT, Falcon, and LlaMA.
 
-## 🧮 Embeddings
-
-OpenLLM provides embeddings endpoint for embeddings calculation. This can
-be accessed via `/v1/embeddings`.
-
-To use via CLI, simply call `openllm embed`:
-
-```bash
-openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
-{
-  "embeddings": [
-    0.006569798570126295,
-    -0.031249752268195152,
-    -0.008072729222476482,
-    0.00847396720200777,
-    -0.005293501541018486,
-    ...<many embeddings>...
-    -0.002078012563288212,
-    -0.00676426338031888,
-    -0.002022686880081892
-  ],
-  "num_tokens": 9
-}
-```
-
-To invoke this endpoint, use `client.embed` from the Python SDK:
-
-```python
-import openllm
-
-client = openllm.client.HTTPClient("http://localhost:3000")
-
-client.embed("I like to eat apples")
-```
-
-> [!NOTE]
-> Currently, the following model family supports embeddings calculation: Llama, T5 (Flan-T5, FastChat, etc.), ChatGLM
-> For the remaining LLM that doesn't have specific embedding implementation,
-> we will use a generic [BertModel](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
-> for embeddings generation. The implementation is largely based on [`bentoml/sentence-embedding-bento`](https://github.com/bentoml/sentence-embedding-bento)
-
 ## 🥅 Playground and Chat UI
 
 The following UIs are currently available for OpenLLM:

diff --git a/changelog.d/500.breaking.md b/changelog.d/500.breaking.md
@@ -0,0 +1,5 @@
+Remove embeddings endpoints from the provided API, as I think it is probably not a good fit to have them here, yet.
+
+This means that `openllm embed` will also be removed.
+
+Client implementation is also updated to fix 0.3.7 breaking changes with models other than Llama