Skip to content

Commit

Permalink
Merge branch 'main' into features/openaimodellist
Browse files Browse the repository at this point in the history
  • Loading branch information
aarnphm authored Oct 14, 2023
2 parents 4861331 + c1ca7cc commit fd6c65e
Show file tree
Hide file tree
Showing 19 changed files with 338 additions and 735 deletions.
239 changes: 0 additions & 239 deletions .github/workflows/build-embedding.yml

This file was deleted.

3 changes: 1 addition & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ repos:
verbose: true
exclude: |
(?x)^(
openllm-client/src/openllm_client/pb.*|
openllm-python/src/openllm/cli/entrypoint.py
openllm-client/src/openllm_client/pb.*
)$
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: 'v0.0.292'
Expand Down
42 changes: 0 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ Options:

Commands:
build Package a given models into a Bento.
embed Get embeddings interactively, from a terminal.
import Setup LLM interactively.
instruct Instruct agents interactively for given tasks, from a...
models List all supported models.
Expand Down Expand Up @@ -867,47 +866,6 @@ openllm build opt --adapter-id ./path/to/adapter_id --build-ctx .
> We will gradually roll out support for fine-tuning all models.
> Currently, the models supporting fine-tuning with OpenLLM include: OPT, Falcon, and LlaMA.
## 🧮 Embeddings
OpenLLM provides embeddings endpoint for embeddings calculation. This can
be accessed via `/v1/embeddings`.
To use via CLI, simply call `openllm embed`:
```bash
openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
{
"embeddings": [
0.006569798570126295,
-0.031249752268195152,
-0.008072729222476482,
0.00847396720200777,
-0.005293501541018486,
...<many embeddings>...
-0.002078012563288212,
-0.00676426338031888,
-0.002022686880081892
],
"num_tokens": 9
}
```
To invoke this endpoint, use `client.embed` from the Python SDK:
```python
import openllm
client = openllm.client.HTTPClient("http://localhost:3000")
client.embed("I like to eat apples")
```
> [!NOTE]
> Currently, the following model family supports embeddings calculation: Llama, T5 (Flan-T5, FastChat, etc.), ChatGLM
> For the remaining LLM that doesn't have specific embedding implementation,
> we will use a generic [BertModel](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
> for embeddings generation. The implementation is largely based on [`bentoml/sentence-embedding-bento`](https://github.com/bentoml/sentence-embedding-bento)
## 🥅 Playground and Chat UI
The following UIs are currently available for OpenLLM:
Expand Down
5 changes: 5 additions & 0 deletions changelog.d/500.breaking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Remove embeddings endpoints from the provided API, as I think it is probably not a good fit to have them here, yet.

This means that `openllm embed` will also be removed.

Client implementation is also updated to fix 0.3.7 breaking changes with models other than Llama
Loading

0 comments on commit fd6c65e

Please sign in to comment.