You can find the following examples to interact with OpenLLM features. See more here
The following notebook demonstrate general OpenLLM features and how to start running any open source models in production.
The openai_completion_client.py
demos how to use the OpenAI-compatible /v1/completions
to generate text.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_completion_client.py
The openai_chat_completion_client.py
demos how to use the OpenAI-compatible /v1/chat/completions
to chat with a model.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_chat_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_chat_completion_client.py
The api_server.py
demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.
Install requirements:
pip install -U "openllm[vllm]"
To serve the Bento (given you have access to GPU):
bentoml serve api_server:svc
To build the Bento do the following:
bentoml build -f bentofile.yaml .