You can find the following examples to interact with OpenLLM features. See more here
The openai_completion_client.py
demos how to use the OpenAI-compatible /v1/completions
to generate text.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_completion_client.py
The openai_chat_completion_client.py
demos how to use the OpenAI-compatible /v1/chat/completions
to chat with a model.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_chat_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_chat_completion_client.py
The api_server.py
demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.
Install requirements:
pip install -U "openllm[vllm]"
To serve the Bento (given you have access to GPU):
bentoml serve api_server:svc
To build the Bento do the following:
bentoml build -f bentofile.yaml .