Superduper allows users to work with self-hosted LLM models via vLLM.
pip install superduper_vllm
Class | Description |
---|---|
superduper_vllm.model.VllmChat |
VLLM model for chatting. |
superduper_vllm.model.VllmCompletion |
VLLM model for generating completions. |
from superduper_vllm import VllmChat
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmChat(identifier="model", vllm_params=vllm_params)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "hello"},
]
Chat with chat format messages
model.predict(messages)
Chat with text format messages
model.predict("hello")
from superduper_vllm import VllmCompletion
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmCompletion(identifier="model", vllm_params=vllm_params)
model.predict("hello")