superduper_vllm
Superduper allows users to work with self-hosted LLM models via vLLM.
Installation​
pip install superduper_vllm
API​
Class | Description |
---|---|
superduper_vllm.model.VllmChat | VLLM model for chatting. |
superduper_vllm.model.VllmCompletion | VLLM model for generating completions. |
Examples​
VllmChat​
from superduper_vllm import VllmChat
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmChat(identifier="model", vllm_params=vllm_params)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "hello"},
]
Chat with chat format messages
model.predict(messages)
Chat with text format messages
model.predict("hello")
VllmCompletion​
from superduper_vllm import VllmCompletion
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmCompletion(identifier="model", vllm_params=vllm_params)
model.predict("hello")