Skip to main content
Version: Main branch

superduper_vllm

Superduper allows users to work with self-hosted LLM models via vLLM.

Installation​

pip install superduper_vllm

API​

ClassDescription
superduper_vllm.model.VllmChatVLLM model for chatting.
superduper_vllm.model.VllmCompletionVLLM model for generating completions.

Examples​

VllmChat​

from superduper_vllm import VllmChat
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmChat(identifier="model", vllm_params=vllm_params)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "hello"},
]

Chat with chat format messages

model.predict(messages)

Chat with text format messages

model.predict("hello")

VllmCompletion​

from superduper_vllm import VllmCompletion
vllm_params = dict(
model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
quantization="awq",
dtype="auto",
max_model_len=1024,
tensor_parallel_size=1,
)
model = VllmCompletion(identifier="model", vllm_params=vllm_params)
model.predict("hello")