Version: Main branch

`superduper_vllm`

Superduper allows users to work with self-hosted LLM models via vLLM.

Installation

pip install superduper_vllm

API

Class	Description
`superduper_vllm.model.VllmChat`	VLLM model for chatting.
`superduper_vllm.model.VllmCompletion`	VLLM model for generating completions.

Examples

VllmChat

from superduper_vllm import VllmChat
vllm_params = dict(
    model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    quantization="awq",
    dtype="auto",
    max_model_len=1024,
    tensor_parallel_size=1,
)
model = VllmChat(identifier="model", vllm_params=vllm_params)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "hello"},
]

Chat with chat format messages

model.predict(messages)

Chat with text format messages

model.predict("hello")

VllmCompletion

from superduper_vllm import VllmCompletion
vllm_params = dict(
    model="hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    quantization="awq",
    dtype="auto",
    max_model_len=1024,
    tensor_parallel_size=1,
)
model = VllmCompletion(identifier="model", vllm_params=vllm_params)
model.predict("hello")

superduper_vllm

Installation​

API​

Examples​

VllmChat​

VllmCompletion​

`superduper_vllm`

Installation

API

Examples

VllmChat

VllmCompletion