model
superduper.ext.vllm.model
VllmAPI
​
VllmAPI(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
api_url: str = '',
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: str = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
prompt: str = '{input}',
prompt_func: Optional[Callable] = None,
max_batch_size: Optional[int] = 4) -> None
Parameter | Description |
---|---|
identifier | Identifier of the leaf. |
db | Datalayer instance. |
uuid | UUID of the leaf. |
artifacts | A dictionary of artifacts paths and DataType objects |
signature | Model signature. |
datatype | DataType instance. |
output_schema | Output schema (mapping of encoders). |
flatten | Flatten the model outputs. |
model_update_kwargs | The kwargs to use for model update. |
predict_kwargs | Additional arguments to use at prediction time. |
compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). |
validation | The validation Dataset instances to use. |
metric_values | The metrics to evaluate on. |
prompt | The template to use for the prompt. |
prompt_func | The function to use for the prompt. |
max_batch_size | The maximum batch size to use for batch generation. |
api_url | The URL for the API. |
Wrapper for requesting the vLLM API service.
API Server format, started by vllm.entrypoints.api_server
.
VllmModel
​
VllmModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: str = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
prompt: str = '{input}',
prompt_func: Optional[Callable] = None,
max_batch_size: Optional[int] = 4,
model_name: str = '',
tensor_parallel_size: int = 1,
trust_remote_code: bool = True,
vllm_kwargs: dict = None,
on_ray: bool = False,
ray_address: Optional[str] = None,
ray_config: dict = None) -> None
Parameter | Description |
---|---|
identifier | Identifier of the leaf. |
db | Datalayer instance. |
uuid | UUID of the leaf. |
artifacts | A dictionary of artifacts paths and DataType objects |
signature | Model signature. |
datatype | DataType instance. |
output_schema | Output schema (mapping of encoders). |
flatten | Flatten the model outputs. |
model_update_kwargs | The kwargs to use for model update. |
predict_kwargs | Additional arguments to use at prediction time. |
compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). |
validation | The validation Dataset instances to use. |
metric_values | The metrics to evaluate on. |
prompt | The template to use for the prompt. |
prompt_func | The function to use for the prompt. |
max_batch_size | The maximum batch size to use for batch generation. |
model_name | The name of the model to use. |
tensor_parallel_size | The number of tensor parallelism. |
trust_remote_code | Whether to trust remote code. |
vllm_kwargs | Additional arguments to pass to the VLLM |
on_ray | Whether to use Ray for parallelism. |
ray_address | The address of the Ray cluster. |
ray_config | The configuration for Ray. |
Load a large language model from VLLM.