Skip to main content

model

superduper.ext.vllm.model

Source code

VllmAPI​

VllmAPI(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
api_url: str = '',
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: str = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
prompt: str = '{input}',
prompt_func: Optional[Callable] = None,
max_batch_size: Optional[int] = 4) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
promptThe template to use for the prompt.
prompt_funcThe function to use for the prompt.
max_batch_sizeThe maximum batch size to use for batch generation.
api_urlThe URL for the API.

Wrapper for requesting the vLLM API service.

API Server format, started by vllm.entrypoints.api_server.

VllmModel​

VllmModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: str = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
prompt: str = '{input}',
prompt_func: Optional[Callable] = None,
max_batch_size: Optional[int] = 4,
model_name: str = '',
tensor_parallel_size: int = 1,
trust_remote_code: bool = True,
vllm_kwargs: dict = None,
on_ray: bool = False,
ray_address: Optional[str] = None,
ray_config: dict = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
promptThe template to use for the prompt.
prompt_funcThe function to use for the prompt.
max_batch_sizeThe maximum batch size to use for batch generation.
model_nameThe name of the model to use.
tensor_parallel_sizeThe number of tensor parallelism.
trust_remote_codeWhether to trust remote code.
vllm_kwargsAdditional arguments to pass to the VLLM
on_rayWhether to use Ray for parallelism.
ray_addressThe address of the Ray cluster.
ray_configThe configuration for Ray.

Load a large language model from VLLM.