Skip to main content

model

superduper.components.model

Source code

init_decorator​

init_decorator(func)
ParameterDescription
funcinit function.

Decorator to set _is_initialized to True after init method is called.

model​

model(item: 't.Optional[t.Callable]' = None,
identifier: 't.Optional[str]' = None,
datatype=None,
model_update_kwargs: 't.Optional[t.Dict]' = None,
output_schema: 't.Optional[Schema]' = None,
num_workers: 'int' = 0,
example: 't.Any | None' = None,
signature: 'Signature' = '*args,
**kwargs')
ParameterDescription
itemCallable to wrap with ObjectModel.
identifierIdentifier for the ObjectModel.
datatypeDatatype for the model outputs.
model_update_kwargsDictionary to define update kwargs.
output_schemaSchema for the model outputs.
num_workersNumber of workers to use for parallel processing
exampleExample to auto-determine the schema/ datatype.
signatureSignature for the model.

Decorator to wrap a function with ObjectModel.

When a function is wrapped with this decorator, the function comes out as an ObjectModel.

Model​

Model(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.

Base class for components which can predict.

ObjectModel​

ObjectModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
object: 't.Callable',
method: 't.Optional[str]' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel processing
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
objectModel/ computation object
methodMethod to call on the object

Model component which wraps a Model to become serializable.

# Example:
# -------
m = ObjectModel('test', lambda x: x + 2)
m.predict(2)
# 4

QueryModel​

QueryModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
preprocess: 't.Optional[t.Callable]' = None,
postprocess: 't.Optional[t.Union[t.Callable]]' = None,
select: 'Query') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
preprocessPreprocess callable
postprocessPostprocess callable
selectquery used to find data (can include like)

QueryModel component.

Model which can be used to query data and return those precomputed queries as Results.

Validation​

Validation(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
metrics: 't.Sequence[Metric]' = (),
key: 'ModelInputType',
datasets: 't.Sequence[Dataset]' = ()) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
metricsList of metrics for validation
keyModel input type key
datasetsSequence of dataset.

component which represents Validation definition.

Mapping​

Mapping(self,
mapping: 'ModelInputType',
signature: 'Signature')
ParameterDescription
mappingMapping that represents a collection or table map.
signatureSignature for the model.

Class to represent model inputs for mapping database collections or tables.

APIBaseModel​

APIBaseModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
model: 't.Optional[str]' = None,
max_batch_size: 'int' = 8) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
modelThe Model to use, e.g. 'text-embedding-ada-002'
max_batch_sizeMaximum batch size.

APIBaseModel component which is used to make the type of API request.

APIModel​

APIModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
model: 't.Optional[str]' = None,
max_batch_size: 'int' = 8,
url: 'str',
postprocess: 't.Optional[t.Callable]' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
modelThe Model to use, e.g. 'text-embedding-ada-002'
max_batch_sizeMaximum batch size.
urlThe url to use for the API request
postprocessPostprocess function to use on the output of the API request

APIModel component which is used to make the type of API request.

CallableInputs​

CallableInputs(self,
fn,
predict_kwargs: 't.Dict' = {})
ParameterDescription
fnCallable function
predict_kwargs(optional) predict_kwargs if provided in Model initiation

Class represents the model callable args and kwargs.

IndexableNode​

IndexableNode(self,
types: 't.Sequence[t.Type]') -> None
ParameterDescription
typesSequence of types

Base indexable node for ObjectModel.

Inputs​

Inputs(self,
params)
ParameterDescription
paramsList of parameters of the Model object

Base class to represent the model args and kwargs.

ModelRouter​

ModelRouter(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
models: 't.Dict[str,
Model]',
model: 'str') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
modelsA dictionary of models to use
modelThe model to use

ModelRouter component which routes the model to the correct model.

RAGModel​

RAGModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'str' = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
prompt_template: 'str',
select: 'Query',
key: 'str',
llm: 'Model') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
prompt_templatePrompt template.
selectQuery to retrieve data.
keyKey to use for get text out of documents.
llmLanguage model to use.

Model to use for RAG.

SequentialModel​

SequentialModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
model_update_kwargs: None = <factory>,
predict_kwargs: None = <factory>,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: None = <factory>,
num_workers: 'int' = 0,
serve: 'bool' = False,
trainer: 't.Optional[Trainer]' = None,
example: 'dc.InitVar[t.Any | None]' = None,
models: 't.List[Model]') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel prediction.
serveCreates an http endpoint and serve the model with compute_kwargs on a distributed cluster.
trainerTrainer instance to use for training.
exampleAn example to auto-determine the schema/ datatype.
modelsA list of models to use

Sequential model component which wraps a model to become serializable.

Trainer​

Trainer(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
key: 'ModelInputType',
select: 'Query',
transform: 't.Optional[t.Callable]' = None,
metric_values: None = <factory>,
signature: 'Signature' = '*args',
data_prefetch: 'bool' = False,
prefetch_size: 'int' = 1000,
prefetch_factor: 'int' = 100,
in_memory: 'bool' = True,
compute_kwargs: None = <factory>,
validation: 't.Optional[Validation]' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
keyModel input type key.
selectModel select query for training.
transform(optional) transform callable.
metric_valuesDictionary for metric defaults.
signatureModel signature.
data_prefetchBoolean for prefetching data before forward pass.
prefetch_sizePrefetch batch size.
prefetch_factorPrefetch factor for data prefetching.
in_memoryIf training in memory.
compute_kwargsKwargs for compute backend.
validationValidation object to measure training performance

Trainer component to train a model.

Training configuration object, containing all settings necessary for a particular learning task use-case to be serialized and initiated. The object is callable and returns a class which may be invoked to apply training.