Version: 0.4

model

superduper.components.model

`init_decorator`

init_decorator(func)

Parameter	Description
func	init function.

Decorator to set _is_initialized to True after init method is called.

`model`

model(item: 't.Optional[t.Callable]' = None,
     identifier: 't.Optional[str]' = None,
     datatype=None,
     model_update_kwargs: 't.Optional[t.Dict]' = None,
     output_schema: 't.Optional[Schema]' = None,
     num_workers: 'int' = 0,
     example: 't.Any | None' = None,
     signature: 'Signature' = '*args,
    **kwargs')

Parameter	Description
item	Callable to wrap with `ObjectModel`.
identifier	Identifier for the `ObjectModel`.
datatype	Datatype for the model outputs.
model_update_kwargs	Dictionary to define update kwargs.
output_schema	Schema for the model outputs.
num_workers	Number of workers to use for parallel processing
example	Example to auto-determine the schema/ datatype.
signature	Signature for the model.

Decorator to wrap a function with ObjectModel.

When a function is wrapped with this decorator, the function comes out as an ObjectModel.

`Model`

Model(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.

Base class for components which can predict.

`ObjectModel`

ObjectModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     object: 't.Callable',
     method: 't.Optional[str]' = None) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel processing
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
object	Model/ computation object
method	Method to call on the object

Model component which wraps a Model to become serializable.

# Example:
# -------
m = ObjectModel('test', lambda x: x + 2)
m.predict(2)
# 4

`QueryModel`

QueryModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '**kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     preprocess: 't.Optional[t.Callable]' = None,
     postprocess: 't.Optional[t.Union[t.Callable]]' = None,
     select: 'Query') -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
preprocess	Preprocess callable
postprocess	Postprocess callable
select	query used to find data (can include `like`)

QueryModel component.

Model which can be used to query data and return those precomputed queries as Results.

`Validation`

Validation(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     metrics: 't.Sequence[Metric]' = (),
     key: 'ModelInputType',
     datasets: 't.Sequence[Dataset]' = ()) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
metrics	List of metrics for validation
key	Model input type key
datasets	Sequence of dataset.

component which represents Validation definition.

`Mapping`

Mapping(self,
     mapping: 'ModelInputType',
     signature: 'Signature')

Parameter	Description
mapping	Mapping that represents a collection or table map.
signature	Signature for the model.

Class to represent model inputs for mapping database collections or tables.

`APIBaseModel`

APIBaseModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
model	The Model to use, e.g. `'text-embedding-ada-002'`
max_batch_size	Maximum batch size.

APIBaseModel component which is used to make the type of API request.

`APIModel`

APIModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8,
     url: 'str',
     postprocess: 't.Optional[t.Callable]' = None) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
model	The Model to use, e.g. `'text-embedding-ada-002'`
max_batch_size	Maximum batch size.
url	The url to use for the API request
postprocess	Postprocess function to use on the output of the API request

APIModel component which is used to make the type of API request.

`CallableInputs`

CallableInputs(self,
     fn,
     predict_kwargs: 't.Dict' = {})

Parameter	Description
fn	Callable function
predict_kwargs	(optional) predict_kwargs if provided in Model initiation

Class represents the model callable args and kwargs.

`IndexableNode`

IndexableNode(self,
     types: 't.Sequence[t.Type]') -> None

Parameter	Description
types	Sequence of types

Base indexable node for ObjectModel.

`Inputs`

Inputs(self,
     params)

Parameter	Description
params	List of parameters of the Model object

Base class to represent the model args and kwargs.

`ModelRouter`

ModelRouter(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     models: 't.Dict[str,
     Model]',
     model: 'str') -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
models	A dictionary of models to use
model	The model to use

ModelRouter component which routes the model to the correct model.

`RAGModel`

RAGModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'str' = 'singleton',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     prompt_template: 'str',
     select: 'Query',
     key: 'str',
     llm: 'Model') -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
prompt_template	Prompt template.
select	Query to retrieve data.
key	Key to use for get text out of documents.
llm	Language model to use.

Model to use for RAG.

`SequentialModel`

SequentialModel(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     signature: 'Signature' = '*args,
    **kwargs',
     datatype: 'EncoderArg' = None,
     output_schema: 't.Optional[Schema]' = None,
     model_update_kwargs: None = <factory>,
     predict_kwargs: None = <factory>,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: None = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     example: 'dc.InitVar[t.Any | None]' = None,
     models: 't.List[Model]') -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
signature	Model signature.
datatype	DataType instance.
output_schema	Output schema (mapping of encoders).
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
example	An example to auto-determine the schema/ datatype.
models	A list of models to use

Sequential model component which wraps a model to become serializable.

`Trainer`

Trainer(self,
     identifier: str,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     uuid: None = <factory>,
     *,
     upstream: "t.Optional[t.List['Component']]" = None,
     plugins: "t.Optional[t.List['Plugin']]" = None,
     artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
     cache: 't.Optional[bool]' = True,
     status: 't.Optional[Status]' = None,
     key: 'ModelInputType',
     select: 'Query',
     transform: 't.Optional[t.Callable]' = None,
     metric_values: None = <factory>,
     signature: 'Signature' = '*args',
     data_prefetch: 'bool' = False,
     prefetch_size: 'int' = 1000,
     prefetch_factor: 'int' = 100,
     in_memory: 'bool' = True,
     compute_kwargs: None = <factory>,
     validation: 't.Optional[Validation]' = None) -> None

Parameter	Description
identifier	Identifier of the leaf.
db	Datalayer instance.
uuid	UUID of the leaf.
artifacts	A dictionary of artifacts paths and `DataType` objects
upstream	A list of upstream components
plugins	A list of plugins to be used in the component.
cache	(Optional) If set `true` the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
status	What part of the lifecycle the component is in.
key	Model input type key.
select	Model select query for training.
transform	(optional) transform callable.
metric_values	Dictionary for metric defaults.
signature	Model signature.
data_prefetch	Boolean for prefetching data before forward pass.
prefetch_size	Prefetch batch size.
prefetch_factor	Prefetch factor for data prefetching.
in_memory	If training in memory.
compute_kwargs	Kwargs for compute backend.
validation	Validation object to measure training performance

Trainer component to train a model.

Training configuration object, containing all settings necessary for a particular learning task use-case to be serialized and initiated. The object is callable and returns a class which may be invoked to apply training.

init_decorator​

model​

Model​

ObjectModel​

QueryModel​

Validation​

Mapping​

APIBaseModel​

APIModel​

CallableInputs​

IndexableNode​

Inputs​

ModelRouter​

RAGModel​

SequentialModel​

Trainer​