Version: Main branch

model

superduper.components.model

`init_decorator`

init_decorator(func)

Parameter	Description
func	init function.

Decorator to set _is_setup to True after init method is called.

`method_wrapper`

method_wrapper(method,
     item,
     signature: 'str')

Parameter	Description
method	Method to execute.
item	Item to wrap.
signature	Signature of the method.

Wrap the item with the model.

`serve`

serve(f)

Parameter	Description
f	Method to serve.

Decorator to serve the model on the associated cluster.

`APIBaseModel`

APIBaseModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8,
     postprocess: 't.Optional[t.Callable]' = None) -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
model	The Model to use, e.g. `'text-embedding-ada-002'`
max_batch_size	Maximum batch size.
postprocess	Postprocess function to use on the output of the API request

APIBaseModel component which is used to make the type of API request.

`Model`

Model(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None) -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.

Base class for components which can predict.

`ObjectModel`

ObjectModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     object: 't.Callable',
     method: 't.Optional[str]' = None) -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel processing
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
object	Model/ computation object
method	Method to call on the object

Model component which wraps a Model to become serializable.

# Example:
# -------
m = ObjectModel('test', lambda x: x + 2)
m.predict(2)
# 4

`QueryModel`

QueryModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     preprocess: 't.Optional[t.Callable]' = None,
     postprocess: 't.Optional[t.Callable]' = None,
     select: 'Query',
     signature: 'Signature' = '**kwargs') -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
preprocess	Preprocess callable
postprocess	Postprocess callable
select	query used to find data (can include `like`)
signature	signature to use

QueryModel component.

Model which can be used to query data and return those precomputed queries as Results.

`Trainer`

Trainer(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     key: 'st.JSON',
     select: 'st.BaseType',
     transform: 't.Optional[t.Callable]' = None,
     metric_values: 't.Dict' = <factory>,
     in_memory: 'bool' = True,
     validation: 't.Optional[Validation]' = None) -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs for compute backend.
db	Datalayer instance. Datalayer instance.
key	Model input type key.
select	Model select query for training.
transform	(optional) transform callable.
metric_values	Dictionary for metric defaults.
in_memory	If training in memory.
validation	Validation object to measure training performance

Trainer component to train a model.

Training configuration object, containing all settings necessary for a particular learning task use-case to be serialized and initiated. The object is callable and returns a class which may be invoked to apply training.

`Validation`

Validation(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     compute_kwargs: Dict = <factory>,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     metrics: 't.List[Metric]' = <factory>,
     key: 'st.JSON',
     datasets: 't.List[Dataset]' = <factory>) -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Keyword arguments to manage the compute environment.
db	Datalayer instance. Datalayer instance.
metrics	List of metrics for validation
key	Model input type key
datasets	Sequence of dataset.

Component which represents Validation definition.

`APIModel`

APIModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8,
     postprocess: 't.Optional[t.Callable]' = None,
     url: 'str') -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
model	The Model to use, e.g. `'text-embedding-ada-002'`
max_batch_size	Maximum batch size.
postprocess	Postprocess function to use on the output of the API request
url	The url to use for the API request

APIModel component which is used to make the type of API request.

`SequentialModel`

SequentialModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     compute_kwargs: 't.Dict' = <factory>,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     models: 't.List[Model]') -> None

Parameter	Description
identifier	Identifier of the instance.
upstream	A list of upstream components.
compute_kwargs	Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
db	Datalayer instance. Datalayer instance.
datatype	DataType instance.
model_update_kwargs	The kwargs to use for model update.
predict_kwargs	Additional arguments to use at prediction time.
validation	The validation `Dataset` instances to use.
metric_values	The metrics to evaluate on.
num_workers	Number of workers to use for parallel prediction.
serve	Creates an http endpoint and serve the model with `compute_kwargs` on a distributed cluster.
trainer	`Trainer` instance to use for training.
models	A list of models to use

Sequential model component which wraps a model to become serializable.

init_decorator​

method_wrapper​

serve​

APIBaseModel​

Model​

ObjectModel​

QueryModel​

Trainer​

Validation​

APIModel​

SequentialModel​