model
superduper.components.model
init_decorator
init_decorator(func)
| Parameter | Description | 
|---|---|
| func | init function. | 
Decorator to set _is_setup to True after init method is called.
method_wrapper
method_wrapper(method,
     item,
     signature: 'str')
| Parameter | Description | 
|---|---|
| method | Method to execute. | 
| item | Item to wrap. | 
| signature | Signature of the method. | 
Wrap the item with the model.
serve
serve(f)
| Parameter | Description | 
|---|---|
| f | Method to serve. | 
Decorator to serve the model on the associated cluster.
APIBaseModel
APIBaseModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8,
     postprocess: 't.Optional[t.Callable]' = None) -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel prediction. | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
| model | The Model to use, e.g. 'text-embedding-ada-002' | 
| max_batch_size | Maximum batch size. | 
| postprocess | Postprocess function to use on the output of the API request | 
APIBaseModel component which is used to make the type of API request.
Model
Model(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None) -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel prediction. | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
Base class for components which can predict.
ObjectModel
ObjectModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     object: 't.Callable',
     method: 't.Optional[str]' = None) -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel processing | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
| object | Model/ computation object | 
| method | Method to call on the object | 
Model component which wraps a Model to become serializable.
# Example:
# -------
m = ObjectModel('test', lambda x: x + 2)
m.predict(2)
# 4
QueryModel
QueryModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     preprocess: 't.Optional[t.Callable]' = None,
     postprocess: 't.Optional[t.Callable]' = None,
     select: 'Query',
     signature: 'Signature' = '**kwargs') -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel prediction. | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
| preprocess | Preprocess callable | 
| postprocess | Postprocess callable | 
| select | query used to find data (can include like) | 
| signature | signature to use | 
QueryModel component.
Model which can be used to query data and return those precomputed queries as Results.
Trainer
Trainer(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     key: 'st.JSON',
     select: 'st.BaseType',
     transform: 't.Optional[t.Callable]' = None,
     metric_values: 't.Dict' = <factory>,
     in_memory: 'bool' = True,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None) -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| key | Model input type key. | 
| select | Model select query for training. | 
| transform | (optional) transform callable. | 
| metric_values | Dictionary for metric defaults. | 
| in_memory | If training in memory. | 
| compute_kwargs | Kwargs for compute backend. | 
| validation | Validation object to measure training performance | 
Trainer component to train a model.
Training configuration object, containing all settings necessary for a particular
learning task use-case to be serialized and initiated. The object is callable
and returns a class which may be invoked to apply training.
Validation
Validation(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     metrics: 't.List[Metric]' = <factory>,
     key: 'st.JSON',
     datasets: 't.List[Dataset]' = <factory>) -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| metrics | List of metrics for validation | 
| key | Model input type key | 
| datasets | Sequence of dataset. | 
Component which represents Validation definition.
APIModel
APIModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     model: 't.Optional[str]' = None,
     max_batch_size: 'int' = 8,
     postprocess: 't.Optional[t.Callable]' = None,
     url: 'str') -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel prediction. | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
| model | The Model to use, e.g. 'text-embedding-ada-002' | 
| max_batch_size | Maximum batch size. | 
| postprocess | Postprocess function to use on the output of the API request | 
| url | The url to use for the API request | 
APIModel component which is used to make the type of API request.
SequentialModel
SequentialModel(self,
     identifier: str,
     upstream: Optional[List[ForwardRef('Component')]] = None,
     db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
     *,
     datatype: 'str | None' = None,
     model_update_kwargs: 't.Dict' = <factory>,
     predict_kwargs: 't.Dict' = <factory>,
     compute_kwargs: 't.Dict' = <factory>,
     validation: 't.Optional[Validation]' = None,
     metric_values: 't.Dict' = <factory>,
     num_workers: 'int' = 0,
     serve: 'bool' = False,
     trainer: 't.Optional[Trainer]' = None,
     models: 't.List[Model]') -> None
| Parameter | Description | 
|---|---|
| identifier | Identifier of the instance. | 
| upstream | A list of upstream components. | 
| db | Datalayer instance. Datalayer instance. | 
| datatype | DataType instance. | 
| model_update_kwargs | The kwargs to use for model update. | 
| predict_kwargs | Additional arguments to use at prediction time. | 
| compute_kwargs | Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...). | 
| validation | The validation Datasetinstances to use. | 
| metric_values | The metrics to evaluate on. | 
| num_workers | Number of workers to use for parallel prediction. | 
| serve | Creates an http endpoint and serve the model with compute_kwargson a distributed cluster. | 
| trainer | Trainerinstance to use for training. | 
| models | A list of models to use | 
Sequential model component which wraps a model to become serializable.