Skip to main content

model

superduper.components.model

Source code

codemodel​

codemodel(item: 't.Optional[t.Callable]' = None,
identifier: 't.Optional[str]' = None,
datatype=None,
model_update_kwargs: 't.Optional[t.Dict]' = None,
flatten: 'bool' = False,
output_schema: 't.Optional[Schema]' = None)
ParameterDescription
itemCallable to wrap with CodeModel.
identifierIdentifier for the CodeModel.
datatypeDatatype for the model outputs.
model_update_kwargsDictionary to define update kwargs.
flattenIf True, flatten the outputs and save.
output_schemaSchema for the model outputs.

Decorator to wrap a function with CodeModel.

When a function is wrapped with this decorator, the function comes out as a CodeModel.

model​

model(item: 't.Optional[t.Callable]' = None,
identifier: 't.Optional[str]' = None,
datatype=None,
model_update_kwargs: 't.Optional[t.Dict]' = None,
flatten: 'bool' = False,
output_schema: 't.Optional[Schema]' = None)
ParameterDescription
itemCallable to wrap with ObjectModel.
identifierIdentifier for the ObjectModel.
datatypeDatatype for the model outputs.
model_update_kwargsDictionary to define update kwargs.
flattenIf True, flatten the outputs and save.
output_schemaSchema for the model outputs.

Decorator to wrap a function with ObjectModel.

When a function is wrapped with this decorator, the function comes out as an ObjectModel.

CodeModel​

CodeModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
num_workers: 'int' = 0,
object: 'Code') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel processing
objectCode object

Model component which stores a code object.

Model​

Model(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.

Base class for components which can predict.

ObjectModel​

ObjectModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
num_workers: 'int' = 0,
object: 't.Any') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
num_workersNumber of workers to use for parallel processing
objectModel/ computation object

Model component which wraps a Model to become serializable.

# Example:
# -------
m = ObjectModel('test', lambda x: x + 2)
m.predict(2)
# 4

QueryModel​

QueryModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
preprocess: 't.Optional[t.Callable]' = None,
postprocess: 't.Optional[t.Union[t.Callable]]' = None,
select: 'Query') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
preprocessPreprocess callable
postprocessPostprocess callable
selectquery used to find data (can include like)

QueryModel component.

Model which can be used to query data and return those precomputed queries as Results.

Validation​

Validation(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
metrics: 't.Sequence[Metric]' = (),
key: 't.Optional[ModelInputType]' = None,
datasets: 't.Sequence[Dataset]' = ()) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
metricsList of metrics for validation
keyModel input type key
datasetsSequence of dataset.

component which represents Validation definition.

Mapping​

Mapping(self,
mapping: 'ModelInputType',
signature: 'Signature')
ParameterDescription
mappingMapping that represents a collection or table map.
signatureSignature for the model.

Class to represent model inputs for mapping database collections or tables.

APIBaseModel​

APIBaseModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
model: 't.Optional[str]' = None,
max_batch_size: 'int' = 8) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
modelThe Model to use, e.g. 'text-embedding-ada-002'
max_batch_sizeMaximum batch size.

APIBaseModel component which is used to make the type of API request.

APIModel​

APIModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
model: 't.Optional[str]' = None,
max_batch_size: 'int' = 8,
url: 'str',
postprocess: 't.Optional[t.Callable]' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
modelThe Model to use, e.g. 'text-embedding-ada-002'
max_batch_sizeMaximum batch size.
urlThe url to use for the API request
postprocessPostprocess function to use on the output of the API request

APIModel component which is used to make the type of API request.

CallableInputs​

CallableInputs(self,
fn,
predict_kwargs: 't.Dict' = {})
ParameterDescription
fnCallable function
predict_kwargs(optional) predict_kwargs if provided in Model initiation

Class represents the model callable args and kwargs.

IndexableNode​

IndexableNode(self,
types: 't.Sequence[t.Type]') -> None
ParameterDescription
typesSequence of types

Base indexable node for ObjectModel.

Inputs​

Inputs(self,
params)
ParameterDescription
paramsList of parameters of the Model object

Base class to represent the model args and kwargs.

SequentialModel​

SequentialModel(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: 'Signature' = '*args,
**kwargs',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = None,
predict_kwargs: 't.Dict' = None,
compute_kwargs: 't.Dict' = None,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = None,
models: 't.List[Model]') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
modelsA list of models to use

Sequential model component which wraps a model to become serializable.

Trainer​

Trainer(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
key: 'ModelInputType',
select: 'Query',
transform: 't.Optional[t.Callable]' = None,
metric_values: 't.Dict' = None,
signature: 'Signature' = '*args',
data_prefetch: 'bool' = False,
prefetch_size: 'int' = 1000,
prefetch_factor: 'int' = 100,
in_memory: 'bool' = True,
compute_kwargs: 't.Dict' = None) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
keyModel input type key.
selectModel select query for training.
transform(optional) transform callable.
metric_valuesDictionary for metric defaults.
signatureModel signature.
data_prefetchBoolean for prefetching data before forward pass.
prefetch_sizePrefetch batch size.
prefetch_factorPrefetch factor for data prefetching.
in_memoryIf training in memory.
compute_kwargsKwargs for compute backend.

Trainer component to train a model.

Training configuration object, containing all settings necessary for a particular learning task use-case to be serialized and initiated. The object is callable and returns a class which may be invoked to apply training.