Skip to main content

config

superduper.base.config

Source code

BaseConfig​

BaseConfig(self) -> None

A base class for configuration dataclasses.

This class allows for easy updating of configuration dataclasses with a dictionary of parameters.

CDCConfig​

CDCConfig(self,
uri: Optional[str] = None,
strategy: Union[superduper.base.config.PollingStrategy,
superduper.base.config.LogBasedStrategy,
NoneType] = None) -> None
ParameterDescription
uriThe URI for the CDC service
strategyThe strategy to use for CDC

Describes the configuration for change data capture.

CDCStrategy​

CDCStrategy(self,
type: str) -> None
ParameterDescription
typeThe type of CDC strategy

Base CDC strategy dataclass.

Cluster​

Cluster(self,
compute: superduper.base.config.Compute = None,
vector_search: superduper.base.config.VectorSearch = None,
rest: superduper.base.config.Rest = None,
cdc: superduper.base.config.CDCConfig = None) -> None
ParameterDescription
computeThe URI for compute - None: run all jobs in local mode i.e. simple function call - "ray://host:port": Run all jobs on a remote ray cluster
vector_searchThe URI for the vector search service - None: Run vector search on local - f"http://{host}:{port}": Connect a remote vector search service
restThe URI for the REST service - f"http://{host}:{port}": Connect a remote vector search service
cdcThe URI for the change data capture service (if "None" then no cdc assumed) None: Run cdc on local as a thread. - f"{http://{host}:{port}": Connect a remote cdc service

Describes a connection to distributed work via Ray.

Compute​

Compute(self,
uri: Optional[str] = None,
compute_kwargs: Dict = None) -> None
ParameterDescription
uriThe URI for the compute service
compute_kwargsThe keyword arguments to pass to the compute service

Describes the configuration for distributed computing.

Config​

Config(self,
envs: dataclasses.InitVar[typing.Optional[typing.Dict[str,
str]]] = None,
data_backend: str = 'mongodb://localhost:27017/test_db',
lance_home: str = '.superduper/vector_indices',
artifact_store: Optional[str] = None,
metadata_store: Optional[str] = None,
cluster: superduper.base.config.Cluster = None,
retries: superduper.base.config.Retry = None,
downloads: superduper.base.config.Downloads = None,
fold_probability: float = 0.05,
log_level: superduper.base.config.LogLevel = <LogLevel.INFO: 'INFO'>,
logging_type: superduper.base.config.LogType = <LogType.SYSTEM: 'SYSTEM'>,
bytes_encoding: superduper.base.config.BytesEncoding = <BytesEncoding.BYTES: 'Bytes'>,
auto_schema: bool = True) -> None
ParameterDescription
envsThe envs datas
data_backendThe URI for the data backend
lance_homeThe home directory for the Lance vector indices, Default: .superduper/vector_indices
artifact_storeThe URI for the artifact store
metadata_storeThe URI for the metadata store
clusterSettings distributed computing and change data capture
retriesSettings for retrying failed operations
downloadsSettings for downloading files
fold_probabilityThe probability of validation fold
log_levelThe severity level of the logs
logging_typeThe type of logging to use
bytes_encodingThe encoding of bytes in the data backend
auto_schemaWhether to automatically create the schema. If True, the schema will be created if it does not exist.

The data class containing all configurable superduper values.

Downloads​

Downloads(self,
folder: Optional[str] = None,
n_workers: int = 0,
headers: Dict = None,
timeout: Optional[int] = None) -> None
ParameterDescription
folderThe folder to download files to
n_workersThe number of workers to use for downloading
headersThe headers to use for downloading
timeoutThe timeout for downloading

Describes the configuration for downloading files.

LogBasedStrategy​

LogBasedStrategy(self,
type: str = 'logbased',
resume_token: Optional[Dict[str,
str]] = None) -> None
ParameterDescription
resume_tokenThe resume token to use for log-based CDC
typeThe type of CDC strategy

Describes a log-based strategy for change data capture.

PollingStrategy​

PollingStrategy(self,
type: 'str' = 'incremental',
auto_increment_field: Optional[str] = None,
frequency: float = 3600) -> None
ParameterDescription
auto_increment_fieldThe field to use for auto-incrementing
frequencyThe frequency to poll for changes
typeThe type of CDC strategy

Describes a polling strategy for change data capture.

Rest​

Rest(self,
uri: Optional[str] = None,
config: Optional[str] = None) -> None
ParameterDescription
uriThe URI for the REST service
configThe path to the config yaml file for the REST service

Describes the configuration for the REST service.

Retry​

Retry(self,
stop_after_attempt: int = 2,
wait_max: float = 10.0,
wait_min: float = 4.0,
wait_multiplier: float = 1.0) -> None
ParameterDescription
stop_after_attemptThe number of attempts to make
wait_maxThe maximum time to wait between attempts
wait_minThe minimum time to wait between attempts
wait_multiplierThe multiplier for the wait time between attempts

Describes how to retry using the tenacity library.

VectorSearch​

VectorSearch(self,
uri: Optional[str] = None,
type: str = 'in_memory',
backfill_batch_size: int = 100) -> None
ParameterDescription
uriThe URI for the vector search service
typeThe type of vector search service
backfill_batch_sizeThe size of the backfill batch

Describes the configuration for vector search.