Skip to main content

query_dataset

superduper.backends.query_dataset

Source code

query_dataset_factory​

query_dataset_factory(**kwargs)
ParameterDescription
kwargsKeyword arguments to be passed to the query dataset object.

Create a query dataset object.

If data_prefetch is set to True, then a CachedQueryDataset object is created, otherwise a QueryDataset object is created.

CachedQueryDataset​

CachedQueryDataset(self,
select: superduper.backends.base.query.Query,
mapping: Optional[ForwardRef('Mapping')] = None,
ids: Optional[List[str]] = None,
fold: Optional[str] = 'train',
transform: Optional[Callable] = None,
db=None,
in_memory: bool = True,
prefetch_size: int = 100)
ParameterDescription
selectA select query object which defines the query to be executed.
mappingA mapping object to be used for the dataset.
idsA list of ids to be used for the dataset.
foldThe fold to be used for the dataset.
transformA callable which can be used to transform the dataset.
dbA datalayer instance to be used for the dataset.
in_memoryA boolean flag to indicate if the dataset should be loaded
prefetch_sizeThe number of documents to prefetch from the database.

Cached Query Dataset for fetching documents from database.

This class which fetch the document corresponding to the given index. This class prefetches documents from database and stores in the memory.

This can drastically reduce database read operations and hence reduce the overall load on the database.

ExpiryCache​

ExpiryCache(self,
/,
*args,
**kwargs)
ParameterDescription
args*args for list
kwargs**kwargs for list

Expiry Cache for storing documents.

The document will be removed from the cache after fetching it from the cache.

QueryDataset​

QueryDataset(self,
select: superduper.backends.base.query.Query,
mapping: Optional[ForwardRef('Mapping')] = None,
ids: Optional[List[str]] = None,
fold: Optional[str] = 'train',
transform: Optional[Callable] = None,
db: Optional[ForwardRef('Datalayer')] = None,
in_memory: bool = True)
ParameterDescription
selectA select query object which defines the query to be executed.
mappingA mapping object to be used for the dataset.
idsA list of ids to be used for the dataset.
foldThe fold to be used for the dataset.
transformA callable which can be used to transform the dataset.
dbA datalayer instance to be used for the dataset.
in_memoryA boolean flag to indicate if the dataset should be loaded in memory.

Query Dataset for fetching documents from database.