Skip to main content

vector_index

superduper.components.vector_index

Source code

backfill_vector_search(db,
vi,
searcher)
ParameterDescription
dbDatalayer instance.
viIdentifier of vector index.
searcherFastVectorSearch instance to load model outputs as vectors.

Backfill vector search from model outputs of a given vector index.

ibatch​

ibatch(iterable: Iterable[~T],
batch_size: int) -> Iterator[List[~T]]
ParameterDescription
iterablethe iterable to batch
batch_sizethe number of groups to write

Batch an iterable into chunks of size batch_size.

sqlvector​

sqlvector(shape,
bytes_encoding: Optional[str] = None)
ParameterDescription
shapeThe shape of the vector
bytes_encodingThe encoding of the bytes

Create an encoder for a vector (list of ints/ floats) of a given shape.

This is used for compatibility with SQL databases, as the default vector

vector​

vector(shape,
identifier: Optional[str] = None)
ParameterDescription
shapeThe shape of the vector
identifierThe identifier of the vector

Create an encoder for a vector (list of ints/ floats) of a given shape.

VectorIndex​

VectorIndex(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: None = <factory>,
*,
upstream: "t.Optional[t.List['Component']]" = None,
plugins: "t.Optional[t.List['Plugin']]" = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
cache: 't.Optional[bool]' = True,
status: 't.Optional[Status]' = None,
cdc_table: str = '',
indexing_listener: superduper.components.listener.Listener,
compatible_listener: Optional[superduper.components.listener.Listener] = None,
measure: superduper.vector_search.base.VectorIndexMeasureType = <VectorIndexMeasureType.cosine: 'cosine'>,
metric_values: None = <factory>) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
upstreamA list of upstream components
pluginsA list of plugins to be used in the component.
cache(Optional) If set true the component will not be cached during primary job of the component i.e on a distributed cluster this component will be reloaded on every component task e.g model prediction.
statusWhat part of the lifecycle the component is in.
cdc_tableTable which fires the triggers.
indexing_listenerListener which is applied to created vectors
compatible_listenerListener which is applied to vectors to be compared
measureMeasure to use for comparison
metric_valuesMetric values for this index

A component carrying the information to apply a vector index.

DecodeArray​

DecodeArray(self,
dtype)
ParameterDescription
dtypeDatatype of array

Class to decode an array.

EncodeArray​

EncodeArray(self,
dtype)
ParameterDescription
dtypeDatatype of array

Class to encode an array.