Version: 0.7

Execute

superduper allows developers to build their functionality on document-stores as well as SQL databases. To enable portability, it includes it's own simple query abstraction, which wraps the db.databackend native queries. Queries are built using a compositional syntax similar to that used by pandas and ibis. The API also includes extensions of this paradigm to cover vector-searches.

superduper supports:

Inserts
Selects
Updates
Deletes

In addition hybrid queries involving a combination of vector-search and filtering are supported:

Vector-search queries

Selects

All select queries consist of a "chain" of methods executed over a base table.

q = db['<table_name>'].method_1(*args_1, **kwargs_1).method_2(*args_2, **kwargs_2)....

Select queries can be passed around as objects, and are lazily executed with .execute() and .get(...). They play a role in certain Component implementations such as Listener and VectorIndex. Developers may incorporate select queries as associated data in their own custom Component implementations.

Table select

# All data
data = db['<table>'].execute()

# One datapoint
r = db['<table>'].get()

Select columns/ fields

data = db['<table>'].select('x', 'y').execute()

Select rows

t = db['<table>']
data = t.filter(t['x'] > 2).execute()

Select rows and columns

t = db['<table>']
data = t.filter(t['x'] > 2).select('x', 'y').execute()

Get outputs of one or more Listener instances

listener_1 = db.load('Listener', '<listener_1>')
listener_2 = db.load('Listener', '<listener_2>')

joined_outputs_1 = db['<table>'].outputs(listener_1.predict_id).execute()
joined_both_outputs = db['<table>'].outputs(listener_1.predict_id, listener_2.predict_id).execute()

Get the ids of a query

ids = query.ids()   # Any select query

Get the distinct values of a column

distinct_x = query.distinct('x')   # Any select query

Restrict the range of a query to certain primary ids

data = query.subset(ids)

Vector search

Vanilla vector-search

db['<table_name>'].like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).execute()

Pre-filtered

t = db['<table_name>']
condition = t['<key>'] == <value>   # ==, <=, >=

t.filter(condition).like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).execute()

Post-filtered

t.like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).filter(condition).execute()

Insert

superduper supports inserting data which is a combination of JSON-native content, and objects which need to be serialized as bytes using a Python based serialized (e.g. pickle, dill or a custom serializer).

superduper has a typing system which is un-pedantic, doing the minimum necessary to get content saved in the db.databackend.

Inserting data is possible in two ways, either by creating a table with typed fields, or by created a superduper.base.Base class with type annotations:

Available datatypes

Name	Description
`str`	Python string
`int`	Python integer
`float`	Python float
`bool`	Python boolean
`json`	JSON-able objects (`list`, `dict`)
`dillencoder`	Save content in `db.databackend` base64 encoded
`dill`	Save content as bytes in `db.artifact_store`
`file`	Save reference as a file in `db.artifact_store`
`package.module.variable_name`	Custom datatype implementing `superduper.base.datatype.BaseDatatype`
`basetype`	Indicates a `superduper.base.Base` class
`componenttype`	Indicates a `superduper.components.component.Component` type
`componentdict`	A dictionary `str -> Component`
`componentlist`	A list of `Component`
`vector[float:32]`	A searchable `numpy.array` with datatype and shape

Create a Table with typed columns

Example data:

import PIL.Image

data = [
    {'x': 'test', 'y': 1, 'z': PIL.Image.open('my_image_1.png')},
    {'x': 'test', 'y': 2, 'z': PIL.Image.open('my_image_2.png')}
]

db.apply(
    Table(
        'documents',
        fields={'x': 'str', 'y': 'int', 'z': 'dill'}
    )
)

q = db['documents'].insert(data)

Create an implicit Table and typed columns with a Base subclass

from superduper.base.Base

class documents(Base):
    x: str
    y: int
    z: t.Any

db.insert(
    documents(x=r['x'], y=r['y'], z=r['z'])
    for r in data
)

Update

Update the table where the values in kwargs apply:

db['<table_name>'].update(**kwargs)

Delete

Delete rows from the table where the values in kwargs apply:

db['<table_name>'].delete(**kwargs)

Execute

Selects​

Vector search​

Insert​

Update​

Delete​

Selects

Vector search

Insert

Update

Delete