Skip to main content
Version: 0.6

Execute

superduper allows developers to build their functionality on document-stores as well as SQL databases. To enable portability, it includes it's own simple query abstraction, which wraps the db.databackend native queries. Queries are built using a compositional syntax similar to that used by pandas and ibis. The API also includes extensions of this paradigm to cover vector-searches.

superduper supports:

  • Inserts
  • Selects
  • Updates
  • Deletes

In addition hybrid queries involving a combination of vector-search and filtering are supported:

  • Vector-search queries

Selects​

All select queries consist of a "chain" of methods executed over a base table.

q = db['<table_name>'].method_1(*args_1, **kwargs_1).method_2(*args_2, **kwargs_2)....

Select queries can be passed around as objects, and are lazily executed with .execute() and .get(...). They play a role in certain Component implementations such as Listener and VectorIndex. Developers may incorporate select queries as associated data in their own custom Component implementations.

Table select

# All data
data = db['<table>'].execute()

# One datapoint
r = db['<table>'].get()

Select columns/ fields

data = db['<table>'].select('x', 'y').execute()

Select rows

t = db['<table>']
data = t.filter(t['x'] > 2).execute()

Select rows and columns

t = db['<table>']
data = t.filter(t['x'] > 2).select('x', 'y').execute()

Get outputs of one or more Listener instances

listener_1 = db.load('Listener', '<listener_1>')
listener_2 = db.load('Listener', '<listener_2>')

joined_outputs_1 = db['<table>'].outputs(listener_1.predict_id).execute()
joined_both_outputs = db['<table>'].outputs(listener_1.predict_id, listener_2.predict_id).execute()

Get the ids of a query

ids = query.ids()   # Any select query

Get the distinct values of a column

distinct_x = query.distinct('x')   # Any select query

Restrict the range of a query to certain primary ids

data = query.subset(ids)

Vanilla vector-search

db['<table_name>'].like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).execute()

Pre-filtered

t = db['<table_name>']
condition = t['<key>'] == <value> # ==, <=, >=

t.filter(condition).like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).execute()

Post-filtered

t.like({'<key>': <value>}, vector_index='<vector_index>', n=<n>).filter(condition).execute()

Insert​

superduper supports inserting data which is a combination of JSON-native content, and objects which need to be serialized as bytes using a Python based serialized (e.g. pickle, dill or a custom serializer).

superduper has a typing system which is un-pedantic, doing the minimum necessary to get content saved in the db.databackend.

Inserting data is possible in two ways, either by creating a table with typed fields, or by created a superduper.base.Base class with type annotations:

Available datatypes

NameDescription
strPython string
intPython integer
floatPython float
boolPython boolean
jsonJSON-able objects (list, dict)
dillencoderSave content in db.databackend base64 encoded
dillSave content as bytes in db.artifact_store
fileSave reference as a file in db.artifact_store
package.module.variable_nameCustom datatype implementing superduper.base.datatype.BaseDatatype
basetypeIndicates a superduper.base.Base class
componenttypeIndicates a superduper.components.component.Component type
componentdictA dictionary str -> Component
componentlistA list of Component
vector[float:32]A searchable numpy.array with datatype and shape

Create a Table with typed columns

Example data:

import PIL.Image

data = [
{'x': 'test', 'y': 1, 'z': PIL.Image.open('my_image_1.png')},
{'x': 'test', 'y': 2, 'z': PIL.Image.open('my_image_2.png')}
]
db.apply(
Table(
'documents',
fields={'x': 'str', 'y': 'int', 'z': 'dill'}
)
)

q = db['documents'].insert(data)

Create an implicit Table and typed columns with a Base subclass

from superduper.base.Base

class documents(Base):
x: str
y: int
z: t.Any

db.insert(
documents(x=r['x'], y=r['y'], z=r['z'])
for r in data
)

Update​

Update the table where the values in kwargs apply:

db['<table_name>'].update(**kwargs)

Delete​

Delete rows from the table where the values in kwargs apply:

db['<table_name>'].delete(**kwargs)