Skip to main content

Basic insertion

Superduper supports inserting data wrapped as dictionaries in Python. These dictionaries may contain basic JSON-compatible data, but also other data-types to be handled with DataType components. All data inserts are wrapped with the Document wrapper:

data = ... # an iterable of dictionaries

For example, first get some sample data:

!curl -O https://superduperdb-public-demo.s3.amazonaws.com/text.json

Then load the data:

with open('./text.json') as f:
data = json.load(f)

Usage pattern​

ids, jobs = db['collection-name'].insert(data).execute()

MongoDB​

ids, jobs = db['collection-name'].insert_many(data).execute()

A Schema which differs from the standard Schema used by "collection-name" may be used with:

ids, jobs = db['collection-name'].insert_many(data).execute(schema=schema_component)

Read about this here Schema here.

SQL​

ids, jobs = db['table-name'].insert(data)

If no Schema has been set-up for table-name" a Schema is auto-inferred. Data not handled by the db.databackend is encoded by default with pickle.

Monitoring jobs​

The second output of this command gives a reference to the job computations which are triggered by the Component instances already applied with db.apply(...).

If users have configured a ray cluster, the jobs may be monitored at the following uri:

from superduper import CFG

print(CFG.cluster.compute.uri)