Skip to main content

Automatic data-types

A major challenge in uniting classical databases with AI, is that the types of data used in AI are often not supported by your database.

To solve this problem, superduper has the abstractions DataType and Schema.

To save developers time, by default, superduper recognizes the type of data and constructs a Schema based on this inference. To learn more about setting these up manually read the following page.

Basic usage​

To learn about this feature, try these lines of code, based on sample image data we've prepared.

curl -O https://superduperdb-public-demo.s3.amazonaws.com/images.zip && unzip images.zip
import os
import PIL.Image

from superduper import superduper

db = superduper('mongomock://test')

images = [PIL.Image.open(f'images/{x}') for x in os.listdir('images') if x.endswith('.png')]

# inserts the images into `db` recognizing datatypes automatically
db['images'].insert_many([{'img': img} for img in images]).execute()

Now if you inspect which components are available, you will see that 2 components have been added to the system:

db.show()
Outputs
[{'identifier': 'pil_image', 'type_id': 'datatype'},
{'identifier': 'AUTO:img=pil_image', 'type_id': 'schema'}]

To verify that the data types were correctly inferred, we can retrieve a single record. The record is a Document which wraps a dictionary with important information:

r = db['images'].find_one().execute()
r
Outputs
Document({'img': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=500x338 at 0x128394190>, '_fold': 'train', '_schema': 'AUTO:img=pil_image', '_id': ObjectId('6658610912e50a99219ba587')})

By calling the .unpack() method, the original data is decoded and unwrapped from the Document. The result in this case is a Python pillow image, which may be used as direct input to functions from, for instance, torchvision or transformers.

r.unpack()['img']
Outputs