Multimodal vector search - Image

Connect to superduper

note

Note that this is only relevant if you are running superduper in development mode. Otherwise refer to "Configuring your production system".

from superduper import superduper

db = superduper('mongomock:///test_db')

Get useful sample data

!curl -O https://superduperdb-public-demo.s3.amazonaws.com/images.zip && unzip images.zip
import os
from PIL import Image

data = [f'images/{x}' for x in os.listdir('./images') if x.endswith(".png")][:200]
data = [ Image.open(path) for path in data]

data = [{'img': d} for d in data[:100]]

Build multimodal embedding models

We define the output data type of a model as a vector for vector transformation.

MongoDB
SQL

from superduper.components.vector_index import vector
output_datatpye = vector(shape=(1024,))        

from superduper.components.vector_index import sqlvector
output_datatpye = sqlvector(shape=(1024,))        

Then define two models, one for text embedding and one for image embedding.

!pip install git+https://github.com/openai/CLIP.git
!pip install ../../plugins/torch
import clip
from superduper import vector
from superduper_torch import TorchModel

# Load the CLIP model and obtain the preprocessing function
model, preprocess = clip.load("RN50", device='cpu')

# Create a TorchModel for text encoding
compatible_model = TorchModel(
    identifier='clip_text', # Unique identifier for the model
    object=model, # CLIP model
    preprocess=lambda x: clip.tokenize(x)[0],  # Model input preprocessing using CLIP 
    postprocess=lambda x: x.tolist(), # Convert the model output to a list
    datatype=output_datatpye,  # Vector encoder with shape (1024,)
    forward_method='encode_text', # Use the 'encode_text' method for forward pass 
)

# Create a TorchModel for visual encoding
embedding_model = TorchModel(
    identifier='clip_image',  # Unique identifier for the model
    object=model.visual,  # Visual part of the CLIP model    
    preprocess=preprocess, # Visual preprocessing using CLIP
    postprocess=lambda x: x.tolist(), # Convert the output to a list 
    datatype=output_datatpye, # Vector encoder with shape (1024,)
)

Because we use multimodal models, we define different keys to specify which model to use for embedding calculations in the vector_index.

indexing_key = 'img' # we use img key for img embedding
compatible_key = 'text' # we use text key for text embedding

Create vector-index

vector_index_name = 'my-vector-index'

from superduper import VectorIndex, Listener

vector_index = VectorIndex(
    vector_index_name,
    indexing_listener=Listener(
        key=indexing_key,                 # the `Document` key `model` should ingest to create embedding
        select=db['docs'].select(),       # a `Select` query telling which data to search over
        model=embedding_model,            # a `_Predictor` how to convert data to embeddings
        identifier='indexing-listener',
    ),
    compatible_listener=Listener(
        key=compatible_key,               # the `Document` key `model` should ingest to create embedding
        model=compatible_model,           # a `_Predictor` how to convert data to embeddings
        select=None,
        identifier='compatible-listener',
    )
)

from superduper import Application

application = Application(
    'image-vector-search',
    components=[vector_index],
)

db.apply(application)

Add the data

The order in which data is added is not important. However if your data requires a custom Schema in order to work, it's easier to add the Application first, and the data later. The advantage of this flexibility, is that once the Application is installed, it's waiting for incoming data, so that the Application is always up-to-date. This comes in particular handy with AI scenarios which need to respond to changing news.

from superduper import Document

table_or_collection = db['docs']

ids = db.execute(table_or_collection.insert([Document(r) for r in data]))

Perform a vector search

We can perform the vector searches using two types of data:

Text: By text description, we can find images similar to the text description.
Image: By using an image, we can find images similar to the provided image.

Text
Image

item = Document({compatible_key: "Find a black dog"})        

from IPython.display import display
search_image = data[0]
display(search_image)
item = Document({indexing_key: search_image})        

Once we have this search target, we can execute a search as follows.

select = db['docs'].like(item, vector_index=vector_index_name, n=5).select()
results = list(db.execute(select))

Visualize Results

from IPython.display import display
for result in results:
    display(result[indexing_key])

Create a `Template`

from superduper import Template

template = Template(
    'image-vector-search',
    template=application,
    substitutions={'docs': 'table'},
)

template.export('.')

Multimodal vector search - Image

Connect to superduper​

Get useful sample data​

Build multimodal embedding models​

Create vector-index​

Add the data​

Perform a vector search​

Visualize Results​

Create a Template​