Skip to main content

Apply a chunker for search

note

Note that applying a chunker is not mandatory for search. If your data is already chunked (e.g. short text snippets or audio) or if you are searching through something like images, which can't be chunked, then this won't be necessary.

from superduper import model

CHUNK_SIZE = 200

@model(flatten=True, model_update_kwargs={'document_embedded': False})
def chunker(text):
text = text.split()
chunks = [' '.join(text[i:i + CHUNK_SIZE]) for i in range(0, len(text), CHUNK_SIZE)]
return chunks

Now we apply this chunker to the data by wrapping the chunker in Listener:

from superduper import Listener

upstream_listener = Listener(
model=chunker,
select=select,
key='x',
uuid="chunk",
)

db.apply(upstream_listener)