Version: Main branch

Superduper Protocol

Superduper includes a protocol allowed developers to switch back and forth from Python and YAML/ JSON formats. The mapping is fairly self-explanatory after reading the examples below.

Writing in Superduper-protocol directly

YAML
JSON
Python

_base: "?my_vector_index"
_leaves:
  postprocess:
    _path: superduper.base.code.Code
    code: '
      from superduper import code

      @code
      def postprocess(x):
          return x.tolist()
      '
  my_vector:
    _path: superduper.components.vector_index.vector
    shape: 384
  sentence_transformer:
    _path: superduper.ext.sentence_transformers.model.SentenceTransformer
    datatype: "?my_vector"
    model: "all-MiniLM-L6-v2"
    postprocess: "?postprocess"
  my_query:
    _path: superduper.backends.mongodb.query.parse_query
    query: "documents.find()"
  my_listener:
    _path: superduper.components.listener.Listener
    model: "?sentence_transformer"
    select: "?my_query"
    key: "X"
  my_vector_index:
    _path: superduper.components.vector_index.VectorIndex
    indexing_listener: "?my_listener"
    measure: cosine

Then from the commmand line:

superduper apply --manifest='<path_to_config>.yaml'

{
  "_base": "?my_vector_index",
  "_leaves": {
    "postprocess": {
      "_path": "superduper.base.code.Code",
      "code": "from superduper import code\n\n@code\ndef postprocess(x):\n    return x.tolist()"
    },
    "my_vector": {
      "_path": "superduper.components.vector_index.vector",
      "shape": 384
    },
    "sentence_transformer": {
      "_path": "superduper.ext.sentence_transformers.model.SentenceTransformer",
      "datatype": "?my_vector",
      "model": "all-MiniLM-L6-v2",
      "postprocess": "?postprocess"
    },
    "my_query": {
      "_path": "superduper.backends.mongodb.query.parse_query",
      "query": "documents.find()"
    },
    "my_listener": {
      "_path": "superduper.components.listener.Listener",
      "model": "?sentence_transformer",
      "select": "?my_query"
    },
    "my_vector_index": {
      "_path": "superduper.components.vector_index.VectorIndex",
      "indexing_listener": "?my_listener",
      "measure": "cosine"
    }
  }
}

Then from the command line:

superduper apply --manifest='<path_to_config>.json'

from superduper import superduper
from superduper.components.vector_index import vector
from superduper.ext.sentence_transformers.model import SentenceTransformer
from superduper.components.listener import Listener
from superduper.components.vector_index import VectorIndex
from superduper.base.code import Code
from superduper import Stack


db = superduper('mongomock://')

datatype = vector(shape=384, identifier="my-vec")


def postprocess(x):
    return x.tolist()


postprocess = Code.from_object(postprocess)


model = SentenceTransformer(
    identifier="test",
    datatype=datatype,
    predict_kwargs={"show_progress_bar": True},
    signature="*args,**kwargs",
    model="all-MiniLM-L6-v2",
    device="cpu",
    postprocess=postprocess,
)

listener = Listener(
    identifier="my-listener",
    key="txt",
    model=model,
    select=db['documents'].find(),
    active=True,
    predict_kwargs={}
)

vector_index = VectorIndex(
    identifier="my-index",
    indexing_listener=listener,
    measure="cosine"
)

db.apply(vector_index)

Converting a `Component` to Superduper-protocol

All components may be converted to Superduper-protocol using the Component.encode method:

encoding = vector_index.encode()

This encoding may be written directly to disk with:

vector_index.export(zip=True)  # outputs to "./my-index.zip"

Developers may reload components from disk with Component.read

reloaded = Component.read('./my-index.zip')

Superduper Protocol

Writing in Superduper-protocol directly​

Converting a Component to Superduper-protocol​

Writing in Superduper-protocol directly

Converting a `Component` to Superduper-protocol