Basic RAG tutorial
In this tutorial we show you how to do retrieval augmented generation (RAG) with Superduper. Note that this is just an example of the flexibility and power which Superduper gives to developers. Superduper is about much more than RAG and LLMs.
As in the vector-search tutorial we'll use Superduper documentation for the tutorial. We'll add this to a testing database by downloading the data snapshot:
!curl -O https://superduperdb-public-demo.s3.amazonaws.com/text.json
import json
from superduper import superduper, Document
db = superduper('mongomock://test')
with open('text.json') as f:
data = json.load(f)
_ = db['docu'].insert_many([{'txt': r} for r in data]).execute()
Let's verify the data in the db
by querying one datapoint:
db['docu'].find_one().execute()
The first step in a RAG application is to create a VectorIndex
. The results of searching
with this index will be used as input to the LLM for answering questions.
Read about VectorIndex
here and follow along the tutorial on
vector-search here.
import requests
from superduper import Application, Document, VectorIndex, Listener, vector
from superduper.ext.sentence_transformers.model import SentenceTransformer
from superduper.base.code import Code
def postprocess(x):
return x.tolist()
datatype = vector(shape=384, identifier="my-vec")
model = SentenceTransformer(
identifier="my-embedding",
datatype=datatype,
predict_kwargs={"show_progress_bar": True},
signature="*args,**kwargs",
model="all-MiniLM-L6-v2",
device="cpu",
postprocess=Code.from_object(postprocess),
)
listener = Listener(
identifier="my-listener",
model=model,
key='txt',
select=db['docu'].find(),
predict_kwargs={'max_chunk_size': 50},
)
vector_index = VectorIndex(
identifier="my-index",
indexing_listener=listener,
measure="cosine"
)
db.apply(vector_index)
Now that we've set up a VectorIndex
, we can connect this index with an LLM in a number of ways.
A simple way to do that is with the SequentialModel
. The first part of the SequentialModel
executes a query and provides the results to the LLM in the second part.
The RetrievalPrompt
component takes a query with a "free" variable as input, signified with <var:???>
.
This gives users great flexibility with regard to how they fetch the context
for their downstream models.
We're using OpenAI, but you can use any type of LLM with Superduper. We have several native integrations (see here) but you can also bring your own model.
from superduper.ext.llm.prompter import *
from superduper import Document
from superduper.components.model import SequentialModel
from superduper.ext.openai import OpenAIChatCompletion
q = db['docu'].like(Document({'txt': '<var:prompt>'}), vector_index='my-index', n=5).find().limit(10)
def get_output(c):
return [r['txt'] for r in c]
prompt_template = RetrievalPrompt('my-prompt', select=q, postprocess=Code.from_object(get_output))
llm = OpenAIChatCompletion('gpt-3.5-turbo')
seq = SequentialModel('rag', models=[prompt_template, llm])
db.apply(seq)
Now we can test the SequentialModel
with a sample question:
seq.predict('Tell be about vector-indexes')
Did you know you can use any tools from the Python ecosystem with Superduper.
That includes langchain
and llamaindex
which can be very useful for RAG applications.
from superduper import Application
app = Application('rag-app', components=[vector_index, seq, plugin_1, plugin_2])
app.encode()
app.export('rag-app')
!cat rag-app/requirements.txt
from superduper import *
app = Component.read('rag-app')
app.info()