Architecture
Here is a schematic of the Superduper design.
Explanation​
-
Superduper expects data and components to be added/ updated from a range of client-side mechanisms: scripts, apps, notebooks or third-party database clients (possibly non-python).
-
Users and programs can add components (models, data encoders, vector-indexes and more) from the client-side. These large items are stored in the artifact-store and are tracked via the meta-data store.
-
If data is inserted to the databackend the change-data-capture (CDC) component captures these changes as they stream in.
-
(CDC) triggers work to be performed in response to these changes, depending on which components are present in the system.
-
The work is submitted to the workers via the scheduler. Together the scheduler and workers make up the compute layer.
-
workers write their outputs back to the databackend and trained models to the artifact-store
-
The compute, databackend, metadata-store, artifact-store collectively make up the datalayer
-
The datalayer may be queried from client-side, including hybrid-queries or compound-select queries, which synthesizes classical selects with vector-searches