Skip to main content
Version: 0.6

Versioning and diffs

superduper has a robust versioning and lineage system, to track the versions and changes of all components installed with superduper. This works using cryptographic techniques borrowed from blockchain and source version control.

Here is an illustrative example to show you how this works:

from superduper import Component
import typing as t
import pprint

class MyClass(Component):
breaks = ()
a: str
b: int
c: t.Dict
d: t.Callable | None = None
e: Component | None = None

my_instance_1 = MyClass('my_class', a='test', b=2, c={'testing': '123'}, d=lambda x: x + 1)
my_instance_2 = MyClass('my_class', a='test', b=2, c={'testing': '456'}, d=lambda x: x + 2)

We have three key methods which superduper leverages under the hood:

MethodDescription
Component.diffDetermines which parameters of the Component have changed
Component.hashDetermines whether 2 Component instances are the same by parameter value
Component.uuidDetermines whether 2 Component instances are the same by breaking changes; used as primary-id in storage.

superduper calls these methods when db.apply is executed, and used to determine whether to replace or update data, or to create a new version of the Component and execute its initialization jobs.

The .diff method returns in which parameters the 2 Component instances are different:

print(my_instance_1.diff(my_instance_2))
# ['c', 'd']

The cryptographic hash .hash determines whether the two components are equal. In this case there are 2 parameters which are different, so that the hashes are distinct:

print(my_instance_1.hash == my_instance_2.hash)
# False

The cryptographic hash .uuid determines whether the two components are equal when only considering breaking changes. You can see that, since this component has no breaking changes (.breaks = ()), the hashes are identical:

print(my_instance_1.uuid)
# dbe131726b2b2fb896eb832b3fde10df

print(my_instance_2.uuid == my_instance_1.uuid)
# True

Now we create a new component, which has breaking changes:

from superduper import Component
import typing as t
import pprint

class BreakingClass(Component):
breaks = ('c', 'e')

a: str
b: int
c: t.Dict
d: t.Callable | None = None
e: Component | None = None

my_breakable_1 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, d=lambda x: x + 1)
my_breakable_2 = BreakingClass('my_class', a='test', b=2, c={'testing': '456'}, d=lambda x: x + 2)

In this case, you can see that the parameter c differs, so in this case the hashes differ:

print(my_breakable_1.uuid == my_breakable_2.uuid)
# False

This also works recursively, so that breaking changes inside nested components propagate upwards:

my_breakable_3 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_breakable_1)
my_breakable_4 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_breakable_2)
print(my_breakable_3.diff(my_breakable_4))
# ['e']

However, if the nested component contains only non-breaking changes, this is respected by the .uuid hash:

my_breakable_5 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_instance_1)
my_breakable_6 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_instance_2)
my_breakable_5.uuid == my_breakable_6.uuid

# True