Versioning and diffs
superduper has a robust versioning and lineage system, to track the versions and changes of all components
installed with superduper. This works using cryptographic techniques borrowed from blockchain and source version control.
Here is an illustrative example to show you how this works:
from superduper import Component
import typing as t
import pprint
class MyClass(Component):
breaks = ()
a: str
b: int
c: t.Dict
d: t.Callable | None = None
e: Component | None = None
my_instance_1 = MyClass('my_class', a='test', b=2, c={'testing': '123'}, d=lambda x: x + 1)
my_instance_2 = MyClass('my_class', a='test', b=2, c={'testing': '456'}, d=lambda x: x + 2)
We have three key methods which superduper leverages under the hood:
| Method | Description |
|---|---|
Component.diff | Determines which parameters of the Component have changed |
Component.hash | Determines whether 2 Component instances are the same by parameter value |
Component.uuid | Determines whether 2 Component instances are the same by breaking changes; used as primary-id in storage. |
superduper calls these methods when db.apply is executed, and used to determine whether to replace or update data, or to create a
new version of the Component and execute its initialization jobs.
The .diff method returns in which parameters the 2 Component instances are different:
print(my_instance_1.diff(my_instance_2))
# ['c', 'd']
The cryptographic hash .hash determines whether the two components are equal. In this case there are 2 parameters
which are different, so that the hashes are distinct:
print(my_instance_1.hash == my_instance_2.hash)
# False
The cryptographic hash .uuid determines whether the two components are equal when only considering breaking changes.
You can see that, since this component has no breaking changes (.breaks = ()), the hashes are identical:
print(my_instance_1.uuid)
# dbe131726b2b2fb896eb832b3fde10df
print(my_instance_2.uuid == my_instance_1.uuid)
# True
Now we create a new component, which has breaking changes:
from superduper import Component
import typing as t
import pprint
class BreakingClass(Component):
breaks = ('c', 'e')
a: str
b: int
c: t.Dict
d: t.Callable | None = None
e: Component | None = None
my_breakable_1 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, d=lambda x: x + 1)
my_breakable_2 = BreakingClass('my_class', a='test', b=2, c={'testing': '456'}, d=lambda x: x + 2)
In this case, you can see that the parameter c differs, so in this case the hashes differ:
print(my_breakable_1.uuid == my_breakable_2.uuid)
# False
This also works recursively, so that breaking changes inside nested components propagate upwards:
my_breakable_3 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_breakable_1)
my_breakable_4 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_breakable_2)
print(my_breakable_3.diff(my_breakable_4))
# ['e']
However, if the nested component contains only non-breaking changes, this is respected by the .uuid hash:
my_breakable_5 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_instance_1)
my_breakable_6 = BreakingClass('my_class', a='test', b=2, c={'testing': '123'}, e=my_instance_2)
my_breakable_5.uuid == my_breakable_6.uuid
# True