hassila
(Joakim Hassila)
1
Just a micro framework for unique identifier generation that might be of use to someone else - inspired by Snowflake but with slightly different tradeoffs. Generates ~125M identifiers per second on an M1.
Usage
The generatorIdentifier must be uniquely in use at a given point in time, either it needs to
be set with a configuration file / persisted, or a global broker needs to assign it to components
that needs flake generators at runtime such that the same identifier is not used concurrently.
import Frostflake
func testFrostflake() {
let frostflakeFactory = Frostflake(generatorIdentifier: 1)
let frostflake = frostflakeFactory.generate()
let decription = frostflake.frostflakeDescription()
print(decription)
}
There's also an optional shared class generator (which gives approx. 1/2 the performance):
Frostflake.setup(generatorIdentifier: 1)
let frostflake1 = Frostflake.generate()
let frostflake2 = Frostflake.generate()
5 Likes
Karl
(👑🦆)
2
Looks interesting!
I'm slightly concerned about this though:
One key difference compared to Snowflake is that Frostflake uses a frozen point in time repeatedly until running out of generation identifier for it, which avoids getting the current time for every id generated - it will update that frozen time point for every 1K generated identifiers (by default)
If a single timestamp can be used for up to 1000 IDs, no matter how far apart they are in time, wouldn't that significantly increase the probability of a collision? Especially if the clocks are not very high precision.
I think the disclaimer could be clearer about that. For average developers who have never considered how unique IDs are generated, the higher performance may not be a good trade-off compared to the increased (and less predictable) risk of collisions. It is something that must be considered very carefully.
1 Like
hassila
(Joakim Hassila)
3
A requirement is the assignment of a unique generatorIdentifier which fundamentally namespaces the identifiers - so there is no risk for a collision. The intended usage is for e g. a cluster system with a few hundred nodes where such an identifier is assigned by a central authority at startup. This is mentioned in the doc, so as long as the generatorIdentifier is appropriately assigned collisions shouldn’t be an issue.
2 Likes
Karl
(👑🦆)
4
I see. So when you say:
The generatorIdentifier must be uniquely in use at a given point in time, either it needs to be set with a configuration file / persisted, or a global broker needs to assign it to components that needs flake generators at runtime such that the same identifier is not used concurrently.
By "unique" and "global", you mean across all workers generating IDs.
hassila
(Joakim Hassila)
5
Yep, exactly - at a given point in time (+- for clock sync) a given generatorIdentifier may only be used by one worker.
2 Likes
ktoso
(Konrad 'ktoso' Malawski 🐟🏴☠️)
6
2 Likes
ktoso
(Konrad 'ktoso' Malawski 🐟🏴☠️)
7
Quick question. It might be good to provide more details about but usage; snowflake uses node id (or anything, 10bit), some bits for the time stamp and more for try sequence number to avoid conflicts. Might be good to precisely explain how you allocate the bits in use in this lib.
1 Like
hassila
(Joakim Hassila)
8
Thanks @ktoso - I just updated the readme with some clarification:
Implementation notes
The Frostflake is a 64-bit value just like Snowflake, but the bit allocation differs a little bit.
Frostflake by default allocates 32 bits for the timestamp (~136 years span), 21 bits for the sequence number (allowing for up to 2.097.152 identifiers per second for a given generator) and 11 bits for the generator identifier (allowing for up to 2.048 unique workers/nodes in a system).
A possible future direction would be to allow for allocation of the bits between the sequence identifier and generator identifier up to the user to more easily allow for different use cases - as long as this would be reallocated during a service window (which just needs to be longer than the clock difference between the two nodes in the cluster being most out of sync) the timestamp portion will continue to ensure uniqeness.
The current bit allocation is just tuned for our use case (a large system for us would be < 200 nodes, so we allocated a factor of 10x there).
Currently we'll abort if generating more than 1 identifier per 477ns (which is completely unrealistic for how we use it), if one has a use case where that would be even remotely realistic, we'd recommend to reallocate the bit assignment if possible, or to allocate multiple generator identifiers and use a wrapper using them round-robin in such cases. Only an issue for synthetic tests for us at least, but YMMV.
1 Like