UniqueID

Karl · November 3, 2021, 11:41am

Recently I was checking out some better forms of UUID, and I found it quite interesting so I decided to publish it as a package.

There are possible benefits to using time-ordered UUIDs for both client and server application developers, and they can be quite interesting for use in distributed systems.

[Repository] - [API Reference]

Random and time-ordered UUID generation for Swift.

A UUID is an identifier that is unique across both space and time, with respect to the space of all UUIDs. They are used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network.

UniqueID supports any 128-bit UUID, and is fully compatible with Foundation's UUID . It also includes features to generate 2 kinds of ID:

random : As defined in RFC-4122 (UUIDv4). A 128-bit identifier, consisting of 122 random or pseudo-random bits. These are the most common form of UUIDs; for example, they are the ones Foundation's UUID type creates by default. The idea is that because this is such a large number, the chance of a system observing a collision is so low that it can be safely ignored. That said, they rely heavily on the amount of entropy in the random bits, and when a system is ingesting IDs created by distributed nodes or devices,the chances of collision may be higher.
time-ordered : Generated according to a draft update of RFC-4122 (UUIDv6). A 128-bit identifier, consisting of a fixed-precision timestamp, per-process sequencing number, and 47-bit node ID (which may be random or pseudo-random bits). Whilst RFC-4122 did include time-based UUIDs (UUIDv1), it ordered the bits such that they had poor locality and couldn't be sorted easily. UUIDv6 rearranges these bits,which dramatically improves their usability as database keys. The node ID can be configured to provide even better resilience against collisions.

Why should you consider using time-ordered UUIDs?

The draft's background section makes a good case:

A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys.

The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments.

However some properties of RFC4122 UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this.

It cites a number of custom unique identifier schemas used by large-scale services, such as:

Snowflake by Twitter,
Sharded IDs as used by Instagram, or
push-IDs in Google Firebase.

One thing you'll notice about all of these schemes is that they're time-based, and the timestamp is the first component so sorting the ID as bytes also sorts by time. The proposed new UUID formats incorporate many of those techniques.

Version 6 UUIDs, as implemented in this project, have fixed-size timestamp, sequence counter, and node ID components. The proposal also includes v7 and v8 UUIDs (not implemented, maybe one day?), which allow for variable-sized components which implementations can customise.

How can I try time-ordered UUIDs?

UniqueID is losslessly convertible to Foundation's UUID. They have the same storage, the same string representation, and encode/decode to the same JSON values. Because of this, you only need to change how you generate your existing UUIDs - use UUID(.timeOrdered()) to create a time-ordered ID.

import Foundation
import UniqueID

// Change from UUID() to UUID(.timeOrdered()).
struct MyRecord {
  var id: UUID = UUID(.timeOrdered())
  var name: String
}

// Read the timestamp by converting to UniqueID.
let uniqueID = UniqueID(myRecord.id)
print(uniqueID.components(.timeOrdered)?.timestamp)

Do remember though: they are still in draft form, and it is possible (although unlikely) that the format will be changed in the future.

What are some other cool things?

More lenient parsing than UUID; accepts dash-less UUIDs such as 1ec3a5fb6fe964d88004c750087bf2db. You often see this in URLs.
More serialisation options than UUID.
UUIDv6 allows you to use a stable node identifier, meaning you can guarantee collision avoidance in many applications - for example, 48 bits may be sufficient to create a combined
[ user ID | device ID | process counter ] value, with user and device ID combinations being guaranteed unique by a server and reflecting that, at any particular time, a given user's device only has a limited number of processes generating IDs for your database.
Fully compatible with existing UUIDs. Time-ordered IDs occupy a different UUID space to regular, random UUIDs (they have a different version bit set), so they can never collide or be confused.

Show me

10 UUIDv4s against 10 UUIDv6s:

for _ in 0..<10 {
  print(UniqueID.random())
}

DFFC75B4-C92F-4DA9-97CA-7F0EEF067FF2
67E5F28C-5083-4908-BD69-D7E27C8BABA4
3BA8EEF0-DFBE-4AE0-A646-E165FCA9054C
DF92B4B0-F5EE-42E5-9577-A9FC373C71A4
A2F8DD26-D513-4AE6-9E5C-58363885CCB6
BB0B5841-2BC0-49E2-BC5C-362CC34D7225
B08AF1F7-E2D3-4175-913D-369140612FF5
A453FB62-DF71-436F-9AC1-0414793DFA16
485EEB84-A4BA-44FE-BE3B-AD90390B0523
8A9AE1FA-4104-442C-B459-8F682E77F2F4

for _ in 0..<10 {
  print(UniqueID.timeOrdered())
}

1EC3C81E-A35C-69E2-BB38-EDDC5E7E5F5E
1EC3C81E-A361-658C-BB38-65AAEF71CFCF
1EC3C81E-A361-6F6E-BB38-6DE69B9BCA1B
1EC3C81E-A362-698C-BB38-050642A95C73
1EC3C81E-A363-6152-BB38-F105ED78927F
1EC3C81E-A363-6A94-BB38-4DAB2CAE46CD
1EC3C81E-A364-63D6-BB38-6114031916EF
1EC3C81E-A364-6D04-BB38-435A854C2E42
1EC3C81E-A365-66AA-BB38-03504FA2F6FE
1EC3C81E-A365-6F74-BB38-1F5AE9E10389

Both lists are unique, and unique with respect to each other, but the time-ordered ones, naturally, came out in order of creation time. We can even extract the embedded timestamp - in this case, it says the UUID was created on the 3rd of November, 2021 at 08:42:01 UTC (down to 100ns precision, theoretically).

The combination of temporal and spacial components means these UUIDs are still robust to collisions - a new 60-bit universe exists every 100ns, and the IDs within that universe are still alloted based on random bits with high entropy. It's tempting to think you might be paying a high cost in collisions for the ease of use, but it's not as simple as that.

tera · November 3, 2021, 2:26pm

that's interesting, thanks.

i guess it won't make much difference with dictionaries though as they are hash based. (was always wondering if B-tree based dictionaries are faster overall than hash based dictionaries due to the mentioned data locality access pattern differences.)

Karl · November 3, 2021, 2:57pm

Right, hashing tends to destroy a value's internal locality in favour of a good distribution of hash values.

If you use random UUIDs, hashed collections are clearly a convenient choice, but if you can guarantee your UUIDs are almost always generated in sorted order (with good locality), you can start to consider using sorted data structures instead, which offer different memory/operation speed trade-offs.

(Note: Time-ordered UUIDs are almost always generated in sorted order, but they are not monotonic. System clock adjustments can result in later UUIDs comparing as less-than a previous UUID. The algorithm includes safeguards to ensure uniqueness across reasonable clock adjustments)

mickeyl · November 4, 2021, 7:07am

Looks interesting, but how does it compare to the ULID spec? I‘ve been using Swift-ULID since a while and it worked perfectly.

hassila · November 4, 2021, 8:33am

Nice, we got caught with the bad locality many years ago and basically did this ourselves at the time - nice to see it standardized. It’d be nice to not have a foundation dependency though ;-)

Karl · November 4, 2021, 12:10pm

It's similar - as mentioned in the RFC, lots of new unique identifier formats exist which try to solve the same basic problem. CUID is another one I tend to come across quite frequently (IDs are easily recognisable because they begin with a 'c' - e.g. cjld2cyuq0000t3rmniod1foy).

Some of these use smaller identifiers (64 vs 128 bits), or base32/64 strings rather than hexadecimal. It seems the RFC author wanted to include shorter text formats, but it didn't make it to the draft. I can imagine that having 3 official text formats for UUIDs is just too awkward to include in a formal internet standard (of course, any application can decide to use a non-standard text format).

The biggest advantage of UUIDv6 is that it's a lot easier to adopt, compared to a completely different identifier:

Binary compatibility. It uses the same 128-bits as UUIDv4, with version and variant bits in the same position so the same software can process them. They can live side-by-side in a single database table; you might not get all of the performance benefits, but they will never collide.
String compatibility. The default serialisation is still the 8-4-4-4-12 string format, so code which parses UUIDs from JSON continues to work, like nothing changed.
Easily convertible to UUIDv1. It uses exactly the same timestamp epoch, format, and precision, just in a way that's better for sorting. Converting between them is a matter of swapping some bytes:
```
+--------------+------+--------------+
| UUIDv1 Field | Bits | UUIDv6 Field |
+--------------+------+--------------+
| time_low     | 32   | time_high    |
+--------------+------+--------------+
| time_mid     | 16   | time_mid     |
+--------------+------+--------------+
| time_high    | 12   | time_low     |
+--------------+------+--------------+
```
It might be worth adding a function to do that, but on the other hand I'm not sure UUIDv1 caught on enough to matter (mostly because of the weird timestamp layout).

The foundation dependency is limited to this file, and is for only 2 things:

UUID <-> UniqueID conversion
Providing the UUID timestamp as a Date. Once Date moves to the standard library, that can go.

Ideally, we'd have a cross-import overlay for Foundation. In the mean time, it's probably worth adding a build flag to opt-out of them. For App developers, it's really important that their existing codebases using Foundation's UUID continue to work, so I think the right default is to include it.

(But actually, part of the point of having a new "UniqueID" type rather than adding extensions on to UUID was to allow using it without Foundation)

Karl · November 4, 2021, 2:37pm

OK, I've published a v1.0.1 which adds a NO_FOUNDATION_COMPAT build setting. I've confirmed that it removes the need to link against Foundation:

To use it, build with the flag:

swift build -Xswiftc -DNO_FOUNDATION_COMPAT

objdump -p my_test_program

Default:
  NEEDED               libswiftSwiftOnoneSupport.so
  NEEDED               libswiftCore.so
  NEEDED               libswift_Concurrency.so
  NEEDED               libswiftGlibc.so
  NEEDED               libm.so.6
  NEEDED               libpthread.so.0
  NEEDED               libutil.so.1
  NEEDED               libdl.so.2
  NEEDED               libFoundation.so
  NEEDED               libswiftDispatch.so
  NEEDED               libdispatch.so
  NEEDED               libBlocksRuntime.so
  NEEDED               libc.so.6

NO_FOUNDATION_COMPAT:
  NEEDED               libswiftSwiftOnoneSupport.so
  NEEDED               libswiftCore.so
  NEEDED               libswift_Concurrency.so
  NEEDED               libswiftGlibc.so
  NEEDED               libm.so.6
  NEEDED               libpthread.so.0
  NEEDED               libutil.so.1
  NEEDED               libdl.so.2
  NEEDED               libc.so.6

hassila · November 4, 2021, 2:47pm

Awesome, thanks!

Have you run any benchmarks at all btw? Sometimes uuid gen have proved an unexpected bottleneck. I’m off computer otherwise I’d just test it now

Karl · November 4, 2021, 4:21pm

I have run a few simple benchmarks. Here are some results:

// with --warmup-iterations 10
name                            time             std        iterations warmup           
----------------------------------------------------------------------------------------
UniqueID.Random                   2599815.000 ns ±   2.88 %        517   28094269.000 ns
UniqueID.TimeOrdered              2373965.000 ns ±   2.32 %        582   23899091.000 ns
Foundation.UUID.Random            3674193.500 ns ±   7.98 %        404   36614193.000 ns

UniqueID.Random.Parallel        190662983.500 ns ±   6.27 %          8 1621812382.000 ns
UniqueID.TimeOrdered.Parallel   221326256.500 ns ±   6.41 %          6 1793332411.000 ns
Foundation.UUID.Random.Parallel 259019378.000 ns ±   2.38 %          5 2371298030.000 ns

.Random: Generate 10,000 UUIDv4s
.TimeOrdered: Generate 10,000 UUIDv6s
.Random.Parallel: Dispatch.concurrentPerform 32 tasks, each generating 100K UUIDv4s
.TimeOrdered.Parallel: Dispatch.concurrentPerform 32 tasks, each generating 100K UUIDv6s with random node identifiers

So that's 259/237ns per UUID, going down to 59/69ns per UUID in parallel. IIRC, a no-op function call is about 30ns, so I think this is fine (not sure what to compare it to).

It seems to compare favourably to Foundation, and generating time-ordered UUIDs performs about the same as generating random ones. The synchronised sequence counter doesn't seem to have a significant performance impact. It's also worth noting that most users aren't generating UUIDs with this frequency.

Replacing the sequence counter and random node ID with other data, more randomness, or even higher-precision timestamps is what UUIDv7 and v8 are all about. I have some interesting ideas for how we could use protocols to let you build a custom schema with your own allocation of bits to time/space uniqueness, but I wasn't sure if anybody would even be interested in this

Karl · December 18, 2021, 12:46pm

Version 1.0.3 has been released.

Uses os_unfair_lock/pthread_mutex to protect the UUIDv6 generator state.

The critical section is so small that it has basically no impact on performance versus the previous implementation, but it's the right thing to do in environments where you might be pre-empted while holding the lock.
Moved to Swift-DocC for documentation.

I don't have as much time to go through it all as I did with WebURL, but it still has better structure than the previous docs and now uses stable URLs for each version.

The 1.0.3 docs are here.
The documentation for the main branch will always be here.

Also, if you'd like to generate time-ordered UUIDs in other projects, this repository includes links to prototype implementations in other languages.

Karl · December 18, 2021, 4:49pm

@hassila I'll respond to your performance question from the server thread here:

Benchmarked on a raspberry pi, with 10 warmup iterations. The numbers are higher than last time, because the poor little pi can't compete with my MBP -- but I figured it would be better to give results on a non-virtualised Linux machine:

name                            time               std        iterations warmup             
--------------------------------------------------------------------------------------------
UniqueID.Random                    80664045.000 ns ±   0.28 %         17    807416613.000 ns
UniqueID.TimeOrdered               41840993.000 ns ±   0.27 %         33    418653514.000 ns
UniqueID.Random.Parallel         8810209226.500 ns ±   0.43 %          2  87534889116.000 ns
UniqueID.TimeOrdered.Parallel    4723407823.000 ns ±   0.10 %          2  47221976934.000 ns
Foundation.UUID.Random            198502249.000 ns ±   1.77 %          7   1985719259.000 ns
Foundation.UUID.Random.Parallel 21523942759.500 ns ±   0.55 %          2 214897900800.000 ns

So for single-threaded performance:

Random UUIDs (single-threaded) are 59% faster than Foundation
Time-ordered UUIDs (single-threaded) are 48% faster than random IDs, 78% faster than Foundation's random IDs

It could still go much lower, if we enhanced the RandomNumberGenerator API to return more than 64-bits per call. Random UUID generation is dominated by the syscall to get the random bytes, so I expect it would be even faster than time-ordered IDs if we did that.

For parallel performance:

Random UUIDs are 59% faster than Foundation (again)
Time-ordered UUIDs are 46% faster than random IDs, 78% faster than Foundation's random IDs (again).

I did notice a few issues with suboptimal code-gen, and the mutex priority inheritance was leading to a bigger performance drop than I anticipated (it doesn't really make sense with adaptive mutexes anyway), so I pushed a 1.0.3 release, which is what these numbers are from.

I consider this library to basically be stable; I can't think of anything to add or change in the short term. So I'm much less hesitant to push out minor releases even if they only include small tweaks.

Karl · December 18, 2021, 6:21pm

BTW, it is actually possible to use the runtime's random function directly.

The headers say the function "must be supported forever", so it's not technically unsafe or unstable or anything like that, but it's hacky. But it's possible. In a hacky way. But not infeasible.

name                   time             std        iterations warmup           
-------------------------------------------------------------------------------
UniqueID.Random         40059828.000 ns ±   0.30 %         35  401728357.000 ns
UniqueID.TimeOrdered    42554658.000 ns ±   0.44 %         33  424516127.000 ns
Foundation.UUID.Random 193528251.000 ns ±   0.55 %          7 1931932069.000 ns

So that would be ~80% faster than the current implementation in Foundation. Again on the pi, with 10 warmups.

We should really fix the standard library so you can make these calls from Swift.

See: Filling buffers using RandomNumberGenerator

mazz1 · July 5, 2023, 1:37pm

Is UUIDv7 support planned?

Karl · July 9, 2023, 3:28pm

Sorry for the late reply - yes! I am planning to add UUIDv7 soon.

I'm planning to do it in two stages - one is to introduce UUIDv7 support, and the next step would be to release a new SemVer-major version which changes the default .timeOrdered() to produce v7 IDs.

ktoso · July 10, 2023, 2:06am

Seems I missed commenting here huh -- really nice work! Looks like a great library, looking forward to the updates as well

mazz1 · July 13, 2023, 12:38pm

Nice. Here’s an interesting Elixir(server) article about UUIDv7