UniqueID - time-ordered UUIDs in Swift

Recently I was checking out some better forms of UUID, and I found it quite interesting so I decided to publish it as a package.

There are possible benefits to using time-ordered UUIDs for both client and server application developers, and they can be quite interesting for use in distributed systems.


UniqueID

[Repository] - [API Reference]

Random and time-ordered UUID generation for Swift.

A UUID is an identifier that is unique across both space and time, with respect to the space of all UUIDs. They are used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network.

UniqueID supports any 128-bit UUID, and is fully compatible with Foundation's UUID . It also includes features to generate 2 kinds of ID:

  • random : As defined in RFC-4122 (UUIDv4). A 128-bit identifier, consisting of 122 random or pseudo-random bits. These are the most common form of UUIDs; for example, they are the ones Foundation's UUID type creates by default. The idea is that because this is such a large number, the chance of a system observing a collision is so low that it can be safely ignored. That said, they rely heavily on the amount of entropy in the random bits, and when a system is ingesting IDs created by distributed nodes or devices,the chances of collision may be higher.

  • time-ordered : Generated according to a draft update of RFC-4122 (UUIDv6). A 128-bit identifier, consisting of a fixed-precision timestamp, per-process sequencing number, and 47-bit node ID (which may be random or pseudo-random bits). Whilst RFC-4122 did include time-based UUIDs (UUIDv1), it ordered the bits such that they had poor locality and couldn't be sorted easily. UUIDv6 rearranges these bits,which dramatically improves their usability as database keys. The node ID can be configured to provide even better resilience against collisions.

Why should you consider using time-ordered UUIDs?

The draft's background section makes a good case:

A lot of things have changed in the time since UUIDs were originally created. Modern applications have a need to use (and many have already implemented) UUIDs as database primary keys.

The motivation for using UUIDs as database keys stems primarily from the fact that applications are increasingly distributed in nature. Simplistic "auto increment" schemes with integers in sequence do not work well in a distributed system since the effort required to synchronize such numbers across a network can easily become a burden. The fact that UUIDs can be used to create unique and reasonably short values in distributed systems without requiring synchronization makes them a good candidate for use as a database key in such environments.

However some properties of RFC4122 UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this.

It cites a number of custom unique identifier schemas used by large-scale services, such as:

One thing you'll notice about all of these schemes is that they're time-based, and the timestamp is the first component so sorting the ID as bytes also sorts by time. The proposed new UUID formats incorporate many of those techniques.

Version 6 UUIDs, as implemented in this project, have fixed-size timestamp, sequence counter, and node ID components. The proposal also includes v7 and v8 UUIDs (not implemented, maybe one day?), which allow for variable-sized components which implementations can customise.

How can I try time-ordered UUIDs?

UniqueID is losslessly convertible to Foundation's UUID. They have the same storage, the same string representation, and encode/decode to the same JSON values. Because of this, you only need to change how you generate your existing UUIDs - use UUID(.timeOrdered()) to create a time-ordered ID.

import Foundation
import UniqueID

// Change from UUID() to UUID(.timeOrdered()).
struct MyRecord {
  var id: UUID = UUID(.timeOrdered())
  var name: String
}

// Read the timestamp by converting to UniqueID.
let uniqueID = UniqueID(myRecord.id)
print(uniqueID.components(.timeOrdered)?.timestamp)

Do remember though: they are still in draft form, and it is possible (although unlikely) that the format will be changed in the future.

What are some other cool things?

  • More lenient parsing than UUID; accepts dash-less UUIDs such as 1ec3a5fb6fe964d88004c750087bf2db. You often see this in URLs.

  • More serialisation options than UUID.

  • UUIDv6 allows you to use a stable node identifier, meaning you can guarantee collision avoidance in many applications - for example, 48 bits may be sufficient to create a combined
    [ user ID | device ID | process counter ] value, with user and device ID combinations being guaranteed unique by a server and reflecting that, at any particular time, a given user's device only has a limited number of processes generating IDs for your database.

  • Fully compatible with existing UUIDs. Time-ordered IDs occupy a different UUID space to regular, random UUIDs (they have a different version bit set), so they can never collide or be confused.

Show me

10 UUIDv4s against 10 UUIDv6s:

for _ in 0..<10 {
  print(UniqueID.random())
}

DFFC75B4-C92F-4DA9-97CA-7F0EEF067FF2
67E5F28C-5083-4908-BD69-D7E27C8BABA4
3BA8EEF0-DFBE-4AE0-A646-E165FCA9054C
DF92B4B0-F5EE-42E5-9577-A9FC373C71A4
A2F8DD26-D513-4AE6-9E5C-58363885CCB6
BB0B5841-2BC0-49E2-BC5C-362CC34D7225
B08AF1F7-E2D3-4175-913D-369140612FF5
A453FB62-DF71-436F-9AC1-0414793DFA16
485EEB84-A4BA-44FE-BE3B-AD90390B0523
8A9AE1FA-4104-442C-B459-8F682E77F2F4
for _ in 0..<10 {
  print(UniqueID.timeOrdered())
}

1EC3C81E-A35C-69E2-BB38-EDDC5E7E5F5E
1EC3C81E-A361-658C-BB38-65AAEF71CFCF
1EC3C81E-A361-6F6E-BB38-6DE69B9BCA1B
1EC3C81E-A362-698C-BB38-050642A95C73
1EC3C81E-A363-6152-BB38-F105ED78927F
1EC3C81E-A363-6A94-BB38-4DAB2CAE46CD
1EC3C81E-A364-63D6-BB38-6114031916EF
1EC3C81E-A364-6D04-BB38-435A854C2E42
1EC3C81E-A365-66AA-BB38-03504FA2F6FE
1EC3C81E-A365-6F74-BB38-1F5AE9E10389

Both lists are unique, and unique with respect to each other, but the time-ordered ones, naturally, came out in order of creation time. We can even extract the embedded timestamp - in this case, it says the UUID was created on the 3rd of November, 2021 at 08:42:01 UTC (down to 100ns precision, theoretically).

The combination of temporal and spacial components means these UUIDs are still robust to collisions - a new 60-bit universe exists every 100ns, and the IDs within that universe are still alloted based on random bits with high entropy. It's tempting to think you might be paying a high cost in collisions for the ease of use, but it's not as simple as that.

18 Likes

that's interesting, thanks.

i guess it won't make much difference with dictionaries though as they are hash based. (was always wondering if B-tree based dictionaries are faster overall than hash based dictionaries due to the mentioned data locality access pattern differences.)

Right, hashing tends to destroy a value's internal locality in favour of a good distribution of hash values.

If you use random UUIDs, hashed collections are clearly a convenient choice, but if you can guarantee your UUIDs are almost always generated in sorted order (with good locality), you can start to consider using sorted data structures instead, which offer different memory/operation speed trade-offs.

(Note: Time-ordered UUIDs are almost always generated in sorted order, but they are not monotonic. System clock adjustments can result in later UUIDs comparing as less-than a previous UUID. The algorithm includes safeguards to ensure uniqueness across reasonable clock adjustments)

Looks interesting, but how does it compare to the ULID spec? Iβ€˜ve been using Swift-ULID since a while and it worked perfectly.

1 Like

Nice, we got caught with the bad locality many years ago and basically did this ourselves at the time - nice to see it standardized. It’d be nice to not have a foundation dependency though ;-)

3 Likes

It's similar - as mentioned in the RFC, lots of new unique identifier formats exist which try to solve the same basic problem. CUID is another one I tend to come across quite frequently (IDs are easily recognisable because they begin with a 'c' - e.g. cjld2cyuq0000t3rmniod1foy).

Some of these use smaller identifiers (64 vs 128 bits), or base32/64 strings rather than hexadecimal. It seems the RFC author wanted to include shorter text formats, but it didn't make it to the draft. I can imagine that having 3 official text formats for UUIDs is just too awkward to include in a formal internet standard (of course, any application can decide to use a non-standard text format).

The biggest advantage of UUIDv6 is that it's a lot easier to adopt, compared to a completely different identifier:

  • Binary compatibility. It uses the same 128-bits as UUIDv4, with version and variant bits in the same position so the same software can process them. They can live side-by-side in a single database table; you might not get all of the performance benefits, but they will never collide.

  • String compatibility. The default serialisation is still the 8-4-4-4-12 string format, so code which parses UUIDs from JSON continues to work, like nothing changed.

  • Easily convertible to UUIDv1. It uses exactly the same timestamp epoch, format, and precision, just in a way that's better for sorting. Converting between them is a matter of swapping some bytes:

    +--------------+------+--------------+
    | UUIDv1 Field | Bits | UUIDv6 Field |
    +--------------+------+--------------+
    | time_low     | 32   | time_high    |
    +--------------+------+--------------+
    | time_mid     | 16   | time_mid     |
    +--------------+------+--------------+
    | time_high    | 12   | time_low     |
    +--------------+------+--------------+
    

    It might be worth adding a function to do that, but on the other hand I'm not sure UUIDv1 caught on enough to matter (mostly because of the weird timestamp layout).

The foundation dependency is limited to this file, and is for only 2 things:

Ideally, we'd have a cross-import overlay for Foundation. In the mean time, it's probably worth adding a build flag to opt-out of them. For App developers, it's really important that their existing codebases using Foundation's UUID continue to work, so I think the right default is to include it.

(But actually, part of the point of having a new "UniqueID" type rather than adding extensions on to UUID was to allow using it without Foundation)

2 Likes

OK, I've published a v1.0.1 which adds a NO_FOUNDATION_COMPAT build setting. I've confirmed that it removes the need to link against Foundation:

To use it, build with the flag:

swift build -Xswiftc -DNO_FOUNDATION_COMPAT

objdump -p my_test_program

Default:
  NEEDED               libswiftSwiftOnoneSupport.so
  NEEDED               libswiftCore.so
  NEEDED               libswift_Concurrency.so
  NEEDED               libswiftGlibc.so
  NEEDED               libm.so.6
  NEEDED               libpthread.so.0
  NEEDED               libutil.so.1
  NEEDED               libdl.so.2
  NEEDED               libFoundation.so
  NEEDED               libswiftDispatch.so
  NEEDED               libdispatch.so
  NEEDED               libBlocksRuntime.so
  NEEDED               libc.so.6

NO_FOUNDATION_COMPAT:
  NEEDED               libswiftSwiftOnoneSupport.so
  NEEDED               libswiftCore.so
  NEEDED               libswift_Concurrency.so
  NEEDED               libswiftGlibc.so
  NEEDED               libm.so.6
  NEEDED               libpthread.so.0
  NEEDED               libutil.so.1
  NEEDED               libdl.so.2
  NEEDED               libc.so.6
3 Likes

Awesome, thanks!

Have you run any benchmarks at all btw? Sometimes uuid gen have proved an unexpected bottleneck. I’m off computer otherwise I’d just test it now :slight_smile:

I have run a few simple benchmarks. Here are some results:

// with --warmup-iterations 10
name                            time             std        iterations warmup           
----------------------------------------------------------------------------------------
UniqueID.Random                   2599815.000 ns Β±   2.88 %        517   28094269.000 ns
UniqueID.TimeOrdered              2373965.000 ns Β±   2.32 %        582   23899091.000 ns
Foundation.UUID.Random            3674193.500 ns Β±   7.98 %        404   36614193.000 ns

UniqueID.Random.Parallel        190662983.500 ns Β±   6.27 %          8 1621812382.000 ns
UniqueID.TimeOrdered.Parallel   221326256.500 ns Β±   6.41 %          6 1793332411.000 ns
Foundation.UUID.Random.Parallel 259019378.000 ns Β±   2.38 %          5 2371298030.000 ns
  • .Random: Generate 10,000 UUIDv4s
  • .TimeOrdered: Generate 10,000 UUIDv6s
  • .Random.Parallel: Dispatch.concurrentPerform 32 tasks, each generating 100K UUIDv4s
  • .TimeOrdered.Parallel: Dispatch.concurrentPerform 32 tasks, each generating 100K UUIDv6s with random node identifiers

So that's 259/237ns per UUID, going down to 59/69ns per UUID in parallel. IIRC, a no-op function call is about 30ns, so I think this is fine (not sure what to compare it to).

It seems to compare favourably to Foundation, and generating time-ordered UUIDs performs about the same as generating random ones. The synchronised sequence counter doesn't seem to have a significant performance impact. It's also worth noting that most users aren't generating UUIDs with this frequency.

Replacing the sequence counter and random node ID with other data, more randomness, or even higher-precision timestamps is what UUIDv7 and v8 are all about. I have some interesting ideas for how we could use protocols to let you build a custom schema with your own allocation of bits to time/space uniqueness, but I wasn't sure if anybody would even be interested in this :sweat_smile:

5 Likes
Terms of Service

Privacy Policy

Cookie Policy