How do I implemented a tagged pointer in Swift?

I'm familiar with tagged pointers from C-like languages, which are much more loose about casting and such. I'm trying to implement a BitVector, where a tag bit signals that the vector is small enough to fit in the 64 bit value in-line, otherwise it's a regular 64 bit pointer to a managed buffer.

I'm not sure about what kind of "gotchas" there are around automatic member managing what needs to be memory managed, and just as importantly, not memory managing what shouldn't be memory managed (since this "pointer" is only a pointer some of the time, and would invalid as a pointer in the "inlined" case).

I know that fundamentally enum is the syntax for discriminated unions in Swift, but I'm worried that using an enum would involve it storing its own tag bit in a separate word (which I want to avoid). Although that seems to be how String does it, which is leaving me rather puzzled.

Can someone please point me in the right direction?

Swift's enumerations are smart about the spare bits that exist in pointers. Consider the output of this small script:

class Buffer {
    var test: [Int] = []
}

enum TaggedPointer {
    case buffer(Buffer)
    case inline
}

print(MemoryLayout<TaggedPointer>.size)

The output of this script is 8, demonstrating that the discriminator bit for the enum is stuffed into the pointer to the Buffer class.

But you've probably noticed that my other case is empty. The reason it's empty is because the Swift compiler needs to know what what you store in there is smaller than 64-bits. Consider the program below:

class Buffer {
    var buffer: [Int] = []
}

enum TaggedPointer {
    case buffer(Buffer)
    case inline(Int32)
}

print(MemoryLayout<TaggedPointer>.size)

This also produces 8. As Swift knows that Int32 only needs 4 bytes, it knows that it can safely use one of the spare bits in the Buffer pointer. However, if you try to put Int64 in there, it all goes wrong:

class Buffer {
    var buffer: [Int] = []
}

enum TaggedPointer {
    case buffer(Buffer)
    case inline(Int64)
}

print(MemoryLayout<TaggedPointer>.size)

The output of this script is 9. This is because Swift doesn't know that you promise not to use the top bit of your Int64 (or UInt64), and so cannot safely use the spare bit in the buffer pointer anymore. So you need to be careful to ensure that the size of the other case really is smaller than 64 bits and that Swift knows it.

The other issue is if you're trying to stuff pointers that Swift doesn't understand. For example, consider this program:

enum EmptyTaggedPointer {
    case buffer(UnsafePointer<UInt8>)
    case inline
}

enum TaggedPointer {
    case buffer(UnsafePointer<UInt8>)
    case inline(Int32)
}

print(MemoryLayout<EmptyTaggedPointer>.size)
print(MemoryLayout<TaggedPointer>.size)

This program prints 8 and then 9. Why?

The short answer is that the UnsafePointer family of things all have exactly one uninhabited bit pattern: the all-zero case. This is because the NULL representation of an unsafe pointer corresponds to Optional<UnsafePointer>.none, and so if you have a non-optional one by definition that pattern is uninhabited.

But because Swift doesn't know anything about the provenance of this pointer, it doesn't know what the rules for it are. It can't know whether the tag bits are important or not, so it can't use them.

The TL;DR here is: this might work, if you fit exactly into the use-case that enum handles well. Otherwise you'll have to bit-bang this yourself with unsafe pointer.

14 Likes

That's amazing! I know enums could pack stuff, but I thought pointers were treated as fully inhabited. It's great that it knows some of least significant bits are uninhabited. Do you know of any documentation that elaborates on this? I'm curious how many bits are free. I know malloc aligned to 16 bytes, meaning the smallest 5 bits are always 0 in real pointers, I'm curious if all 5 bits are considered uninhabited by the implementation of enum.

I understand the case inline(Int32) vs case inline distinction, but I have one more question. Say I have an instance of a TaggedPointer, and I have checked its case and determined that it's .inline. How do I access the instance as a bitfield, as a UInt64? I imagine UnsafeBitCast(_:to:) would work, is that the preferred way?

Also curious: Is there some way to define something like a UInt59 (64 bit pointer, minus 5 tag bits), as the payload of inline?

This looks relevant: Multi-Payload Enums:

The ABI will first try to find common spare bits, that is, bits in the data types' binary representations which are either fixed-zero or ignored by valid values of all of the data types

And in GenEnum.cpp:

//     For example, on a 64-bit platform, the least significant 3 bits of
//     a retainable pointer are always zero, so a multi-payload enum can
//     have up to 8 retainable pointer payload cases before any additional
//     storage is required.
1 Like

It's the only way if you plan to do that. I recommend not doing it and only accessing the associated data instead, of course.

Better to ask @jrose here, but I'm not aware of one other than the most annoying, which is to define an enum that wraps a UInt32, a UInt16, a UInt8 and then defines as many cases as necessary to cover the other bits. In this case you'd have three spare bits (the above is a UInt56, you need a UInt59) so you need eight cases. That looks like this:

struct UInt56 {
    var low: UInt32
    var mid: UInt16
    var high: UInt8
}

enum UInt59 {
    case zero(UInt56)
    case one(UInt56)
    case two(UInt56)
    case three(UInt56)
    case four(UInt56)
    case five(UInt56)
    case six(UInt56)
    case seven(UInt56)
}

This works, but I'm sure you can agree it's pretty stupid! It also gets very annoying in the case where you only want to free up one bit, which requires 128 enumeration cases to cover the other 7 bits. On top of that, implementing the mathematical operations on top gets pretty crappy as well due to all the switch statements.

Unofficially, the Builtin module contains every integer size, and you could hack Integers.swift.gyb to generate an Int59 type for you, and it would have the spare bit behavior in enums you expect.

3 Likes

It might be interesting to define a "maximal packable integer" type for this sort of use case, though there's a bunch of little technical details that we would need to figure out.

1 Like

You could use a property delegate for that, maybe, if that feature lands.

3 Likes

lets break some stuff

enum Tagged 
{
    case pointer(UnsafeRawPointer)
    case other
}

let allOnes:UnsafeRawPointer = UnsafeRawPointer(bitPattern: UInt.max)!
let foo:Tagged = .pointer(allOnes + 1)

print(foo)
// other
1 Like

allOnes + 1 is undefined behavior here, since it's going "out of bounds" of the nonexistent object allOnes is pointing to.

3 Likes