Understanding of reference's memory layout

Hello all. I'm trying to comprehend memory layout optimisations of enumerations. But feel that i lack some common knowledge.
I know that first 4KB of memory are not valid addresses for references in Swift and so used as extra inhabitance for enum optimisations. I assume that reference's valid values are in range 0x0000_0000_0000_1000 - 0xFFFF_FFFF_FFFF_FFFF, but it appears that the last byte isn’t used for storing possible addresses. As I can see it is used as tag/discriminator in enumeration.

enum ManyPayloads {
    case A.              // 0x00_00_00_00_00_00_00_80
    case B               // 0x01_00_00_00_00_00_00_80
    case C               // 0x02_00_00_00_00_00_00_80
    case D(AnyObject)    // 0xXX_XX_XX_XX_XX_XX_XX_00
    case E(AnyObject)    // 0xXX_XX_XX_XX_XX_XX_XX_40
}

The question is, why the last byte is not used? Is it a common behaviour for most languages or only in Swift? Or it is some exception even in Swift?

Your enum has multiple cases with payloads. Extra inhabitants can be used to represent non-payload cases when there is only one case with a payload, but they are not generally sufficient when there are multiple payloads that must be distinguished from each other. When there are multiple cases with payloads, you generally need external discriminator bits unless the payload types have “spare bits” in common: bits that have a constant value in all payloads, such as the low bits of an aligned pointer.

AnyObject does not have any spare bits on ObjC-compatible targets because of ObjC tagged object pointers. Tagged object pointers still allow for extra inhabitants, but they make the computation more complicated than you’re describing because an extra inhabitant must not have any possible tagged-object-pointer bit set. (Unfortunately, on x86_64 there are two possible bits in use.)

2 Likes

So, if I understand you correctly, in common case "clean" reference in Swift has “spare bits”. Sometimes they are used for tagging object as ObjC (and maybe for other reasons), but when they are not, they can be used for storing discriminator in enumerations.

But then I put ObjC object in payload cases of enum above:

class Car: NSObject { }

I still get 8 byte enum size. I expect the ninth byte to be allocated for the discriminator, because ObjC references has no spare bits.

Swift classes can’t have tagged-pointer representations, even if they inherit from ObjC.

FWIW I'm getting these enum sizes:

class Car {}
class NSObjectCar: NSObject {}

enum E1 { case a, b, c, d(AnyObject) }                  // 8
enum E2 { case a, b, c, d(AnyObject), e(AnyObject) }    // 9
enum E3 { case a, b, c, d(Car) }                        // 8
enum E4 { case a, b, c, d(Car), e(Car) }                // 8
enum E5 { case a, b, c, d(NSObjectCar) }                // 8
enum E6 { case a, b, c, d(NSObjectCar), e(NSObjectCar) }// 8
enum E7 { case a, b, c, d(NSObject) }                   // 8
enum E8 { case a, b, c, d(NSObject), e(NSObject) }      // 9
1 Like

Note that “tagged object pointers” doesn’t mean we mark which objects are ObjC and which are Swift. We don’t need to do that, because Swift classes use a basic object layout that’s compatible with ObjC.

Completely independent of Swift, ObjC “object pointers” are not always pointers to normal objects. On some platforms, the ObjC runtime allows a bit to be set in the pointer which says that (glossing over some details) the rest of the pointer is an arbitrary inline payload. This is used in Foundation to e.g. optimize small strings and integers so that it doesn’t have to always allocate objects for them. Swift doesn’t generally participate in this, it just has to be aware of it when playing bit tricks.

So NSObjectCar (inheritor of NSObject) has some spare bits for discriminator, but NSObject itself hasn't. Interesting and confusing result.

I see it is more complicated question than I thought. I assumed that any reference type in Swift is just raw pointer under hood, but there are many nuances with additional tags and e.c.

Going back to the original question, can we answer, how many bits raw pointer uses for store memory address. Earlier I thought that all of 8 allocated bytes are used, but if sometimes there are space for spare bits, im not sure anymore.

Or the question itself is not correct.

NSObject is a superclass of some classes that use tagged-pointer representations for some of their values, such as NSValue and NSString. It has to be able to store values that are tagged pointers because you can convert such a value from one of those types to NSObject. Similarly, protocol types like AnyObject have to be able to represent tagged pointer values because you can convert a tagged pointer value from one of those class types to those protocol types. As a result, none of those types have any spare bits (on platforms that support ObjC interop).

In contrast, we know that the type NSObjectCar cannot have any values that are tagged pointers because Swift classes do not support values with tagged-pointer representation. As a result, we can make stronger assumptions about the bits set in an NSObjectCar value. This should not be surprising, because of course we have more information about an NSObjectCar value than we do for an NSObject value: we know that its dynamic type is always a subclass of NSObjectCar, instead of just a subclass of NSObject.

As a general rule, we assume that all ObjC class types (and types that can store them, like protocol types) can use tagged-pointer representations and that all Swift class types cannot.

1 Like

I see. Great thanks for explanation! Things became much clearer about tagged ObjC pointers.