Piercing the String Veil

Michael_Ilseman · February 24, 2021, 4:23pm

Hello everyone, I thought I'd post a decoder ring which helps anyone make sense of the stdlib source code which implements these String forms. The names in source code were established before my post (due to ABI stability), and were chosen to guide implementors (i.e. they appear in symbol names but not API).

"Immortal" denotes a string that has permanent lifetime, such as a string literal or a small-form string, and thus is not managed by ARC.

"Bridged" denotes that the object coming in originated in Objective-C and thus reference counting should be performed using the Objective-C runtime rather than Swift's runtime.

Large is used to refer to not-small, including everything except Immediate-Small. All of these forms share a similar bit-layout, including where the count and performance flags are located. Small strings have a completely different bit layout, since they pack their small contents directly into the struct, so differentiating between small and large is often one of the first considerations.

Native is used for Immediate-Large strings, as they are using the stdlib's native storage class which has a guaranteed fixed offset to the start of the tail-allocated code units. These are the fastest non-small strings as the pointer can be derived through addition. This offset is referred to as the nativeBias and is applied in reverse to string literal addresses, which saves a branch everywhere in any (usually inlined) code that accesses the content of a string. I.e., the code paths for reading from natively-stored tail-allocated strings and immortal strings literals is identical.

Shared is used for Indirect strings and Foreign is used for Opaque strings. In hindsight, "Indirect" and "Opaque" are probably better names, and luckily they are only present in the ABI in very minor ways, so we have the opportunity to rename them.

_StringObject and _SmallString are layout-equivalent with one another on 64-bit little endian platforms and byte swapped on 64-bit endian platforms. On 32-bit, small strings need to be packed and unpacked, since we are more bit-constrained there. _SmallString is useful as a view of a subset of bit-patterns that _StringObject can take, allowing String functionality to be directly implemented.