As John says, there’s no reason to stick too closely to the 32-bit layout or the 64-bit layout, which support things that will never appear on a 16-bit platform like bridged representations. So let’s build it from first principles, or at least second principles, by going off of the descriptions in StringObject.swift.
No Builtin.BridgeObject and a full address space means we don’t have anything better than an enum to represent both owned and unowned cases and still get ARC, so let’s eat that cost up front:
enum Variant {
case owned(StringBufferOfSomeKind)
case unowned(UInt16)
} // three bytes total, 2-byte aligned
I made the unowned case UInt16 instead of UnsafeRawPointer so it can pull the same trick of being some offset before the string buffer as the offset built into the heap object. If we could promise Swift the "unowned" case is always aligned, we could do better, but for now let's assume we can't.
Now, the enum discriminator is going to burn a byte, so it benefits us to make more cases to emulate flags:
case asciiInline((UInt8, UInt8))
case nonAsciiInline((UInt8, UInt8))
We know an out-of-line representation will need a count that can span the whole memory. (We could probably limit it to half of memory if we needed another flag bit later.)
var count: UInt16
var variant: Variant
The bigger string formats reserve 16 bits for “non-essential” flags. However, only 5 bits of that are actually defined, and if I’m reading them correctly only two of them are actually useful for embedded: isKnownASCII and isKnownNFC. So I say we start with 8 flag bits and see how long we can get away with it. (It’s possible that more flags could be added out-of-line to the owned case as well, they’d just be slower to access.)
var flags: UInt8
That brings us to 6 bytes, 3 words, while maintaining the majority of the functionality of the “full-size” Strings. For inline strings, we can repurpose flags
as a short count, and use the variant payload bytes and the original count
as up to 4 bytes of contents.
That's one alternative. But actually…maybe this is being too clever. We could also just use an enum for the whole thing.
enum StringGuts {
case empty
case inline1(payload: UInt8, flags: UInt8)
case inline2(payload: (UInt8, UInt8), flags: UInt8)
case inline3(payload: (UInt8, UInt8, UInt8), flags: UInt8)
case inline4(payload: (UInt8, UInt8, UInt8, UInt8), flags: UInt8)
// inline5 has no room for flags
// Whether or not that’s a good tradeoff probably goes to someone else to answer.
// Making it specifically asciiInline5 might be a good compromise.
// On the other hand, if you omit it, you can move the flags out of the enum.
case inline5((UInt8, UInt8, UInt8, UInt8, UInt8))
case owned(count: UInt16, buffer: StringBuffer, flags: UInt8)
case unowned(count: UInt16, adjustedAddress: UInt16, flags: UInt8)
}
This is also 6 bytes! And spells out what we want in general! The only really unusual thing here is having separate cases for all the different lengths of inline strings, but that's just that "making the best use of our mandatory discriminator byte".
If we want to play layout tricks, we can go further by adding explicit padding and such to the cases that aren't using the full 5 bytes of payload. This puts the flags field in the 5th byte for every case that isn't inline5
. As a bonus, the discriminators will get assigned in order if every case has a payload, which means the counts for the inline cases line up with the discriminators. (The optimizer doesn't seem to take advantage of this and I'm not sure why.)
enum StringGuts {
case empty(padding: (UInt16, UInt16), flags: UInt8)
case inline1(payload: UInt8, padding: (UInt8, UInt8, UInt8), flags: UInt8)
case inline2(payload: (UInt8, UInt8), padding: (UInt8, UInt8), flags: UInt8)
case inline3(payload: (UInt8, UInt8, UInt8), padding: UInt8, flags: UInt8)
case inline4(payload: (UInt8, UInt8, UInt8, UInt8), flags: UInt8)
// inline5 has no room for flags
// Whether or not that’s a good tradeoff probably goes to someone else to answer.
// Making it specifically asciiInline5 might be a good compromise.
// On the other hand, if you omit it, you can move the flags out of the enum.
case inline5((UInt8, UInt8, UInt8, UInt8, UInt8))
case owned(count: UInt16, buffer: StringBuffer, flags: UInt8)
case unowned(count: UInt16, adjustedAddress: UInt16, flags: UInt8)
}
Unfortunately, all of this ought to optimize better than it actually does. Maybe it looks better on an architecture that isn't x86_64, where the entire enum is passed in a single register but then you have to do shifts to break it apart. Given that, though, maybe the struct approach is better for now, limiting to four bytes of inline.