StringGuts, StringObject internals how to layout on 16-bit platforms?

carlos42421 · July 12, 2024, 10:19am

Hi,

I'm investigating how to adapt String for 16 bit platforms. I'm new to the internals of String. I'm thinking, would it work to just keep the same basic layout as the 32 bit version?

  internal var _count: Int
  internal var _variant: Variant
  internal var _discriminator: UInt8
  internal var _flags: UInt16

...which comes to a total of 8 bytes I think?

Any thoughts gratefully received. (Although ideally not the sort of "don't do that on 16-bit, why would you?" thoughts please. )

Cheers,
Carl

Philippe_Hausler · July 12, 2024, 5:45pm

That layout is intended to be an approximation of a union (if I remember correctly). Where it is supposed to be a union with two pointers - in that it should occupy the same space as two pointers. On 32 bit the count Int is 32 bit, the variant is 8 bits, so the total layout is 64 bits. So somehow you would need to compress down the variant/discrimnator/flags to occupy only 16 bits. Perhaps you could nibble a few bits off the count?

I have a feeling that the advantages of some of the optimizations like smol strings won't really scale down that well since you will effectively only have 32 bits of storage to work with. Even if you were able to use all the bits that would be at most 4 utf8 characters.

@Michael_Ilseman might be a better resource for stringy things.

Michael_Ilseman · July 12, 2024, 5:53pm

What is your platform’s layout for references to ARC managed objects?

carlos42421 · July 12, 2024, 6:01pm

So far I'm only interested in getting AVR support working, which will always be using embedded compilation mode and objective c will be disabled, so ARC is done in EmbeddedRuntime.swift.

public struct HeapObject {
  var metadata: UnsafeRawPointer
  var refcount: Int
...
}

So it's basically two 16 bits... 32 bits, on the front of things.

So...

class A {
  var myVal: UInt8
}

will be 5 bytes on the heap (plus the usual 2 bytes for malloc).

I think that's right. The embedded guys are the authority on it.

John_McCall · July 12, 2024, 7:06pm

I wouldn’t let yourself be overly influenced by the existing representation design; if there are capabilities that don’t scale well down to 16-bit pointers, you may need to rethink how they’re represented.

Protocol types normally give you a value buffer of 3 pointers, which I doubt you want to change for your target. You really don’t want String to require heap allocation when stored in an Any. That gives you an upper limit of 6 bytes.

I think the basic questions you need to wrestle with are:

Is there a minimum size for the small-string optimization to be practically useful?
What’s the least size we reasonably get the non-small representation down to?

For the former, my guess is “no” — that even if we only allowed single-byte strings to avoid allocation, that would still be worthwhile. (Making the zero-length string not allocate is critical, of course, but that could be handled without a general small-string representation. What I’m guessing is that that representation is probably still worth it even if it only handles very small strings.)

For the latter, this is all about how much of the non-pointer storage is indispensable. We have, I think, 1 bit of storage in the alignment bit of the object pointer, then either 16 or 32 bits for flags and sizes, and that’s it. It looks like variant encoding requires at least 2 bits, and we’d like three or four bits of flags but could theoretically move them into the object if we needed to. How important is the other storage to continue to privilege?

Slava_Pestov · July 12, 2024, 8:05pm

I thought embedded Swift did not support existentials. Do we use value buffers for anything else these days?

John_McCall · July 12, 2024, 8:34pm

Ah, that’s true. No, if this is an embedded-only platform, there’s no need to worry about protocol types or value buffers.

carlos42421 · July 12, 2024, 8:40pm

Honestly, I don't have String on my platform, due to the complexity and the potential for accidental misuse (even before I switched to Swift... when I used the Arduino C++ platform, their String class was the quickest and most reliable way to cause a heap/stack crash with very ordinary looking code... basically a heap allocated, easily extensible character buffer is pretty much to be avoided on small microcontrollers).

So I haven't had to deal with this problem in 5 years or so of using my own cut down version of stdlib. I'm trying to work out something that makes sense because I want to follow Kuba's vision of keeping embedded Swift a subset of Swift so it behaves the same in every way but with some features not available.

It's a tricky balance. I am thinking it will take a few iterations and some feedback from anyone who uses the first iteration of String on AVR before we find the sensible balance points. So in the first blush I tried some patches to use the existing structure for 32 bit, just scaled down (pointers and Ints half the size) seemed OK.

Embedded doesn't do existentials, although there might still be reasons we'd want to stay to that size. In fact smaller is probably better.

For people who use String I think the small string optimisation is going to be useful even if it's two 16 bit pointers, only 3 characters. In fact on microcontrollers, small strings will be more common.

One simplification is we won't have objective C on AVR at least, it's always disabled, so we might be able to eliminate some discriminator bits. I'd like to tighten the design, I'm struggling a little to get my head around the design of String internals. Is there a document that explains it? I seem to remember a Swift Evolution proposal?

jrose · July 13, 2024, 7:42pm

As John says, there’s no reason to stick too closely to the 32-bit layout or the 64-bit layout, which support things that will never appear on a 16-bit platform like bridged representations. So let’s build it from first principles, or at least second principles, by going off of the descriptions in StringObject.swift.

No Builtin.BridgeObject and a full address space means we don’t have anything better than an enum to represent both owned and unowned cases and still get ARC, so let’s eat that cost up front:

enum Variant {
  case owned(StringBufferOfSomeKind)
  case unowned(UInt16)
} // three bytes total, 2-byte aligned

I made the unowned case UInt16 instead of UnsafeRawPointer so it can pull the same trick of being some offset before the string buffer as the offset built into the heap object. If we could promise Swift the "unowned" case is always aligned, we could do better, but for now let's assume we can't.

Now, the enum discriminator is going to burn a byte, so it benefits us to make more cases to emulate flags:

case asciiInline((UInt8, UInt8))
case nonAsciiInline((UInt8, UInt8))

We know an out-of-line representation will need a count that can span the whole memory. (We could probably limit it to half of memory if we needed another flag bit later.)

var count: UInt16
var variant: Variant

The bigger string formats reserve 16 bits for “non-essential” flags. However, only 5 bits of that are actually defined, and if I’m reading them correctly only two of them are actually useful for embedded: isKnownASCII and isKnownNFC. So I say we start with 8 flag bits and see how long we can get away with it. (It’s possible that more flags could be added out-of-line to the owned case as well, they’d just be slower to access.)

var flags: UInt8

That brings us to 6 bytes, 3 words, while maintaining the majority of the functionality of the “full-size” Strings. For inline strings, we can repurpose flags as a short count, and use the variant payload bytes and the original count as up to 4 bytes of contents.

That's one alternative. But actually…maybe this is being too clever. We could also just use an enum for the whole thing.

enum StringGuts {
  case empty

  case inline1(payload: UInt8, flags: UInt8)
  case inline2(payload: (UInt8, UInt8), flags: UInt8)
  case inline3(payload: (UInt8, UInt8, UInt8), flags: UInt8)
  case inline4(payload: (UInt8, UInt8, UInt8, UInt8), flags: UInt8)

  // inline5 has no room for flags
  // Whether or not that’s a good tradeoff probably goes to someone else to answer.
  // Making it specifically asciiInline5 might be a good compromise.
  // On the other hand, if you omit it, you can move the flags out of the enum.
  case inline5((UInt8, UInt8, UInt8, UInt8, UInt8))

  case owned(count: UInt16, buffer: StringBuffer, flags: UInt8)
  case unowned(count: UInt16, adjustedAddress: UInt16, flags: UInt8)
}

This is also 6 bytes! And spells out what we want in general! The only really unusual thing here is having separate cases for all the different lengths of inline strings, but that's just that "making the best use of our mandatory discriminator byte".

If we want to play layout tricks, we can go further by adding explicit padding and such to the cases that aren't using the full 5 bytes of payload. This puts the flags field in the 5th byte for every case that isn't inline5. As a bonus, the discriminators will get assigned in order if every case has a payload, which means the counts for the inline cases line up with the discriminators. (The optimizer doesn't seem to take advantage of this and I'm not sure why.)

enum StringGuts {
  case empty(padding: (UInt16, UInt16), flags: UInt8)

  case inline1(payload: UInt8, padding: (UInt8, UInt8, UInt8), flags: UInt8)
  case inline2(payload: (UInt8, UInt8), padding: (UInt8, UInt8), flags: UInt8)
  case inline3(payload: (UInt8, UInt8, UInt8), padding: UInt8, flags: UInt8)
  case inline4(payload: (UInt8, UInt8, UInt8, UInt8), flags: UInt8)

  // inline5 has no room for flags
  // Whether or not that’s a good tradeoff probably goes to someone else to answer.
  // Making it specifically asciiInline5 might be a good compromise.
  // On the other hand, if you omit it, you can move the flags out of the enum.
  case inline5((UInt8, UInt8, UInt8, UInt8, UInt8))

  case owned(count: UInt16, buffer: StringBuffer, flags: UInt8)
  case unowned(count: UInt16, adjustedAddress: UInt16, flags: UInt8)
}

Unfortunately, all of this ought to optimize better than it actually does. Maybe it looks better on an architecture that isn't x86_64, where the entire enum is passed in a single register but then you have to do shifts to break it apart. Given that, though, maybe the struct approach is better for now, limiting to four bytes of inline.

carlos42421 · July 14, 2024, 9:40am

Sounds nice and tight. I'm still struggling really to understand how string internals work. Where are the flags defined? Also struggling with the terminology a bit. When you say owned and unowned... I sort of assumed there are basically three cases with strings...

there's a block of data on the heap, containing the string contents and the String struct and its internals have a reference to it. Possibly the heap memory is something like a tail allocated class, with key information at the front, followed by the unicode code units themselves. And some high level information like the count is kept in the String struct, for fast access. Presumably it uses COW mechanisms like Array, Set and Dictionary, using ARC to see if the object is not uniquely referenced when it is mutated, and making a copy of the underlying buffer.
there's a fixed string somewhere, something like an interned string literal in a constant read only memory block, for most people these will be string literals they define in their programs, but it could in theory be something more exotic I suppose. The important point is it is not heap memory managed by ARC and our program, it is some other kind of pointer. Again some top level information is kept in the String struct. In this case there is probably nowhere else to keep other information.
small strings, a few unicode code units, which are stored inline, instead of pointers, count, etc. together with presumably some flag that indicates this is a small string and some kind of count (which would only need to be a few bits as the maximum length is something like 15).

So I'm sort of guessing when you talk about "owned" you're talking about something like heap managed storage in 1) above? And for unowned you're talking about 2)?

Sorry to have to go back to basics, but if I'm going to be the one building this PR for 16-bit String then I think I'm going to have to understand things a lot better! If someone else wants to pick up [Standard Library] [16-bit] StringObject not compiling on 16 bit platforms · Issue #75143 · swiftlang/swift · GitHub then they are welcome. I'll do it if no one else wants to but I'll need help!

Cheers,
Carl

p.s. I get what you're saying about optimisation, but your structure looks solid and I'm fairly sure that on our platform at least, it will optimise well. 6 bytes is a good size for AVR objects and it will be possible to pass that in registers a lot of the time. That said, 4 bytes would be better for our platform, but we can always take a step in this direction and see where we go.

jrose · July 21, 2024, 9:57pm

For the (belated) record, this was all based on the large comment blocks strewn throughout StringObject.swift. Your three cases are exactly correct, though the real String has a bit more nuance to account for future evolution (a "native but not stored directly" case) as well as Objective-C NSStrings (referred to as "foreign" cases). Neither of those are relevant in an embedded context.

4 bytes would be tricky! If string literals really can be anywhere in the entire address space, and they're unaligned, you really can't get the Variant enum or its equivalent smaller than 3 bytes. Even ignoring flags, that leaves you too little space to include an entire count unless you want to ban strings over 255 characters. Which means you'd have to make a choice of sacrifices:

No unowned strings
No owned strings
Count not stored inline (thus making some optimizations less effective, as well as making string literals more complicated)
Force the unowned case to be 2-byte aligned like the owned case, so the bottom bit of both the owned and unowned cases can be used as a discriminator. Maybe steal a bit from the other two bytes to re-un-align it. (This is not something easily expressible in Swift; I don't think we promise that UnsafePointers are correctly aligned. But it could be worth adding a hack for.)

Even with this, let's say strings are allowed to be half the total address space. If the count is stored inline, that means there's at most one bit left for inline "flags". The flags are just there for optimization, so partly this depends on how much Unicode is going to be supported in embedded mode. Which I know is a generally-contentious topic.

These are all interesting trade-offs that I don't think I'm the one qualified to make! The 6-byte versions I showed above are meant to behave a lot like the existing String implementation, and on the one hand, the further we go from that, the less we can share with the "normal" String. On the other hand, the existing implementation had very different goals in mind, and we don't necessarily need to optimize for the same things in embedded-land—and not just embedded but 16-bit embedded.

carlos42421 · July 29, 2024, 8:25pm

Alright. That's pretty useful information and really helps me to understand it.

I implemented a really basic version that was just following the 32 bit structure in the end. It's not final but it was good enough to get a basic standard library working, at least as far as AST/verify. I'm having separate issues from IRGen that are really AVR specific, I'm addressing those in other pull requests, but there's no reason to believe those issues will affect any other 16 bit platform so I think we've hit a good milestone for 16-bit support.

I feel like once I've got standard compilation working on AVR, I can come back to look at improving String optimisations on 16-bit. This information is perfect for when I do!

Thank you very much Jordan!
Carl