Layout-compatible structs and arrays

I'm doing some work where I need to toe the line between low-level code internally and a type-friendly layer on top. It would be really beneficial if I could safely make some big assumptions about layout.

I found this post from @jrose where it's stated that "A [Swift] struct with one element has the same layout as that element". I assume that's still true today and isn't going to change.

I have a bunch of generated types M_ that contain a single field C, where C is a (non-Obj-C) class type:

final class C { /* properties */ }
struct M1 { private var c: C }
struct M2 { private var c: C }
// ...

So I should be able to "safely" unsafeBitCast between M_ and C.

By extension, does this also imply that I can bitcast an Array<M_> to Array<C> and vice versa? Assuming that the Array is always a native Swift array and not the bridged Obj-C representation, do I risk violating any guarantees when I do operation on that array or read/mutate the elements themselves?

The generated code is mostly the same for these two functions, except for type metadata references that include the actual types:

final class C { var s: String = "foo" }
struct M { var c: C }

func firstFunction(_ x: [M]) -> Int {
    var sum = 0
    for e in x { sum += e.c.s.count }
    return sum
}

func secondFunction(_ x: [C]) -> Int {
    var sum = 0
    for e in x { sum += e.s.count }
    return sum
}

So I'm really just trying to figure out how much I can get away with here without violating anything that will bite me later :grimacing:

Even if the types have the same layout, you run the risk of violating aliasing assumptions in the optimizer when you cast between unrelated types like that. But @Andrew_Trick is more qualified to comment on such matters than I am.

2 Likes

That makes sense. To elaborate on what I'm trying to do:

C manages a raw memory buffer. The layout of the data in that buffer is defined by an external schema.

M1, M2, etc. are typed views of the data in some instance of C. All Ms use the same C type but provide it with different schemas describing what's at what offsets in the buffer. An instance of C is self-describing—it references its schema.

We have existing APIs that we'd really like to not break, which look like this:

struct M1 {
  var foo: M2 { get set }
  var bar: [M2] { get set }
}

Storing actual instances of M2 and [M2] in C's buffer is what I'm doing today, but that causes code bloat because doing any operations on that data requires knowing the concrete types to bind the buffer slices to. I do this by generating trampoline functions where C calls into M1, which calls back a generic function on C that takes a metatype hint. It works but I have to generate different trampolines for several operations, and it's becoming a lot.

I could change this to store the underlying C in the buffer instead of the wrapper types, and have the accessors wrap/unwrap those as needed. That's straightforward and cheap for the singular M2 accessor, but I'd really like to avoid the costly manual conversions of [M2] to/from [C].

Introducing our own collection type that does the wrapping/unwrapping on demand instead of using Array is also an option, but that's also client-API breaking (unless we do the conversion ourselves, so also a linear time copy until we provide some migration path).

I'm trying to have my cake and eat it too here, which may just not be possible.

You can bitcast values as long as the layouts match and you don't change the type of any pointers or class references they hold. Bitcasting to different pointer or reference types can get you into trouble.

You can't bitcast arrays because you're not really supposed to know their layout and you would be changing the type of the pointer they hold internally.

If you want a copy-on-write array with Element=M2 you can make an M2Array wrapper on top of Array<M1> whose subscript does the M1 -> M2 bitcast for you.

If you just want to view the storage as a Span of M2 within some local scope, then you can use Span's ability to simultanesouly view the same buffer as different element types:

Span<M2>(_bytes: m1_array.span.bytes)

If you wanted complete control of the memory buffer, you can build a wrapper over UnsafeRawPointer, then you can pick the element type via loadBytes which effectively does the bitcast for you.

4 Likes

Thanks, that confirms that what I wanted to do was not a good idea.

Being able to seamlessly cast arrays like that was most appealing because we could keep the same API surface with zero copies. I'm going to try some alternate implementation strategies (that don't do naughty bit-casting) to see if I can avoid changing the client-facing API.