Layout-compatible structs and arrays

I'm doing some work where I need to toe the line between low-level code internally and a type-friendly layer on top. It would be really beneficial if I could safely make some big assumptions about layout.

I found this post from @jrose where it's stated that "A [Swift] struct with one element has the same layout as that element". I assume that's still true today and isn't going to change.

I have a bunch of generated types M_ that contain a single field C, where C is a (non-Obj-C) class type:

final class C { /* properties */ }
struct M1 { private var c: C }
struct M2 { private var c: C }
// ...

So I should be able to "safely" unsafeBitCast between M_ and C.

By extension, does this also imply that I can bitcast an Array<M_> to Array<C> and vice versa? Assuming that the Array is always a native Swift array and not the bridged Obj-C representation, do I risk violating any guarantees when I do operation on that array or read/mutate the elements themselves?

The generated code is mostly the same for these two functions, except for type metadata references that include the actual types:

final class C { var s: String = "foo" }
struct M { var c: C }

func firstFunction(_ x: [M]) -> Int {
    var sum = 0
    for e in x { sum += e.c.s.count }
    return sum
}

func secondFunction(_ x: [C]) -> Int {
    var sum = 0
    for e in x { sum += e.s.count }
    return sum
}

So I'm really just trying to figure out how much I can get away with here without violating anything that will bite me later :grimacing:

Even if the types have the same layout, you run the risk of violating aliasing assumptions in the optimizer when you cast between unrelated types like that. But @Andrew_Trick is more qualified to comment on such matters than I am.

2 Likes

That makes sense. To elaborate on what I'm trying to do:

C manages a raw memory buffer. The layout of the data in that buffer is defined by an external schema.

M1, M2, etc. are typed views of the data in some instance of C. All Ms use the same C type but provide it with different schemas describing what's at what offsets in the buffer. An instance of C is self-describing—it references its schema.

We have existing APIs that we'd really like to not break, which look like this:

struct M1 {
  var foo: M2 { get set }
  var bar: [M2] { get set }
}

Storing actual instances of M2 and [M2] in C's buffer is what I'm doing today, but that causes code bloat because doing any operations on that data requires knowing the concrete types to bind the buffer slices to. I do this by generating trampoline functions where C calls into M1, which calls back a generic function on C that takes a metatype hint. It works but I have to generate different trampolines for several operations, and it's becoming a lot.

I could change this to store the underlying C in the buffer instead of the wrapper types, and have the accessors wrap/unwrap those as needed. That's straightforward and cheap for the singular M2 accessor, but I'd really like to avoid the costly manual conversions of [M2] to/from [C].

Introducing our own collection type that does the wrapping/unwrapping on demand instead of using Array is also an option, but that's also client-API breaking (unless we do the conversion ourselves, so also a linear time copy until we provide some migration path).

I'm trying to have my cake and eat it too here, which may just not be possible.

You can bitcast values as long as the layouts match and you don't change the type of any pointers or class references they hold. Bitcasting to different pointer or reference types can get you into trouble.

You can't bitcast arrays because you're not really supposed to know their layout and you would be changing the type of the pointer they hold internally.

If you want a copy-on-write array with Element=M2 you can make an M2Array wrapper on top of Array<M1> whose subscript does the M1 -> M2 bitcast for you.

If you just want to view the storage as a Span of M2 within some local scope, then you can use Span's ability to simultanesouly view the same buffer as different element types:

Span<M2>(_bytes: m1_array.span.bytes)

If you wanted complete control of the memory buffer, you can build a wrapper over UnsafeRawPointer, then you can pick the element type via loadBytes which effectively does the bitcast for you.

5 Likes

Thanks, that confirms that what I wanted to do was not a good idea.

Being able to seamlessly cast arrays like that was most appealing because we could keep the same API surface with zero copies. I'm going to try some alternate implementation strategies (that don't do naughty bit-casting) to see if I can avoid changing the client-facing API.

@allevato I may be misunderstanding your requirements, but if all your wrapper types use the same storage, why not declare an internal protocol for exposing the storage? That way you can write generic code that will work with any wrapper type without having to bit-cast anything.

internal protocol HavingC { var c: C { get } }
public struct M1: HavingC { internal var c: C }
public struct M2: HavingC { internal var c: C }
// ...

You can still get the benefit of no extra overhead, because making sure all conformances of HavingC have only one stored property, they will all fit inside the 3-pointer inline storage of an existential, so referring to any HavingC will be more-or-less what you want.

That doesn't address wanting to interchange between [M1] and [C] at zero cost, without copying the array.

Does it have to be an Array? You could define a concrete type conforming to Collection which does the wrapping and unwrapping as necessary.

Yeah, that's exactly what I would do if I was willing to source-break the existing APIs, which are already using arrays, or if I were designing this from scratch.

That's still an option in the future (after giving users a migration), but for now I was trying to see what I could get away with while sticking to regular Swift arrays. And switching to a custom collection type has its own drawbacks, mainly that the type becomes viral throughout the rest of the client code. They're passing these arrays around already, so they'll end up taking the same hit anyway to do the conversion if they don't want to migrate all their code where the values propagate.

This seems to be working fine... Could this fail in practice?

final class C { let x = "42" }
struct M1 { var c: C }
struct M2 { var c: C }
var arrayOfM1: [M1] = [M1(c: C()), M1(c: C())]
let arrayOfM2 = unsafeBitCast(arrayOfM1, to: [M2].self)
for item in arrayOfM2 { print(item.c.x) }
let arrayOfC = unsafeBitCast(arrayOfM1, to: [C].self)
for item in arrayOfC { print(item.x) }