SE-0333: Expand usability of withMemoryRebound

ben-cohen · December 1, 2021, 5:04am

Hello, Swift community.

The review of SE-0333: Expand usability of withMemoryRebound begins now and runs through December 9, 2021.

Note this proposal is running concurrently with SE-0334 which also relates to unsafe pointer usability.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. If you do email me directly, please put "SE-0333" somewhere in the subject line.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at:

https://github.com/apple/swift-evolution/blob/master/process.md

As always, thank you for contributing to Swift.

Ben Cohen
Review Manager

Michael_Ilseman · December 1, 2021, 3:24pm

First off, big +1 in general and I trust that the authors have thought through this domain thoroughly. I do think it's useful to clarify some terminology and concepts though.

As before, T's memory layout must be compatible with that of Pointee

I think it would greatly help this review (as well as this murky part of the Swift langauge) if you could define what "layout compatible" means and why its interesting and important. "Compatible" seems to depend on what you'd use it for, while "layout equivalence" would imply "compatible" and more.

I get scrapping type equality in favor of layout equivalence, such as for CGFloat -> Double on 64-bit. But, do we also need them to match in triviality?

I get going to/from a pointer to a C struct and a pointer to the first stored member of that struct. But, when the first member has a smaller alignment than the struct, do we need to track, reason about, and enforce a higher alignment?

lukasa · December 1, 2021, 4:46pm

I'm delighted with this proposal, particularly for the addition of withMemoryRebound(to:) for raw pointers. The absence of this functionality has caused a lot of awkward workarounds to be promulgated, and it'll be good to finally have a "proper" spelling for it.

glessard · December 5, 2021, 1:01am

Thanks for the question, @Michael_Ilseman. I came up with a draft to answer these questions; I would appreciate feedback:

The concept of layout compatibility is not precisely defined, unfortunately. Here are some rules:

Two types are mutually layout-compatible if their in-memory representation has the same size and alignment.
Some examples of mutually layout-compatible types follow:

identical types
a typealias with the type for which it is an alias
floating point types of the same size
class types and AnyObject existentials
pointer types, such as UnsafePointer and OpaquePointer
frozen structs with a single stored property with their stored property type
frozen enums with a single case with the type of the associated value of their single case

Aggregate types (tuples, array storage, and structs) are mutually layout-compatible if they have the same number of mutually layout-compatible elements. This means that contiguous array storage and homogeneous tuples are mutually layout-compatible as long as they have the same number of elements (and their elements are themselves mutually layout-compatible.)

Some types can be layout compatible but not mutually:

aggregates are layout compatible with larger aggregates of the same kind when their common elements are mutually layout compatible.
an enum associated values's type is layout compatible its enum type if the enum has only one case with associated value (and zero or more no-payload cases).

For non-homogeneous tuples and structs whose stored properties are of multiple types, layout compatibility requires the constitutive elements to be mutually layout compatible, and to be ordered in memory with the same stride and alignment.

In the general case, the runtime performs housekeeping tasks when initializing or deinitializing a value,
as well as when updating a value. When a value is updated, reference counts may change, leading to possible deinitialization elsewhere. Initialization and deinitialization mean that type-specific code is executed, and therefore memory layout is not enough for compatibility in the general case. In other words, in the most general case, type identity would still be required.

Types variously referred to as "POD" (plain old data) or "trivial" in the documentation do not trigger actions by the runtime after an initialization, a deinitialization or an update. For such types, layout compatibility is sufficient for correct temporary rebinding.

It looks like this only made it to the doc-comments, but:

  ///   If `T` and `Pointee` have different alignments, this pointer
  ///   must be aligned with the larger of the two alignments.

(each version has an appropriate variation on that theme.)
That should be included in the "proposed solution" section. Thanks.

Andrew_Trick · December 6, 2021, 7:41am

Right, while this proposal won't deliver a complete formal spec, we can confidently expect any future spec to cover some obvious cases where the implemenation already assumes layout compatibility and would have no reason to change.

Two types are mutually layout-compatible if their in-memory representation has the same size and alignment.

For the purpose of this API, it may be more clear to talk about "layout equivalence" rather than "[mutual] layout compatibility". When we rebind memory, we require one type's stride to be a whole multiple of the other. We can't rebind part of a value. So we can simply state that the contiguous sequence of the smaller type must be layout equivalent with the members of the larger type.

All "mutually layout-compatible" rules here could just be called "layout equivalent".

Some examples of mutually layout-compatible types follow:

identical types

a typealias with the type for which it is an alias

floating point types of the same size

class types and AnyObject existentials

pointer types, such as UnsafePointer and OpaquePointer

frozen structs with a single stored property with their stored property type

frozen enums with a single case with the type of the associated value of their single case

Aggregate types (tuples, array storage, and structs) are mutually layout-compatible if they have the same number of mutually layout-compatible elements. This means that contiguous array storage and homogeneous tuples are mutually layout-compatible as long as they have the same number of elements (and their elements are themselves mutually layout-compatible.)

Some types can be layout compatible but not mutually:

aggregates are layout compatible with larger aggregates of the same kind when their common elements are mutually layout compatible.

I don't think this rule is useful for rebinding memory in its current form. We would need more precise description of structure layout to make it useful. I'm not comfortable making those kinds of guarantees, especially for nonfrozen structs, without more review.

an enum associated values's type is layout compatible its enum type if the enum has only one case with associated value (and zero or more no-payload cases).

Similarly, this rule is not useful for rebinding non-frozen enums, because the enum may be larger than its payload.

For non-homogeneous tuples and structs whose stored properties are of multiple types, layout compatibility requires the constitutive elements to be mutually layout compatible, and to be ordered in memory with the same stride and alignment.

This wording needs work. It's trying to say that 'S3' and 'S2_1' below are layout equivalent:

struct S1 {
  var i: Int32
}

struct S2 {
  var i: Int32
  var j: Int32
}

struct S3 {
  var i: Int32
  var j: Int32
  var k: Int32
}

struct S2_1 {
  var s2: S2
  var i: Int32
}

I personally think that's a sensible rule, but we should avoid avoid documenting it until it's been reviewed.

Michael_Ilseman:

I get scrapping type equality in favor of layout equivalence, such as for CGFloat -> Double on 64-bit. But, do we also need them to match in triviality?

Triviality is critical. As Guillaume explains below, the language implementation needs to know how to copy or deinitialize the in-memory value. For nontrivial values, object references need to be in the same positions, and should at least have a superclass/subclass relationship. So, rebinding memory is not meant to be a mechanism to extract the bitpattern of a reference or vice-versa.

For trivial types, the programmer simply needs to guarantee a valid bit pattern for the new type when those bits are evaluated. The language implementation doesn't care.

In the general case, the runtime performs housekeeping tasks when initializing or deinitializing a value,
as well as when updating a value. When a value is updated, reference counts may change, leading to possible deinitialization elsewhere. Initialization and deinitialization mean that type-specific code is executed, and therefore memory layout is not enough for compatibility in the general case. In other words, in the most general case, type identity would still be required.

Types variously referred to as "POD" (plain old data) or "trivial" in the documentation do not trigger actions by the runtime after an initialization, a deinitialization or an update. For such types, layout compatibility is sufficient for correct temporary rebinding.

Michael_Ilseman:

I get going to/from a pointer to a C struct and a pointer to the first stored member of that struct. But, when the first member has a smaller alignment than the struct, do we need to track, reason about, and enforce a higher alignment?

It looks like this only made it to the doc-comments, but:
  ///   If `T` and `Pointee` have different alignments, this pointer
  ///   must be aligned with the larger of the two alignments.
(each version has an appropriate variation on that theme.)
That should be included in the "proposed solution" section. Thanks.
[/quote]

Yep. Typed pointers need to be naturally aligned when they are accessed, period. When you rebind memory, it's up to you to ensure the resulting pointer alignment.

Of course, you can always rebind to a type with lower alignment without worry. Rebinding to a higher alignment requires an alignment check.

Michael_Ilseman · December 6, 2021, 7:26pm

I agree, especially if "layout compatible" isn't an existing term of art in Swift. Perhaps compatible is an injection and equivalence is a bijection. "Layout" is well-established, at least if you consider the ABI stability manifesto formal and authoritative (I couldn't find a more recent formal doc). It would definitely be nice (but not required for this proposal) to formally pull some of that out and update it.

@glessard 's posted rules make sense from the perspective of implementation or an operational semantics, where we start with forbidding everything except that which is relaxed in a list of exceptions, which can be extended over time.

For the purpose of establishing a mental model of Swift's notions of layout equivalence and compatibility, I think it's clearer to build up what is equivalent and/or compatible. My working understanding is that two types are layout equivalent if they:

Are of the same "kind" (trivial vs a moveable reference in the same type lineage vs a non-movable entity, etc)
Have the same alignment and size (and thus stride)

Structs containing only a single member are layout equivalent to that member. Thus, if that single member is a tuple, then the struct will be layout equivalent to that tuple (and thus C's layout) such that a point struct holding a tuple is layout equivalent with the raw tuple in memory. IIRC, A pointer to a tuple is layout compatible with a pointer to the first member of the tuple, but not equivalent if the tuple is not single-element.

Sometimes you care about binary compatibility and sometimes you're reasoning about C's typed pointer aliasing rules. It's unfortunate that UnsafePointer has to serve both duties. Swift's binary compatibility requires tracking non-triviality while C's (unhelpfully) constrains even trivial types.

In Swift's model, IIUC, you can view UBP<(Int, Float)> as UBP<MyStruct> if MyStruct is frozen and whose sole member is a tuple of (Int, Float). But this is not necessarily the case in C's model. Does this proposal attempt to square this circle?

I think talking about homogeneity here adds undue complexity without actually defining or clarifying much. UBP<(Int, Float)> is homogenous as (Int, Float) strided in memory, even if the tuple isn't. Beyond that it doesn't matter what's inside the tuple. Homogeneity would be for expressing that UP<(Int, Int, Int)> can be viewed as UBP<Int> with count of 3.

Yes, if they were tuples instead of structs (or frozen single-member structs that stored the tuples).

My goal is to clarify that "always". It's always ok if they match in "kind" above and you're not using what is in effect a C pointer type in Swift to model C's pointer rules.

Andrew_Trick · December 7, 2021, 7:47pm

Michael_Ilseman:

I agree, especially if "layout compatible" isn't an existing term of art in Swift. Perhaps compatible is an injection and equivalence is a bijection. "Layout" is well-established, at least if you consider the ABI stability manifesto formal and authoritative (I couldn't find a more recent formal doc). It would definitely be nice (but not required for this proposal) to formally pull some of that out and update it.

@glessard 's posted rules make sense from the perspective of implementation or an operational semantics, where we start with forbidding everything except that which is relaxed in a list of exceptions, which can be extended over time.

For the purpose of establishing a mental model of Swift's notions of layout equivalence and compatibility, I think it's clearer to build up what is equivalent and/or compatible. My working understanding is that two types are layout equivalent if they:

Are of the same "kind" (trivial vs a moveable reference in the same type lineage vs a non-movable entity, etc)

Have the same alignment and size (and thus stride)

That is certainly a better mental model.

But this proposal is not saying that two trivial multi-member structs are internally layout equivalent just because they have the same alignment and size. That's a completely reasonable interpretation of @frozen structs if you ask me, but we're not writing the ABI here.

Yes. I agree 100% with this wording. Although I don't think we should mention "layout compatible" at all in this proposal.

To provide some background... The "layout compatible" language cited earlier in this thread was an attempt (long ago) to pin down how the compiler behaves with typed pointers. It answers whether they are allowed to alias and whether they can be type cast using an in-place pointer cast.

There are/were places in the standard library that expect to do an in-place cast from a larger to a smaller type. That's why those layout compatible rules aren't all symmetric.

I think that's a reasonable interpretaton of @frozen, but that's only my interpretation. We're not clarifing the ABI in this proposal. (I do think you're doing a good job of that though and would like to see that in a separate proposal). I can imagine saying: here are the ABI layout rules, and any Swift type whose address might be taken needs to follow those rules, not just the public types that participate in the ABI. That would put an additional barrier up for us if we decide to do auto-layout optimization.

I don't think this contradicts anything you've said, but to frame the discussion for everyone else:

The withMemoryRebound API docs aren't the place to write ABI rules. As an unsafe API, it's sufficient to say that the types need to be layout equivalent. We should specifically call out the fact that references need to be in the same position, if not in the API doc, then at least in the proposal text.

In the absence of ABI rules, we can add helpful guidance in the text of this proposal, calling out some common cases that can be relied on in practice (which are mostly obvious anyway). The hallmarks of these cases are

the language implementation already relies on them
there would be no conceivable reason for layout optimization to break them

Yes, to the next order, homogeneity says that UBP<((Int, Float), (Int, Float))> can be viewed as UBP<(Int, Float)> with a count of 2.

Homogeneity is relevant because we need arrays to be layout compatible with homogeneous tuples and structs, regardless whether that is part of the ABI spec.

I think what you're asking for is completely reasonable, and would obviate the need to some special-case rules, but draws on the interpretation of ABI.

Lantua · December 22, 2021, 7:40pm

Sorry for the late comment. I think this is an excellent little addition to the pointer APIs.

That said, I have some qualms about how we use the term layout compatible with little to no explanation. Through no fault of its own, I get that it's because layout decision is an ABI commitment, and we don't want to do that here. However, that puts us in a position where the proposal sits on top of no (formal) foundation. The only two ways anyone would know what layout compatibility entails is that they are in the knows or happen to be there when it was brought up in this forum (as an attempt to formalize it).

I want to trust that the authors know what they are doing, but trust is a currency I don't want to use for any critical review; it makes (that part of) the proposal unfalsifiable, which is less than ideal. Furthermore, it's not particularly reassuring to see that we're still debating whether we should use layout compatible or layout equivalent, both of which seem to be slightly different.

I wish we had formal (albeit minimal) guarantees I can point to that would expand over time, but I'll have to trust them on this.

On the layout compatibility, would it be correct to say that the definition of layout compatible in this proposal is symmetric; if T is compatible with Pointee, then so is Pointee with T? I got this impression because I see withMemoryRebound as simply binding Pointee -> T, then back T -> Pointee. So either the layout compatibility is symmetric, or we somehow have a notion of forward and backward bindings, and I don't know which one we're going for.

glessard · December 22, 2021, 8:19pm

We have discussed a change where we refer to "layout equivalence" instead, including a definition of that term. It is much better defined than the term we used in the original version.

Quoting from the diff:

In order to safely use withMemoryRebound, the current rule is that the destination type, T, must be layout equivalent with Pointee. To this we add that, as an alternative, T can be a homogeneous aggregate of Pointee, or Pointee can be a homogeneous aggregate of T.

Two types A and B are layout equivalent when they are, for example:

identical types;
one is a typealias for the other;
trivial scalar types with the same size and alignment, including floating-point and integer types;
one is a class type, and the other is one of its superclass types (including AnyObject);
pointer types, such as UnsafePointer and OpaquePointer;
one is a frozen struct with a single stored property, the other is the type of its stored property;

Homogeneous aggregate types (tuples, array storage, and frozen structs) are layout equivalent if they have the same number of layout-equivalent elements.

Lantua · December 23, 2021, 12:36am

Hmm, I could use that, thanks.

This definition of layout equivalent seems to be symmetric, which is nice. One minor thing though. The homologous aggregate part doesn't seem transitive. There are cases like (Int, Int) x 3 <=> Int x 6 and Int x 6 <=> (Int, Int, Int) x 2. Is that intentional? Not that I have a particular use case for it, but it seems like one could rebind (Int, Int, Int) x 2 <=> Int x 6 <=> (Int, Int) x 3 provided proper alignment, which is doable with two withMemoryRebound but not with one.

jrose · December 23, 2021, 12:54am

Some nitpicks on this definition:

The proposal doesn't define "trivial" or "scalar", and I don't think there is such a concept as "scalar" in the user model for Swift. There are some primitive LLVM types, but they're all exposed as structs named Int or Float or whatever.

There are also additional questions around SIMD vectors (an LLVM primitive) vs. an integer of the same total size vs. a tuple of the same elements. But I guess these are not necessarily layout equivalent (because they may all have different alignments).

I'd separate AnyObject out. There's also a question of whether a class and an existential have the same representation (no, unless it's an ObjC protocol or composition of ObjC protocols), and calling AnyObject a "superclass type" leaves room for questioning protocols.

As a bonus, the ObjC protocol thing should be guaranteed too.

I think this should be guaranteed for all structs, not just frozen structs. On the one hand, structs in non-library-evolution modules are not "frozen" and do not have a guaranteed layout, except in this one case. On the other, a non-frozen struct can still be safely reinterpreted within its own module at run time, and unsafely-but-correctly reinterpreted by a client trusting that the layout hasn't changed (perhaps because it's a second module distributed with the first).

glessard · December 23, 2021, 1:10am

We do have the requirement that the base address for a withMemoryRebound call must match the more strenuous alignment between T and Pointee. SIMD types can be used as long as the alignment is matched. This allows you to select the portion of a buffer that is SIMD-compatible to perform SIMD computations, and to round out the rest in a non-SIMD context. One interesting example is if one had a buffer of 3D coordinates (3 Float values) to be scaled. One can perform a withMemoryRebound down to Float, adjust the base address for SIMD alignment, and then use withMemoryRebound to get the SIMD vector type.

"trivial" is defined outside of this quote, but "scalar" is not defined anywhere.

We tried to be prudent with these definitions; this one does seem excessive.

I'm not sure what you mean about AnyObject. Do you mean that what I wrote is outright wrong or that it is expressed clumsily? (I agree with the second, rereading one week removed.)

jrose · December 23, 2021, 1:38am

Outright wrong, but only in a "technically correct" sense: any class instance reference is layout-equivalent to AnyObject, but a "superclass" is always a class and NSObject has no superclass.

Another case that probably deserves special-casing: Optional of a pointer type or class reference is layout-equivalent with another Optional pointer or class reference. In fact, if layout-equivalence includes extra inhabitants, Optional<A> and Optional<B> are layout-equivalent if A and B are layout-equivalent. (But if it doesn't that's not true in general; consider Optional<Int32> vs. Optional<NonZeroInt32>.)

glessard · December 23, 2021, 2:24am

How about this: one is a class type, and the other is one of its superclass types, or AnyObject.
I think the error was putting it between parentheses, as if it refined "superclass".

Andrew_Trick · December 24, 2021, 11:45pm

The rules apply transitively. We could add a statement to that effect if needed.

"Trivial" is defined in the proposal as "not requiring initialization or deinitialization".

Since we can't talk about builtin types, instead of "scalar", we can refer to integer types, floating point types, and pointer types.

We may have also wanted Unmanaged to be considered trivial. But, for layout rules, I think it's reasonably covered by the single member struct and AnyObject existential rules.

Yes, agreed.

jrose · December 25, 2021, 1:27am

Oops I just thought about it again and it's equivalent to "all class types" if layout equivalence is transitive. Additionally it answers the question of whether layout-equivalence includes extra inhabitants with a definitive no because ObjC class references can be tagged pointers (and therefore so can AnyObject) and Swift ones cannot.

ben-cohen · January 18, 2022, 8:56pm

Review Conclusion

The proposal has been accepted.