SE-0334: Pointer Usability Improvements

Ben_Cohen · December 1, 2021, 5:11am

Hello, Swift community.

The review of SE-0334: Pointer API Usability Improvements begins now and runs through December 9, 2021.

Note this proposal is running concurrently with SE-0333 which also relates to unsafe pointer usability.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. If you do email me directly, please put "SE-0334" somewhere in the subject line.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at:

https://github.com/apple/swift-evolution/blob/master/process.md

As always, thank you for contributing to Swift.

Ben Cohen
Review Manager

lukasa · December 1, 2021, 5:02pm

I'm a strong +1 on this proposal. It's reasonable and proportionate, and greatly improves the ergonomics of the pointer types.

Michael_Ilseman · December 1, 2021, 10:12pm

+1 this is great!

The calculation of tOffset above is overly complex.

I agree, I've written this code several times and I still have to do a ~~double~~ triple take to see what it's doing! This usually means I write a series of assert on the resulting addresses just to make sure.

    rawValue = .allocate(byteCount: MemoryLayout<NodeStorage>.size,
                         alignment: MemoryLayout<NodeStorage>.alignment)

Thoughts on an allocate<T>(_ type: T.Type = T.self) helper? (assuming I remembered that whole type dance correctly...)

ksluder · December 2, 2021, 12:12am

+1.

Absolutely. Manually allocating instances of Swift data types is difficult, and correctly computing alignment is a big contributor to that difficultly.

In fact, the example of how to do so contains a potential error: since Swift types can be zero-sized, zero is a conceivably valid value for alignment. Thus this code could trap with an integer underflow:

let tMask   = MemoryLayout<T>.alignment - 1

Yes. Improving pointer ergonomics is a high priority for near-term Swift development.

I do lots of manual allocation in C, which lacks this functionality. It would be most welcome.

A quick reading, plus conversation with the proposal authors through the pitch phase.

By the way, during that quick reading, I noticed this discrepancy between aligned(for:) and alignedUp(for:) still has not been corrected:

extension UnsafeRawPointer {
  public func alignedUp<T>(for: T.type) -> Self
}
This function will round the current pointer up to the next address properly aligned to access an instance of T . When applied to a self already aligned for T , UnsafeRawPointer.aligned(for:) will return self .

Chris_Lattner3 · December 2, 2021, 1:56am

Looks nice to me, a couple of random questions:

Do the mixed comparison operators affect compile time of the constraint solver?

Also, is overload resolution in any surprising ways? Has anyone checked the source compatibility suite?

-Chris

woolsweater · December 2, 2021, 5:19am

Looks excellent; helpful and well-motivated additions to the pointer APIs.

glessard · December 2, 2021, 7:03pm

I checked against the compiler performance benchmark ([DNM] comparison operators for heterogeneous pointer types by glessard · Pull Request #39635 · apple/swift · GitHub), which showed no performance regression. Is there a different measurement I should check?

I have launched a source compatibility run in the same PR.

glessard · December 2, 2021, 7:09pm

I don't think 0 is a valid value for alignment, as it is not a power of 2.
In practice the runtime returns alignment of 1 for Void and Never (both size 0).

Michael_Ilseman · December 2, 2021, 7:32pm

More specifically, types of size 0 have a stride of 1.

jrose · December 6, 2021, 2:07am

I’m in favor of most of this, but I don’t actually think the mixed-type comparison operators are a good idea. If I compare two pointers with different types, it’s possible it’s deliberate, but it’s more likely I made a mistake (a typo or a copy/paste error). I’d suggest restricting these new operators to pointers with the same pointee type, so that the mutable/immutable problem is fixed without losing the otherwise-useful static checking.

glessard · December 7, 2021, 7:30pm

Limiting to same-type pointees would lose the raw-to-typed pointer comparisons, which are another frequent use case that I wouldn't want to lose.

Adding the raw-to-typed comparisons after enforcing Pointee-identity requires either
A) treating the UInt8 pointee in a special way (always allow Self.Pointee == UInt8 and always allow Other.Pointee == UInt8); this adds 10 additional operators. Or
B) adding the set of 5 operators when Self is either UnsafeRawPointer or UnsafeMutableRawPointer and when Other is either UnsafeRawPointer or UnsafeMutableRawPointer; this adds 20 additional operators.

Michael_Ilseman · December 8, 2021, 6:57pm

URP is a special case, given it's an entirely distinct type from UP. I think == overloads using memory address could sense and it doesn't seem like the potential for programmer error is high here. But, it seems far more valuable to allow easier conversion from any UP to URP.

UInt8 / Int8 is very special already (and in C it can alias with any UP/URP freely). It's a huge pain point when working with UP<CChar>. Honestly, an == overload would do very little to help this pain. We need easier conversions, e.g. implicit-only-for-C-bridging or even just plain implicit conversions.

UMP->UP, much like UP->URP, is yet another special case because we use naming to encode capabilities. Similarly UMRP->URP. This is a pain point and == doesn't alleviate this pain much, but better conversions would.

Outside of these special cases, what are you hoping to achieve from these == overloads? I could see walking heterogenous regions with homogenous sub-regions and wanting to do an end-of-subregion check. But, I also wonder if a URP overload is preferable even here, that is you store endPointers as URP because the subregion they're ending is not the same type as the subregion starting at that address.

(Lacking implicit conversions, I end up resorting to inadvisable operators like my magic carrot, which converts in the "obviously" desired direction).

glessard · December 8, 2021, 9:41pm

We are getting the C-bridging ones: SE-0324

re: the goal for these operators
One of the main goals is indeed for working with heterogeneous regions of memory. The more general alternative to the operators-based approach is to have implicit conversions, but they may be worse for compiler performance (and they've generated a lot of pushback every time they've been pitched.)

This being said, I don't feel like the comparison operators are the most important part of this proposal.

Andrew_Trick · December 8, 2021, 9:42pm

Three cases should be supported:

UnsafePointer<T> == UnsafeMutablePointer<T>

UnsafeRawPointer == UnsafePointer<T>

UnsafeRawPointer == UnsafeMutablePointer<T>

Conversions between these types are common. Those conversions lead to the type mismatch across variables. Then comes the expectation that they can still be compared. When that comparison doesn't work, programmers start wading through the memory binding APIs -- basically the worst case scenario.

We can add partially concrete overloads for only the meaningful cases above, along the lines of:

extension UnsafePointer {
  public static func == (lhs: UnsafePointer<Pointee>, rhs: UnsafeMutablePointer<Pointee>) -> Bool
}

That addresses both the concern about mismatched pointer types, and type checker performance.

I realize that would add a large number of operators. The cross product of <,>,<=,>=,== with the 3 combinations of types above, in either of two operand positions. Is that 30 operators?

I think that's the way to go, but if anyone is concerned about adding such a large number of operators, then I suggest we drop this part of the proposal for now.

[edit] I left out the conversions to/from UnsafeMutableRawPointer above. That might double the number of operators.

Karl · January 3, 2022, 3:26pm

2 questions:

Is UnsafePointer<Foo> expected to be properly aligned for Foo? Does the .pointee property assume alignment?
Are there any other features in Swift which might prohibit automatic (not opt-in) packing of struct fields?

My concern is that, if you work with raw pointers to trivial fields of a struct (using copyBytes/storeBytes, etc), there’s no alignment requirement, so it may be easier for the compiler to perform automatic layout optimisation. That opportunity may be lost (if it isn’t already) by providing bound pointers to those fields, meaning packed structs would always require an opt-in attribute of some sort.

Michael_Ilseman · January 3, 2022, 8:49pm

Structs have the single-member layout equivalence rule, that is, if a struct has a single stored member (even if that member is a tuple or aggregate), that struct is layout equivalent with the single stored member. If there's an ABI boundary between you and the struct, then the struct has to be @frozen for you to know if it is single-stored-member. Otherwise, no, IIRC.

If you have a struct with two Bool fields in it, those might not be addressable as they could be packed into separate bits of the same byte. If you want an address, Swift has to materialize the address for you and perform a write-back when you're done. Improving how that is done is(/was) the role of accessors and safety is helped by exclusivity.

For the purposes of UP family of types, IIUC, we're only talking about rules as they pertain to things that are already addressable and with already-defined layout. So, they would only apply to single-member structs, tuples, regions of memory, etc. You can store a tuple as the sole member of a struct to enforce or guarantee a certain layout and the addressability of individual tuple-members if you want to.

Karl · January 4, 2022, 10:56am

Right, but what I'm wondering is whether addressable fields need to be properly aligned. If not (i.e. if we still want the ability to pack structs as an optimisation, without explicit opt-in), then it might not be safe to construct a typed pointer to a field.

We could perhaps just say that we'll return nil if the compiler decides to pack the field in a way that leaves it non-aligned. @John_McCall said in another thread that this is also possible if the compiler decides to make a field non-addressable:

I'm a little uneasy about that (for both cases, struct-packing and bit-packing). It means that the optimiser can change the observable semantics of a program, and that both the proposed pointer-to-field and existing address-of-field APIs are inherently unreliable. The optimiser is built on heuristics which can change at any time.

Andrew_Trick · January 4, 2022, 6:42pm

Right. It's not safe to construct a typed pointer to a misaligned field. The general answer is to make them non-addressable. If the programmer could somehow reliably determine the packed offset, then they could safely access the data with a raw pointer.

Ben_Cohen · January 18, 2022, 8:57pm

Review Conclusion

The proposal has been accepted.