[Pitch] Pointer Usability Improvements

glessard · October 11, 2021, 5:57pm

Here is the second in a series of proposed improvements to the UnsafePointer and UnsafeBufferPointer families. This one concentrates on the pointers.

Note: an updated pitch document can be found below.

Pointer API Usability Improvements

Proposal: SE-NNNN full proposal draft
Authors: Guillaume Lessard, Andrew Trick
Review Manager: TBD
Status: Draft pull request
Implementation: pending
Bugs: rdar://64342031, SR-11156 (rdar://53272880), rdar://22541346
Previous Revision: none

Introduction

This proposal introduces some quality-of-life improvements for UnsafePointer and its Mutable and Raw variants.

Add an API to obtain an UnsafeRawPointer instance that is advanced to a given alignment from its starting point.
Add an API to obtain a pointer to a stored property of an aggregate T, given an UnsafePointer<T>.
Rename the unchecked subscript of Unsafe[Mutable]Pointer to include the argument label unchecked.
Add the ability to compare pointers of any two types.

Motivation

The everyday use of UnsafePointer and its variants comes with many difficulties unrelated to the unsafeness of the type. We can improve the ergonomics of these types without hiding the unsafeness.

For example, if one needs to advance a pointer to a given alignment, there is no need to force the programmer to derive the proper calculation (or consult a textbook, or copy an answer from stack overflow.) An API that provides this utility would not take away from the fact that the type is called "unsafe".

Similarly, it is rather difficult to pass a pointer to a property of a struct to (e.g.) a C function. In such cases, the poor ergonomics lead to code that is less safe than it should be.

From another perspective, the integer subscript on UnsafePointer is different from other subscripts in Swift. Normally, similar-looking subscripts perform bounds checking. The UnsafePointer version does not warn that it does not check its parameter, even though it looks similar to a Collection subscript at the point of use. It would be an improvement to give a name to the way that subscript is unsafe: it is unchecked.

Finally, when dealing with pointers of different types, we can often get in situations where Swift's type system gets in the way. Regardless of their type, pointers represent one unique storage location in memory. As such, casting the type of a pointer in order to be able to compare it to another is not a useful exercise.

Proposed solution

Ability to obtain a pointer properly aligned to store a given type

When using pointers into untyped (raw) memory, it is often desirable to obtain another pointer that is advanced to a given alignment, rather than advanced by a particular offset. The current API provides no help in performing this task, even though the calculation isn't entirely obvious. The programmer should not need to derive the proper calculation, or to consult a textbook.

For example, consider implementing a complex data structure whose nodes include atomic pointers to other nodes in the graph. In order to avoid two allocations per node, we allocate a range of raw memory and manually bind subranges of the allocation. Our example node allocates space for one atomic pointer value and one value of type T:

import SwiftAtomics

struct Node<T>: RawRepresentable, AtomicValue, AtomicOptionalWrappable {
  typealias AtomicRepresentation = AtomicRawRepresentableStorage<Self>
  typealias AtomicOptionalRepresentation =
                                   AtomicOptionalRawRepresentableStorage<Self>
  typealias NodeStorage = (AtomicOptionalRepresentation, T)

  let rawValue: UnsafeMutableRawPointer

  init(_ element: T) {
    rawValue = .allocate(byteCount: MemoryLayout<NodeStorage>.size,
                         alignment: MemoryLayout<NodeStorage>.alignment)

    // bind and initialize atomic storage
    rawValue.initializeMemory(as: AtomicOptionalRepresentation.self,
                              repeating: AtomicOptionalRepresentation(nil),
                              count: 1)
    // bind and initialize payload storage
    let tMask   = MemoryLayout<T>.alignment - 1
    let tOffset = (MemoryLayout<AtomicOptionalRepresentation>.size + tMask) & ~tMask
    let t = rawValue.advanced(by: tOffset)
                    .initializeMemory(as: T.self, repeating: element, count: 1)
  }
}

The calculation of tOffset above is overly complex. Calculating the offset between the start of the data structure to the field of type T should be straightforward!

We propose to add a function to help perform this operation on raw pointer types:

extension UnsafeRawPointer {
  public func roundedUp<T>(toAlignmentOf: T.type) -> Self
}

This function would round the current pointer up to the next address that satisfies the alignment of T. UnsafeRawPointer.roundedUp(toAlignmentOf:) would not return a different value when applied to a pointer that is already aligned with T.

The new function would make identifying the storage location of T much more straightforward than in the example above:

  init(_ element: T) {
    rawValue = .allocate(byteCount: MemoryLayout<NodeStorage>.size,
                         alignment: MemoryLayout<NodeStorage>.alignment)

    // bind and initialize atomic storage
    rawValue.initializeMemory(as: AtomicOptionalRepresentation.self,
                              repeating: AtomicOptionalRepresentation(nil),
                              count: 1)
    // bind and initialize payload storage
    rawValue.advanced(by: MemoryLayout<AtomicOptionalRepresentation>.size)
            .roundedUp(toAlignmentOf: T.self)
            .initializeMemory(as: T.self, repeating: element, count: 1)
  }

Ability to obtain a pointer to a member of an aggregate value

When using a pointer to a struct with multiple stored properties, it isn't obvious how to obtain pointers to more than one of the stored properties. For example, consider using the pthreads library, a major C API. The pthreads library uses the return value to indicate error conditions,
and modifies values through pointers it receives as parameters. It has many APIs with multiple pointer arguments. One would query a thread's scheduling parameters using pthread_getschedparam`, which has the following prototype:

int pthread_getschedparam(pthread_t tid, int *policy, struct sched_param *param);

A swift user, concerned with keeping related data packaged together, might have elected to define a struct thusly:

struct ThreadSchedulingParameters {
  var policy: Int
  var parameters: sched_param
  var priority: Int { parameters.sched_priority }
}

Updating a ThreadSchedulingParameters instance using the above C function is not obvious:

var scheduling = ThreadSchedulingParameters()
var tid = pthread_create(...)
var e = withUnsafeMutableBytes(of: &scheduling) { bytes in
  let o1 = MemoryLayout<ThreadSchedulingParameters>.offset(of: \.policy)!
  let policy_p = bytes.baseAddress!.advanced(by: o1).assumingMemoryBound(to: Int32.self)
  let o2 = MemoryLayout<ThreadSchedulingParameters>.offset(of: \.parameters)!
  let params_p = bytes.baseAddress!.advanced(by: o2).assumingMemoryBound(to: sched_param.self)
  return pthread_getschedparam(thread, policy_p, params_p)
}

We must first reach for the non-obvious withUnsafeMutableBytes rather than for withUnsafePointer. In so doing, we suppress statically-known type information, only to immediately assert the type using assumingMemoryBound. We can use KeyPathto do better. We shall add a new subscript toUnsafePointerandUnsafeMutablePointer`:

extension UnsafeMutablePointer {
  subscript<Property>(property: WritableKeyPath<Pointee, Property>) -> UnsafeMutablePointer<Property>? { get }
}

The return value of this subscript must be optional, because a KeyPath represents a property regardless of its kind (stored or computed). In the case of a computed property, there is no pointer to return and we must return nil.

With this new subscript, a correct call to pthread_getschedparam becomes the much simpler:

var e = withUnsafeMutablePointer(to: &scheduling) {
  pthread_getschedparam(thread, $0[\.policy]!, $0[\.parameters]!)
}

Add `unchecked` argument label to `UnsafePointer`'s integer subscript

In Swift, it is customary for subscripts to have a precondition that their argument be valid. It is reasonable and expected that UnsafePointer should have a less-safe subscript. Unfortunately, the unsafe usage is unmarked at the point of use.

We propose to replace (via deprecation) the existing subscript of UnsafePointer with a subscript that adds an argument label (unchecked). The label will help visually distinguish the unchecked pointer subscript from a "normal" (checked) subscript.

extension UnsafeMutablePointer {
  public subscript(unchecked i: Int) -> Pointee { get set }
}

There is precedent for using of the word "unchecked" in the standard library. It is frequently used in internal names: the word currently appears as part of a Swift symbol on 197 lines of the standard library source code. It is also used to mark unchecked preconditions in these public API:
@unchecked Sendable and Range.init(uncheckedBounds:).

Allow comparisons of pointers of any type

Pointers are effectively an index into the fundamental collection that is the computer's memory. Regardless of their type, they represent a unique storage location in memory. As such, having to cast the type of a pointer in order to be able to compare it to another is not a useful exercise.

It's very common to end up with a combination of Mutable and non-Mutable pointers into the same buffer, and the programmer needs to write conversions that satisfy the compiler but have no real effect in the generated code.

To remedy this, we propose to add the following static functions, scoped to the existing _Pointer protocol:

extension _Pointer {
  public static func == <Other: _Pointer>(lhs: Self, rhs: Other) -> Bool

  public static func <  <Other: _Pointer>(lhs: Self, rhs: Other) -> Bool
  public static func <= <Other: _Pointer>(lhs: Self, rhs: Other) -> Bool
  public static func >  <Other: _Pointer>(lhs: Self, rhs: Other) -> Bool
  public static func >= <Other: _Pointer>(lhs: Self, rhs: Other) -> Bool
}

Note that it is always possible to enclose both pointers in a conversion to UnsafeRawPointer. This addition simply removes the necessity to insert conversions that are always legal.

Detailed design

Note: please see the draft pull request or the full proposal for details.

Source compatibility

Most of the proposed changes are additive, and therefore are source-compatible.
The existing pointer subscript would be deprecated,
and a fixit will support an easy transition.

Effect on ABI stability

We intend to implement these changes in an ABI-neutral manner.

Effect on API resilience

The proposed additions will be public API,
and will all be marked @_alwaysEmitIntoClient to support back-deployability.

The deprecated integer subscript will remain in place,
and will therefore support pre-existing binaries.

Alternatives considered

API to obtain a pointer properly aligned to store a given type

Instead of the proposed function, we could add an API that simply takes an integer, and rounds the value of the pointer to a multiple of that number. We believe that having a type parameter is the correct default. The disadvantage is that it is not possible at this juncture to define a type whose alignment is greater than 16. Consequently this function cannot be used to obtain a pointer aligned to a cache line, for example. On the other hand, this API does not increase the difficulty to obtain such a pointer.

The name of the function could simply be advanced<T>(toAlignmentOf: T.type). This pairs well with the existing pointer advancement functions, but implies that it the returned value is always different from self. The name roundedUp correctly describes the idempotent behaviour.

There is a pre-existing internal API to obtain pointers aligned with a type's alignment, consisting of static members of MemoryLayout<T>. We believe that the functionality is a more natural fit as methods of Unsafe[Mutable]RawPointer.

API to obtain a pointer to a member of an aggregate value

It might be possible to use the @dynamicMemberLookup functionality to make this even more elegant. It isn't clear to the authors what the ABI impact of that approach would be. On the other hand, we know that the approach suggested above can be ABI-neutral.

We could provide the same functionality as a function instead:

func pointer<Property>(to: KeyPath<Pointee, Property>) -> UnsafePointer<Property>?

A subscript could be misconstrued as providing access directly to the stored property, but we feel that the subscript is nevertheless a more elegant solution,

Add `unchecked` argument label to `UnsafePointer`'s integer subscript

The community could decide not to do this. The authors believe that unsafe API would be improved by indicating the nature of their unsafety at the point of use, and this pitch is a first step for such improvements.

In addition to changing the UnsafePointer subscript, we could also add a subscript to UnsafeBufferPointer that includes the unchecked argument label. The behaviour of this additional subscript would be different from the behaviour of the existing integer subscript,
and would not be a replacement. As a reminder, UnsafeBufferPointer.subscript(_ i: Int) performs bounds-checking in debug mode, and skips bounds-checking in release mode.
This behaviour leads to optimization issues when there are three compilation units (the standard library, user code, and a third-party library that uses UnsafeBufferPointer), limiting the optimizations available to the library code.

Adding an unchecked subscript to UnsafeBufferPointer could help the ultimate performance of such third-party libraries. Changing the default behaviour of UnsafeBufferPointer's subscript with regards to bounds-checking is out of scope for this proposal.

Allow comparisons of pointers of any type

Compiler performance is a concern, and operator overloads have been the cause of performance issues in the past. Preliminary compiler performance testing suggests that this addition does not appreciably affect performance.

Acknowledgements

Thanks to Kyle Macomber and the Swift Standard Library team for valuable feedback.

Nobody1707 · October 11, 2021, 9:59pm

This looks good to me. Not having align pointers by hand is an obvious win, the rationale for the property subscript returning a nullable pointer is sound, and comparing memory locations directly is a perfectly reasonable thing to do. The only thing that even could be controversial is the unchecked subscript, but I'm still reasonably confident that will survive to the actual proposal stage.

lukasa · October 12, 2021, 9:11am

I think all of these improvements seem reasonable. At the risk of trolling, did we consider introducing a -> operator as a spelling for UnsafeMutablePointer(property:)? It's not quite the right operator, but it's close!

glessard · October 12, 2021, 3:53pm

That's more or less the @dynamicMemberLookup direction, and the operator for that would be .

Andrew_Trick · October 12, 2021, 6:58pm

I do worry that subscript (pointer[\.member]) implies pointer deference. It would be reasonable for that syntax to return or yield an element as opposed to a pointer. But people like the terseness of the subscript for, so I'm ok with it.

Using @dynamicMemberLookup to write pointer.member is both too magical too close to pointer.pointee for my taste.

glessard · October 12, 2021, 7:23pm

Agreed. FWIW, it also breaks the test suite in unexpected places.

ksluder · October 13, 2021, 3:22pm

Very happy to see this! One nit: the proposal uses roundedUp(toAlignmentOf:) in the example, but advanced(toAlignmentOf:) in the prose. Which is being proposed? My preference is for the latter.

glessard · October 13, 2021, 3:32pm

The pitch is for roundedUp. We originally had advanced, but it had some disadvantages, such as not having a clear "backward" counterpart, and that it doesn't imply idempotence. I corrected the post above.

kylemacomber · October 13, 2021, 3:34pm

After thinking about it more, I probably prefer the method spelling (rather than subscript) for obtaining a pointer to a member of an aggregate value:

var e = withUnsafeMutablePointer(to: &scheduling) {
  pthread_getschedparam(thread, $0[\.policy]!, $0[\.parameters]!)
}

var e = withUnsafeMutablePointer(to: &scheduling) {
  pthread_getschedparam(thread, $0.pointer(to: \.policy)!, $0.pointer(to: \.parameters)!)
}

Subscripts often imply dereferencing (e.g. array[0], dict[key], ptr[0], value[keyPath: keyPath]).
The subscript spelling leads to a large number of special characters being packed together tightly $0[\.]!.

woolsweater · October 14, 2021, 4:20am

Maybe the happy medium is a subscript with a label? pointer[at: \.member] or pointer[offsetTo: \.member]

johnno1962 · October 15, 2021, 9:02am

This is a source breaking change that is going to break a lot of my code. I don't feel this can really be discussed under the heading of "Pointer Usability Improvements". It is already clearly labeled as an UnsafePointer. Do we really need to become retentive about it and add an unchecked: label? Apart from this all the rest seems sensible,

glessard · October 15, 2021, 4:05pm

It is not a source-breaking change, as the original subscript would be deprecated rather than obsoleted. The consequence is added warnings, which do not break source. There is no change in semantics, either.

We would like to mark unsafety at the use site, and not stuff every type of unsafety under an unspecific "unsafe" label. UnsafePointer is unmanaged memory, yes, but not every API it has is always unsafe. For example, advancing a pointer is perfectly legal; it is dereferencing the pointer that is potentially incorrect, and therefore unsafe.

Personally, I see the Unsafe umbrella as being less than illuminating, because it does not distinguish manual memory management from bounds checking from pointer dereferencing from type safety from exclusivity checking. Marking UnsafePointer's subscript as unchecked is an attempt to begin rectifying this unhelpfulness.

glessard · October 15, 2021, 4:21pm

Naming the variable pointer as you do makes it read as well as the function, but consider an arbitrary variable name. The at label (or any preposition) doesn't work. offsetTo isn't so bad, but I feel it describes the mechanics more than the result. As of right now the updated proposal pitches the function form rather than the subscript form.

dabrahams · October 18, 2021, 2:37am

I'm really not, though. This seems like an obvious sacrifice of clarity to get brevity, which runs against the grain of Swift.

scanon · October 18, 2021, 1:43pm

I'm not sure how I feel about the spelling of roundedUp<T>(toAlignmentOf: T.type). Semantically, the operation being performed is aligning, not rounding. It's only "rounding" if you view the address space as being laid out in fractions of a T, which is totally valid but a little bit weird to enshrine in the name.

I think I would prefer something like align<T>(for: T.type, moving: Direction = .forward), but I'm not totally convinced that's quite right either.

ksluder · October 18, 2021, 3:50pm

When is it practically useful to align downwards? Perhaps when implementing alloca?

scanon · October 18, 2021, 4:18pm

Anything alloca-like, or to get the address of the ragged end of an array that was otherwise processed in larger chunks with aligned vector operations, or a bunch of other reasons.

ksluder · October 18, 2021, 4:22pm

In that case I think we really want something like advanced<T>(byAlignedMultiple: Int, of: T.Type), so you can pass a positive or negative integer.

Which of course brings up the question of whether advancing by a negative integer should avoid overlapping with the current Pointee, which can’t be known for UnsafeRawPointer…

glessard · October 18, 2021, 4:25pm

That ends up requiring a version that is not idempotent, and in the simple case that ends up requiring additional advanced calls , or subtracting 1 from a size, or such things. I'd rather get a properly aligned value, then move forward or back by a number of strides if necessary.

scanon · October 18, 2021, 4:26pm

You don't want to use "advanced", because that implies that it always moves. In the case where the pointer is already aligned, it should be preserved. As Guillaume said, the key is that this operation should be idempotent, no matter how it is spelled.

[Pitch] Pointer Usability Improvements

Pointer API Usability Improvements

Introduction

Motivation

Proposed solution

Ability to obtain a pointer properly aligned to store a given type

Ability to obtain a pointer to a member of an aggregate value

Add unchecked argument label to UnsafePointer's integer subscript

Allow comparisons of pointers of any type

Detailed design

Source compatibility

Effect on ABI stability

Effect on API resilience

Alternatives considered

API to obtain a pointer properly aligned to store a given type

API to obtain a pointer to a member of an aggregate value

Add unchecked argument label to UnsafePointer's integer subscript

Allow comparisons of pointers of any type

Acknowledgements

Add `unchecked` argument label to `UnsafePointer`'s integer subscript

Add `unchecked` argument label to `UnsafePointer`'s integer subscript