[Pitch] Expand usability of `withMemoryRebound`

Hi!

This is the first of a series of proposed improvements to the UnsafePointer and UnsafeBufferPointer families. I look forward to your feedback!

EDIT: an updated pitch document can be found here.

Expand usability of withMemoryRebound

Introduction

The function withMemoryRebound(to:capacity:_ body:)
executes a closure while temporarily binding a range of memory to a different type than the callee is bound to.
We propose to lift some notable limitations of withMemoryRebound and enable rebinding to a larger set of types,
as well as rebinding from raw memory pointers and buffers.

Swift-evolution thread: Discussion thread TK

Motivation

When using Swift in a systems programming context or using Swift with libraries written in C,
we occasionally need to temporarily access a range of memory as instances of a different type than has been declared
(the pointer's Pointee type parameter).
In those cases, withMemoryRebound is the tool to reach for,
allowing scoped access to the range of memory as another type.

As a reminder, the function is declared as follows on the type UnsafePointer<Pointee>:

func withMemoryRebound<T, Result>(
  to type: T.Type,
  capacity count: Int,
  _ body: (UnsafePointer<T>) throws -> Result
) rethrows -> Result

This function is currently more limited than necessary.
It requires that the stride of Pointee and T be equal.
This requirement makes many legitimate use cases technically illegal,
even though they could be supported by the compiler.

We propose to allow temporarily binding to a type T whose stride is
a whole fraction or whole multiple of Pointee's stride,
when the starting address is properly aligned for type T.
As before, T's memory layout must be compatible with that ofPointee.

For example, imagine that a buffer of Double consisting of a series of (x,y) pairs is returned from data analysis code written in C.
The next step might be to display it in a preview graph, which needs to read CGPoint values.
We need to copy the Double values as pairs to values of type CGPoint:

var count = 0
let pointer: UnsafePointer<Double> = calculation(&count)

var points = Array<CGPoint>(unsafeUninitializedCapacity: count/2) {
  buffer, initializedCount in
  var p = pointer
  for i in buffer.indices where p+1 < pointer+count {
    buffer[i] = CGPoint(x: p[0], y: p[1])
    p += 2
  }
  initializedCount = buffer.count
}

We could do better with an improved version of withMemoryRebound.
Since CGPoint values consist of a pair of CGFloat values,
and CGFloat values are themselves layout-compatible with Double:

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/2) {
  buffer, initializedCount in
  pointer.withMemoryRebound(to: CGPoint.self, capacity: buffer.count) {
    (_, initializedCount) = buffer.initialize(from: UnsafeBufferPointer(start: $0, count: buffer.count))
  }
}

Alternately, the data could have been received as bytes from a network request, wrapped in a Data instance.
Previously we would have needed to do:

let data: Data = ...

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/MemoryLayout<CGPoint>.stride) {
  buffer, initializedCount in
  data.withUnsafeBytes { data in
    var read = 0
    for i in buffer.indices where (read+2*MemoryLayout<CGFloat>.stride)<=data.count {
      let x = data.load(fromByteOffset: read, as: CGFloat.self)
      read += MemoryLayout<CGFloat>.stride
      let y = data.load(fromByteOffset: read, as: CGFloat.self)
      read += MemoryLayout<CGFloat>.stride
      buffer[i] = CGPoint(x: x, y: y)
    }
    initializedCount = read / MemoryLayout<CGPoint>.stride
  }
}

In this case having the ability to use withMemoryRebound with UnsafeRawBuffer improves readability in a similar manner as in the example above:

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/MemoryLayout<CGPoint>.stride) {
  buffer, initializedCount in
  data.withUnsafeBytes {
    $0.withMemoryRebound(to: CGPoint.self) {
      (_, initializedCount) = buffer.initialize(from: $0)
    }
  }
}

Proposed solution

We propose to lift the restriction that the strides of T and Pointee must be equal.
This means that it will now be considered correct to re-bind from a homogeneous aggregate type to the type of its constitutive elements,
as they are layout compatible, even though their stride is different.

Instance methods of UnsafePointer<Pointee> and UnsafeMutablePointer<Pointee>

We propose to lift the restriction that the strides of T and Pointee must be equal, when calling withMemoryRebound.
The function declarations remain the same on these two types,
though given the relaxed restriction,
we must clarify the meaning of the capacity argument.
capacity shall mean the number of strides of elements of the temporary type (T) to be temporarily bound.
The documentation will be updated to reflect the changed behaviour.
We will also add parameter labels to the closure type declaration to benefit code completion (a source compatible change.)

extension UnsafePointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafePointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutablePointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafeMutablePointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeRawPointer and UnsafeMutableRawPointer

We propose adding a withMemoryRebound method, which currently does not exist on these types.
Since it operates on raw memory, this version of withMemoryRebound places no restriction on the temporary type (T).
It is therefore up to the program author to ensure type safety when using these methods.
As in the UnsafePointer case, capacity means the number of strides of elements of the temporary type (T) to be temporarily bound.

extension UnsafeRawPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafePointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutableRawPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafeMutablePointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeBufferPointer and UnsafeMutableBufferPointer

We propose to lift the restriction that the strides of T and Pointee must be equal, when calling withMemoryRebound.
The function declarations remain the same on these two types.
The capacity of the buffer to the temporary type will be calculated using the length of the UnsafeBufferPointer<Element> and the stride of the temporary type.
The documentation will be updated to reflect the changed behaviour.
We will add parameter labels to the closure type declaration to benefit code completion (a source compatible change.)

extension UnsafeBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutableBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeMutableBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeRawBufferPointer and UnsafeMutableRawBufferPointer

We propose adding a withMemoryRebound method, which currently does not exist on these types.
Since it operates on raw memory, this version of withMemoryRebound places no restriction on the temporary type (T).
It is therefore up to the program author to ensure type safety when using these methods.
The capacity of the buffer to the temporary type will be calculated using the length of the UnsafeRawBufferPointer and the stride of the temporary type.

Finally the set, we propose to add an assumingMemoryBound function that calculates the capacity of the returned UnsafeBufferPointer.

extension UnsafeRawBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
  
  public func assumingMemoryBound<T>(to type: T.Type) -> UnsafeBufferPointer<T>
}

extension UnsafeMutableRawBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeMutableBufferPointer<T>) throws -> Result
  ) rethrows -> Result

  public func assumingMemoryBound<T>(to type: T.Type) -> UnsafeMutableBufferPointer<T>
}

Detailed design

Note: please see the draft PR or the full proposal for details.

Source compatibility

This proposal is source-compatible.
Some changes are compatible with existing correct uses of the API,
while others are additive.

Effect on ABI stability

This proposal consists of ABI-preserving changes and ABI-additive changes.

Effect on API resilience

The behaviour change for the withMemoryRebound is compatible with previous uses,
since restrictions were lifted.
Code that depends on the new semantics may not be compatible with old versions of these functions.
Back-deployment of new binaries will be supported by making the updated versions @_alwaysEmitIntoClient.
Compatibility of old binaries with a new standard library will be supported by ensuring that a compatible entry point remains.

Alternatives considered

One alternative is to implement none of this change, and leave withMemoryRebound as is.
The usability problems of withMemoryRebound would remain.

Another alternative is to leave the type layout restrictions as they are for the typed Pointer and BufferPointer types,
but add the withMemoryRebound functions to the RawPointer and RawBufferPointer variants.
In that case, the stride restriction would be no more than a speedbump,
because it would be straightforward to bypass it by transiting through the appropriate Raw variant.

17 Likes

Having written code in the Foundation overlay that extremely carefully reinterprets an AnyObject buffer (8 byte stride) as a String buffer (16 byte stride), I'm obviously very much in favor of this proposal :slight_smile:

7 Likes

This example should probably say “on 64-bit platforms” or suchlike, according to the documentation for CGFloat.

2 Likes

Noted: gist updated.

1 Like

Could the body of the withMemoryRebound(to:capacity:_:) methods be given a buffer pointer? Otherwise, what's the point of providing the capacity/count argument?

It actually tells the compiler the range of memory that needs to be bound to the temporary type. This affects the available optimizations. Without the range, the compiler would not know to re-bind any more than 1 stride worth of memory. We could possibly add a version that passes a BufferPointer to the closure. The re-bound pointer is typically what would be needed when calling a C function, for example.

1 Like

It would simplify your example:

 var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/2) {
   buffer, initializedCount in
   pointer.withMemoryRebound(to: CGPoint.self, capacity: buffer.count) {
-    (_, initializedCount) = buffer.initialize(from: UnsafeBufferPointer(start: $0, count: buffer.count))
+    (_, initializedCount) = buffer.initialize(from: $0)
   }
 }

But could the new methods have a different argument label to disambiguate? For example, count instead of capacity, if that seems more correct? Would the old methods be deprecated?

The current methods are ABI and inlinable, so I don't think we would want to deprecate them.

Note that there are many ways to write the example, such as:

points = Array<CGPoint>(unsafeUninitializedCapacity: count/2) {
  buffer, initializedCount in
  pointer.withMemoryRebound(to: CGPoint.self, capacity: buffer.count) {
    buffer.baseAddress!.initialize(from: $0, count: buffer.count)
  }
  initializedCount = buffer.count
}

I may have unwisely picked which version to use for the example.

1 Like

What's the difference between binding a raw pointer's memory to a type via withMemoryRebound vs. the existing bindMemory API?

assumingMemoryBound on URBP? Yes, please! I use this in my code quite often; it's really great if you need to use a tuple as a small array/mutable collection. I remember being quite worried when I first added this helper to my code, not knowing if it was safe to just assumingMemoryBound the base address and assume the entire region of memory was bound. I did a fair amount of research, digging through the forums and stdlib source, and ultimately concluded that it was safe, because aMB literally does nothing - unlike bindMemory, it doesn't call any built-ins, it quite literally just constructs the requested Swift pointer struct. It's just like a C-style pointer cast, which is what makes it so dangerous if used incorrectly.

Anyway, it's a useful operation, and was not obvious to me that it was safe. It might also not be obvious to others, so I think we should add it and document it, and not require developers to roll their own.

But I think there's a typo - the snippet shows adding it to UMBP, not UMRBP (I know, it's a real word-salad).

2 Likes

At the end of the scope, the memory-binding state is reverted to what it was at the beginning. This requires a new built-in that isn't ready yet, but will be soon.

The problem with just using bindMemory is that you might not know what binding state to change back to. With the typed pointers, it's easy to know what binding state to revert to: Pointee.

Oh dear. Thanks!

3 Likes

Big +1!

Thanks for everyone's feedback. I made modest updates to the proposal document. The updated version is pasted below:

Expand usability of withMemoryRebound

Introduction

The function withMemoryRebound(to:capacity:_ body:)
executes a closure while temporarily binding a range of memory to a different type than the callee is bound to.
We propose to lift some notable limitations of withMemoryRebound and enable rebinding to a larger set of types,
as well as rebinding from raw memory pointers and buffers.

Swift-evolution thread: Pitch thread

Motivation

When using Swift in a systems programming context or using Swift with libraries written in C,
we occasionally need to temporarily access a range of memory as instances of a different type than has been declared
(the pointer's Pointee type parameter).
In those cases, withMemoryRebound is the tool to reach for,
allowing scoped access to the range of memory as another type.

As a reminder, the function is declared as follows on the type UnsafePointer<Pointee>:

func withMemoryRebound<T, Result>(
  to type: T.Type,
  capacity count: Int,
  _ body: (UnsafePointer<T>) throws -> Result
) rethrows -> Result

This function is currently more limited than necessary.
It requires that the stride of Pointee and T be equal.
This requirement makes many legitimate use cases technically illegal,
even though they could be supported by the compiler.

We propose to allow temporarily binding to a type T whose stride is
a whole fraction or whole multiple of Pointee's stride,
when the starting address is properly aligned for type T.
As before, T's memory layout must be compatible with that ofPointee.

For example, suppose that a buffer of Double consisting of a series of (x,y) pairs is returned from data analysis code written in C.
The next step might be to display it in a preview graph, which needs to read CGPoint values.
We need to copy the Double values as pairs to values of type CGPoint (when executing on a 64-bit platform):

var count = 0
let pointer: UnsafePointer<Double> = calculation(&count)

var points = Array<CGPoint>(unsafeUninitializedCapacity: count/2) {
  buffer, initializedCount in
  var p = pointer
  for i in buffer.indices where p+1 < pointer+count {
    buffer.baseAddress!.advanced(by: i).initialize(to: CGPoint(x: p[0], y: p[1]))
    p += 2
  }
  initializedCount = pointer.distance(to: p)/2
}

We could do better with an improved version of withMemoryRebound.
Since CGPoint values consist of a pair of CGFloat values,
and CGFloat values are themselves layout-compatible with Double (when executing on a 64-bit platform):

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/2) {
  buffer, initializedCount in
  pointer.withMemoryRebound(to: CGPoint.self, capacity: buffer.count) {
    buffer.baseAddress!.initialize(from: $0, count: buffer.count)
  }
  initializedCount = buffer.count
}

Alternately, the data could have been received as bytes from a network request, wrapped in a Data instance.
Previously we would have needed to do:

let data: Data = ...

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/MemoryLayout<CGPoint>.stride) {
  buffer, initializedCount in
  data.withUnsafeBytes { data in
    var read = 0
    for i in buffer.indices where (read+2*MemoryLayout<CGFloat>.stride)<=data.count {
      let x = data.load(fromByteOffset: read, as: CGFloat.self)
      read += MemoryLayout<CGFloat>.stride
      let y = data.load(fromByteOffset: read, as: CGFloat.self)
      read += MemoryLayout<CGFloat>.stride
      buffer.baseAddress!.advanced(by: i).initialize(to: CGPoint(x: x, y: y))
    }
    initializedCount = read / MemoryLayout<CGPoint>.stride
  }
}

In this case having the ability to use withMemoryRebound with UnsafeRawBuffer improves readability in a similar manner as in the example above:

var points = Array<CGPoint>(unsafeUninitializedCapacity: data.count/MemoryLayout<CGPoint>.stride) {
  buffer, initializedCount in
  data.withUnsafeBytes {
    $0.withMemoryRebound(to: CGPoint.self) {
      (_, initializedCount) = buffer.initialize(from: $0)
    }
  }
}

Proposed solution

We propose to lift the restriction that the strides of T and Pointee must be equal.
This means that it will now be considered correct to re-bind from a homogeneous aggregate type to the type of its constitutive elements,
as they are layout compatible, even though their stride is different.

Instance methods of UnsafePointer<Pointee> and UnsafeMutablePointer<Pointee>

We propose to lift the restriction that the strides of T and Pointee must be equal, when calling withMemoryRebound.
The function declarations remain the same on these two types,
though given the relaxed restriction,
we must clarify the meaning of the capacity argument.
capacity shall mean the number of strides of elements of the temporary type (T) to be temporarily bound.
The documentation will be updated to reflect the changed behaviour.
We will also add parameter labels to the closure type declaration to benefit code completion (a source compatible change.)

extension UnsafePointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafePointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutablePointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafeMutablePointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeRawPointer and UnsafeMutableRawPointer

We propose adding a withMemoryRebound method, which currently does not exist on these types.
Since it operates on raw memory, this version of withMemoryRebound places no restriction on the temporary type (T).
It is therefore up to the program author to ensure type safety when using these methods.
As in the UnsafePointer case, capacity means the number of strides of elements of the temporary type (T) to be temporarily bound.

extension UnsafeRawPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafePointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutableRawPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    capacity count: Int,
    _ body: (_ pointer: UnsafeMutablePointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeBufferPointer and UnsafeMutableBufferPointer

We propose to lift the restriction that the strides of T and Pointee must be equal, when calling withMemoryRebound.
The function declarations remain the same on these two types.
The capacity of the buffer to the temporary type will be calculated using the length of the UnsafeBufferPointer<Element> and the stride of the temporary type.
The documentation will be updated to reflect the changed behaviour.
We will add parameter labels to the closure type declaration to benefit code completion (a source compatible change.)

extension UnsafeBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}

extension UnsafeMutableBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeMutableBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}

Instance methods of UnsafeRawBufferPointer and UnsafeMutableRawBufferPointer

We propose adding a withMemoryRebound method, which currently does not exist on these types.
Since it operates on raw memory, this version of withMemoryRebound places no restriction on the temporary type (T).
It is therefore up to the program author to ensure type safety when using these methods.
The capacity of the buffer to the temporary type will be calculated using the length of the UnsafeRawBufferPointer and the stride of the temporary type.

Finally the set, we propose to add an assumingMemoryBound function that calculates the capacity of the returned UnsafeBufferPointer.

extension UnsafeRawBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
  
  public func assumingMemoryBound<T>(to type: T.Type) -> UnsafeBufferPointer<T>
}

extension UnsafeMutableRawBufferPointer {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (_ buffer: UnsafeMutableBufferPointer<T>) throws -> Result
  ) rethrows -> Result

  public func assumingMemoryBound<T>(to type: T.Type) -> UnsafeMutableBufferPointer<T>
}

Detailed design

Note: please see the draft PR or the full proposal for details.

Source compatibility

This proposal is source-compatible.
Some changes are compatible with existing correct uses of the API,
while others are additive.

Effect on ABI stability

This proposal consists of ABI-preserving changes and ABI-additive changes.

Effect on API resilience

The behaviour change for the withMemoryRebound is compatible with previous uses,
since restrictions were lifted.
Code that depends on the new semantics may not be compatible with old versions of these functions.
Back-deployment of new binaries will be supported by making the updated versions @_alwaysEmitIntoClient.
Compatibility of old binaries with a new standard library will be supported by ensuring that a compatible entry point remains.

Alternatives considered

One alternative is to implement none of this change, and leave withMemoryRebound as is.
The usability problems of withMemoryRebound would remain.

Another alternative is to leave the type layout restrictions as they are for the typed Pointer and BufferPointer types,
but add the withMemoryRebound functions to the RawPointer and RawBufferPointer variants.
In that case, the stride restriction would be no more than a speedbump,
because it would be straightforward to bypass it by transiting through the appropriate Raw variant.

4 Likes