[pitch] Pointer Family Initialization Improvements & Better Buffer Slices

Rounding out (for the moment) a series of proposed improvements to the UnsafePointer and UnsafeBufferPointer families, this pitch merges two previous pitches that were a little too similar. The result is rather large, with apologies.

Besides merging those two pitches, the changes here consist mainly of improvements in the motivations and discussions.

Pointer Family Initialization Improvements & Better Buffer Slices

Introduction

The types in the UnsafeMutablePointer family typically require manual management of memory allocations, including the management of their initialization state. Unfortunately, not every relevant type in the family has the necessary functionality to fully manage the initialization state of the memory it represents. The states involved are, after allocation:

  1. Unbound and uninitialized (as returned from UnsafeMutableRawPointer.allocate())
  2. Bound to a type, and uninitialized (as returned from UnsafeMutablePointer<T>.allocate())
  3. Bound to a type, and initialized

Memory can be safely deallocated whenever it is uninitialized.

We intend to round out initialization functionality for every relevant member of that family: UnsafeMutablePointer, UnsafeMutableRawPointer, UnsafeMutableBufferPointer, UnsafeMutableRawBufferPointer, Slice<UnsafeMutableBufferPointer> and Slice<UnsafeMutableRawBufferPointer>. The functionality will allow managing initialization state in a much greater variety of situations, including easier handling of partially-initialized buffers.

Motivation

Memory allocated using UnsafeMutablePointer, UnsafeMutableRawPointer, UnsafeMutableBufferPointer and UnsafeMutableRawBufferPointer is passed to the user in an uninitialized state. In the general case, such memory needs to be initialized before it is used in Swift. Memory can be "initialized" or "uninitialized". We hereafter refer to this as a memory region's "initialization state".

The methods of UnsafeMutablePointer that interact with initialization state are:

  • func initialize(to value: Pointee)
  • func initialize(repeating repeatedValue: Pointee, count: Int)
  • func initialize(from source: UnsafePointer<Pointee>, count: Int)
  • func assign(repeating repeatedValue: Pointee, count: Int)
  • func assign(from source: UnsafePointer<Pointee>, count: Int)
  • func move() -> Pointee
  • func moveInitialize(from source: UnsafeMutablePointer<Pointee>, count: Int)
  • func moveAssign(from source: UnsafeMutablePointer<Pointee>, count: Int)
  • func deinitialize(count: Int) -> UnsafeMutableRawPointer

This is a fairly complete set.

  • The initialize functions change the state of memory locations from uninitialized to initialized,
    then assign the corresponding value(s).
  • The assign functions update the values stored at memory locations that have previously been initialized.
  • deinitialize changes the state of a range of memory from initialized to uninitialized.
  • The move() function deinitializes a memory location, then returns its current contents.
  • The move prefix means that the source range of memory will be deinitialized after the function returns.

Unfortunately, UnsafeMutablePointer is the only one of the list of types listed in the introduction to allow full control of initialization state, and this means that complex use cases such as partial initialization of a buffer become overly complicated.

An example of partial initialization is the insertion of elements in the middle of a collection. This is one of the possible operations needed in an implementation of RangeReplaceableCollection.replaceSubrange(_:with:). Given a RangeReplaceableCollection whose unique storage can be represented by a partially-initialized UnsafeMutableBufferPointer:

mutating func replaceSubrange<C>(_ subrange: Range<Index>, with newElements: C)
  where C: Collection, Element == C.Element {

  // obtain unique storage as UnsafeMutableBufferPointer
  let buffer: UnsafeMutableBufferPointer<Element> = self.myUniqueStorage()
  let oldCount = self.count
  let growth = newElements.count - subrange.count
  let newCount = oldCount + growth
  if growth > 0 {
    assert(newCount < buffer.count)
    let oldTail = subrange.upperBound..<oldCount
    let newTail = subrange.upperBound+growth..<newCount
    let oldTailBase = buffer.baseAddress!.advanced(by: oldTail.lowerBound)
    let newTailBase = buffer.baseAddress!.advanced(by: newTail.lowerBound)
    newTailBase.moveInitialize(from: oldTailBase,
                               count: oldCount - subrange.upperBound)

    // Update still-initialized values in the original subrange
    var j = newElements.startIndex
    for i in subrange {
      buffer[i] = newElements[j]
      newElements.formIndex(after: &j)
    }
    // Initialize the remaining range
    for i in subrange.upperBound..<newTail.lowerBound {
      buffer.baseAddress!.advanced(by: i).initialize(to: newElements[j])
      newElements.formIndex(after: &j)
    }
    assert(newElements.distance(from: newElements.startIndex, to: j) == newElements.count)
  }
  ...
}

Here, we had to convert to UnsafeMutablePointer to use some of its API, as well as resort to element-by-element copying and initialization. With API enabling buffer operations on the slices of buffers, we could simplify things greatly:

mutating func replaceSubrange<C>(_ subrange: Range<Index>, with newElements: C)
  where C: Collection, Element == C.Element {

  // obtain unique storage as UnsafeMutableBufferPointer
  let buffer: UnsafeMutableBufferPointer<Element> = self.myUniqueStorage()
  let oldCount = self.count
  let growth = newElements.count - subrange.count
  let newCount = oldCount + growth
  if growth > 0 {
    assert(newCount < buffer.count)
    let oldTail = subrange.upperBound..<count
    let newTail = subrange.upperBound+growth..<newCount
    var m = buffer[newTail].moveInitialize(fromElements: buffer[oldTail])
    assert(m == newTail.upperBound)

    // Update still-initialized values in the original subrange
    m = buffer[subrange].update(fromElements: newElements)
    // Initialize the remaining range
    m = buffer[m..<newTail.lowerBound].initialize(
      fromElements: newElements.dropFirst(m - subrange.lowerBound)
    )
    assert(m == newTail.lowerBound)
  }
  ...
}

In addition to simplifying the implementation, the new methods have the advantage of having the same bounds-checking behaviour as UnsafeMutableBufferPointer, relieving the implementation from being required to do its own bounds checking.

This proposal aims to add API to control initialization state and improve multiple-element copies for UnsafeMutableBufferPointer, UnsafeMutableRawBufferPointer, Slice<UnsafeMutableBufferPointer> and Slice<UnsafeMutableRawBufferPointer>.

Proposed solution

Note: the pseudo-diffs presented in this section denotes added functions with +++ and renamed functions with ---. Unmarked functions are unchanged.

UnsafeMutableBufferPointer

We propose to modify UnsafeMutableBufferPointer as follows:

extension UnsafeMutableBufferPointer {
    func initialize(repeating repeatedValue: Element)
    func initialize<S>(from source: S) -> (S.Iterator, Index) where S: Sequence, S.Element == Element
+++ func initialize<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
--- func assign(repeating repeatedValue: Element)
+++ func update(repeating repeatedValue: Element)
+++ func update<S>(from source: S) -> (unwritten: S.Iterator, updated: Index) where S: Sequence, S.Element == Element
+++ func update<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
+++ func moveInitialize(fromElements: UnsafeMutableBufferPointer) -> Index
+++ func moveInitialize(fromElements: Slice<UnsafeMutableBufferPointer>) -> Index
+++ func moveUpdate(fromElements: `Self`) -> Index
+++ func moveUpdate(fromElements: Slice<`Self`>) -> Index
+++ func deinitialize() -> UnsafeMutableRawBufferPointer

+++ func initializeElement(at index: Index, to value: Element)
+++ func updateElement(at index: Index, to value: Element)
+++ func moveElement(from index: Index) -> Element
+++ func deinitializeElement(at index: Index)
}

We would like to use the verb update instead of assign, in order to better communicate the intent of the API. It is currently a common programmer error to use one of the existing assign functions for uninitialized memory; using the verb update instead would express the precondition in the API itself.

The methods that initialize or update from a Collection will have forgiving semantics, and copy the number of elements that they can, be that every available element or none, and then return the index in the buffer that follows the last element copied, which is cheaper than returning an iterator and a count. Unlike the existing Sequence functions, they include no preconditions beyond having a valid Collection and valid buffer, with the understanding that if a user needs stricter behaviour, it can be composed from these functions.

The above changes include a method to update a single element. Evidently that is a synonym for the subscript(_ i: Index) setter. We hope that documenting the update action specifically will help clarify the requirements of that action, namely that the buffer element must already be initialized. Experience shows that the initialization requirement of the subscript setter is frequently missed by users in the current situation, where it is only documented along with the subscript getter.

UnsafeMutablePointer

The proposed modifications to UnsafeMutablePointer are renamings:

extension UnsafeMutablePointer {
    func initialize(to value: Pointee)
    func initialize(repeating repeatedValue: Pointee, count: Int)
    func initialize(from source: UnsafePointer<Pointee>, count: Int)
+++ func update(to value: Pointee)
--- func assign(repeating repeatedValue: Pointee, count: Int)
+++ func update(repeating repeatedValue: Pointee, count: Int)
--- func assign(from source: UnsafePointer<Pointee>, count: Int)
+++ func update(from source: UnsafePointer<Pointee>, count: Int)
    func move() -> Pointee
    func moveInitialize(from source: UnsafeMutablePointer, count: Int)
--- func moveAssign(from source: UnsafeMutablePointer, count: Int)
+++ func moveUpdate(from source: UnsafeMutablePointer, count: Int)
    func deinitialize(count: Int) -> UnsafeMutableRawPointer
}

The motivation for these renamings are explained above.

UnsafeMutableRawBufferPointer

We propose to add new functions to initialize memory referenced by UnsafeMutableRawBufferPointer instances.

extension UnsafeMutableRawBufferPointer {
    func initializeMemory<T>(
      as type: T.Type, repeating repeatedValue: T
    ) -> UnsafeMutableBufferPointer<T>

  	func initializeMemory<S>(
      as type: S.Element.Type, from source: S
    ) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>) where S: Sequence

+++ func initializeMemory<C>(
      as type: C.Element.Type, fromElements: C
		) -> UnsafeMutableBufferPointer<C.Element> where C: Collection

+++ func moveInitializeMemory<T>(
  		as type: T.Type, fromElements: UnsafeMutableBufferPointer<T>
		) -> UnsafeMutableBufferPointer<T>

+++ func moveInitializeMemory<T>(
  		as type: T.Type, fromElements: Slice<UnsafeMutableBufferPointer<T>>
		) -> UnsafeMutableBufferPointer<T>
}

The first addition will initialize raw memory from a Collection and have similar behaviour as UnsafeMutableBufferPointer.initialize(fromElements:), described above. The other two initialize raw memory by moving data from another range of memory, leaving that other range of memory deinitialized.

UnsafeMutableRawPointer
extension UnsafeMutableRawPointer {
+++ func initializeMemory<T>(as type: T.Type, to value: T) -> UnsafeMutablePointer<T>

  	func initializeMemory<T>(
      as type: T.Type, repeating repeatedValue: T, count: Int
    ) -> UnsafeMutablePointer<T>

  	func initializeMemory<T>(
      as type: T.Type, from source: UnsafePointer<T>, count: Int
    ) -> UnsafeMutablePointer<T>

  	func moveInitializeMemory<T>(
      as type: T.Type, from source: UnsafeMutablePointer<T>, count: Int
    ) -> UnsafeMutablePointer<T>
}

The addition here initializes a single value.

Slices of BufferPointer

We propose to add to slices of Unsafe[Mutable][Raw]BufferPointer all the BufferPointer-specific methods of their Base. The following declarations detail the additions, which are all intended to behave exactly as the functions on the base BufferPointer types:

extension Slice<UnsafeBufferPointer<T>> {
  public func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}
extension Slice<UnsafeMutableBufferPointer<T>> {
  func initialize(repeating repeatedValue: Element)

  func initialize<S: Sequence>(from source: S) -> (S.Iterator, Index)
    where S.Element == Element

  func initialize<C: Collection>(fromElements: C) -> Index
    where C.Element == Element

  func update(repeating repeatedValue: Element)

  func update<S: Sequence>(
    from source: S
  ) -> (iterator: S.Iterator, updated: Index) where S.Element == Element

  func update<C: Collection>(
    fromElements: C
  ) -> Index where C.Element == Element

  func moveInitialize(fromElements source: UnsafeMutableBufferPointer<Element>) -> Index
  func moveInitialize(fromElements source: Slice<UnsafeMutableBufferPointer<Element>>) -> Index
  func moveUpdate(fromElements source: UnsafeMutableBufferPointer<Element>) -> Index
  func moveUpdate(fromElements source: Slice<UnsafeMutableBufferPointer<Element>>) -> Index

  func deinitialize() -> UnsafeMutableRawBufferPointer

  func initializeElement(at index: Index, to value: Element)
  func updateElement(at index: Index, to value: Element)
  func moveElement(at index: Index) -> Element
  func deinitializeElement(at index: Index)

  func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (UnsafeMutableBufferPointer<T>) throws -> Result
	) rethrows -> Result
}
extension Slice<UnsafeRawBufferPointer> {
  func bindMemory<T>(to type: T.Type) -> UnsafeBufferPointer<T>
  func assumingMemoryBound<T>(to type: T.Type) -> UnsafeBufferPointer<T>

  func withMemoryRebound<T, Result>(
    to type: T.Type, _ body: (UnsafeBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}
extension Slice<UnsafeMutableRawBufferPointer> {
	func copyMemory(from source: UnsafeRawBufferPointer)
  func copyBytes<C: Collection>(from source: C) where C.Element == UInt8

  func initializeMemory<T>(
    as type: T.Type, repeating repeatedValue: T
  ) -> UnsafeMutableBufferPointer<T>

  func initializeMemory<S: Sequence>(
    as type: S.Element.Type, from source: S
  ) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>)

  func initializeMemory<C: Collection>(
    as type: C.Element.Type, fromElements: C
  ) -> UnsafeMutableBufferPointer<C.Element>

  func moveInitializeMemory<T>(
    as type: T.Type, fromElements: UnsafeMutableBufferPointer<T>
  ) -> UnsafeMutableBufferPointer<T>

  func moveInitializeMemory<T>(
    as type: T.Type, fromElements: Slice<UnsafeMutableBufferPointer<T>>
  ) -> UnsafeMutableBufferPointer<T>

  func bindMemory<T>(to type: T.Type) -> UnsafeMutableBufferPointer<T>
  func assumingMemoryBound<T>(to type: T.Type) -> UnsafeMutableBufferPointer<T>

  func withMemoryRebound<T, Result>(
    to type: T.Type,
    _ body: (UnsafeMutableBufferPointer<T>) throws -> Result
  ) rethrows -> Result
}

Detailed design

Note: please see the draft PR or the full proposal for details.

Source compatibility

This proposal consists mostly of additions, which are by definition source compatible.

The proposal includes the renaming of four existing functions from assign to update. The existing function names would be deprecated, producing a warning. A fixit will support an easy transition to the renamed versions of these functions.

Effect on ABI stability

The functions proposed here are generally small wrappers around existing functionality. We expect to implement them as @_alwaysEmitIntoClient functions, which means they would have no ABI impact.

The renamed functions can reuse the existing symbol, while the deprecated functions can forward using an @_alwaysEmitIntoClient stub to support the functionality under its previous name. This means they would have no ABI impact.

Effect on API resilience

All functionality implemented as @_alwaysEmitIntoClient will back-deploy. Renamed functions that reuse a previous symbol will also back-deploy.

Alternatives considered

Single element update functions

The single-element update functions, UnsafeMutablePointer.update(to:) and UnsafeMutableBufferPointer.updateElement(at:to:), are synonyms for the setters of UnsafeMutablePointer.pointee and UnsafeMutableBufferPointer.subscript(_ i: Index), respectively. Clearly we can elect to not add them.

The setters in question, like the update functions, have a required precondition that the memory they refer to must be initialized. Somehow this precondition is often overlooked and leads to bug reports. The proposed names and cross-references should help clarify the requirements to users.

Renaming assign to update

The renaming of assign to update could be omitted entirely, although we believe that update communicates intent much better than assign does. In The Swift Programming Language, the = symbol is named "the assignment operator", and its function is described as to either initialize or to update a value. The current name (assign) is not as clear as the documentation in TSPL, while the proposed name (update) builds on it.

There are only four current symbols to be renamed by this proposal, and their replacements are easily migrated by a fixit. For context, this renaming would change only 6 lines of code in the standard library, outside of the function definitions. If the renaming is omitted, the four new functions proposed in the family should use the name assign as well. The two single-element versions would be assign(_ value:) and assignElement(at:_ value:).

Element-by-element copies from Collection inputs

The initialization and updating functions that copy from Collection inputs use the argument label fromElements. This is different from the pre-existing functions that copy from Sequence inputs. We could use the same argument label (from) as with the Sequence inputs, but that would mean that we must return the Iterator for the Collection versions, and that is generally not desirable, especially if a particular Iterator cannot be copied cheaply. If we did not return Iterator, then the Sequence and Collection versions of the initialize(from:) would be overloaded by their return type, and that would be source-breaking:
an existing use of the current function that doesn't destructure the returned tuple on assignment could now pick up the Collection overload, which would have a return value incompatible with the existing code which assumes that the return value is of type (Iterator, Int).

Acknowledgments

Kelvin Ma (aka Taylor Swift)'s initial versions of the pitch that became SE-0184 included more functions to manipulate initialization state. These were deferred, but much of the deferred functionality has not been pitched again until now.

Members of the Swift Standard Library team for valuable discussions.

13 Likes

For UnsafeMutableRawBufferPointer.initializeMemory(as:fromElements:), how do I find out how many elements it copied? If the collection is longer than the space in the buffer, the answer is the result’s count, but what happens if it’s shorter? Does the rest of the original buffer get thrown away?

The returned UnsafeMutableBufferPointer is a typed view over the same range of memory. Its count tells you how many elements were copied. If the original raw buffer represented a larger amount of memory than that, the rest of the original raw buffer will remain uninitialized. As with anything in the UnsafePointer family, you'll need to keep track.

3 Likes

Note: the implementation and detailed design were tweaked to avoid creating a new protocol, which would have had unfortunate ABI consequences.

I wonder if we can provide polyfills for older Swift versions, with something like a new apple/swift-stdlib-compatibility package. This will largely improve the adoption of new APIs, reduce call-site boilerplates and, eventually, make transition to Swift 6 a lot easier.