Hello,
Here is the third in a series of pitches related to improvements the Pointer and BufferPointer family. This pitch is about initialization, and management of initialization state in your manual allocations.
Initialization improvements for UnsafePointer and UnsafeBufferPointer family
- Proposal: SE-NNNN Initialization improvements for UnsafePointer and UnsafeBufferPointer family
- Author: Guillaume Lessard
- Review Manager: TBD
- Implementation: Draft Pull Request
- Bugs: pending
- Previous Revision: none
Introduction
The types in the UnsafeMutablePointer family typically require manual management of memory allocations, including the management of their initialization state. The states involved are, after allocation:
- Unbound and uninitialized (as returned from
UnsafeMutableRawPointer.allocate()) - Bound to a type, and uninitialized (as returned from
UnsafeMutablePointer<T>.allocate()) - Bound to a type, and initialized
Memory can be safely deallocated whenever it is uninitialized.
Unfortunately, not every relevant type in the family has the necessary functionality to fully manage the initialization state of its memory. We intend to address this issue in this proposal, and provide functionality to manage initialization state in a much expanded variety of situations.
Motivation
Memory allocated using UnsafeMutablePointer, UnsafeMutableRawPointer, UnsafeMutableBufferPointer and UnsafeMutableRawBufferPointer is passed to the user in an uninitialized state. In the general case, such memory needs to be initialized before it is used in Swift. Memory can be "initialized" or "uninitialized". We hereafter refer to this as a memory region's "initialization state".
The methods of UnsafeMutablePointer that interact with initialization state are:
func initialize(to value: Pointee)func initialize(repeating repeatedValue: Pointee, count: Int)func initialize(from source: UnsafePointer<Pointee>, count: Int)func assign(repeating repeatedValue: Pointee, count: Int)func assign(from source: UnsafePointer<Pointee>, count: Int)func move() -> Pointeefunc moveInitialize(from source: UnsafeMutablePointer<Pointee>, count: Int)func moveAssign(from source: UnsafeMutablePointer<Pointee>, count: Int)func deinitialize(count: Int) -> UnsafeMutableRawPointer
This is a fairly complete set.
- The
initializefunctions change the state of memory locations from uninitialized to initialized, then assign the corresponding value(s). - The
assignfunctions assign values into memory locations that are already initialized. -
deinitializechanges the state of a range of memory from initialized to uninitialized. - The
move()function deinitializes a memory location, then returns its current contents. - The
moveprefix means that thesourcerange of memory will be deinitialized after the function returns.
In a complex use-case such as a custom-written data structure, a subrange of memory may transition between the initialized and uninitialized state multiple times during the life of a memory allocation. For example, if a mutable and contiguously allocated CustomArray is called with a sequence of alternating append and removeLast calls, one storage location will get repeatedly initialized and deinitialized. The implementor of CustomArray might want to represent the allocated buffer using UnsafeMutableBufferPointer, but that means they will have to use the UnsafeMutablePointer type instead for initialization and deinitialization.
We would like to have a full complement of corresponding functions to operate on UnsafeMutableBufferPointer, but we only have the following:
func initialize(repeating repeatedValue: Element)func initialize<S: Sequence>(from source: S) -> (S.Iterator, Index)func assign(repeating repeatedValue: Element)
Missing are methods to assign from a Sequence or a Collection, move elements from another UnsafeMutableBufferPointer, modify the initialization state of a range of memory for a particular index of the buffer, or to deinitialize (at all). Such functions would add some safety to these operations,
as they would add some bounds checking, unlike the equivalent operations on UnsafeMutablePointer, which have no concept of bounds checking.
Similarly, the functions that change the initialization state for UnsafeMutableRawPointer are:
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T, count: Int) -> UnsafeMutablePointer<T>func initializeMemory<T>(as type: T.Type, from source: UnsafePointer<T>, count: Int) -> UnsafeMutablePointer<T>func moveInitializeMemory<T>(as type: T.Type, from source: UnsafeMutablePointer<T>, count: Int) -> UnsafeMutablePointer<T>
Since initialized memory is bound to a type, these cover the essential operations.
(The assign and deinitialize operations only make sense on typed UnsafePointer<T>.)
On UnsafeMutableRawBufferPointer, we only have:
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T) -> UnsafeMutableBufferPointer<T>func initializeMemory<S: Sequence>(as type: S.Element.Type, from source: S) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>)
Missing is an equivalent to moveInitializeMemory, in particular.
Additionally, the buffer initialization functions from Sequence parameters are overly strict, and trap in many situations where the buffer length and the number of elements in a Collection do not match exactly. We can improve on this situation with initialization functions from Collections that behave more nicely.
Proposed solution
We propose to modify UnsafeMutableBufferPointer as follows:
extension UnsafeMutableBufferPointer {
func initialize(repeating repeatedValue: Element)
func initialize<S>(from source: S) -> (S.Iterator, Index) where S: Sequence, S.Element == Element
+++ func initialize<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
func assign(repeating repeatedValue: Element)
+++ func assign<S>(from source: S) -> (unwritten: S.Iterator, assigned: Index) where S: Sequence, S.Element == Element
+++ func assign<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
+++ func moveInitialize(fromElements: `Self`) -> Index
+++ func moveInitialize(fromElements: Slice<`Self`>) -> Index
+++ func moveAssign(fromElements: `Self`) -> Index
+++ func moveAssign(fromElements: Slice<`Self`>) -> Index
+++ func deinitialize() -> UnsafeMutableRawBufferPointer
+++ func initializeElement(at index: Index, to value: Element)
+++ func assignElement(at index: Index, _ value: Element)
+++ func moveElement(at index: Index) -> Element
+++ func deinitializeElement(at index: Index)
}
The methods that initialize or assign from a Collection will have forgiving semantics, and copy the number of elements that they can, and return the next index in the buffer. Unlike the existing Sequence functions, they include no preconditions, with the understanding that if a user wishes stricter behaviour,
they can compose it from these functions.
The above additions include a method to assign a single element.
Evidently that is a synonym for the subscript(_ i: Index) setter.
We hope that documenting the assignment action specifically will help clarify the requirements of that action, which are evidently muddled when documented along with the subscript getter. Similarly, we propose adding to UnsafeMutablePointer and UnsafeMutableRawPointer:
extension UnsafeMutablePointer {
func initialize(to value: Pointee)
func initialize(repeating repeatedValue: Pointee, count: Int)
func initialize(from source: UnsafePointer<Pointee>, count: Int)
+++ func assign(_ value: Pointee)
func assign(repeating repeatedValue: Pointee, count: Int)
func assign(from source: UnsafePointer<Pointee>, count: Int)
func move() -> Pointee
func moveInitialize(from source: UnsafeMutablePointer, count: Int)
func moveAssign(from source: UnsafeMutablePointer, count: Int)
func deinitialize(count: Int) -> UnsafeMutableRawPointer
}
extension UnsafeMutableRawPointer {
+++ func initializeMemory<T>(as type: T.Type, to value: T) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T, count: Int) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as type: T.Type, from source: UnsafePointer<T>, count: Int) -> UnsafeMutablePointer<T>
func moveInitializeMemory<T>(as type: T.Type, from source: UnsafeMutablePointer<T>, count: Int) -> UnsafeMutablePointer<T>
}
Finally, we propose adding additional functions to initialize UnsafeMutableRawBufferPointers. The first will initialize from a Collection and have less stringent semantics than the existing function that initializes from a Sequence. The other two enable moving a range of memory into an UnsafeMutableRawBufferPointer while deinitializing a typed UnsafeMutableBufferPointer.
extension UnsafeMutableRawBufferPointer {
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T) -> UnsafeMutableBufferPointer<T>
func initializeMemory<S>(as type: S.Element.Type, from source: S) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>) where S: Sequence
+++ func initializeMemory<C>(as type: C.Element.Type, fromElements: C) -> UnsafeMutableBufferPointer<C.Element> where C: Collection
+++ func moveInitializeMemory<T>(as type: T.Type, fromElements: UnsafeMutableBufferPointer<T>) -> UnsafeMutableBufferPointer<T>
+++ func moveInitializeMemory<T>(as type: T.Type, fromElements: Slice<UnsafeMutableBufferPointer<T>>) -> UnsafeMutableBufferPointer<T>
}
Detailed design
Note: please see the draft pull request or the full proposal for details.
Source compatibility
This proposal consists solely of additions.
Effect on ABI stability
The functions proposed here are generally small wrappers around existing functionality. We expect to implement them as @_alwaysEmitIntoClient functions, which means they would have no ABI impact.
Effect on API resilience
All functionality implemented as @_alwaysEmitIntoClient will back-deploy.
Alternatives considered
UnsafeMutableBufferPointer.moveElement(at index: Index) -> Element could use from as an argument label. from does read slightly better, but implies the existence of a to that does not exist. On the other hand, the three other related functions harmonize better with the at argument label, and having consistency with those three is good.
The single-element assignment functions, UnsafeMutablePointer.assign(_ value:) and UnsafeMutableBufferPointer.assignElement(_ value:at:), are synonyms for the setters of UnsafeMutablePointer.pointee and UnsafeMutableBufferPointer.subscript(_ i: Index), respectively. Clearly we can elect to not add them.
The setters in question, like the assignment functions, have a required precondition that the memory they refer to must be initialized. Somehow this precondition is often overlooked and that leads to bug reports. The proposed names and cross-references will hopefully help clarify the requirements to users.
The initialization and assignment functions that copy from Collection inputs use the argument label fromElements. This is different from the pre-existing functions that copy from Sequence inputs. We could use the same argument label (from) is with the Sequence inputs, but that would result in pairs of functions that are overloaded by their return-type. Functions with return-type overloads are often unwieldy, and we chose to avoid them in this situation.
Acknowledgments
Kelvin Ma (aka Taylor Swift)'s initial versions of the pitch that became SE-0184 included more functions to manipulate initialization state. These were deferred, but the functionality has not been pitched again until now.