Here is an updated pitch, below. The full document is here.
The changes are clarifications, the assign
renaming mentioned in the previous post, and moveElement(at index:)
to moveElement(from index:)
(h/t Nevin).
Initialization improvements for UnsafePointer and UnsafeBufferPointer family
- Proposal: SE-NNNN Initialization improvements for UnsafePointer and UnsafeBufferPointer family
- Author: Guillaume Lessard
- Review Manager: TBD
- Status: Draft Pull Request
- Implementation: pending
- Bugs: rdar://51817146, SR-14982 (rdar://81168547), rdar://74655413
- Previous Revision: none
Introduction
The types in the UnsafeMutablePointer
family typically require manual management of memory allocations, including the management of their initialization state. The states involved are, after allocation:
- Unbound and uninitialized (as returned from
UnsafeMutableRawPointer.allocate()
) - Bound to a type, and uninitialized (as returned from
UnsafeMutablePointer<T>.allocate()
) - Bound to a type, and initialized
Memory can be safely deallocated whenever it is uninitialized.
Unfortunately, not every relevant type in the family has the necessary functionality to fully manage the initialization state of its memory. We intend to address this issue in this proposal, and provide functionality to manage initialization state in a much expanded variety of situations.
Swift-evolution thread: Pitch thread
Motivation
Memory allocated using UnsafeMutablePointer
, UnsafeMutableRawPointer
, UnsafeMutableBufferPointer
and UnsafeMutableRawBufferPointer
is passed to the user in an uninitialized state. In the general case, such memory needs to be initialized before it is used in Swift. Memory can be "initialized" or "uninitialized". We hereafter refer to this as a memory region's "initialization state".
The methods of UnsafeMutablePointer
that interact with initialization state are:
func initialize(to value: Pointee)
func initialize(repeating repeatedValue: Pointee, count: Int)
func initialize(from source: UnsafePointer<Pointee>, count: Int)
func assign(repeating repeatedValue: Pointee, count: Int)
func assign(from source: UnsafePointer<Pointee>, count: Int)
func move() -> Pointee
func moveInitialize(from source: UnsafeMutablePointer<Pointee>, count: Int)
func moveAssign(from source: UnsafeMutablePointer<Pointee>, count: Int)
func deinitialize(count: Int) -> UnsafeMutableRawPointer
This is a fairly complete set.
- The
initialize
functions change the state of memory locations from uninitialized to initialized, then assign the corresponding value(s). - The
assign
functions update the values stored at memory locations that have previously been initialized. -
deinitialize
changes the state of a range of memory from initialized to uninitialized. - The
move()
function deinitializes a memory location, then returns its current contents. - The
move
prefix means that thesource
range of memory will be deinitialized after the function returns.
In a complex use-case such as a custom-written data structure, a subrange of memory may transition between the initialized and uninitialized state multiple times during the life of a memory allocation. For example, if a mutable and contiguously allocated CustomArray
is called with a sequence of alternating append
and removeLast
calls, one storage location will get repeatedly initialized and deinitialized. The implementor of CustomArray
might want to represent the allocated buffer using UnsafeMutableBufferPointer
, but that means they will have to use the UnsafeMutablePointer
type instead for initialization and deinitialization.
We would like to have a full complement of corresponding functions to operate on UnsafeMutableBufferPointer
, but we only have the following:
func initialize(repeating repeatedValue: Element)
func initialize<S: Sequence>(from source: S) -> (S.Iterator, Index)
func assign(repeating repeatedValue: Element)
Missing are methods to assign from a Sequence
or a Collection
, move elements from another UnsafeMutableBufferPointer
, modify the initialization state of a range of memory for a particular index of the buffer, or to deinitialize (at all). Such functions would add some safety to these operations, as they would add some bounds checking, unlike the equivalent operations on UnsafeMutablePointer
, which have no concept of bounds checking.
Similarly, the functions that change the initialization state for UnsafeMutableRawPointer
are:
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T, count: Int) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as type: T.Type, from source: UnsafePointer<T>, count: Int) -> UnsafeMutablePointer<T>
func moveInitializeMemory<T>(as type: T.Type, from source: UnsafeMutablePointer<T>, count: Int) -> UnsafeMutablePointer<T>
Since initialized memory is bound to a type, these cover the essential operations.
(The assign
and deinitialize
operations only make sense on typed UnsafePointer<T>
.)
On UnsafeMutableRawBufferPointer
, we only have:
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T) -> UnsafeMutableBufferPointer<T>
func initializeMemory<S: Sequence>(as type: S.Element.Type, from source: S) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>)
Missing is an equivalent to moveInitializeMemory
, in particular.
Additionally, the buffer initialization functions from Sequence
parameters are overly strict, and trap in many situations where the buffer length and the number of elements in a Collection
do not match exactly. We can improve on this situation with initialization functions from Collection
s that behave more nicely.
There are four existing functions that use the assign
(or moveAssign
) name. This name is unfortunately not especially clear. In The Swift Programming Language, =
is called the "assignment operator", and is said to either initialize or update a variable. The word "update" here is much clearer, as it implies the existence of a prior value, which communicates the requirement that a given memory location must have been previously initialized. For this reason, we propose to rename "assign" to "update". This would involve deprecating the existing (rarely-used) functions, with a straightforward fixit. The existing symbol can be reused for purposes of ABI stability.
Proposed solution
Note: in the pseudo-diffs presented in this section, +++
indicates an added symbol, while ---
indicates a renamed symbol.
We propose to modify UnsafeMutableBufferPointer
as follows:
extension UnsafeMutableBufferPointer {
func initialize(repeating repeatedValue: Element)
func initialize<S>(from source: S) -> (S.Iterator, Index) where S: Sequence, S.Element == Element
+++ func initialize<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
--- func assign(repeating repeatedValue: Element)
+++ func update(repeating repeatedValue: Element)
+++ func update<S>(from source: S) -> (unwritten: S.Iterator, updated: Index) where S: Sequence, S.Element == Element
+++ func update<C>(fromElements: C) -> Index where C: Collection, C.Element == Element
+++ func moveInitialize(fromElements: UnsafeMutableBufferPointer) -> Index
+++ func moveInitialize(fromElements: Slice<UnsafeMutableBufferPointer>) -> Index
+++ func moveUpdate(fromElements: `Self`) -> Index
+++ func moveUpdate(fromElements: Slice<`Self`>) -> Index
+++ func deinitialize() -> UnsafeMutableRawBufferPointer
+++ func initializeElement(at index: Index, to value: Element)
+++ func updateElement(at index: Index, to value: Element)
+++ func moveElement(from index: Index) -> Element
+++ func deinitializeElement(at index: Index)
}
The methods that initialize or update from a Collection
will have forgiving semantics, and copy the number of elements that they can, be that every available element or none, and then return the next index in the buffer. Unlike the existing Sequence
functions, they include no preconditions beyond having a valid Collection
and valid buffer, with the understanding that if a user wishes stricter behaviour, they can compose it from these functions.
The above changes include a method to assign a single element. Evidently that is a synonym for the subscript(_ i: Index)
setter. We hope that documenting the assignment action specifically will help clarify the requirements of that action, which are evidently muddled when documented along with the subscript getter. Similarly, we propose adding to UnsafeMutablePointer
and UnsafeMutableRawPointer
:
extension UnsafeMutablePointer {
func initialize(to value: Pointee)
func initialize(repeating repeatedValue: Pointee, count: Int)
func initialize(from source: UnsafePointer<Pointee>, count: Int)
+++ func update(to value: Pointee)
--- func assign(repeating repeatedValue: Pointee, count: Int)
+++ func update(repeating repeatedValue: Pointee, count: Int)
--- func assign(from source: UnsafePointer<Pointee>, count: Int)
+++ func update(from source: UnsafePointer<Pointee>, count: Int)
func move() -> Pointee
func moveInitialize(from source: UnsafeMutablePointer, count: Int)
--- func moveAssign(from source: UnsafeMutablePointer, count: Int)
+++ func moveUpdate(from source: UnsafeMutablePointer, count: Int)
func deinitialize(count: Int) -> UnsafeMutableRawPointer
}
extension UnsafeMutableRawPointer {
+++ func initializeMemory<T>(as type: T.Type, to value: T) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T, count: Int) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as type: T.Type, from source: UnsafePointer<T>, count: Int) -> UnsafeMutablePointer<T>
func moveInitializeMemory<T>(as type: T.Type, from source: UnsafeMutablePointer<T>, count: Int) -> UnsafeMutablePointer<T>
}
Finally, we propose adding additional functions to initialize UnsafeMutableRawBufferPointer
s. The first will initialize from a Collection
and have less stringent semantics than the existing function that initializes from a Sequence
. The other two enable moving a range of memory into an UnsafeMutableRawBufferPointer
while deinitializing a typed UnsafeMutableBufferPointer
.
extension UnsafeMutableRawBufferPointer {
func initializeMemory<T>(as type: T.Type, repeating repeatedValue: T) -> UnsafeMutableBufferPointer<T>
func initializeMemory<S>(as type: S.Element.Type, from source: S) -> (unwritten: S.Iterator, initialized: UnsafeMutableBufferPointer<S.Element>) where S: Sequence
+++ func initializeMemory<C>(as type: C.Element.Type, fromElements: C) -> UnsafeMutableBufferPointer<C.Element> where C: Collection
+++ func moveInitializeMemory<T>(as type: T.Type, fromElements: UnsafeMutableBufferPointer<T>) -> UnsafeMutableBufferPointer<T>
+++ func moveInitializeMemory<T>(as type: T.Type, fromElements: Slice<UnsafeMutableBufferPointer<T>>) -> UnsafeMutableBufferPointer<T>
}
Detailed design
Note: please see the draft pull request or the full proposal for details.
Source compatibility
This proposal consists mostly of additions.
The proposal includes the renaming of four existing functions from assign
to update
. The existing function names would be deprecated, producing a warning. A fixit will support an easy transition to the renamed versions of these functions.
Effect on ABI stability
The functions proposed here are generally small wrappers around existing functionality. We expect to implement them as @_alwaysEmitIntoClient
functions, which means they would have no ABI impact.
The renamed functions can reuse the existing symbol, while the deprecated functions can use an @_alwaysEmitIntoClient
support the functionality under its previous name. This would have no ABI impact.
Effect on API resilience
All functionality implemented as @_alwaysEmitIntoClient
will back-deploy. Renamed functions that reuse a previous symbol will also back-deploy.
Alternatives considered
The single-element update functions, UnsafeMutablePointer.update(to:)
and UnsafeMutableBufferPointer.updateElement(at:to:)
, are synonyms for the setters of UnsafeMutablePointer.pointee
and UnsafeMutableBufferPointer.subscript(_ i: Index)
, respectively. Clearly we can elect to not add them. The setters in question, like the update functions, have a required precondition that the memory they refer to must be initialized. Somehow this precondition is often overlooked and leads to bug reports. The proposed names and cross-references should help clarify the requirements to users.
The renaming of assign
to update
could be omitted entirely, although we believe that update
communicates intent much better than assign
does. There are only four symbols affected by this renaming, and their replacements are easily migrated by a fixit. For context, this renaming would only 6 lines of code in the standard library, outside of the function definitions. If the renaming is omitted, the four new functions proposed in the family should use the name assign
as well. The two single-element versions would be assign(_ value:)
and assignElement(at:_ value:)
.
The initializing and updating functions that copy from Collection
inputs use the argument label fromElements
. This is different from the pre-existing functions that copy from Sequence
inputs. We could use the same argument label (from
) is with the Sequence
inputs, but that would mean that we must return the Iterator
for the Collection
versions, and that is generally not desirable. If we did not return Iterator
, then the Sequence
and Collection
versions of the initialize(from:)
would be overloaded by their return type, and that would be source-breaking:
an existing use of the current function that doesn't immediately destructure the returned tuple could pick up the Collection
overload, which would have a return value incompatible with the existing code that makes use the return value.
Acknowledgments
Kelvin Ma (aka Taylor Swift)'s initial versions of the pitch that became SE-0184 included functions to manipulate initialization state. These were deferred, but the functionality has not been pitched again until now.