Hi Swift Evolution,
Some holiday reading for you, a pitch of a new way of writing mutable computed properties and subscripts.
For more information on the performance implications of this change and why it's an important feature to add, you might also find this talk from last year's Functional Swift Conference useful.
Proposal
We propose the introduction of a new keyword, modify
, for implementing mutable computed properties and subscripts, alongside the current get
and set
.
The bodies of modify
implementations will be coroutines, and they will introduce a new contextual keyword, yield
, that will be used to yield a value to be modified back to the caller. Control will resume after the yield
when the caller returns.
This modify
feature is currently available (but not supported) from Swift 5.0 as _modify
, for experimentation purposes when reviewing this proposal.
Motivation
Swift's get
/set
syntax allows users to expose computed properties and subscripts that behave as l-values. This powerful feature allows for the creation of succinct idiomatic APIs, such as this use of Dictionary
's defaulting subscript:
var wordFrequencies: [String:Int] = [:]
wordFrequency["swift", default: 0] += 1
// wordFrequencies == ["swift":1]
However, while this provides the illusion of "in-place" mutation, this is actually implemented as three separate operations: a get
of a copy of the value, the mutation on that returned value, and finally a set
replacing the original value with the mutated copy.
This can be seen by performing side-effects within the getter and setter as in this sample code:
struct GetSet {
var x: String = "๐๐ฝ Hello"
var property: String {
get { print("Getting",x); return x }
set { print("Setting",newValue); x = newValue }
}
}
var getSet = GetSet()
getSet.property.append(", ๐!")
// prints:
// Getting ๐๐ฝ Hello
// Setting ๐๐ฝ Hello, ๐!
This simulation of in-place mutation works well for user ergnomics, but has a major performance shortcoming. This can be seen in even our simple GetSet
type above. Strings in Swift are "non-trivial" types. Once they grow beyond a small fixed size, they allocate a reference-counted buffer to hold their contents. Mutation is handled via the usual copy-on-write technique: when you make a copy of a string, only the reference to the buffer is copied, not the buffer itself. Then, when the string is mutated, it checks if the buffer is uniquely referenced. If it isn't (because the string has been copied), it first makes a full copy of the buffer before then mutating the buffer, preserving the value semantics of string while avoiding unnecessary eager copies.
Given this, we can see the performance problem when appending to GetSet.property
in our example above:
-
GetSet.property { get }
is called, and returns a copy ofx
. - Because a copy is returned, the buffer backing the string is now multiply referenced.
- The
append
mutation therefore triggers a full copy of the string's buffer. -
GetSet.property { set }
writes this copy back over the top ofx
. - The original string buffer's reference count drops to zero, freeing it up.
So, despite looking like in-place mutation, every mutating operation on x
made through property
is actually causing a full copy of x
's backing buffer. This is a linear operation. If we were doing something like appending to this property in a loop, this loop would end up being quadratic in complexity. This is likely very surprising to the user and can be a major performance pitfall.
Since Swift 1.0, this problem has been avoided with Swift's Array
type, using a non-public mechanism called an "addressor". Instead of a get
and set
, Array.subscript
defined a single operation that returned the address of the element within the array to the caller. Swift would then mutate the value at that address. This worked, but was hard to use correctly and not a feature ever intended to be made available outside the standard library. It also didn't provide the ability to intercept and perform further processing of the value after the caller mutated it โ which is why Dictionary
still performed unexpected copies-on-write on values fetched by key (for reasons explained below).
As part of fixing the ABI for the standard library in Swift 5.0, this addressor mechanism was replace by coroutine-based "accessors". This proposal converts them to a full public feature.
Detailed Design
The GetSet
type above could be implemented with modify
as follows:
struct GetModify {
var x: String = "๐๐ฝ Hello"
var property: String {
get { print("Getting",x); return x }
modify {
print("Yielding",x)
yield &x
print("Post yield",x)
}
}
}
var getModify = GetModify()
getModify.property.append(", ๐!")
// prints:
// Yielding ๐๐ฝ Hello
// Post yield ๐๐ฝ Hello, ๐!
Things to note about this example:
- the
get
is never called โ the property access is handled entirely by themodify
call - the
yield
is similar to areturn
, but control returns to themodify
after theappend
completes - there is no more
newValue
โ the yielded value is modified byappend
- the
yield
uses the&
sigil, similar to passing an argumentinout
Unlike the get
/set
pair, the modify
operation is able to safely yield a value to the caller at "+0
" โ that is, if the yielded value is a reference or contains references (as String
can in our example), their reference counts do not need to be increased when they are yielded. This can be done safely because, unlike with a return
statement, the yield
is just temporarily giving up control to the caller, then resumes after the caller (in this case, the append
) completes. The caller is "borrowing" the value yielded by the coroutine.
The get
is still used in the case of only fetching, not modifying, the property:
_ = getModify.property
// prints:
// Getting ๐๐ฝ Hello, ๐!
While a modify
is sufficient to allow assignment to a property:
getModify.property = "Hi, ๐, 'sup?"
// prints:
// Yielding ๐๐ฝ Hello, ๐!
// Post yield Hi, ๐, 'sup?
it is also possible to supply both a modify
and a set
. The set
will be called in the case of straight assignment, which may be more efficient than first fetching/creating a value to then be overwritten:
struct GetSetModify {
var x: String = "๐๐ฝ Hello"
var property: String {
get { x }
modify { yield &x }
set { print("Setting",newValue); x = newValue }
}
}
var getSetModify = GetSetModify()
getSetModify.property = "Hi ๐, 'sup?"
// prints:
// Setting Hi ๐, 'sup?
Pre- and post-processing in modify
As with set
, modify
gives the property author an opportunity to perform some post-processing on the new value.
Consider the following implementation of an enhanced version of Array.first
that allows the user to modify the first value of the array:
extension Array {
var first: Element? {
get { isEmpty ? nil : self[0] }
modify {
var tmp: Optional<Element>
if isEmpty {
tmp = nil
yield &tmp
if let newValue = tmp {
self.append(newValue)
}
} else {
tmp = self[0]
yield &tmp
if let newValue = tmp {
self[0] = newValue
} else {
self.removeFirst()
}
}
}
}
}
This implementation takes the same approach as Swift.Dictionary
's key-based subscript.
- If the entry was not there, it adds it.
- If
nil
is assigned, it removes it. - Otherwise, it mutates it.
Because the fetch and update code are all contained in one block, the isEmpty
check is not duplicated (unlike with a get
/set
pair). Instead, the state of whether the array was empty or not is captured by the program location in the coroutine when the element is yielded. Notice that there are two yields in this modify
implementation, for the empty and non-empty branches.
The rules for accessor yields are similar to that of deferred initialization of let
variables: it must be possible for the compiler to guarantee there is exactly one yield
on every path. The call must not contain any path with either zero or more than one yield
. This is the case here, as there is a yield in both the if
and the else
. More complex cases where the compiler cannot guarantee this will need refactoring, or use of fatalError()
to assert unreachable code paths.
Yielding and exclusive access
The optional return value of first
in the code above means that, even with a modify
, we have introduced the problem of triggering copy-on-write when mutating via our first
property. We cannot yield the value in the array's buffer directly because it needs to be placed inside an optional. That act of placing inside the optional creates a copy.
We can work around this with some lower-level unsafe code. If the implementation of Array.first
has access to its underlying buffer, it can move that value directly into the optional, yield it, and then move it back:
extension Array {
var first: Element? {
modify {
var tmp: Optional<Element>
if isEmpty {
// Unchanged
} else {
// Illustrative code only, Array's real internals are fiddlier.
// _storage is an UnsafeMutablePointer<Element> to the Array's storage.
// Move first element in _storage into a temporary, leaving that slot
// in the storage buffer as uninintialized memory.
tmp = _storage.move()
// Yield that moved value to the caller
yield &tmp
// Once the caller returns, restore the array to a valid state
if let newValue = tmp {
// Re-initialize the storage slot with the modified value
_storage.initialize(to: newValue)
} else {
// Element removed. Slide other elements down on top of the
// uninitialized first slot:
_storage.moveInitialize(from: _storage + 1, count: self.count - 1)
self.count -= 1
}
}
}
}
During the yield to the caller, the array is in an invalid state: the memory location where the first element is stored is left uninitialized, and must not be accessed. This is safe due to Swift's rules preventing conflicting access to memory. For the full duration of the coroutine, the call to modify
has exclusive access to the array. Unlike a get
, the modify
is guaranteed to have an opportunity to put the element back (or to remove the invalid memory if the entry is set to nil
) after the caller returns from the yield
, restoring the array to a valid state in all circumstances before any other code can access it.
Throwing callers
The above code avoids the CoW issue, but is incorrect in the case of a throwing caller:
try? myArray.first?.throwingMutatingOp()
When throwingMutatingOp
throws, control returns back to the outer caller. The body of Array.first { modify }
terminates, and the code after the yield
does not execute. This would result in the yielded element being discarded, and the memory location for it in the array being left in an uninitialized state (leaving the array corrupted).
For this reason, it is important to put code that restores the array to a valid state into a defer
block. This block is guaranteed to execute, even in the case of the yielded-to function throwing.
extension Array {
var first: Element? {
modify {
var tmp: Optional<Element> = nil
if isEmpty {
// unchanged
} else {
// put array into temporarily invalid state
tmp = _storage.move()
// code to restore valid state put in defer
defer {
if let newValue = tmp {
_storage.initialize(to: newValue)
} else {
_storage.moveInitialize(from: _storage + 1, count: self.count - 1)
self.count -= 1
}
}
// yield that moved value to the caller
yield &tmp
}
}
}
Now that the valid state restoration is in a defer
block, the array is guaranteed to be back in a valid state by the time control is returned to the caller.
The body of the modify
does not get to participate in the error handling. This is not like nested function calls, where an inner call could catch and potentially supress or alter an error. The modify
has no knowledge to what operation it is yielding and whether it could throw, any more than a get
/set
pair would.
Note that our efficient implementation now handles errors slightly differently to the first version that made a copy. That first version was safe in the face of exceptions, but discarded the update if an error was thrown (because the code to unwrap the optional and replace the value never ran). Whereas in this version, the yielded value is always replaced with whatever value is in the optional when the error was thrown. This means that whether the value in the array is updated depends on whether the caller threw the error before or after updating the value.
Alternatives Considered
The need to place mandatory cleanup inside a defer
block is definitely a sharp edge. It would be easy to overlook this and assume that the cleanup code can just be placed after the yield
. An alternative design could be to always run subsequent code, even when the caller throws. There are a number of reasons why the defer
approach is preferable.
The next likely use of coroutines in the language is as an alternative mechanism for sequence iteration caller generators. This could also offer the option for mutating for...in
syntax:
extension Array: MutableGenerator {
// let's not discuss naming/syntax here...
func generateMutably() {
for i in 0..<count {
yield &self[i]
}
}
}
// allowing something like...
for &x in myArray {
x += 1
}
Similar to how any method on a property yielded by modify
can throw, anything in the body of the for
loop could throw, and be caught outside the loop. In addition, the user may just break
out of the loop. In either case, the generateMutably
function should immediately terminate, not run to completion. A simplified model where the straight line code just continues is not viable. More fine-grained control of cleanup is also likely needed. The implementation of generateMutably
might need cleanup either for each element yielded, or after iteration is finished, or both. Using defer
for cleanup provides full control of this.
With both accessors and generators, it might also be desirable to drive different behavior on early vs regular termination. Early termination can be handled in a defer
block, while straight-line code can be used for when the implementation was allowed to run to completion.
Given that a yield
may be to a throwing caller, there is an argument that the keyword should be try yield
to indicate this and remind the user. However, this is likely to just cause noise, and annoy users in the majority of use cases where this does not matter. This use of try
would also be inconsistent with the rest of the language: it would not need to be wrapped in a do...catch
block nor would try?
or try!
make sense.
The majority of modify accessors are expected to be simple, directly yielding some storage. Most times mandatory cleanup will be needed will be when dealing with unsafe constucts as seen in our first
implementation. As always when using unsafe operations, the user must have a full understanding of exactly how the language operates in order to manually ensure safety. As such, understanding how errors are handled when yielding is unavoidable.
Future Directions
The ability to borrow elements from a collection is a key part of the future support of move-only types. If an array were to hold a move-only type, then mutation via a modify
that yields a mutable borrowed value is not just an optimization over a get
/set
pair โ it's essential, as the value could not be copied.
There is a similar possible enhancement for coroutine-based fetching of read-only values: read
instead of get
. This will also be necessary for containers of move-only values. For example, it would be necessary to implement even the current read-only version of Array.first
, similar to the mutating version described here.
However, unlike with modify
, there is little to motivate adding this feature to the language until we have move-only types. A read
property would allow for avoiding reference counting in some circumstances, but this micro-optimization is currently outweighed by the overhead of setting up the coroutine. The exact rules for when to prefer read
over get
also need exploration, so adding this feature should be deferred for now.
Further ergonomic enhancements to the language may be needed over time to use this new feature. For example, it is not possible to yield
from within a closure passed to another function. This makes using this feature in conjunction with the current "scoped" buffer pointer pattern found throughout the standard library difficult. Solving this is a complex language design problem, and should not hold up making the feature available. The future language design will be better informed by feedback from widespread use of modify
.
ABI Implications
Adding a new modify
accessor to an existing type or computed property has the same ABI implications as adding a getter, setter or function. It must be guarded by availability on ABI-stable platforms.
Renaming the current _modify
used by the standard library to modify
does not affect the ABI. The mangling already uses a different (shorter) identifier for these symbols, which will remain the same.