Hi Swift Evolution,
Some holiday reading for you, a pitch of a new way of writing mutable computed properties and subscripts.
For more information on the performance implications of this change and why it's an important feature to add, you might also find this talk from last year's Functional Swift Conference useful.
Proposal
We propose the introduction of a new keyword, modify, for implementing mutable computed properties and subscripts, alongside the current get and set.
The bodies of modify implementations will be coroutines, and they will introduce a new contextual keyword, yield, that will be used to yield a value to be modified back to the caller. Control will resume after the yield when the caller returns.
This modify feature is currently available (but not supported) from Swift 5.0 as _modify, for experimentation purposes when reviewing this proposal.
Motivation
Swift's get/set syntax allows users to expose computed properties and subscripts that behave as l-values. This powerful feature allows for the creation of succinct idiomatic APIs, such as this use of Dictionary's defaulting subscript:
var wordFrequencies: [String:Int] = [:]
wordFrequency["swift", default: 0] += 1
// wordFrequencies == ["swift":1]
However, while this provides the illusion of "in-place" mutation, this is actually implemented as three separate operations: a get of a copy of the value, the mutation on that returned value, and finally a set replacing the original value with the mutated copy.
This can be seen by performing side-effects within the getter and setter as in this sample code:
struct GetSet {
var x: String = "๐๐ฝ Hello"
var property: String {
get { print("Getting",x); return x }
set { print("Setting",newValue); x = newValue }
}
}
var getSet = GetSet()
getSet.property.append(", ๐!")
// prints:
// Getting ๐๐ฝ Hello
// Setting ๐๐ฝ Hello, ๐!
This simulation of in-place mutation works well for user ergnomics, but has a major performance shortcoming. This can be seen in even our simple GetSet type above. Strings in Swift are "non-trivial" types. Once they grow beyond a small fixed size, they allocate a reference-counted buffer to hold their contents. Mutation is handled via the usual copy-on-write technique: when you make a copy of a string, only the reference to the buffer is copied, not the buffer itself. Then, when the string is mutated, it checks if the buffer is uniquely referenced. If it isn't (because the string has been copied), it first makes a full copy of the buffer before then mutating the buffer, preserving the value semantics of string while avoiding unnecessary eager copies.
Given this, we can see the performance problem when appending to GetSet.property in our example above:
-
GetSet.property { get }is called, and returns a copy ofx. - Because a copy is returned, the buffer backing the string is now multiply referenced.
- The
appendmutation therefore triggers a full copy of the string's buffer. -
GetSet.property { set }writes this copy back over the top ofx. - The original string buffer's reference count drops to zero, freeing it up.
So, despite looking like in-place mutation, every mutating operation on x made through property is actually causing a full copy of x's backing buffer. This is a linear operation. If we were doing something like appending to this property in a loop, this loop would end up being quadratic in complexity. This is likely very surprising to the user and can be a major performance pitfall.
Since Swift 1.0, this problem has been avoided with Swift's Array type, using a non-public mechanism called an "addressor". Instead of a get and set, Array.subscript defined a single operation that returned the address of the element within the array to the caller. Swift would then mutate the value at that address. This worked, but was hard to use correctly and not a feature ever intended to be made available outside the standard library. It also didn't provide the ability to intercept and perform further processing of the value after the caller mutated it โ which is why Dictionary still performed unexpected copies-on-write on values fetched by key (for reasons explained below).
As part of fixing the ABI for the standard library in Swift 5.0, this addressor mechanism was replace by coroutine-based "accessors". This proposal converts them to a full public feature.
Detailed Design
The GetSet type above could be implemented with modify as follows:
struct GetModify {
var x: String = "๐๐ฝ Hello"
var property: String {
get { print("Getting",x); return x }
modify {
print("Yielding",x)
yield &x
print("Post yield",x)
}
}
}
var getModify = GetModify()
getModify.property.append(", ๐!")
// prints:
// Yielding ๐๐ฝ Hello
// Post yield ๐๐ฝ Hello, ๐!
Things to note about this example:
- the
getis never called โ the property access is handled entirely by themodifycall - the
yieldis similar to areturn, but control returns to themodifyafter theappendcompletes - there is no more
newValueโ the yielded value is modified byappend - the
yielduses the&sigil, similar to passing an argumentinout
Unlike the get/set pair, the modify operation is able to safely yield a value to the caller at "+0" โ that is, if the yielded value is a reference or contains references (as String can in our example), their reference counts do not need to be increased when they are yielded. This can be done safely because, unlike with a return statement, the yield is just temporarily giving up control to the caller, then resumes after the caller (in this case, the append) completes. The caller is "borrowing" the value yielded by the coroutine.
The get is still used in the case of only fetching, not modifying, the property:
_ = getModify.property
// prints:
// Getting ๐๐ฝ Hello, ๐!
While a modify is sufficient to allow assignment to a property:
getModify.property = "Hi, ๐, 'sup?"
// prints:
// Yielding ๐๐ฝ Hello, ๐!
// Post yield Hi, ๐, 'sup?
it is also possible to supply both a modify and a set. The set will be called in the case of straight assignment, which may be more efficient than first fetching/creating a value to then be overwritten:
struct GetSetModify {
var x: String = "๐๐ฝ Hello"
var property: String {
get { x }
modify { yield &x }
set { print("Setting",newValue); x = newValue }
}
}
var getSetModify = GetSetModify()
getSetModify.property = "Hi ๐, 'sup?"
// prints:
// Setting Hi ๐, 'sup?
Pre- and post-processing in modify
As with set, modify gives the property author an opportunity to perform some post-processing on the new value.
Consider the following implementation of an enhanced version of Array.first that allows the user to modify the first value of the array:
extension Array {
var first: Element? {
get { isEmpty ? nil : self[0] }
modify {
var tmp: Optional<Element>
if isEmpty {
tmp = nil
yield &tmp
if let newValue = tmp {
self.append(newValue)
}
} else {
tmp = self[0]
yield &tmp
if let newValue = tmp {
self[0] = newValue
} else {
self.removeFirst()
}
}
}
}
}
This implementation takes the same approach as Swift.Dictionary's key-based subscript.
- If the entry was not there, it adds it.
- If
nilis assigned, it removes it. - Otherwise, it mutates it.
Because the fetch and update code are all contained in one block, the isEmpty check is not duplicated (unlike with a get/set pair). Instead, the state of whether the array was empty or not is captured by the program location in the coroutine when the element is yielded. Notice that there are two yields in this modify implementation, for the empty and non-empty branches.
The rules for accessor yields are similar to that of deferred initialization of let variables: it must be possible for the compiler to guarantee there is exactly one yield on every path. The call must not contain any path with either zero or more than one yield. This is the case here, as there is a yield in both the if and the else. More complex cases where the compiler cannot guarantee this will need refactoring, or use of fatalError() to assert unreachable code paths.
Yielding and exclusive access
The optional return value of first in the code above means that, even with a modify, we have introduced the problem of triggering copy-on-write when mutating via our first property. We cannot yield the value in the array's buffer directly because it needs to be placed inside an optional. That act of placing inside the optional creates a copy.
We can work around this with some lower-level unsafe code. If the implementation of Array.first has access to its underlying buffer, it can move that value directly into the optional, yield it, and then move it back:
extension Array {
var first: Element? {
modify {
var tmp: Optional<Element>
if isEmpty {
// Unchanged
} else {
// Illustrative code only, Array's real internals are fiddlier.
// _storage is an UnsafeMutablePointer<Element> to the Array's storage.
// Move first element in _storage into a temporary, leaving that slot
// in the storage buffer as uninintialized memory.
tmp = _storage.move()
// Yield that moved value to the caller
yield &tmp
// Once the caller returns, restore the array to a valid state
if let newValue = tmp {
// Re-initialize the storage slot with the modified value
_storage.initialize(to: newValue)
} else {
// Element removed. Slide other elements down on top of the
// uninitialized first slot:
_storage.moveInitialize(from: _storage + 1, count: self.count - 1)
self.count -= 1
}
}
}
}
During the yield to the caller, the array is in an invalid state: the memory location where the first element is stored is left uninitialized, and must not be accessed. This is safe due to Swift's rules preventing conflicting access to memory. For the full duration of the coroutine, the call to modify has exclusive access to the array. Unlike a get, the modify is guaranteed to have an opportunity to put the element back (or to remove the invalid memory if the entry is set to nil) after the caller returns from the yield, restoring the array to a valid state in all circumstances before any other code can access it.
Throwing callers
The above code avoids the CoW issue, but is incorrect in the case of a throwing caller:
try? myArray.first?.throwingMutatingOp()
When throwingMutatingOp throws, control returns back to the outer caller. The body of Array.first { modify } terminates, and the code after the yield does not execute. This would result in the yielded element being discarded, and the memory location for it in the array being left in an uninitialized state (leaving the array corrupted).
For this reason, it is important to put code that restores the array to a valid state into a defer block. This block is guaranteed to execute, even in the case of the yielded-to function throwing.
extension Array {
var first: Element? {
modify {
var tmp: Optional<Element> = nil
if isEmpty {
// unchanged
} else {
// put array into temporarily invalid state
tmp = _storage.move()
// code to restore valid state put in defer
defer {
if let newValue = tmp {
_storage.initialize(to: newValue)
} else {
_storage.moveInitialize(from: _storage + 1, count: self.count - 1)
self.count -= 1
}
}
// yield that moved value to the caller
yield &tmp
}
}
}
Now that the valid state restoration is in a defer block, the array is guaranteed to be back in a valid state by the time control is returned to the caller.
The body of the modify does not get to participate in the error handling. This is not like nested function calls, where an inner call could catch and potentially supress or alter an error. The modify has no knowledge to what operation it is yielding and whether it could throw, any more than a get/set pair would.
Note that our efficient implementation now handles errors slightly differently to the first version that made a copy. That first version was safe in the face of exceptions, but discarded the update if an error was thrown (because the code to unwrap the optional and replace the value never ran). Whereas in this version, the yielded value is always replaced with whatever value is in the optional when the error was thrown. This means that whether the value in the array is updated depends on whether the caller threw the error before or after updating the value.
Alternatives Considered
The need to place mandatory cleanup inside a defer block is definitely a sharp edge. It would be easy to overlook this and assume that the cleanup code can just be placed after the yield. An alternative design could be to always run subsequent code, even when the caller throws. There are a number of reasons why the defer approach is preferable.
The next likely use of coroutines in the language is as an alternative mechanism for sequence iteration caller generators. This could also offer the option for mutating for...in syntax:
extension Array: MutableGenerator {
// let's not discuss naming/syntax here...
func generateMutably() {
for i in 0..<count {
yield &self[i]
}
}
}
// allowing something like...
for &x in myArray {
x += 1
}
Similar to how any method on a property yielded by modify can throw, anything in the body of the for loop could throw, and be caught outside the loop. In addition, the user may just break out of the loop. In either case, the generateMutably function should immediately terminate, not run to completion. A simplified model where the straight line code just continues is not viable. More fine-grained control of cleanup is also likely needed. The implementation of generateMutably might need cleanup either for each element yielded, or after iteration is finished, or both. Using defer for cleanup provides full control of this.
With both accessors and generators, it might also be desirable to drive different behavior on early vs regular termination. Early termination can be handled in a defer block, while straight-line code can be used for when the implementation was allowed to run to completion.
Given that a yield may be to a throwing caller, there is an argument that the keyword should be try yield to indicate this and remind the user. However, this is likely to just cause noise, and annoy users in the majority of use cases where this does not matter. This use of try would also be inconsistent with the rest of the language: it would not need to be wrapped in a do...catch block nor would try? or try! make sense.
The majority of modify accessors are expected to be simple, directly yielding some storage. Most times mandatory cleanup will be needed will be when dealing with unsafe constucts as seen in our first implementation. As always when using unsafe operations, the user must have a full understanding of exactly how the language operates in order to manually ensure safety. As such, understanding how errors are handled when yielding is unavoidable.
Future Directions
The ability to borrow elements from a collection is a key part of the future support of move-only types. If an array were to hold a move-only type, then mutation via a modify that yields a mutable borrowed value is not just an optimization over a get/set pair โ it's essential, as the value could not be copied.
There is a similar possible enhancement for coroutine-based fetching of read-only values: read instead of get. This will also be necessary for containers of move-only values. For example, it would be necessary to implement even the current read-only version of Array.first, similar to the mutating version described here.
However, unlike with modify, there is little to motivate adding this feature to the language until we have move-only types. A read property would allow for avoiding reference counting in some circumstances, but this micro-optimization is currently outweighed by the overhead of setting up the coroutine. The exact rules for when to prefer read over get also need exploration, so adding this feature should be deferred for now.
Further ergonomic enhancements to the language may be needed over time to use this new feature. For example, it is not possible to yield from within a closure passed to another function. This makes using this feature in conjunction with the current "scoped" buffer pointer pattern found throughout the standard library difficult. Solving this is a complex language design problem, and should not hold up making the feature available. The future language design will be better informed by feedback from widespread use of modify.
ABI Implications
Adding a new modify accessor to an existing type or computed property has the same ABI implications as adding a getter, setter or function. It must be guarded by availability on ABI-stable platforms.
Renaming the current _modify used by the standard library to modify does not affect the ABI. The mangling already uses a different (shorter) identifier for these symbols, which will remain the same.