Background: Ownership Manifesto
As part of preparing the Swift ABI for getting locked down in Swift 5, I am changing some aspects of the implementation strategy for storage accesses in Swift 5. A lot of this is purely implementation-level and has no effect on the language, or indeed on programmers at all except for small changes in performance (expected to be minor and generally for the better). But there are two ways that it will surface as new features that can be used to achieve better performance in particular situations. In keeping with Swift's general principle of "progressive disclosure", my expectation is that most programmers will never need to use or even know about these features; nonetheless, they (eventually) need to be pitched and undergo the normal evolution process.
Storage Abstraction
What?
All of this relates to the problem of storage abstraction, i.e. hiding the details of how a storage declaration (a var
, let
, or subscript
) is implemented.
By the implementation of a storage declaration, I mean information like:
- whether the storage is backed by memory,
- the set of accessor functions defined by the storage, and
- the function bodies of those accessors.
Why?
There are currently three reasons why Swift might need to abstract over how storage is implemented:
-
The storage might be a protocol requirement, and so Swift has no static knowledge about how it's implemented by the conforming type.
-
The storage might be an overridable class member, and so Swift has to assume that the base object is an instance of a subclass which has overridden the storage to use a substantially different implementation.
-
The storage might be defined in a binary framework which the current code maintains a stable binary interface to, and so Swift has to assume that the storage's implementation might be different in the version of the library actually present at execution time. (That is, the storage might be "resilient".)
In the future, we may add more reasons to abstract over storage implementations, e.g. by allowing arbitrary storage to be dynamic
.
How?
The traditional way of abstracting over storage, familiar from many different languages, is to define a getter and a setter (if mutable). These functions can be synthesized automatically for any reasonable storage implementation, and Swift does this when the accessors are required; for example, if the storage is a simple stored property, the getter returns a copy of the current value of the property and the setter writes its argument value into the property.
However, a getter and a setter aren't very efficient if the storage declaration is actually backed by memory, which is very common. The problem is not that a function call is required: that's a relatively small amount of overhead, and besides, it's essentially unavoidable if we're going to allow the underlying implementation to be an arbitrary computed property. The problem is that forcing the storage to be accessed through this interface may create a large amount of extra work just to satisfy the interface, and the impact is particularly bad if the storage is backed by memory. For example:
-
Calling a getter will always copy the value, but the caller may be able to complete its work without needing a separate copy.
-
If the storage is of an aggregate value, calling a getter will force the entire value to be copied, but the caller may only wish to copy a small portion of it.
-
If the storage is of an aggregate value, calling a setter will always replace the entire value, but the caller may only wish to replace a small portion of it.
-
If the caller wishes to read and then modify the current value of the storage (e.g. passing it as an
inout
argument), it must do the modification on a copy of the current value; there is no way to modify it "in place". This is particularly bad if the value is a copy-on-write structure.
In an effort to address some of these issues --- particularly the last one --- previous versions of Swift have synthesized a third accessor for mutable storage declarations. This accessor is called materializeForSet
, and it is essentially a hacked-in coroutine that yields a mutable pointer. When the storage is just backed by memory, its materializeForSet
returns a pointer to that memory. When the storage is instead computed, materializeForSet
calls the getter, writes the value into a temporary variable, and yields the address of that variable; it then calls the setter when resumed.
materializeForSet
addresses a lot of the biggest problems with pure getter-setter abstraction, but it's still got some major flaws. A relatively small flaw (at least, for evolution purposes) is that it's pretty hacked-in: it's awkward to generate code for it, and it uses an odd, unsystematic ABI that introduces a fair amount of code bloat and doesn't really fit with some of the structural things we try to do in SIL. The bigger flaw is that it's just for
modifications and doesn't really help with the performance issues I mentioned about getter.
For these reasons, I am changing the set of basic accessors used to access abstracted storage. The first change is to replace materializeForSet
with a modify
coroutine that yields a mutable reference to storage, which is really just an implementation-level improvement. The second change is to conditionally replace the getter with a read
coroutine that yields a borrowed value (i.e. a value taken from storage without copying it). These changes give rise to the two new features I mentioned at the top:
Generalized Accessors
The first feature is called Generalized Accessors, and I'm not ready to fully pitch it yet because we're still figuring some things out. Suffice it to say for now that the idea is to allow Swift programmers to directly define the read
and modify
accessors (materializeForSet
has never been implementable in Swift code). This is discussed in some detail in the ownership manifesto.
Ownership of Read Values
The second feature is to allow a storage declaration to explicitly control the ownership of a value that's been read out of the storage. For example, does an abstracted access to base.property
always produce an owned value or can it produce a borrowed value? On the implementation level, this means: are reads from the abstracted storage implemented by calling a getter or by calling a read
coroutine?
There are two reasons why this is important:
-
The first reason is that it's semantically critical for move-only types. A
var
of move-only type that's actually backed by memory cannot be accessed by a getter because the getter, in order to return an owned value, would need to move the value out of the backing memory, leaving it uninitialized. On the other hand, avar
of move-only type that's actually implemented with a getter (necessarily creating a new value on every access, which would be strange but not unimaginable) should not generally be accessed with aread
coroutine because ownership of the returned value will be irrevocably lost, which is likely contrary to the intent of such an API. -
The second reason is that it matters for performance even when the storage type is copyable. A
read
coroutine can help avoid a copy, but otherwise it's more expensive to use than a getter because of the need to support the separate phases of a coroutine; this may be worthwhile to avoid copying anArray
, but it's overkill to avoid the supposed expense of copying anInt
. Also, if the storage is actually implemented with a getter, aread
coroutine can't forward ownership out; if the caller really does need its own copy, it'll be forced to copy the yielded borrowed value. And it's quite common for callers to need an independent copy of the value; if they do, and the caller has to perform that copy itself, that's generally worse for code size because most declarations have more call sites than implementations. On the other hand, the caller does generally have more information than the callee (especially with generic code) and can perform the copy more cheaply.
The possible solutions I see here are:
-
We can include both a getter and a
read
coroutine in the set of accessors synthesized for abstracted storage, and then pick one or the other based on we use the value. To me, this seems untenable because of the code-size impact. -
We can have one accessor and make the decision dynamic by passing an owned-vs.-borrowed flag. I don't think this would really help code size much, if at all, and it'd cause significant problems in SIL.
-
We can choose one accessor or the other statically based on the declaration and allow the decision to be overriden with an attribute. For this to be used for resilience, the decision needs to be independent of what accessors are actually defined by the source code.
As reflected in the name of this section, I'm leaning towards the third option but I'm not sure what the right default rule is:
- In the abstract, I think producing a borrowed value is the best default rule, and that we should have some type-based heuristic for deciding that a type is trivial enough that it should always be returned with a getter.
- But in practice I'm worried about the impact of using
read
coroutines, especially on code size, and especially on properties that are implemented with getters.
The syntax I'm currently leaning towards for declaring the ownership of the returned value is to put it after the colon or (in a subscript
) arrow, e.g.:
var title: __owned String { get set }
subscript (index: Int) -> __shared Element { get set }
(Note that these are the stand-in, underscored keywords currently used by the parameter-ownership annotations; that proposal also needs to move forward eventually.)
Another idea would be to make this explicit in the accessors list of a protocol requirement:
var title: String { get set }
subscript (index: Int) -> Element { read set }
But that idea only works for protocol requirements, and it's pretty subtle.
Much like parameter ownership, the language design aspects of this don't actually need to be resolved by Swift 5; we can always change the spelling later. I just want to move towards the right semantic model.