ValueSemantic protocol

dabrahams · November 5, 2020, 8:18pm

Defining Value Semantics for Swift

(with thanks for editorial input to the Swift for Tensorflow Team, and to @anandbits for a crucial insight)

For years we've talked informally about certain types “having value semantics”
and used that statement to draw correct conclusions about the behavior of some
code. But until now, we've never really nailed down what it means for a type to
have value semantics. This document provides a clear definition on which
language and/or library extensions could be based, and explores its
implications.

Requirements of Value Semantic Types

When we say “type X has value semantics,” we mean:

Each variable of type X has an independent notional value.
A language-level copy (e.g., let b = a) of a variable of type X has an
equivalent value.
Given a local variable a of type X, safe code cannot observe the value of a
except via an expression that uses a.
Given a variable a of type X, safe code cannot alter the value of a
except by one of the following means applied to a or to a property of a
that reflects all or part of a's value.
- assignment.
- invocation of a mutating method.
- invocation of a mutating accessor of a property or subscript
- passing the expression as an inout parameter.
Concurrent access to the values of distinct variables of type X cannot
cause a data race.

Safety

Swift has an existing definition of safety, which just implies “memory safety.”
The above definition says that for value semantic types, some memory-safe
operations—those that can cause value access on distinct instances to race—are
also classified as unsafe. This constraint on the meaning of “safety” does not
break any existing code because no types have yet been classified as value semantic
under this definition.

Corollaries

From the above properties and Swift’s semantics, we can conclude that for a type
with value semantics:

The value of a let-bound instance is constant for all time.
All instances have the same properties as variables.
Assignment causes the left-hand side to have the same value as the right-hand
side.
Any struct, enum, or tuple whose value is composed of the values of its
stored value-semantic properties also has value semantics.
The type's value semantic properties are preserved by any new APIs
(e.g. extensions) implemented using only safe operations.

Usefulness

Knowing that a type has value semantics is useful because:

Variables can be safely copied or moved across concurrency boundaries
without introducing race conditions in code that otherwise appears to be
safe.
The value of a non-global variable is immune from “spooky action at a distance.”
Value-semantic types support equational reasoning.

Defining Value

The statement that “X has value semantics” is obviously meaningless without a
concept of the value of an instance of X. This concept must be clear for each
type with value semantics. Likewise, to be useful, the operations that access a
value of an instance of X must be clear. In many ways, the semantics of those
operations constitute a definition of the type's value.

Depending on how narrowly one defines a type's value, any type can be said to
have value semantics. For example, UnsafeMutablePointer can have value
semantics if pointee and the data accessed by subscript (and the operations
that affect them) are excluded from its value but its other members are
included.

Documenting Value

For types that are consumed only by their authors (e.g. in a quick script), a
clear concept of value can be maintained in the author's brain. For other types,
the value must be documented.

For most types with value semantics, it would be difficult to describe the type
without describing its value, but it could be cumbersome to explicitly and
separately document the value of every such type. Fortunately we can avoid
some of that description by acknowledging the patterns these types usually
follow. I suggest the following convention for types documented as having value
semantics:

Unless explicitly documented otherwise,

public and internal properties are implicitly part of a type's value.
Data accessed by public and internal subscripts is implicitly part of a
type's value.
private and fileprivate properties and subscripts are implicitly not
part of the type's value.
The results of witnesses for any Equatable, Comparable, Hashable, and
Codable conformances depend entirely on the type's entire value.
Non-salient attributes of the type (like an Array's capacity) can be
assumed to be safely accessible on distinct instances, across threads.

Other APIs probably don't need any special attention in documentation for value
semantics, since it is pretty much impossible to describe their semantics in
documentation without describing how they use, reflect, and/or affect the type's
value.

Deployment in Practice

It follows from the definition of “value semantics” that any APIs that don't
uphold the requirements value semantics imposes on safe operations are unsafe, and
should be labeled as such in public APIs, by the usual Swift convention.

It follows that “Unsafe” in a type name should mean that one is to assume that
operations are unsafe unless otherwise specified. For example, the safe
operations on UnsafePointer should be documented as such to support its claim of
value semantics.

It's important that clients of APIs exposing non-salient attributes know that
these attributes are not part of the instance's value.

Open Questions

I think UnsafePointer should be declared to have value semantics. Whether a
similar thing should be done for UnsafeBufferPointer is an interesting
question; it would imply that there are some Collections whose elements are
not part of their value. Whether that's OK may depend on how extant Collection
algorithms are documented.