On purity grades of value types [was: ValueType protocol]

Alvae · September 20, 2021, 8:31am

I can perhaps paraphrase @dabrahams's definition of value semantics, repeating some of what we've written in our papers.

One way to understand that definition is to first think about what we can and can't do with types that have (mutable) value semantics. More precisely:

If the type T of a variable x has value semantics, then there is no way to observe its value (i.e., read its contents) with an expression that does not mention x.
If the type T of variable x has value semantics, then its value is independent: its contents cannot change due to an operation on another variable.

Both points uphold local reasoning [1] and suggest that reference semantics must not surface. If the value of x can be observed without mentioning x, then there must be a (possibly immutable) reference on x somewhere. Similarly, if the value of x can change via an operation on another variable, then x must share mutable state.

For a theoretical foundation about the idea of value independence, one can look into external uniqueness [2, 3], which I think formally expresses @tera's clean level.

From points 1 and 2, it naturally follows that:

Concurrent access to the values of distinct variables of a type T with value semantics cannot cause a data race.

For a theoretical justification of that assertion, one can look into concurrent separation logic [4], which states that parallel composition of separate environments is race-free. Intuitively, if x has an independent value (in the sense of points 1 and 2), then one can reason locally about its value, without any concern for concurrent execution.

When you add the first point of @dabrahams's definition about notional values, you can derive other interesting properties.

Composition of value types expresses whole/part relationships.
Equality and hashing are well-defined and synthesizable for every value type.

Note for the purists: formally, hashing requires an ad-hoc definition for at least some predefined types, acting as the base cases of an inductive definition.

The fourth point is particularly important, because it answers recurring questions of "depth" in object-oriented languages (e.g., deep vs shallow copy/equality/immutability).

Notional values are what lets us discard capacity when we're talking about the value of an Array, because the notional value of an array is made of its elements. It follows that equality and hashing do indeed derive from the elements contained in the array, not the current state of its internal storage. In other words, the concept of notional value is what allows us to "ignore some bits" in a type's representation. Further, you can represent the notional value of a value type with all sorts of reference shenanigans, as long as you satisfy points 1 and 2. That bit explains why CoW does not violate value semantics.

Note that synthesizability is not a requirement. You can have a value type that does not want to conform to Equatable for any justified reason in a given codebase. The definition only says that (value) equality is well-defined and derives from the type's notional value.

AFAICT, the other points in @dabrahams's definition relate to the mutable part of a value and are more specific to the way Swift expresses mutation (e.g., via assignment, mutating methods, etc.). Indeed, if we outright ban mutation, all types trivially have value semantics and there isn't any useful distinction between structs and classes (as far as value typedness is concerned) [5]. So, we can add the following point:

We use the term mutable value semantics meaning “value semantics with in-place, part-wise mutation.”

The terms "part-wise" and "in-place" are in opposition to pure functional updates, i.e., the fact that any mutation of a variable x can be substituted by the rebinding of x to brand new value. While that approach is perhaps justifiable as a theoretical model for value semantics, it clashes with performance perdictability.

Yes, I think it does.

As it's been made clear already in this thread, substituting struct for class in a type definition is far from sufficient to satisfy value semantics, without even considering unsafe code.

There are multiple ways to unintendedly create references in Swift (which is a bit unfortunate, if you ask me). In particular, global variables and (by default) closure captures have reference semantics. That makes it difficult to establish static rules that would enforce "value typedness" according to the definition I have presented.

Nonetheless, I think that is possible. We would "only" have to be more careful about the way a value can capture references. Swift already proposes one mechanism in that direction, in the form of the @escaping annotation on function parameters. An extension of that idea has been studied the context of Scala [6].

I think that the idea of extending the capability of @escaping (or rather @nonescaping) relates to @tera's fair level.

Purity levels are not in opposition or even orthogonal to the definition of value semantics above. Nonetheless, I am not yet convinced that these distinctions are as useful as they may seem, at least for the user. IMHO, local reasoning is enough.

I also fear that these levels might fail to accurately characterize all possible implementations of a value type. For example, it looks like implementing CoW would immediately put a type into dirty territory. It could be tempting to add yet another level, but that's a slippery slope.

[1] Peter W. O'Hearn et al.: Local Reasoning about Programs that Alter Data Structures. CSL, 2001
[2] Dave Clarke, Tobias Wrigstad: External Uniqueness Is Unique Enough. ECOOP, 2003
[3] Philipp Haller, Martin Odersky: Capabilities for Uniqueness and Borrowing. ECOOP, 2010
[4] Stephen Brookes, Peter W. O'Hearn: Concurrent separation logic. ACM SIGLOG, 2016
[5] Dimitri Racordon, Didier Buchs: Featherweight Swift: a Core calculus for Swift's type system. SLE, 2020
[6] Leo Osvald et al:. Gentrification gone too far? affordable 2nd-class values for fun and (co-)effect. OOPSLA, 2016