ValueSemantic protocol

dabrahams · November 2, 2020, 3:53am

In a recent thread on actor isolation we started discussing a ValueSemantic protocol, and what it might mean:

I'm opening this thread for that topic.

Paul_Cantrell · November 2, 2020, 6:28am

In a comment thread on Chris Lattner’s ActorSendable pitch, Dave Abrahams raised an interesting question: what does “value semantics” mean?

I also have to go over the argument by which Joe [Groff] has convinced me several times that value semantics is not a rigorous concept first! … It boils down mostly to the claim that the interesting property is functional purity (a version that accomodates inout) rather than anything about types, because e.g. you can always use an Int to index some global storage. The Int has value semantics, but you still produce reference semantics. There's also the fact that an UnsafePointer is totally a Value Semantic type if used only for its identity. I always leave these discussion feeling like I've just been bamboozled by his dry wit, but without a solid grasp on an answer.

I replied there that I thought I could give a rigorous enough definition, but also thought Joe would probably unravel it in two breaths. Since we’ve opened this thread…well, here’s my attempt; @Joe_Groff, do your worst!

The intuitive-but-leaky definition for Swift:

Given some variable x of type T, if T has value semantics, then it is impossible for any code to alter any visible behavior of x unless that code actually mentions x.

(Asterisk: this all goes for memory-safe code only.)

So, for example:

var x = A()
var y = x
y.mutateWildly()  // [1]
x.doStuff()       // [2]

If A is a mutable class, for example, then removing [1] could potentially alter the behavior of [2]; we’d have to understand exactly what mutateWildly and doStuff actually do.

However, if the type A has value semantics, then [1] cannot affect the results of [2] because line [1] does not mention x. I think that’s the intuition we’re trying to capture.

Joe’s as-summarized-by-Dave remarks raise several challenges to that:

What if code uses an Int as a reference into some global storage?

OpenGL aside

Anybody who’s worked with OpenGL knows the…um, joys of ints that are secretly pointers to complex data structures. Ask me about the bug I once encountered where a graphics library assumed that if glRefA == glRefB then they refer to the same object, causing a bug that only happened in release builds because of ARC reordering. F U N.

Doesn’t an object reference have value semantics? You just copy the reference around as a value, right?
For that matter, don’t pointers have value semantics? A pointer is just a value, after all; is it the value’s fault that the memory it refers to is changing?

To answer this, I’d sharpen my definition above as follows:

Let x, y, z,… with types X, Y, Z,… be all the variables touched by a given Swift expression E, where “touched” means:
- any variable that is lexically present (i.e. appears in the actual source code) for the expression,
- plus the implicit variable memory: Memory for a dereference of any Swift *Pointer type, where Memory does not have value semantics by definition,
- plus all the global variables touched in the transitive closure over function calls (and subscripts, getters, etc.) of E.
If all the types X, Y, Z,… have value semantics, then there does not exist any expression E’ that touches none of x, y, z… but whose insertion before E alters the result of E.
A type T has value semantics if the statement above is true for all possible E which touch a variable of T. (I have a logic mutual recursion problem here, but I think the intent is clear and it’s bedtime in Minneapolis.)

That addresses the challenges as follows:

If E uses some Int as an index into global storage, then E touches that global storage — as does any E’ that alters it. Creating reference-like behavior using an Int thus does not upset the value semantics of Int itself, because the reference-like behavior also requires touching the global storage.
The definition accounts for the effects of aliasing objects, making it clear that mutable class types do not have value semantics.
The implicit memory variable rules out value semantics for pointer dereferences by definition.

That’s hardly a pleasant or intuitive definition of value semantics for popular consumption, and probably has a hole or two in it, but then it’s also something that this redhead just sketched out in one sitting instead of going to bed on time.

Hopefully it at least makes the case that there is some rigorous idea lurking here. Joe, pop my happy bubble!

dabrahams · November 2, 2020, 6:44am

I've avoided this topic for long enough, so let's do this.

Let's start with a plain English definition:

Definition:
A type has value semantics when mutation of one variable of that type can never be observed through a different variable of the same type.

This applies to more than just variables (e.g. you can't observe a mutation to a variable through a let; you can't observe a mutation of one Array element from another), but the semantics of Swift ensures that a type meeting the definition also covers those cases.
By this definition, immutable types have value semantics trivially. If you want to emphasize that you're interested in more than just immutable data, you can call it mutable value semantics, but I'm going to leave out “mutable” in this discussion.

Formalizing this definition further for Swift is hard, since what it means to "mutate a variable" and "observe something through a variable" is tough to describe, especially when it comes to classes.

Collapsed the part that is mostly wrong

I think you need a notion of “purity” that allows mutation, but that is hard to nail down. I can describe it roughly as:

Definition
A function is value-pure if and only if it:

calls no methods on class instances, and

mutates no global variables, and

calls no functions or methods that aren't value-pure.

Then you can say a type T has value semantics if and only if it's impossible to write a unary value-pure function whose result when passed a variable b of type T changes after being passed a variable a of that type:

var a, b: T
...
let r0 = f(&b)
_ =      f(&a)
let r1 = f(&b) // r0 must be equivalent to r1 for all possible value-pure functions f

Among the other weakness in this description that I've undoubtedly missed

it doesn't work for immutable class instances.
“equivalent” isn't well-defined, and we're not going to require Equatable.
I've relied on a notation of mutation in defining value-pure, which is circular
the value-pure functions you can write depends on the scope you're in—e.g. I could certainly violate value semantics from inside the implementation of Array<Int> but it still has value semantics.
you probably want a back door for “unsafe” functions (I just keep thinking of new bullets to add!)

It's possible someone can figure out how to close these holes, but in the end it might just be better to stick with the plain English definition of what ValueSemantic means.

cukr · November 2, 2020, 7:36am

With that definition if I make a type with all the interesting methods being impure, it would count as a type with value semantics :(

class Ref<T> {
    var value: T
    init(value: T) { self.value = value }
}
struct T {
    private var ref = Ref(value: 0)
    func impureFunc() -> Int {
        ref.value += 1
        return ref.value
    }
}
// doesn't count, this func isn't pure
func f(_ arg: inout T) -> Int {
    return arg.impureFunc()
}
// doesn't count, doesn't change when you pass in `a`
func f(_ arg: inout T) -> Int {
    return 0
}

dabrahams · November 2, 2020, 8:08am

Well, there ya go; you're quite right. Like I said, it's very hard to formalize further. We could say that a value-pure function is allowed to use any accessible method on the type (which makes me realize that the property I'm looking for isn't about purity at all), but then there's nothing special about methods if any of the type's stored properties are fileprivate.

cukr · November 2, 2020, 8:25am

I feel like "reference/value semantics" is not something that could be formalized. It depends on the interpretation by the user.

Does UnsafePointer have value or reference semantics? It depends on if you look at it as an address to some memory, or as the reference to the pointee.

Does String have value or reference semantics? It depends on if you look at it as some text, or as an obj-c selector that references a method.

Does "the sentence above" have value or reference semantics? It depends on if you look at it as a chain of 18 ascii letters, or as a thing written in an english language that talks about a sentence about Strings and obj-c selectors.

BigSur · November 2, 2020, 8:28am

Or we could call it value-reference duality ~

Avi · November 2, 2020, 9:03am

I think it helps to think of value semantics as follows:

If multiple variables are derived from a common source, they exhibit value semantics if they behave as if they were independent of each other and their source.

A trivial example is String:

var source = "A really, really, long source string"
var a = source, b = source

All 3 of these variables share the same backing store, but they behave as if they were independent -- changing one variable (e.g. a += "more text") has no visible impact on either source or b.

By this definition, an UnsafeMutablePointer does not have value semantics. Two such pointers derived from a common source will not behave as if they were independent. That is, there are actions that can be performed with each instance that will be visible through the other instances.

Structs which contain reference types do not have value semantics, unless they take care, such as Array and String do, to create (deep) copies when mutations are triggered.

cukr · November 2, 2020, 10:37am

@Avi I am interested what is your intuition. Does adding the following extension to Int make it not have the value semantics anymore?

extension Int {
    static var globalValues: [Int:String] = [:]
     var pointee: String {
        get {
            Self.globalValues[self, default: ""]
        }
        nonmutating set {
            Self.globalValues[self] = newValue
        }
    }
}

Avi · November 2, 2020, 11:32am

Interesting example. By the rule I laid out, that would cause Int not to have value semantics. One you start forming exceptions, there's no end. Much like every other definition that has been proffered.

We could talk about how certain properties are not included, but that's just the opening to a very deep and convoluted rabbit warren.

We could talk about how the type is used. In this example, you've added an arbitrary property that has nothing to do with Ints. It doesn't affect the mathematical properties, nor how two Ints equate or compare. We could say the property thus doesn't count. However, see previous paragraph.

I would say, in the end, that the only useful position to take is context-dependent. If code is known not to cause visible changes in other instances of the type, the code exhibits value semantics with respect to that type. Some types will always behave this way, but some do not, and thus depend on the context in which they are used.

michelf · November 2, 2020, 12:40pm

Perhaps it’s time to revisit my old attempt at formalizing value semantics through purity. What I found out is that value semantics is something a function decides to preserve, or break, depending on what it accesses. So you end up having to classify functions into value-semantics-preserving ones, called pure in the above, and others which are not preserving value semantics.

If you attach value semantics to a type rather than a function, adding an extension could breaks value semantics, or subclassing a class could breaks value semantics. Even without extensions: Array breaks value semantics by exposing its capacity, and a classes breaks it by making its address observable and comparable (you can’t clone it without making it different at the address level).

And then it depends on the application: if you sort class references but only look at the address which you treat as opaque (never dereferencing the pointer), then you have value semantics: the value are the addresses, even though they can lead you to some mutable state elsewhere. If you look only at the characters in an immutable NSString without looking at its address, you’re preserving value semantics too, your just looking at it at a different kind of value it carries. There’s a duality here: the same NSString contains two kinds of values that when mixed appears to break value semantics. This applies to Array too if you look at its capacity field.

There is a subset of value semantics that is useful for concurrency which consists of determining whether we’re accessing shared state. I’d suggest we focus on that instead, as it is much less subjective and does not have superimposed meanings depending on how you look at the type in a particular context.

Avi · November 2, 2020, 12:56pm

Can you explain what you mean here? If I create a copy of an array, the copy's capacity is unaffected by changes to the original, and vice versa.

Lantua · November 2, 2020, 1:00pm

There's one scenario I find interesting:

final class X {
  private var state = ...

  func random() -> Int {
    synchronization.sync {
      // mutate state
      return randomNumberBasedOnState
    }
  }
}

There's no reliance on X being unique, and we'd perceive the same behaviour (i.e. random) whether or not X is called from the other class. For all intend and purpose, using X here does reason like what you'd expect of value semantic.

michelf · November 2, 2020, 1:46pm

Say you mutate the array by removing all elements. The capacity of the resulting array value will differ depending on whether the storage is uniquely referenced or not. If uniquely referenced, capacity will stay the same (provided you don’t use preserveCapacity: false in your call to removeAll). If not, it’ll snap back to zero. So you can easily observe whether the array storage is uniquely referenced.

Avi · November 2, 2020, 2:18pm

I don't see this behavior.

var a = [0, 2, 4]
var b = a //[1]

a.capacity
b.capacity

a.removeAll()

a.capacity
b.capacity

Whichever variant of var b = ... I use, The last a.capacity is 0.

michelf · November 2, 2020, 2:38pm

I was mistaken on the details, but this works:

var a = [0, 2, 4]
var b = a // remove this line to change behavior
print(a.capacity)
a.removeAll(keepingCapacity: true)
print(a.capacity)

I thought it’d be zero, but it instead it reallocates with a different size (3 -> 4), which still lets you observe the storage reference isn’t unique.

There are also similar ways to get this behavior by removing and appending elements.

xwu · November 2, 2020, 2:51pm

In this example, you’re observing the value of a after mutating a. This particular operation clearly isn’t “pure,” but in the same sense that selecting a random element of a isn’t pure. But in terms of value semantics, no changes to b observably affect the value of a, correct?

Joe_Groff · November 2, 2020, 2:53pm

Well, we don't want optimization to be bound to maintain the observed capacity of values after high-level transformations like constant folding of array operations. It's not really part of the "value" per se. If one models purity/value semantics at the level of individual operations, then I think it'd make sense to treat capacity as impure, even if it doesn't really modify state, because "pure" value semantics code should not change behavior given values that only differ in capacity.

Paul_Cantrell · November 2, 2020, 3:02pm

I believe that both the “intuitive-but-leaky” and sharpened definitions in my post above addresses this.

@cukr, using your T, given this code:

var x = T()
var y = x
// [1]
f(x) // [2]

…then under my definition, [2] only touches x. The expression f(y) only touches y, but inserting it at at [1] changes the result of [2]. Therefore T does not have value semantics.

Paul_Cantrell · November 2, 2020, 3:08pm

My sharpened definition addresses all of these (even the last one) by dealing with context. Under that definition, UnsafePointer does not have value semantics, and String does.

To your last example, which is a lovely illustration of @Joe_Groff’s point (as I understand it) that you can create reference semantics using values, the string “the sentence above” does have value semantics under my definition, because to give it the referential interpretation that may produce different text, you also need to touch the text it references.

The trick of my sharpened definition is to look at whole interactions of multiple variables instead of values in isolation: reference-producing behavior requires something to refer into, and the definition accounts for that.

This example:

michelf:

var a = [0, 2, 4]
var b = a // remove this line to change behavior
print(a.capacity)
a.removeAll(keepingCapacity: true)
print(a.capacity)
I thought it’d be zero, but it instead it reallocates with a different size (3 -> 4), which still lets you observe the storage reference isn’t unique.

There are also similar ways to get this behavior by removing and appending elements.

…and this response:

…suggests that we may indeed need to take a Liskov-substitution-like view that the properties in question are in the eye of the beholder.

That’s problematic, though I like Joe’s answer of treating value semantics as applying to some subset of a type’s surface:

This neatly addresses @Lantua’s example: you can’t poison a value type by adding an extension method that uses a global, because your extension method isn’t really part of the value-typed surface.

To the other half of Michel’s example:

…I would say that subclassing a class can break value semantics, just as subclassing can break immutability, and thus a class can’t have value semantics unless it prohibits subclassing.

(The reason for this subclass vs extension distinction is that existing code can’t encounter an extension method it didn’t know about at compile time, but can encounter an overridden method it didn’t know about.)