ValueSemantic protocol

I feel like "reference/value semantics" is not something that could be formalized. It depends on the interpretation by the user.

Does UnsafePointer have value or reference semantics? It depends on if you look at it as an address to some memory, or as the reference to the pointee.

Does String have value or reference semantics? It depends on if you look at it as some text, or as an obj-c selector that references a method.

Does "the sentence above" have value or reference semantics? It depends on if you look at it as a chain of 18 ascii letters, or as a thing written in an english language that talks about a sentence about Strings and obj-c selectors.

3 Likes

Or we could call it value-reference duality ~

2 Likes

I think it helps to think of value semantics as follows:

If multiple variables are derived from a common source, they exhibit value semantics if they behave as if they were independent of each other and their source.

A trivial example is String:

var source = "A really, really, long source string"
var a = source, b = source

All 3 of these variables share the same backing store, but they behave as if they were independent -- changing one variable (e.g. a += "more text") has no visible impact on either source or b.

By this definition, an UnsafeMutablePointer does not have value semantics. Two such pointers derived from a common source will not behave as if they were independent. That is, there are actions that can be performed with each instance that will be visible through the other instances.

Structs which contain reference types do not have value semantics, unless they take care, such as Array and String do, to create (deep) copies when mutations are triggered.

8 Likes

@Avi I am interested what is your intuition. Does adding the following extension to Int make it not have the value semantics anymore?

extension Int {
    static var globalValues: [Int:String] = [:]
     var pointee: String {
        get {
            Self.globalValues[self, default: ""]
        }
        nonmutating set {
            Self.globalValues[self] = newValue
        }
    }
}
4 Likes

Interesting example. By the rule I laid out, that would cause Int not to have value semantics. One you start forming exceptions, there's no end. Much like every other definition that has been proffered.

We could talk about how certain properties are not included, but that's just the opening to a very deep and convoluted rabbit warren.

We could talk about how the type is used. In this example, you've added an arbitrary property that has nothing to do with Ints. It doesn't affect the mathematical properties, nor how two Ints equate or compare. We could say the property thus doesn't count. However, see previous paragraph.

I would say, in the end, that the only useful position to take is context-dependent. If code is known not to cause visible changes in other instances of the type, the code exhibits value semantics with respect to that type. Some types will always behave this way, but some do not, and thus depend on the context in which they are used.

1 Like

Perhaps it’s time to revisit my old attempt at formalizing value semantics through purity. What I found out is that value semantics is something a function decides to preserve, or break, depending on what it accesses. So you end up having to classify functions into value-semantics-preserving ones, called pure in the above, and others which are not preserving value semantics.

If you attach value semantics to a type rather than a function, adding an extension could breaks value semantics, or subclassing a class could breaks value semantics. Even without extensions: Array breaks value semantics by exposing its capacity, and a classes breaks it by making its address observable and comparable (you can’t clone it without making it different at the address level).

And then it depends on the application: if you sort class references but only look at the address which you treat as opaque (never dereferencing the pointer), then you have value semantics: the value are the addresses, even though they can lead you to some mutable state elsewhere. If you look only at the characters in an immutable NSString without looking at its address, you’re preserving value semantics too, your just looking at it at a different kind of value it carries. There’s a duality here: the same NSString contains two kinds of values that when mixed appears to break value semantics. This applies to Array too if you look at its capacity field.

There is a subset of value semantics that is useful for concurrency which consists of determining whether we’re accessing shared state. I’d suggest we focus on that instead, as it is much less subjective and does not have superimposed meanings depending on how you look at the type in a particular context.

4 Likes

Can you explain what you mean here? If I create a copy of an array, the copy's capacity is unaffected by changes to the original, and vice versa.

There's one scenario I find interesting:

final class X {
  private var state = ...

  func random() -> Int {
    synchronization.sync {
      // mutate state
      return randomNumberBasedOnState
    }
  }
}

There's no reliance on X being unique, and we'd perceive the same behaviour (i.e. random) whether or not X is called from the other class. For all intend and purpose, using X here does reason like what you'd expect of value semantic.

Say you mutate the array by removing all elements. The capacity of the resulting array value will differ depending on whether the storage is uniquely referenced or not. If uniquely referenced, capacity will stay the same (provided you don’t use preserveCapacity: false in your call to removeAll). If not, it’ll snap back to zero. So you can easily observe whether the array storage is uniquely referenced.

1 Like

I don't see this behavior.

var a = [0, 2, 4]
var b = a //[1]

a.capacity
b.capacity

a.removeAll()

a.capacity
b.capacity

Whichever variant of var b = ... I use, The last a.capacity is 0.

I was mistaken on the details, but this works:

var a = [0, 2, 4]
var b = a // remove this line to change behavior
print(a.capacity)
a.removeAll(keepingCapacity: true)
print(a.capacity)

I thought it’d be zero, but it instead it reallocates with a different size (3 -> 4), which still lets you observe the storage reference isn’t unique.

There are also similar ways to get this behavior by removing and appending elements.

In this example, you’re observing the value of a after mutating a. This particular operation clearly isn’t “pure,” but in the same sense that selecting a random element of a isn’t pure. But in terms of value semantics, no changes to b observably affect the value of a, correct?

Well, we don't want optimization to be bound to maintain the observed capacity of values after high-level transformations like constant folding of array operations. It's not really part of the "value" per se. If one models purity/value semantics at the level of individual operations, then I think it'd make sense to treat capacity as impure, even if it doesn't really modify state, because "pure" value semantics code should not change behavior given values that only differ in capacity.

1 Like

I believe that both the “intuitive-but-leaky” and sharpened definitions in my post above addresses this.

@cukr, using your T, given this code:

var x = T()
var y = x
// [1]
f(x) // [2]

…then under my definition, [2] only touches x. The expression f(y) only touches y, but inserting it at at [1] changes the result of [2]. Therefore T does not have value semantics.

My sharpened definition addresses all of these (even the last one) by dealing with context. Under that definition, UnsafePointer does not have value semantics, and String does.

To your last example, which is a lovely illustration of @Joe_Groff’s point (as I understand it) that you can create reference semantics using values, the string “the sentence above” does have value semantics under my definition, because to give it the referential interpretation that may produce different text, you also need to touch the text it references.

The trick of my sharpened definition is to look at whole interactions of multiple variables instead of values in isolation: reference-producing behavior requires something to refer into, and the definition accounts for that.


This example:

…and this response:

…suggests that we may indeed need to take a Liskov-substitution-like view that the properties in question are in the eye of the beholder.

That’s problematic, though I like Joe’s answer of treating value semantics as applying to some subset of a type’s surface:

This neatly addresses @Lantua’s example: you can’t poison a value type by adding an extension method that uses a global, because your extension method isn’t really part of the value-typed surface.

To the other half of Michel’s example:

…I would say that subclassing a class can break value semantics, just as subclassing can break immutability, and thus a class can’t have value semantics unless it prohibits subclassing.

(The reason for this subclass vs extension distinction is that existing code can’t encounter an extension method it didn’t know about at compile time, but can encounter an overridden method it didn’t know about.)

1 Like

I think it's crucial to observe at the outset of this discussion that this ValueSemantic protocol is a refinement of ActorSendable. The refinement introduces an additional semantic requirement internal serialization of mutation is not sufficient. Instead, copy-on-write must be used for all heap allocated data.

I have long maintained that there is a useful notion of "value" that is closely related to (but independent of) the notion of referentially transparent pure functions. As has already been pointed out, many "value" types include members that are not referentially transparent: Array.capacity, Date.init, UUID.init, Int.random(in:), etc. The relationship as I view it is that "value" types predominantly contain pure members. Impure members are the exception to the rule.

We must acknowledge and embrace this distinction between "value"-ness and purity if we are going to have any hope of providing clear semantics for this protocol. A correct conformance to a protocol can only depend on code that is under control of the author of the conformance. As has already been pointed out, anyone can add a trivial impure extension to any type. For this reason, purity of members cannot be considered a necessary condition of a valid conformance. Further, I argue that if the semantics of this protocol cannot depend on the definition of purity then it isn't necessary to provide a definition at this time (that can be defined separably in a future proposal to introduce support for pure functions).

Semantics of ValueSemantic

If the semantics of this protocol aren't (directly) about purity and referential transparency then what are they? To answer this question I'll begin by quoting @Chris_Lattner3's comments about ActorSendable:

ActorSendable is about ensuring memory safety when references are moved across a serialization boundary. A type that includes stored properties of a non-ActorSendable type a manual conformance is necessary to ensure data races are not introduced. This manual conformance bears great responsibility to uphold the intended semantics of the protocol.

In my view a refining ValueSemantic protocol should be defined using a very similar strategy. The differences are that this time we're talking about a "boundary" that is crossed every time a value is copied. The property we seek to uphold is "no sharing of mutable data". Each instance of the type must be semantically independent. (note: independence of copies is subtly different than purity / referential transparency: Array.capacity is semantically independent but not referentially transparent)

In order to protect against mutable sharing when stored properties include references to heap data it is essential for these references to be encapsulated: they must not be non-public. When all references are encapsulated in this way it is possible for the author of the type to avoid mutable sharing using copy-on-write. Trivially, access to global mutable data is off limits: a type that stores and uses an Int index into global mutable data should not provide a conformance.

To summarize, in my view this protocol is really about controlling access and use of references to heap allocated data in order to avoid mutation of shared data. While this protocol does not introduce any new requirements, its semantics do support the default implementation of unsafeSendToActor that simply returns self.

Relationship to purity

When we speak of value semantics we are usually talking about more than just the definition provided above. The members of types conforming to the ValueSemantic protocol as defined above will usually be referentially transparent, especially when all parameters also meet the ValueSemantic requirement. In the latter case, as long as a member avoids access to global mutable data it will be referentially transparent (with rare exceptions such as Array.capacity).

Given this close relationship and the conventional assumptions made about value semantic types I think it's important to alert programmers when a member of a value semantic type breaks referential transparency. Because there is no language support, learning about these exceptions and why they matter is often difficult for programmers not already steeped in FP (and even for some who are).

I also think it would be good for the language to understand the referential transparency of members of value semantic types without requiring annotation beyond the syntax used today. A ValueSemantic conformance could imply purity of members, with an annotation required where non-referentially-transparent members are necessary and intended by the programmer. However, this is a topic for a separate thread (either now or in the future).

Conclusion

Even without language understanding of purity I believe a ValueSemantic protocol would aid significantly in everyday reasoning about code. Crucially, it would distinguish types intended to behave in the conventional value semantic way from value types that have nothing to do with value semantics. This distinction is one that is subtle and difficult to teach. This protocol, along with synthesized conformances would go a long way towards helping programmers understand when they are stepping outside the world of "trivially composable" value semantics in their type declarations.

The ValueSemantic protocol is not a replacement for language understanding of purity and referential transparency, but it is a valuable complement to it.

(Note: we could bikeshed on the name of this protocol if the community wants to preserve "value semantic" as referring to purity / referential transparency. Regardless of its name, I think a protocol with the semantics described above would be a good thing)

4 Likes

Random note but @tali pointed out that ValueSemantic can be extremely useful for making collection transfers across actors more efficient.

1 Like

The name ValueSemantic definitely overpromises what this does, IMO. It's only really making a statement about the semantics of ActorSendable, so maybe it could be called PureActorSendable or something like that, since the protocol may be useful for making more efficient refined implementations for collections. (In previous discussions we'd had about language-level purity support, we'd discussed having a way to impose purity over the requirements of an existing protocol, like pure Collection or pure ActorSendable, which could be applicable here as well.)

2 Likes

b only need to exist to affect the capacity of a after a is first mutated (by forcing a to allocate new storage, or not). Although if you change b first it’ll stop affecting a as storage is no longer shared.

If you call a function that mutates a. My point is: unless you do that, nothing that you do to b affects a—is that correct?

It is trivially the case that other values existing can affect how capacity is allocated for a when you try to mutate a: there is finite memory available, after all.