ValueSemantic protocol

I was mistaken on the details, but this works:

var a = [0, 2, 4]
var b = a // remove this line to change behavior
print(a.capacity)
a.removeAll(keepingCapacity: true)
print(a.capacity)

I thought it’d be zero, but it instead it reallocates with a different size (3 -> 4), which still lets you observe the storage reference isn’t unique.

There are also similar ways to get this behavior by removing and appending elements.

In this example, you’re observing the value of a after mutating a. This particular operation clearly isn’t “pure,” but in the same sense that selecting a random element of a isn’t pure. But in terms of value semantics, no changes to b observably affect the value of a, correct?

Well, we don't want optimization to be bound to maintain the observed capacity of values after high-level transformations like constant folding of array operations. It's not really part of the "value" per se. If one models purity/value semantics at the level of individual operations, then I think it'd make sense to treat capacity as impure, even if it doesn't really modify state, because "pure" value semantics code should not change behavior given values that only differ in capacity.

1 Like

I believe that both the “intuitive-but-leaky” and sharpened definitions in my post above addresses this.

@cukr, using your T, given this code:

var x = T()
var y = x
// [1]
f(x) // [2]

…then under my definition, [2] only touches x. The expression f(y) only touches y, but inserting it at at [1] changes the result of [2]. Therefore T does not have value semantics.

My sharpened definition addresses all of these (even the last one) by dealing with context. Under that definition, UnsafePointer does not have value semantics, and String does.

To your last example, which is a lovely illustration of @Joe_Groff’s point (as I understand it) that you can create reference semantics using values, the string “the sentence above” does have value semantics under my definition, because to give it the referential interpretation that may produce different text, you also need to touch the text it references.

The trick of my sharpened definition is to look at whole interactions of multiple variables instead of values in isolation: reference-producing behavior requires something to refer into, and the definition accounts for that.


This example:

…and this response:

…suggests that we may indeed need to take a Liskov-substitution-like view that the properties in question are in the eye of the beholder.

That’s problematic, though I like Joe’s answer of treating value semantics as applying to some subset of a type’s surface:

This neatly addresses @Lantua’s example: you can’t poison a value type by adding an extension method that uses a global, because your extension method isn’t really part of the value-typed surface.

To the other half of Michel’s example:

…I would say that subclassing a class can break value semantics, just as subclassing can break immutability, and thus a class can’t have value semantics unless it prohibits subclassing.

(The reason for this subclass vs extension distinction is that existing code can’t encounter an extension method it didn’t know about at compile time, but can encounter an overridden method it didn’t know about.)

1 Like

I think it's crucial to observe at the outset of this discussion that this ValueSemantic protocol is a refinement of ActorSendable. The refinement introduces an additional semantic requirement internal serialization of mutation is not sufficient. Instead, copy-on-write must be used for all heap allocated data.

I have long maintained that there is a useful notion of "value" that is closely related to (but independent of) the notion of referentially transparent pure functions. As has already been pointed out, many "value" types include members that are not referentially transparent: Array.capacity, Date.init, UUID.init, Int.random(in:), etc. The relationship as I view it is that "value" types predominantly contain pure members. Impure members are the exception to the rule.

We must acknowledge and embrace this distinction between "value"-ness and purity if we are going to have any hope of providing clear semantics for this protocol. A correct conformance to a protocol can only depend on code that is under control of the author of the conformance. As has already been pointed out, anyone can add a trivial impure extension to any type. For this reason, purity of members cannot be considered a necessary condition of a valid conformance. Further, I argue that if the semantics of this protocol cannot depend on the definition of purity then it isn't necessary to provide a definition at this time (that can be defined separably in a future proposal to introduce support for pure functions).

Semantics of ValueSemantic

If the semantics of this protocol aren't (directly) about purity and referential transparency then what are they? To answer this question I'll begin by quoting @Chris_Lattner3's comments about ActorSendable:

ActorSendable is about ensuring memory safety when references are moved across a serialization boundary. A type that includes stored properties of a non-ActorSendable type a manual conformance is necessary to ensure data races are not introduced. This manual conformance bears great responsibility to uphold the intended semantics of the protocol.

In my view a refining ValueSemantic protocol should be defined using a very similar strategy. The differences are that this time we're talking about a "boundary" that is crossed every time a value is copied. The property we seek to uphold is "no sharing of mutable data". Each instance of the type must be semantically independent. (note: independence of copies is subtly different than purity / referential transparency: Array.capacity is semantically independent but not referentially transparent)

In order to protect against mutable sharing when stored properties include references to heap data it is essential for these references to be encapsulated: they must not be non-public. When all references are encapsulated in this way it is possible for the author of the type to avoid mutable sharing using copy-on-write. Trivially, access to global mutable data is off limits: a type that stores and uses an Int index into global mutable data should not provide a conformance.

To summarize, in my view this protocol is really about controlling access and use of references to heap allocated data in order to avoid mutation of shared data. While this protocol does not introduce any new requirements, its semantics do support the default implementation of unsafeSendToActor that simply returns self.

Relationship to purity

When we speak of value semantics we are usually talking about more than just the definition provided above. The members of types conforming to the ValueSemantic protocol as defined above will usually be referentially transparent, especially when all parameters also meet the ValueSemantic requirement. In the latter case, as long as a member avoids access to global mutable data it will be referentially transparent (with rare exceptions such as Array.capacity).

Given this close relationship and the conventional assumptions made about value semantic types I think it's important to alert programmers when a member of a value semantic type breaks referential transparency. Because there is no language support, learning about these exceptions and why they matter is often difficult for programmers not already steeped in FP (and even for some who are).

I also think it would be good for the language to understand the referential transparency of members of value semantic types without requiring annotation beyond the syntax used today. A ValueSemantic conformance could imply purity of members, with an annotation required where non-referentially-transparent members are necessary and intended by the programmer. However, this is a topic for a separate thread (either now or in the future).

Conclusion

Even without language understanding of purity I believe a ValueSemantic protocol would aid significantly in everyday reasoning about code. Crucially, it would distinguish types intended to behave in the conventional value semantic way from value types that have nothing to do with value semantics. This distinction is one that is subtle and difficult to teach. This protocol, along with synthesized conformances would go a long way towards helping programmers understand when they are stepping outside the world of "trivially composable" value semantics in their type declarations.

The ValueSemantic protocol is not a replacement for language understanding of purity and referential transparency, but it is a valuable complement to it.

(Note: we could bikeshed on the name of this protocol if the community wants to preserve "value semantic" as referring to purity / referential transparency. Regardless of its name, I think a protocol with the semantics described above would be a good thing)

4 Likes

Random note but @tali pointed out that ValueSemantic can be extremely useful for making collection transfers across actors more efficient.

1 Like

The name ValueSemantic definitely overpromises what this does, IMO. It's only really making a statement about the semantics of ActorSendable, so maybe it could be called PureActorSendable or something like that, since the protocol may be useful for making more efficient refined implementations for collections. (In previous discussions we'd had about language-level purity support, we'd discussed having a way to impose purity over the requirements of an existing protocol, like pure Collection or pure ActorSendable, which could be applicable here as well.)

1 Like

b only need to exist to affect the capacity of a after a is first mutated (by forcing a to allocate new storage, or not). Although if you change b first it’ll stop affecting a as storage is no longer shared.

If you call a function that mutates a. My point is: unless you do that, nothing that you do to b affects a—is that correct?

It is trivially the case that other values existing can affect how capacity is allocated for a when you try to mutate a: there is finite memory available, after all.

I guess what you are getting at is that value semantics does not imply mutating the value has to be independent of the global state. Purity would imply that, but not value semantics.

1 Like

I think we can do better than that by formalizing the type's set of basis operations. UnsafePointer can only be said to have value semantics if dereferencing is not classified as a basis operation. I don't think that would be a very useful way to define UnsafePointer's basis, though.

For the purposes of reasoning about multithreading and providing other guarantees related to value semantics (for example, there are some default implementations in RangeReplaceableCollection that only work if the collection has value semantics), it is sufficient to choose one view of what constitutes any type's basis. Then if we want to use just a subset of the type's basis operations, we can either:

  • put it in a subsetting wrapper type, or
  • reason about behavior in terms of the guarantees provided for a hypothetical subsetting wrapper type without actually creating one.

I think there are probably a set of default rules we can follow that minimize the amount of explicit specification needed for the basis operations, e.g., every public method is a basis operation, every property and subscript is either 1 or 2 basis operations, depending on whether it's writable. But these are just defaults; to cover the range of possible APIs, a type's author may need to include arbitrary other operations in the type's basis. Outside the defining file, and certainly outside the defining module, all safe operations must be defined in terms of the basis, and can't introduce any new reference semantics.

Aside

Does String have value or reference semantics? It depends on if you look at it as some text, or as an obj-c selector that references a method.

I don't understand, but it might just be my ignorance of ObjC details. IIUC a method or selector, without an instance, has no mutable state, so there's no possibility of reference semantics.

1 Like

I can see the argument that ValueSemantic overpromises but I don't see a reason the protocol should only be allowed to make statements about the semantics of ActorSendable. Why can't it require encapsulation of references to mutable heap data and copy-on-write? IMO, the semantics should specifically include something to the effect of "copying never introduces sharing of mutable state".

Would something along the lines of IndependentValue promise less than ValueSemantic in your mind?

I've been down that road too…

I don't think that's necessary if you define the type's basis operations. In a type with value semantics, the basis operations preserve logical independence of distinct variables.

If you attach value semantics to a type rather than a function, adding an extension could breaks value semantics, or subclassing a class could breaks value semantics.

Eww, let's not talk about classes :wink: Okay, fine, we can talk about classes. IMO most any non-final class must be considered to have reference semantics.

Even without extensions: Array breaks value semantics by exposing its capacity

AFAICT, all of the examples you've shown exhibit value semantics by my definition.

Also, capacity is what's known as an non-salient attribute of Array, so one shouldn't necessarily expect it to behave in an ordinary way. Of course, “salience” isn't documented anywhere for Array, but it's another aspect needed to nail down value semantics. This is the same thing Joe was describing when he said of capacity:

4 Likes

Isn't that all implied by the contract of unsafeActorSend? My point was that the protocol requirement doesn't make any guarantees about any other operations on the type.

No, Chris specifically wants to support sending lock-free concurrent data structures, etc. These can include shared mutable state.

I'm suggesting that this protocol places a requirement not on a specific operation, but on all declarations of the type within the declaring module. Specifically, the module is required to encapsulate heap references and ensure copy-on-write is used any time that data is mutated.

If we were to model those things like Rust does, then they would not be considered mutable state in the sense you need inout exclusive access to modify them; shared borrows are sufficient to modify them.

That's a strange non-local effect of the declaration, and it still wouldn't be sufficient to make any guarantees about the behavior of operations on the type.

1 Like

Call it strange if you want, but I think these properties are implicitly assumed when people talk about a type having "value semantics". Personally, I think it's extremely useful to be able to talk about properties of a type in aggregate.

FWIW, if ValueSemantic conformance could require an impure annotation on impure members the name would no longer overpromise. The default behavior of members would align with the intended regular semantics of the type and it would be immediately clear to both authors and users when a member is impure. This would move the language forward without requiring explicit pure annotations all over our code and provide a clear meaning for the informal terminology that is pervasive in the community. It would also highlight the impure members that are exceptions to a type's regular semantics. This is important because these members can and do lead to incorrect code.

I know you've suggested a => arrow for pure functions in the past, but I am less convinced of that than I used to be. It would be a lot harder to catch incorrect uses of -> instead of => in code review than it would to catch incorrect uses of impure. Further, no existing code uses => yet lots of that code is pure. Given source stability requirements, I think requiring occasional impure annotations on members of value types when a ValueSemantic conformance is added would be much better than requiring lots of code to replace -> with =>.

It seems like it may be useful to have a separate thread to discuss the topic of pure / value-semantic operations. Is that topic is on the table for discussion in this context? Or would it be deemed out of scope in the timeline of the concurrency proposals?

Since I've now linked to two different examples here, I should point out that some people have gone down this road before us. John Lakos, in particular, did a pretty solid job of attacking the problem for C++, and his work is based on the work of Alexander Stepanov and Sean Parent on “regular types” (Titus Winters has written a pretty good introduction to the idea). IMO if we want to succeed here we'd all do well to read up on the work that has come before us.

Cheers,
Dave

7 Likes

I understand that that's what people implicitly assume, but danger lurks in the holes in those assumptions, and reinforcing people's misconceptions without addressing those holes doesn't seem like a good idea to me.

1 Like
Terms of Service

Privacy Policy

Cookie Policy