ValueSemantic protocol

anandabits · November 2, 2020, 4:56pm

I think it's crucial to observe at the outset of this discussion that this ValueSemantic protocol is a refinement of ActorSendable. The refinement introduces an additional semantic requirement internal serialization of mutation is not sufficient. Instead, copy-on-write must be used for all heap allocated data.

I have long maintained that there is a useful notion of "value" that is closely related to (but independent of) the notion of referentially transparent pure functions. As has already been pointed out, many "value" types include members that are not referentially transparent: Array.capacity, Date.init, UUID.init, Int.random(in:), etc. The relationship as I view it is that "value" types predominantly contain pure members. Impure members are the exception to the rule.

We must acknowledge and embrace this distinction between "value"-ness and purity if we are going to have any hope of providing clear semantics for this protocol. A correct conformance to a protocol can only depend on code that is under control of the author of the conformance. As has already been pointed out, anyone can add a trivial impure extension to any type. For this reason, purity of members cannot be considered a necessary condition of a valid conformance. Further, I argue that if the semantics of this protocol cannot depend on the definition of purity then it isn't necessary to provide a definition at this time (that can be defined separably in a future proposal to introduce support for pure functions).

Semantics of `ValueSemantic`

If the semantics of this protocol aren't (directly) about purity and referential transparency then what are they? To answer this question I'll begin by quoting @Chris_Lattner3's comments about ActorSendable:

At this point it is worth discussing why the requirement is named “unsafe”SendToActor. The reason is that manual implementations (which are rare, see below) could implement it incorrectly and break memory safety: for example, this would be unsafe (and very unwise!) to add to your codebase:
extension NSMutableString : ActorSendable {
  func unsafeSendToActor() -> Self { self }
}
This is why it is important to include the word “unsafe” in the method requirement. However, this power is also extremely useful -- it allows authors of powerful types to conform their types to this protocol, and thereby allow clients of their libraries to pass their types around safely. This is what allows the proposal to support immutable reference types, references to internally synchronized types that use lock free data structures, types that want to be deep copied, etc.

ActorSendable is about ensuring memory safety when references are moved across a serialization boundary. A type that includes stored properties of a non-ActorSendable type a manual conformance is necessary to ensure data races are not introduced. This manual conformance bears great responsibility to uphold the intended semantics of the protocol.

In my view a refining ValueSemantic protocol should be defined using a very similar strategy. The differences are that this time we're talking about a "boundary" that is crossed every time a value is copied. The property we seek to uphold is "no sharing of mutable data". Each instance of the type must be semantically independent. (note: independence of copies is subtly different than purity / referential transparency: Array.capacity is semantically independent but not referentially transparent)

In order to protect against mutable sharing when stored properties include references to heap data it is essential for these references to be encapsulated: they must not be non-public. When all references are encapsulated in this way it is possible for the author of the type to avoid mutable sharing using copy-on-write. Trivially, access to global mutable data is off limits: a type that stores and uses an Int index into global mutable data should not provide a conformance.

To summarize, in my view this protocol is really about controlling access and use of references to heap allocated data in order to avoid mutation of shared data. While this protocol does not introduce any new requirements, its semantics do support the default implementation of unsafeSendToActor that simply returns self.

Relationship to purity

When we speak of value semantics we are usually talking about more than just the definition provided above. The members of types conforming to the ValueSemantic protocol as defined above will usually be referentially transparent, especially when all parameters also meet the ValueSemantic requirement. In the latter case, as long as a member avoids access to global mutable data it will be referentially transparent (with rare exceptions such as Array.capacity).

Given this close relationship and the conventional assumptions made about value semantic types I think it's important to alert programmers when a member of a value semantic type breaks referential transparency. Because there is no language support, learning about these exceptions and why they matter is often difficult for programmers not already steeped in FP (and even for some who are).

I also think it would be good for the language to understand the referential transparency of members of value semantic types without requiring annotation beyond the syntax used today. A ValueSemantic conformance could imply purity of members, with an annotation required where non-referentially-transparent members are necessary and intended by the programmer. However, this is a topic for a separate thread (either now or in the future).

Conclusion

Even without language understanding of purity I believe a ValueSemantic protocol would aid significantly in everyday reasoning about code. Crucially, it would distinguish types intended to behave in the conventional value semantic way from value types that have nothing to do with value semantics. This distinction is one that is subtle and difficult to teach. This protocol, along with synthesized conformances would go a long way towards helping programmers understand when they are stepping outside the world of "trivially composable" value semantics in their type declarations.

The ValueSemantic protocol is not a replacement for language understanding of purity and referential transparency, but it is a valuable complement to it.

(Note: we could bikeshed on the name of this protocol if the community wants to preserve "value semantic" as referring to purity / referential transparency. Regardless of its name, I think a protocol with the semantics described above would be a good thing)

Chris_Lattner3 · November 2, 2020, 5:50pm

Random note but @tali pointed out that ValueSemantic can be extremely useful for making collection transfers across actors more efficient.

Joe_Groff · November 2, 2020, 6:03pm

The name ValueSemantic definitely overpromises what this does, IMO. It's only really making a statement about the semantics of ActorSendable, so maybe it could be called PureActorSendable or something like that, since the protocol may be useful for making more efficient refined implementations for collections. (In previous discussions we'd had about language-level purity support, we'd discussed having a way to impose purity over the requirements of an existing protocol, like pure Collection or pure ActorSendable, which could be applicable here as well.)

michelf · November 2, 2020, 6:19pm

b only need to exist to affect the capacity of a after a is first mutated (by forcing a to allocate new storage, or not). Although if you change b first it’ll stop affecting a as storage is no longer shared.

xwu · November 2, 2020, 6:28pm

If you call a function that mutates a. My point is: unless you do that, nothing that you do to b affects a—is that correct?

It is trivially the case that other values existing can affect how capacity is allocated for a when you try to mutate a: there is finite memory available, after all.

michelf · November 2, 2020, 6:53pm

I guess what you are getting at is that value semantics does not imply mutating the value has to be independent of the global state. Purity would imply that, but not value semantics.

dabrahams · November 2, 2020, 7:33pm

I think we can do better than that by formalizing the type's set of basis operations. UnsafePointer can only be said to have value semantics if dereferencing is not classified as a basis operation. I don't think that would be a very useful way to define UnsafePointer's basis, though.

For the purposes of reasoning about multithreading and providing other guarantees related to value semantics (for example, there are some default implementations in RangeReplaceableCollection that only work if the collection has value semantics), it is sufficient to choose one view of what constitutes any type's basis. Then if we want to use just a subset of the type's basis operations, we can either:

put it in a subsetting wrapper type, or
reason about behavior in terms of the guarantees provided for a hypothetical subsetting wrapper type without actually creating one.

I think there are probably a set of default rules we can follow that minimize the amount of explicit specification needed for the basis operations, e.g., every public method is a basis operation, every property and subscript is either 1 or 2 basis operations, depending on whether it's writable. But these are just defaults; to cover the range of possible APIs, a type's author may need to include arbitrary other operations in the type's basis. Outside the defining file, and certainly outside the defining module, all safe operations must be defined in terms of the basis, and can't introduce any new reference semantics.

Aside

Does String have value or reference semantics? It depends on if you look at it as some text, or as an obj-c selector that references a method.

I don't understand, but it might just be my ignorance of ObjC details. IIUC a method or selector, without an instance, has no mutable state, so there's no possibility of reference semantics.

anandabits · November 2, 2020, 7:34pm

I can see the argument that ValueSemantic overpromises but I don't see a reason the protocol should only be allowed to make statements about the semantics of ActorSendable. Why can't it require encapsulation of references to mutable heap data and copy-on-write? IMO, the semantics should specifically include something to the effect of "copying never introduces sharing of mutable state".

Would something along the lines of IndependentValue promise less than ValueSemantic in your mind?

dabrahams · November 2, 2020, 7:52pm

I've been down that road too…

I don't think that's necessary if you define the type's basis operations. In a type with value semantics, the basis operations preserve logical independence of distinct variables.

If you attach value semantics to a type rather than a function, adding an extension could breaks value semantics, or subclassing a class could breaks value semantics.

Eww, let's not talk about classes Okay, fine, we can talk about classes. IMO most any non-final class must be considered to have reference semantics.

Even without extensions: Array breaks value semantics by exposing its capacity

AFAICT, all of the examples you've shown exhibit value semantics by my definition.

Also, capacity is what's known as an non-salient attribute of Array, so one shouldn't necessarily expect it to behave in an ordinary way. Of course, “salience” isn't documented anywhere for Array, but it's another aspect needed to nail down value semantics. This is the same thing Joe was describing when he said of capacity:

Joe_Groff · November 2, 2020, 8:06pm

Isn't that all implied by the contract of unsafeActorSend? My point was that the protocol requirement doesn't make any guarantees about any other operations on the type.

anandabits · November 2, 2020, 8:10pm

No, Chris specifically wants to support sending lock-free concurrent data structures, etc. These can include shared mutable state.

I'm suggesting that this protocol places a requirement not on a specific operation, but on all declarations of the type within the declaring module. Specifically, the module is required to encapsulate heap references and ensure copy-on-write is used any time that data is mutated.

Joe_Groff · November 2, 2020, 8:16pm

If we were to model those things like Rust does, then they would not be considered mutable state in the sense you need inout exclusive access to modify them; shared borrows are sufficient to modify them.

That's a strange non-local effect of the declaration, and it still wouldn't be sufficient to make any guarantees about the behavior of operations on the type.

anandabits · November 2, 2020, 8:42pm

Call it strange if you want, but I think these properties are implicitly assumed when people talk about a type having "value semantics". Personally, I think it's extremely useful to be able to talk about properties of a type in aggregate.

FWIW, if ValueSemantic conformance could require an impure annotation on impure members the name would no longer overpromise. The default behavior of members would align with the intended regular semantics of the type and it would be immediately clear to both authors and users when a member is impure. This would move the language forward without requiring explicit pure annotations all over our code and provide a clear meaning for the informal terminology that is pervasive in the community. It would also highlight the impure members that are exceptions to a type's regular semantics. This is important because these members can and do lead to incorrect code.

I know you've suggested a => arrow for pure functions in the past, but I am less convinced of that than I used to be. It would be a lot harder to catch incorrect uses of -> instead of => in code review than it would to catch incorrect uses of impure. Further, no existing code uses => yet lots of that code is pure. Given source stability requirements, I think requiring occasional impure annotations on members of value types when a ValueSemantic conformance is added would be much better than requiring lots of code to replace -> with =>.

It seems like it may be useful to have a separate thread to discuss the topic of pure / value-semantic operations. Is that topic is on the table for discussion in this context? Or would it be deemed out of scope in the timeline of the concurrency proposals?

dabrahams · November 2, 2020, 9:32pm

Since I've now linked to two different examples here, I should point out that some people have gone down this road before us. John Lakos, in particular, did a pretty solid job of attacking the problem for C++, and his work is based on the work of Alexander Stepanov and Sean Parent on “regular types” (Titus Winters has written a pretty good introduction to the idea). IMO if we want to succeed here we'd all do well to read up on the work that has come before us.

Cheers,
Dave

Joe_Groff · November 2, 2020, 9:50pm

I understand that that's what people implicitly assume, but danger lurks in the holes in those assumptions, and reinforcing people's misconceptions without addressing those holes doesn't seem like a good idea to me.

anandabits · November 2, 2020, 10:18pm

Sure, that's why I like the idea of requiring an annotation on impure members of value semantic types.

dabrahams · November 2, 2020, 10:49pm

Hi All,

There's a lot to catch up on in this thread, and I actually need to write responses in order to assimilate it, so apologies in advance if things I mention here are covered downthread. @Paul_Cantrell's approach was posted while I was composing my first stab at the problem here, and seems to be following a similar overall strategy…

Paul_Cantrell:

Let x, y, z,… with types X, Y, Z,… be all the variables touched by a given Swift expression E, where “touched” means:

any variable that is lexically present (i.e. appears in the actual source code) for the expression,

plus the implicit variable memory: Memory for a dereference of any Swift *Pointer type, where Memory does not have value semantics by definition,

plus all the global variables touched in the transitive closure over function calls (and subscripts, getters, etc.) of E.

If all the types X, Y, Z,… have value semantics , then there does not exist any expression E’ that touches none of x, y, z… but whose insertion before E alters the result of E.

A type T has value semantics if the statement above is true for all possible E which touch a variable of T. (I have a logic mutual recursion problem here, but I think the intent is clear and it’s bedtime in Minneapolis.)

I think this formulation is missing some quantifiers and other formalities, and I started to try to rephrase it a bit more rigorously so I could understand it, but I ran into problems, mostly surrounding “if the statement above is true.” The statement above is a definitional claim, so it has to be taken to be true.

But I'll try anyway; please tell me if I've got it:

A type T has value semantics if and only if there exists no pair of expressions (E₀, E₁) such that all of the following are true:

A variable of type T is touched by E₁.
The sets of variables touched by E₀ and E₁ are disjoint.
Any variables touched by E₁ that are not of type T have value semantics.
(E₀, E₁).1 is not equivalent to E₁

Unfortunately, I'm pretty sure this type satisfies that definition, for the same basic reason my original definition failed: the fact that the set of variables touched by interesting expressions that touch X always touch a Y means that point 3 is never satisfied.

struct X { 
  class Y { var i = 3 }
  let y = Y()
}

dabrahams · November 2, 2020, 11:01pm

I think maybe you're misunderstanding the point of this thread; it really has nothing to do with ActorSendable. We're trying to understand the set of promises that should be given by a protocol called ValueSemantic. We have an intuitive idea of what that should mean, which has proven useful for informally reasoning about programs. We're trying to formalize that intuitive idea enough to make it rigorously applicable.

Paul_Cantrell · November 2, 2020, 11:07pm

Yes, that’s the spirit of what I was muddling my way toward. Thanks for cleaning up my logic!

dabrahams:

Unfortunately, I'm pretty sure this type satisfies that definition, for the same basic reason my original definition failed: the fact that the set of variables touched by interesting expressions that touch X always touch a Y means that point 3 is never satisfied.
struct X { 
  class Y { var i = 3 }
  let y = Y()
}

Hmm, that is vexing. I would want E₀ = x.y.i += 1 and E₁ = x.y.i to falsify X having value semantics.

Could it work to alter 3 as follows?

Any variables touched by E₁ that are not of type T either have value semantics, or are reachable by a chain of member accesses from a value of T.

That fixes your particular example. Pondering whether it’s actually getting at an underlying principle (“value semantics are about not bringing along visible references to shared data”), or is just a definition kludge.

Joe_Groff · November 2, 2020, 11:51pm

I still don't think the concept is rigorously applicable divorced from talking about a specific operation. The topic came up in response to the discussion around ActorSendable, but whether values of a type can be sent without copying additional data outside the value doesn't say anything about the properties of any other operations.

ValueSemantic protocol

Semantics of ValueSemantic

Relationship to purity

Conclusion

Semantics of `ValueSemantic`