ValueSemantic protocol

Random note but @tali pointed out that ValueSemantic can be extremely useful for making collection transfers across actors more efficient.

1 Like

The name ValueSemantic definitely overpromises what this does, IMO. It's only really making a statement about the semantics of ActorSendable, so maybe it could be called PureActorSendable or something like that, since the protocol may be useful for making more efficient refined implementations for collections. (In previous discussions we'd had about language-level purity support, we'd discussed having a way to impose purity over the requirements of an existing protocol, like pure Collection or pure ActorSendable, which could be applicable here as well.)

2 Likes

b only need to exist to affect the capacity of a after a is first mutated (by forcing a to allocate new storage, or not). Although if you change b first it’ll stop affecting a as storage is no longer shared.

If you call a function that mutates a. My point is: unless you do that, nothing that you do to b affects a—is that correct?

It is trivially the case that other values existing can affect how capacity is allocated for a when you try to mutate a: there is finite memory available, after all.

I guess what you are getting at is that value semantics does not imply mutating the value has to be independent of the global state. Purity would imply that, but not value semantics.

1 Like

I think we can do better than that by formalizing the type's set of basis operations. UnsafePointer can only be said to have value semantics if dereferencing is not classified as a basis operation. I don't think that would be a very useful way to define UnsafePointer's basis, though.

For the purposes of reasoning about multithreading and providing other guarantees related to value semantics (for example, there are some default implementations in RangeReplaceableCollection that only work if the collection has value semantics), it is sufficient to choose one view of what constitutes any type's basis. Then if we want to use just a subset of the type's basis operations, we can either:

  • put it in a subsetting wrapper type, or
  • reason about behavior in terms of the guarantees provided for a hypothetical subsetting wrapper type without actually creating one.

I think there are probably a set of default rules we can follow that minimize the amount of explicit specification needed for the basis operations, e.g., every public method is a basis operation, every property and subscript is either 1 or 2 basis operations, depending on whether it's writable. But these are just defaults; to cover the range of possible APIs, a type's author may need to include arbitrary other operations in the type's basis. Outside the defining file, and certainly outside the defining module, all safe operations must be defined in terms of the basis, and can't introduce any new reference semantics.

Aside

Does String have value or reference semantics? It depends on if you look at it as some text, or as an obj-c selector that references a method.

I don't understand, but it might just be my ignorance of ObjC details. IIUC a method or selector, without an instance, has no mutable state, so there's no possibility of reference semantics.

2 Likes

I can see the argument that ValueSemantic overpromises but I don't see a reason the protocol should only be allowed to make statements about the semantics of ActorSendable. Why can't it require encapsulation of references to mutable heap data and copy-on-write? IMO, the semantics should specifically include something to the effect of "copying never introduces sharing of mutable state".

Would something along the lines of IndependentValue promise less than ValueSemantic in your mind?

1 Like

I've been down that road too…

I don't think that's necessary if you define the type's basis operations. In a type with value semantics, the basis operations preserve logical independence of distinct variables.

If you attach value semantics to a type rather than a function, adding an extension could breaks value semantics, or subclassing a class could breaks value semantics.

Eww, let's not talk about classes :wink: Okay, fine, we can talk about classes. IMO most any non-final class must be considered to have reference semantics.

Even without extensions: Array breaks value semantics by exposing its capacity

AFAICT, all of the examples you've shown exhibit value semantics by my definition.

Also, capacity is what's known as an non-salient attribute of Array, so one shouldn't necessarily expect it to behave in an ordinary way. Of course, “salience” isn't documented anywhere for Array, but it's another aspect needed to nail down value semantics. This is the same thing Joe was describing when he said of capacity:

4 Likes

Isn't that all implied by the contract of unsafeActorSend? My point was that the protocol requirement doesn't make any guarantees about any other operations on the type.

No, Chris specifically wants to support sending lock-free concurrent data structures, etc. These can include shared mutable state.

I'm suggesting that this protocol places a requirement not on a specific operation, but on all declarations of the type within the declaring module. Specifically, the module is required to encapsulate heap references and ensure copy-on-write is used any time that data is mutated.

If we were to model those things like Rust does, then they would not be considered mutable state in the sense you need inout exclusive access to modify them; shared borrows are sufficient to modify them.

That's a strange non-local effect of the declaration, and it still wouldn't be sufficient to make any guarantees about the behavior of operations on the type.

1 Like

Call it strange if you want, but I think these properties are implicitly assumed when people talk about a type having "value semantics". Personally, I think it's extremely useful to be able to talk about properties of a type in aggregate.

FWIW, if ValueSemantic conformance could require an impure annotation on impure members the name would no longer overpromise. The default behavior of members would align with the intended regular semantics of the type and it would be immediately clear to both authors and users when a member is impure. This would move the language forward without requiring explicit pure annotations all over our code and provide a clear meaning for the informal terminology that is pervasive in the community. It would also highlight the impure members that are exceptions to a type's regular semantics. This is important because these members can and do lead to incorrect code.

I know you've suggested a => arrow for pure functions in the past, but I am less convinced of that than I used to be. It would be a lot harder to catch incorrect uses of -> instead of => in code review than it would to catch incorrect uses of impure. Further, no existing code uses => yet lots of that code is pure. Given source stability requirements, I think requiring occasional impure annotations on members of value types when a ValueSemantic conformance is added would be much better than requiring lots of code to replace -> with =>.

It seems like it may be useful to have a separate thread to discuss the topic of pure / value-semantic operations. Is that topic is on the table for discussion in this context? Or would it be deemed out of scope in the timeline of the concurrency proposals?

Since I've now linked to two different examples here, I should point out that some people have gone down this road before us. John Lakos, in particular, did a pretty solid job of attacking the problem for C++, and his work is based on the work of Alexander Stepanov and Sean Parent on “regular types” (Titus Winters has written a pretty good introduction to the idea). IMO if we want to succeed here we'd all do well to read up on the work that has come before us.

Cheers,
Dave

8 Likes

I understand that that's what people implicitly assume, but danger lurks in the holes in those assumptions, and reinforcing people's misconceptions without addressing those holes doesn't seem like a good idea to me.

2 Likes

Sure, that's why I like the idea of requiring an annotation on impure members of value semantic types.

Hi All,

There's a lot to catch up on in this thread, and I actually need to write responses in order to assimilate it, so apologies in advance if things I mention here are covered downthread. @Paul_Cantrell's approach was posted while I was composing my first stab at the problem here, and seems to be following a similar overall strategy…

I think this formulation is missing some quantifiers and other formalities, and I started to try to rephrase it a bit more rigorously so I could understand it, but I ran into problems, mostly surrounding “if the statement above is true.” The statement above is a definitional claim, so it has to be taken to be true.

But I'll try anyway; please tell me if I've got it:

A type T has value semantics if and only if there exists no pair of expressions (E₀, E₁) such that all of the following are true:

  1. A variable of type T is touched by E₁.
  2. The sets of variables touched by E₀ and E₁ are disjoint.
  3. Any variables touched by E₁ that are not of type T have value semantics.
  4. (E₀, E₁).1 is not equivalent to E₁

Unfortunately, I'm pretty sure this type satisfies that definition, for the same basic reason my original definition failed: the fact that the set of variables touched by interesting expressions that touch X always touch a Y means that point 3 is never satisfied.

struct X { 
  class Y { var i = 3 }
  let y = Y()
}

I think maybe you're misunderstanding the point of this thread; it really has nothing to do with ActorSendable. We're trying to understand the set of promises that should be given by a protocol called ValueSemantic. We have an intuitive idea of what that should mean, which has proven useful for informally reasoning about programs. We're trying to formalize that intuitive idea enough to make it rigorously applicable.

2 Likes

Yes, that’s the spirit of what I was muddling my way toward. Thanks for cleaning up my logic!

Hmm, that is vexing. I would want E₀ = x.y.i += 1 and E₁ = x.y.i to falsify X having value semantics.

Could it work to alter 3 as follows?

  1. Any variables touched by E₁ that are not of type T either have value semantics, or are reachable by a chain of member accesses from a value of T.

That fixes your particular example. Pondering whether it’s actually getting at an underlying principle (“value semantics are about not bringing along visible references to shared data”), or is just a definition kludge.

1 Like

I still don't think the concept is rigorously applicable divorced from talking about a specific operation. The topic came up in response to the discussion around ActorSendable, but whether values of a type can be sent without copying additional data outside the value doesn't say anything about the properties of any other operations.

I actually think I have a pretty straightforward definition (ever so slightly recursive) of a type with value semantics. It involves three values instead of two. The only personal device I have is a measly phone, so I can’t tap it out at the moment, but I will shortly... Perhaps it’s woefully inadequate, but it has the virtue of being simple and—near as I can tell turning it over in my head—comports with most or all intuitive notions of value semantics.


Edit: Suppose I have a type T, and I want to know whether it has value semantics.

This type may or may not have a conformance to Equatable, but for clarity, we will not use the definition of equivalence given by any conformance of T to Equatable. That is, we shall not restrict ourselves to "salient" properties as arbitrarily chosen by the author of T.

Instead, consider all possible pairs of two "independently obtained" immutable values a and b of type T. By independently obtained, I mean created by any other means than assignment (let a = b or let b = a).

Now, let's create a mutable binding var c = a.

Then, type T has value semantics if, for every possible operation f (noting that a are b are immutable, so these operations don't mutate a or b) where each consecutive invocation of f(a) is indistinguishable from each other and from f(b), there is no intervening operation mutating c that causes the result of a then subsequent invocation of f(a) to be distinguishable from f(b).

("Indistinguishable" is then defined recursively by the set of operations g on f(a) and f(b) such that g(f(a)) is indistinguishable from g(f(b)) for all such g. The description above is a little clumsy, but hopefully it gets the point across.)

1 Like