ValueSemantic protocol

Call it strange if you want, but I think these properties are implicitly assumed when people talk about a type having "value semantics". Personally, I think it's extremely useful to be able to talk about properties of a type in aggregate.

FWIW, if ValueSemantic conformance could require an impure annotation on impure members the name would no longer overpromise. The default behavior of members would align with the intended regular semantics of the type and it would be immediately clear to both authors and users when a member is impure. This would move the language forward without requiring explicit pure annotations all over our code and provide a clear meaning for the informal terminology that is pervasive in the community. It would also highlight the impure members that are exceptions to a type's regular semantics. This is important because these members can and do lead to incorrect code.

I know you've suggested a => arrow for pure functions in the past, but I am less convinced of that than I used to be. It would be a lot harder to catch incorrect uses of -> instead of => in code review than it would to catch incorrect uses of impure. Further, no existing code uses => yet lots of that code is pure. Given source stability requirements, I think requiring occasional impure annotations on members of value types when a ValueSemantic conformance is added would be much better than requiring lots of code to replace -> with =>.

It seems like it may be useful to have a separate thread to discuss the topic of pure / value-semantic operations. Is that topic is on the table for discussion in this context? Or would it be deemed out of scope in the timeline of the concurrency proposals?

Since I've now linked to two different examples here, I should point out that some people have gone down this road before us. John Lakos, in particular, did a pretty solid job of attacking the problem for C++, and his work is based on the work of Alexander Stepanov and Sean Parent on “regular types” (Titus Winters has written a pretty good introduction to the idea). IMO if we want to succeed here we'd all do well to read up on the work that has come before us.

Cheers,
Dave

8 Likes

I understand that that's what people implicitly assume, but danger lurks in the holes in those assumptions, and reinforcing people's misconceptions without addressing those holes doesn't seem like a good idea to me.

2 Likes

Sure, that's why I like the idea of requiring an annotation on impure members of value semantic types.

Hi All,

There's a lot to catch up on in this thread, and I actually need to write responses in order to assimilate it, so apologies in advance if things I mention here are covered downthread. @Paul_Cantrell's approach was posted while I was composing my first stab at the problem here, and seems to be following a similar overall strategy…

I think this formulation is missing some quantifiers and other formalities, and I started to try to rephrase it a bit more rigorously so I could understand it, but I ran into problems, mostly surrounding “if the statement above is true.” The statement above is a definitional claim, so it has to be taken to be true.

But I'll try anyway; please tell me if I've got it:

A type T has value semantics if and only if there exists no pair of expressions (E₀, E₁) such that all of the following are true:

  1. A variable of type T is touched by E₁.
  2. The sets of variables touched by E₀ and E₁ are disjoint.
  3. Any variables touched by E₁ that are not of type T have value semantics.
  4. (E₀, E₁).1 is not equivalent to E₁

Unfortunately, I'm pretty sure this type satisfies that definition, for the same basic reason my original definition failed: the fact that the set of variables touched by interesting expressions that touch X always touch a Y means that point 3 is never satisfied.

struct X { 
  class Y { var i = 3 }
  let y = Y()
}

I think maybe you're misunderstanding the point of this thread; it really has nothing to do with ActorSendable. We're trying to understand the set of promises that should be given by a protocol called ValueSemantic. We have an intuitive idea of what that should mean, which has proven useful for informally reasoning about programs. We're trying to formalize that intuitive idea enough to make it rigorously applicable.

2 Likes

Yes, that’s the spirit of what I was muddling my way toward. Thanks for cleaning up my logic!

Hmm, that is vexing. I would want E₀ = x.y.i += 1 and E₁ = x.y.i to falsify X having value semantics.

Could it work to alter 3 as follows?

  1. Any variables touched by E₁ that are not of type T either have value semantics, or are reachable by a chain of member accesses from a value of T.

That fixes your particular example. Pondering whether it’s actually getting at an underlying principle (“value semantics are about not bringing along visible references to shared data”), or is just a definition kludge.

1 Like

I still don't think the concept is rigorously applicable divorced from talking about a specific operation. The topic came up in response to the discussion around ActorSendable, but whether values of a type can be sent without copying additional data outside the value doesn't say anything about the properties of any other operations.

I actually think I have a pretty straightforward definition (ever so slightly recursive) of a type with value semantics. It involves three values instead of two. The only personal device I have is a measly phone, so I can’t tap it out at the moment, but I will shortly... Perhaps it’s woefully inadequate, but it has the virtue of being simple and—near as I can tell turning it over in my head—comports with most or all intuitive notions of value semantics.


Edit: Suppose I have a type T, and I want to know whether it has value semantics.

This type may or may not have a conformance to Equatable, but for clarity, we will not use the definition of equivalence given by any conformance of T to Equatable. That is, we shall not restrict ourselves to "salient" properties as arbitrarily chosen by the author of T.

Instead, consider all possible pairs of two "independently obtained" immutable values a and b of type T. By independently obtained, I mean created by any other means than assignment (let a = b or let b = a).

Now, let's create a mutable binding var c = a.

Then, type T has value semantics if, for every possible operation f (noting that a are b are immutable, so these operations don't mutate a or b) where each consecutive invocation of f(a) is indistinguishable from each other and from f(b), there is no intervening operation mutating c that causes the result of a then subsequent invocation of f(a) to be distinguishable from f(b).

("Indistinguishable" is then defined recursively by the set of operations g on f(a) and f(b) such that g(f(a)) is indistinguishable from g(f(b)) for all such g. The description above is a little clumsy, but hopefully it gets the point across.)

1 Like

Matthew, your post contained at least one gold nugget for me, but lots of it, I still don't understand. Sorry if it seems like I'm taking issue with every point; it's just because I'm trying to suss out what you're really saying.

Actually, as I said in my reply to Joe, that fact is incidental to the reason I opened the thread. I'd be interested in the answer to the question, as I described it to Joe, even in the absence of ActorSendable. That said, I'd consider a definition of ValueSemantic to be a failure if it admitted types where the default implementation {self} for unsafeSendToActor was incorrect.

The refinement introduces an additional semantic requirement internal serialization of mutation is not sufficient. Instead, copy-on-write must be used for all heap allocated data.

I think maybe you have described these things in opposition when they are in fact causally related. The way Swift is currently defined, the only way to internally serialize mutation of types with mutable heap allocated data is to use CoW. So it's not an “instead” thing, I think.

Also, I'm currently working on interoperability with C++, and I think it's pretty much inevitable that as a consequence, we'll end up with some types in Swift that eagerly copy heap-allocated data. Those could still be ValueSemantic, so there's nothing fundamental about CoW here (which is an implementation detail anyway).

All that said, I think I don't quite understand why these facts are important. Can you help me out by connecting them to something else you've said?

Where “pure” has the non-traditional meaning that includes an ability to mutate via inout, I presume? As I define them, the basis operations of a value type are always pure. Note, however, that I'm not sure I want to outlaw all operations with side-effects other than inout mutation in the basis operations of a ValueSemantic type. For example, printing to the console is a side-effect that doesn't, notionally, break value semantics.

We must acknowledge and embrace this distinction between "value"-ness and purity if we are going to have any hope of providing clear semantics for this protocol.

That's plausible, but I don't think it's obvious. Care to explain why you say that?

A correct conformance to a protocol can only depend on code that is under control of the author of the conformance.

Maybe I don't understand what you mean by “control.” I'm pretty sure I don't control code in the standard library and yet I think I can depend on that code in a correct conformance.

As has already been pointed out, anyone can add a trivial impure extension to any type. For this reason, purity of members cannot be considered a necessary condition of a valid conformance .

Agreed. In fact, whether something is a member is irrelevant. For example, a type can have fileprivate members and its basis operations can all be defined as public functions at module scope in the same file.

Further, I argue that if the semantics of this protocol cannot depend on the definition of purity then it isn't necessary to provide a definition at this time (that can be defined separably in a future proposal to introduce support for pure functions).

Would you mind spelling out the basis of your argument? Just trying to follow along here.

This is a great point. I think it follows from that point that any operations on a ValueSemantic type that expose reference semantics must be considered unsafe (and so named if they are visible outside the type's domain of definition).

Precedent: withUnsafeMutablePointer(&x), where x is ValueSemantic, exposes reference semantics. Remember, whether operations are members of the type is irrelevant.

(note: independence of copies is subtly different than purity / referential transparency: Array.capacity is semantically independent but not referentially transparent)

Hmm, the expression x.capacity, where x is an Array, seems to meet the Wikipedia definition of referential transparency, which basically means “no side-effects:” you can replace it with its result without changing the meaning of the program. Maybe you mean something else? Please also let me know if other statements about referential transparency in your post need to be revised.

In order to protect against mutable sharing when stored properties include references to heap data it is essential for these references to be encapsulated: they must not be non- public .

One reason I am attracted to the idea of basis operations is that I don't think any definition that depends on a particular level of access control is going to work. Your whole program might be in one file and there might be nothing public anywhere. Another is that it admits regular functions and isn't tied to whether something is a member.

To summarize, in my view this protocol is really about controlling access and use of references to heap allocated data in order to avoid mutation of shared data.

Yes, value semantics is about avoiding mutation of shared data. I don't see how it can reasonably be about heap allocated data at a fundamental level, though; there are lots of types that can “share mutable data” in the absence of heap allocations (e.g. UnsafeMutableBufferPointer can be bound to the memory of a stack-allocated tuple).

Thanks for wading through my questions; I'm really looking forward to hearing more.

Yep, I'm well acquainted with your point-of-view on this, and you've even convinced me several times that you're probably right! You might yet turn out to be right. What I've really been convinced of, though, is that if it can be done, it's hard, and I'm not ready to give up on that basis alone.

My current view, which leaves me hopeful of success, is:

  • Assuming you can rigorously apply the concept when talking about one specific operation, then you can do the same for a well-defined set of operations that constitute a value-semantic basis for the type.
  • It's possible to define the concept so that stepping outside the value semantic basis is clearly identifiable (e.g. marked “unsafe”).
  • The language provides natural boundaries beyond which we can reason that compositions of value semantic basis operations always have value semantics, so you don't need to think about every possible extension individually.
1 Like

Let me guess: it doesn't quite fit in the margin, Monsieur Fermat? :wink:

That's fine, if the author of T has supplied a conformance to Equatable, that defines which properties are salient. Either way, the author is choosing the salient properties. If the conformance to Equatable was written by someone else… I think I'll just assume that's off the table for the sake of your argument.

Nit: that's not assignment, but initialization. Presumably you want to rule out both of them?

Just trying to get a grip on this. Does let a = T(x: b.x, y: b.y, z: b.z) qualify as “independently-created,” where x, y, and z are the stored properties of T and the initializer does “the obvious thing” with its arguments?

You lost me here. What makes one invocation distinguishable from another? Is this a condition you're trying to impose on the semantics of f, or is it a claim you're making about f?

there is no intervening operation mutating c that causes the result of a then subsequent invocation of f(a) to be distinguishable from f(b) .

Sorry, I'm not there yet. Perhaps you could clarify a bit?

3 Likes

A particular wish I’ve had for a ValueSemantics protocol is in writing Siesta, whose purpose is to provide a widely shared observable cache of HTTP resources. Resources have some shared structure (HTTP headers, metadata about staleness etc.), but also user-definable content types.

Siesta-based apps take a sort of “reactive-lite” approach where any code that depends on a particular API response becomes an observer of that response — and any changes to the data trigger notifications.

I would require the content type of Siesta resources to have value semantics if the language let me do it. As it stands, a note buried in Siesta’s documentation merely warns about it.

It seems to me SwiftUI might have use for a similar constraint limiting @State to types with value semantics, for similar reasons? (Forgive me, my SwiftUI knowledge is thin; I may be totally off the mark here.)

Yes, let's take it off the table. That is, we must be able to determine whether a type T can have value semantics even when it doesn't conform to Equatable.

For the sake of strictness, let's rule this out too. That is, a and b are created without reference to each other in the code in any way. We ask about all such possible pairs of "independently created" values.

Sorry. I did say the wording was clumsy. In code:

let result_0 = f(a)
let result_1 = f(a)
// ...repeat an arbitrary number of times:
let result_n = f(a)
let result = f(b)
var c = a
mutate(&c)
let result_prime = f(a)

If result_0 is indistinguishable from result_{1...n} and from result, then the type T has value semantics if(f?) result_prime is indistinguishable from result_{0...n} (and result) for all possible nonmutating operations f (however spelt of course, whether property, subscript, member function, free function, etc.) and mutating functions mutate (however spelt), and all possible pairs of such independently created values a and b of type T.

I define "indistinguishable" recursively by saying that result_1 is indistinguishable from result_2 if(f?) g(result_1) is indistinguishable from g(result_2) for all g that fulfill similar semantic requirements as f given above.


Hmm, this all seems to reduce to just a pretty run-of-the-mill definition of purity. The idea, though, is this:

  • @Joe_Groff believes that value semantics can be rigorously applied with respect to a specific operation.
  • You argue that a type could then be considered to have value semantics if its author defines a set of basis operations to which such rigorous semantics can be applied.
  • Can we instead find some definition of the set of operations on a type to which such rigorous semantics can be applied by some way other than the declaration of such a set by the type's author? If we can, then without the author declaring which operations are basis operations, we can rigorously figure out if a type has value semantics if we can rigorously figure out if any specific operation has value semantics.

I claim that the set of such operations can be the entire set of operations that give indistinguishable results when operating on two "independently created" values that are the "same." By that criterion, for instance, the memory address of two values is excluded from that set of operations, because "independently created" values by any definition of "sameness" will nonetheless have different memory addresses (unless for some reason that type always represents every possible value by a pointer to a fixed location in memory--in which case, then the memory address of such values can be part of the set of operations determining value semantics).

Absolutely! I was trying to emphasize the aspects of ActorSendable that are insufficient but I agree that ValueSemantic is useful on its own.

Fair enough, but a Swift type could wrap C or C++ code that uses sophisticated techniques, right?

Again, fair enough. The property that matters is avoiding mutation of shared data. The implementation technique is incidental, even if copy-on-write is conventional in Swift.

Yes, of course!

I think pure should follow the conventional meaning of referentially transparent. Maybe someday Swift will have an algebraic effect system. If it did I could understand some programmers choosing to make the tradeoff of allowing console effects in their value type members. I also think it would be reasonable to support a debugPrint function that is a no-op in release builds and is allowed in pure functions.

Purity (referential transparency) is a property of operations not types. As Joe and others have repeatedly pointed out this property is fundamental to what we mean by "value semantics". But as they have also pointed out, this property only describes individual operations, not types. Since a protocol is applied to a type an independent concept is necessary which defines a property of a type.

What I was trying to say is that a protocol should not try to impose semantics requirements that are impossible to fulfill, such as "all members of the type must be pure".

If the semantics of a protocol don't depend on the definition of purity why would it be necessary to provide such a definition? That said, there is an intimate relationship with purity and the definition would be useful to have. I was hoping it might be possible to avoid roping in yet another large and complex topic into the already large set of discussions happening right now.

Quoting Wikipedia:

the expression value must be the same for the same inputs and its evaluation must have no side effects

The reason Array.capacity is not referentially transparent is that "the same" here means ==. It's harder to pin down "the same" when an == conformance is not possible but in this example that is not a problem.

I agree that basis operations should be pure and that this is sufficient to capture the notion of a "value semantic" type. It does drag along some complexity that might be good to avoid though. Is it really necessary to determine whether Array.capacity is a basis operation or not to know whether or not Array has value semantics?

My thinking was to try and identify general rules that could be applied to a finite set of code (i.e. the declaring module) to determine whether or not a type has value semantics without relying on a notion of purity or information extrinsic to the code about which operations are basis operations. I used public as a proxy for "implementation boundary". You're right that this does not work for single file or single module programs. This line of thinking may be better suited to providing guidelines for upholding value semantics than it is to defining the notion.

Fair enough. I wasn't thinking about edge cases like this.

No problem. Thanks for pointing out several oversights in my thinking. Looking forward to your reply!

One reason I keep going back to the idea that ValueSemantic types should have pure operations by default is that it is conservative in that it only forces programmers to think about purity when they define an impure operation. This feels more in the spirit of progressive disclosure and would provide important feedback (in the form of error messages) as programmers work.

While an exact set of basis operations would not be explicitly defined, a de-facto basis would be defined (specifically, the pure operations). impure members would be de-facto considered outside the basis. This seems like a useful approximation of basis that doesn't rely on humans coming to agreement on a topic that can be very subtle. It would also serve as a good place to start for teams that are interested in going further and defining a more precise basis for their types.

EDIT: it looks like @xwu is thinking along similar lines

Yeah, I think it's important to distinguish arguments that are just rephrasings of things we've already said from those that bring something new to the table.

I think you mean “if we can rigorously figure out if any specific operation has purity?”

While I think we can reduce the need for explicit basis definitions to zero in most cases by choosing the right defaults, I am not betting on being able to eliminate them entirely, for a number of reasons that I'm having trouble spelling out cogently right now. In general terms, I'm thinking about the fact that it's often important to expose some unsafe and/or impure operations as part of a type's public API, that the operations available on a type are dependent on scope relationships and access control but we want the ValueSemantic conformance to be valid regardless of scope, and that we have no formal notions of purity in the type system.

For me personally, completely eliminating the need for explicit basis definition seems like a hard problem with few benefits, compared to almost completely eliminating the need, which seems obviously possible. The more important problem at this point is to validate or refute my hypothesis that a well-defined pure basis operation set allows us to create a useful definition of ValueSemantic, because without that, it's irrelevant whether the basis sometimes requires explicit declaration.

2 Likes

Sure. I don't think I understand why that's relevant, though.

I think pure should follow the conventional meaning of referentially transparent.

Perhaps, though I'm not sure I understand exactly what that means. “Referentially transparent” applies to expressions and “pure” applies to functions. Maybe you mean something like, “A function is pure iff every valid invocation in which all arguments are side-effect-free is a referentially-transparent expression.”

There's another horrible wrinkle: there's “pure” and there's “pure in the absence of multithreading.” For example, on a single thread (c.f. @UIActor) a memoized function that uses an unsynchronized Dictionary as a cache is perfectly pure for semantic purposes. But it's not thread-safe. IMO conformance to a ValueSemantic protocol should mean the basis operations have thread-safe purity. It's just a more generally-useful constraint.

It's definitely fundamentally related, but I'm less sure that it is more fundamental. To understand whether a function is consistent with value semantics, in general, you need to look at not only the operations performed by the function, but the invariants of the types on which it operates. For example (it may not be the case in Swift today, but once we add move-only types, or C++ interop, it will change) you can have a type with dynamically-allocated storage that is known never to be shared (that's an invariant). Obviously you can mutate that storage freely without breaking value semantics. The invariants of a type are defined by its set of basis operations and their legal usage.

But as they have also pointed out, this property only describes individual operations, not types. Since a protocol is applied to a type an independent concept is necessary which defines a property of a type .

Sure. On the other hand, IMO, there are ways to describe value semantics without defining purity at all. There have been several attempts in this thread that simply describe the properties of referential transparency as they apply to value semantic types.

a protocol should not try to impose semantics requirements that are impossible to fulfill,
such as "all members of the type must be pure".

Maybe you mean, “impossible for the author to enforce?” Certainly I can fulfill that requirement:

struct X {}

Are you just saying we need to account for the possibility that anyone can add any API they want in an extension? If so, sure. Let's stop thinking of members as though they are special, except in defining a useful default basis.

Gotcha! I misread it as "it isn't necessary to provide a definition of ValueSemantic at this time.” Also the use of “cannot” (as opposed to “don't”, which you used just above) was confusing. Thanks for clearing that up.

That said, there is an intimate relationship with purity and the definition would be useful to have. I was hoping it might be possible to avoid roping in yet another large and complex topic into the already large set of discussions happening right now.

Agreed. Actually I typed and deleted the suggestion that maybe we could avoid defining purity for now in this reply.

It is still a problem when the element type is not Equatable, but your point is taken. capacity is a non-salient attribute and thus not part of the value of the array. So the values of the inputs a and b can be the same, but a.capacity can differ from b.capacity.

I agree that basis operations should be pure and that this is sufficient to capture the notion of a "value semantic" type.

I don't think I've quite proven that yet. I need to write down a formal definition and see if it holds up.

It does drag along some complexity that might be good to avoid though. Is it really necessary to determine whether Array.capacity is a basis operation or not to know whether or not Array has value semantics?

I don't think so; I hope not. My post with a formal definition should make the answer clear, though.

My thinking was to try and identify general rules that could be applied to a finite set of code (i.e. the declaring module) to determine whether or not a type has value semantics without relying on a notion of purity or information extrinsic to the code about which operations are basis operations. I used public as a proxy for "implementation boundary". You're right that this does not work for single file or single module programs. This line of thinking may be better suited to providing guidelines for upholding value semantics than it is to defining the notion.

I think it's a fine way to avoid being explicit everywhere, just as I feel comfortable not explicitly documenting the complexity of any property that can be accessed in O(1).

A couple of problems with that:

  • A chain of member accesses does not “reach a variable.” But maybe you just mean s/variable/value/ everywhere in the whole definition?
  • Whether or not something has the syntactic form of a member access is fundamentally irrelevant to value semantics, so I don't see how this can fix the problem.

Yes that's right. I didn't notice the subtle distinction before, probably because I was introduced to the concept in functional languages where functions are defined as expressions. Thanks for pointing it out.

I agree that pure functions must be thread-safe.

There's a further wrinkle in your wrinkle: memoization involves shared mutable state so I wouldn't call it "pure" in implementation. I would expect the compiler to produce an error in this case and require an "unsafe pure" annotation.

Yes, I would expect the compiler to understand this and therefore allow mutation of uniquely owned state in "pure" functions.

I don't know how the compiler could ever understand arbitrary invariants though. Depending on the invariants that are relied upon by an operation, it wouldn't surprise me if "unsafe pure" annotations were necessary. That said, I would not expect to need "unsafe pure" for assertions, preconditions and fatal errors. These are like "bottom" in Haskell.

Yes that is what I mean and fair enough.

Yep. Interestingly, this is an impure operation that I don't think the compiler would have any chance of rejecting. It would require an explicit impure annotation.

Looking forward to seeing what you come up with.

I'm going to give this a shot:

Let the canonical basis set O of a type T be the set of basis operations wrapped such that each inout parameter is replaced with a normal parameter and a corresponding element in result tuple of the transformed operation and replacing throws with a Result wrapper around the result tuple.

A type T has value semantics if and only if for each operation o in O and each valid argument tuple a consisting of expressions ( E0 ... En ) the following is always the case:

let input = a
let result = o(a)
assert(input == a)

Further, it is always the case that for any valid argument tuple b where b == a, it is always the case that o(b) == o(a).

Note: this glosses over the fact that some value-semantic types don't implement ==, tuples don't implement == and Result<T, Error> doesn't implement ==.

1 Like