ValueSemantic protocol

xwu · November 3, 2020, 12:10am

I actually think I have a pretty straightforward definition (ever so slightly recursive) of a type with value semantics. It involves three values instead of two. The only personal device I have is a measly phone, so I can’t tap it out at the moment, but I will shortly... Perhaps it’s woefully inadequate, but it has the virtue of being simple and—near as I can tell turning it over in my head—comports with most or all intuitive notions of value semantics.

Edit: Suppose I have a type T, and I want to know whether it has value semantics.

This type may or may not have a conformance to Equatable, but for clarity, we will not use the definition of equivalence given by any conformance of T to Equatable. That is, we shall not restrict ourselves to "salient" properties as arbitrarily chosen by the author of T.

Instead, consider all possible pairs of two "independently obtained" immutable values a and b of type T. By independently obtained, I mean created by any other means than assignment (let a = b or let b = a).

Now, let's create a mutable binding var c = a.

Then, type T has value semantics if, for every possible operation f (noting that a are b are immutable, so these operations don't mutate a or b) where each consecutive invocation of f(a) is indistinguishable from each other and from f(b), there is no intervening operation mutating c that causes the result of a then subsequent invocation of f(a) to be distinguishable from f(b).

("Indistinguishable" is then defined recursively by the set of operations g on f(a) and f(b) such that g(f(a)) is indistinguishable from g(f(b)) for all such g. The description above is a little clumsy, but hopefully it gets the point across.)

dabrahams · November 3, 2020, 12:37am

Matthew, your post contained at least one gold nugget for me, but lots of it, I still don't understand. Sorry if it seems like I'm taking issue with every point; it's just because I'm trying to suss out what you're really saying.

Actually, as I said in my reply to Joe, that fact is incidental to the reason I opened the thread. I'd be interested in the answer to the question, as I described it to Joe, even in the absence of ActorSendable. That said, I'd consider a definition of ValueSemantic to be a failure if it admitted types where the default implementation {self} for unsafeSendToActor was incorrect.

The refinement introduces an additional semantic requirement internal serialization of mutation is not sufficient. Instead, copy-on-write must be used for all heap allocated data.

I think maybe you have described these things in opposition when they are in fact causally related. The way Swift is currently defined, the only way to internally serialize mutation of types with mutable heap allocated data is to use CoW. So it's not an “instead” thing, I think.

Also, I'm currently working on interoperability with C++, and I think it's pretty much inevitable that as a consequence, we'll end up with some types in Swift that eagerly copy heap-allocated data. Those could still be ValueSemantic, so there's nothing fundamental about CoW here (which is an implementation detail anyway).

All that said, I think I don't quite understand why these facts are important. Can you help me out by connecting them to something else you've said?

Where “pure” has the non-traditional meaning that includes an ability to mutate via inout, I presume? As I define them, the basis operations of a value type are always pure. Note, however, that I'm not sure I want to outlaw all operations with side-effects other than inout mutation in the basis operations of a ValueSemantic type. For example, printing to the console is a side-effect that doesn't, notionally, break value semantics.

We must acknowledge and embrace this distinction between "value"-ness and purity if we are going to have any hope of providing clear semantics for this protocol.

That's plausible, but I don't think it's obvious. Care to explain why you say that?

A correct conformance to a protocol can only depend on code that is under control of the author of the conformance.

Maybe I don't understand what you mean by “control.” I'm pretty sure I don't control code in the standard library and yet I think I can depend on that code in a correct conformance.

As has already been pointed out, anyone can add a trivial impure extension to any type. For this reason, purity of members cannot be considered a necessary condition of a valid conformance .

Agreed. In fact, whether something is a member is irrelevant. For example, a type can have fileprivate members and its basis operations can all be defined as public functions at module scope in the same file.

Further, I argue that if the semantics of this protocol cannot depend on the definition of purity then it isn't necessary to provide a definition at this time (that can be defined separably in a future proposal to introduce support for pure functions).

Would you mind spelling out the basis of your argument? Just trying to follow along here.

This is a great point. I think it follows from that point that any operations on a ValueSemantic type that expose reference semantics must be considered unsafe (and so named if they are visible outside the type's domain of definition).

Precedent: withUnsafeMutablePointer(&x), where x is ValueSemantic, exposes reference semantics. Remember, whether operations are members of the type is irrelevant.

(note: independence of copies is subtly different than purity / referential transparency: Array.capacity is semantically independent but not referentially transparent)

Hmm, the expression x.capacity, where x is an Array, seems to meet the Wikipedia definition of referential transparency, which basically means “no side-effects:” you can replace it with its result without changing the meaning of the program. Maybe you mean something else? Please also let me know if other statements about referential transparency in your post need to be revised.

In order to protect against mutable sharing when stored properties include references to heap data it is essential for these references to be encapsulated: they must not be non- public .

One reason I am attracted to the idea of basis operations is that I don't think any definition that depends on a particular level of access control is going to work. Your whole program might be in one file and there might be nothing public anywhere. Another is that it admits regular functions and isn't tied to whether something is a member.

To summarize, in my view this protocol is really about controlling access and use of references to heap allocated data in order to avoid mutation of shared data.

Yes, value semantics is about avoiding mutation of shared data. I don't see how it can reasonably be about heap allocated data at a fundamental level, though; there are lots of types that can “share mutable data” in the absence of heap allocations (e.g. UnsafeMutableBufferPointer can be bound to the memory of a stack-allocated tuple).

Thanks for wading through my questions; I'm really looking forward to hearing more.

dabrahams · November 3, 2020, 12:56am

Yep, I'm well acquainted with your point-of-view on this, and you've even convinced me several times that you're probably right! You might yet turn out to be right. What I've really been convinced of, though, is that if it can be done, it's hard, and I'm not ready to give up on that basis alone.

My current view, which leaves me hopeful of success, is:

Assuming you can rigorously apply the concept when talking about one specific operation, then you can do the same for a well-defined set of operations that constitute a value-semantic basis for the type.
It's possible to define the concept so that stepping outside the value semantic basis is clearly identifiable (e.g. marked “unsafe”).
The language provides natural boundaries beyond which we can reason that compositions of value semantic basis operations always have value semantics, so you don't need to think about every possible extension individually.

dabrahams · November 3, 2020, 1:08am

Let me guess: it doesn't quite fit in the margin, Monsieur Fermat?

That's fine, if the author of T has supplied a conformance to Equatable, that defines which properties are salient. Either way, the author is choosing the salient properties. If the conformance to Equatable was written by someone else… I think I'll just assume that's off the table for the sake of your argument.

Nit: that's not assignment, but initialization. Presumably you want to rule out both of them?

Just trying to get a grip on this. Does let a = T(x: b.x, y: b.y, z: b.z) qualify as “independently-created,” where x, y, and z are the stored properties of T and the initializer does “the obvious thing” with its arguments?

You lost me here. What makes one invocation distinguishable from another? Is this a condition you're trying to impose on the semantics of f, or is it a claim you're making about f?

there is no intervening operation mutating c that causes the result of a then subsequent invocation of f(a) to be distinguishable from f(b) .

Sorry, I'm not there yet. Perhaps you could clarify a bit?

Paul_Cantrell · November 3, 2020, 1:48am

A particular wish I’ve had for a ValueSemantics protocol is in writing Siesta, whose purpose is to provide a widely shared observable cache of HTTP resources. Resources have some shared structure (HTTP headers, metadata about staleness etc.), but also user-definable content types.

Siesta-based apps take a sort of “reactive-lite” approach where any code that depends on a particular API response becomes an observer of that response — and any changes to the data trigger notifications.

I would require the content type of Siesta resources to have value semantics if the language let me do it. As it stands, a note buried in Siesta’s documentation merely warns about it.

It seems to me SwiftUI might have use for a similar constraint limiting @State to types with value semantics, for similar reasons? (Forgive me, my SwiftUI knowledge is thin; I may be totally off the mark here.)

xwu · November 3, 2020, 1:53am

Yes, let's take it off the table. That is, we must be able to determine whether a type T can have value semantics even when it doesn't conform to Equatable.

For the sake of strictness, let's rule this out too. That is, a and b are created without reference to each other in the code in any way. We ask about all such possible pairs of "independently created" values.

Sorry. I did say the wording was clumsy. In code:

let result_0 = f(a)
let result_1 = f(a)
// ...repeat an arbitrary number of times:
let result_n = f(a)
let result = f(b)
var c = a
mutate(&c)
let result_prime = f(a)

If result_0 is indistinguishable from result_{1...n} and from result, then the type T has value semantics if(f?) result_prime is indistinguishable from result_{0...n} (and result) for all possible nonmutating operations f (however spelt of course, whether property, subscript, member function, free function, etc.) and mutating functions mutate (however spelt), and all possible pairs of such independently created values a and b of type T.

I define "indistinguishable" recursively by saying that result_1 is indistinguishable from result_2 if(f?) g(result_1) is indistinguishable from g(result_2) for all g that fulfill similar semantic requirements as f given above.

Hmm, this all seems to reduce to just a pretty run-of-the-mill definition of purity. The idea, though, is this:

@Joe_Groff believes that value semantics can be rigorously applied with respect to a specific operation.
You argue that a type could then be considered to have value semantics if its author defines a set of basis operations to which such rigorous semantics can be applied.
Can we instead find some definition of the set of operations on a type to which such rigorous semantics can be applied by some way other than the declaration of such a set by the type's author? If we can, then without the author declaring which operations are basis operations, we can rigorously figure out if a type has value semantics if we can rigorously figure out if any specific operation has value semantics.

I claim that the set of such operations can be the entire set of operations that give indistinguishable results when operating on two "independently created" values that are the "same." By that criterion, for instance, the memory address of two values is excluded from that set of operations, because "independently created" values by any definition of "sameness" will nonetheless have different memory addresses (unless for some reason that type always represents every possible value by a pointer to a fixed location in memory--in which case, then the memory address of such values can be part of the set of operations determining value semantics).

anandabits · November 3, 2020, 6:16am

Absolutely! I was trying to emphasize the aspects of ActorSendable that are insufficient but I agree that ValueSemantic is useful on its own.

Fair enough, but a Swift type could wrap C or C++ code that uses sophisticated techniques, right?

Again, fair enough. The property that matters is avoiding mutation of shared data. The implementation technique is incidental, even if copy-on-write is conventional in Swift.

Yes, of course!

I think pure should follow the conventional meaning of referentially transparent. Maybe someday Swift will have an algebraic effect system. If it did I could understand some programmers choosing to make the tradeoff of allowing console effects in their value type members. I also think it would be reasonable to support a debugPrint function that is a no-op in release builds and is allowed in pure functions.

Purity (referential transparency) is a property of operations not types. As Joe and others have repeatedly pointed out this property is fundamental to what we mean by "value semantics". But as they have also pointed out, this property only describes individual operations, not types. Since a protocol is applied to a type an independent concept is necessary which defines a property of a type.

What I was trying to say is that a protocol should not try to impose semantics requirements that are impossible to fulfill, such as "all members of the type must be pure".

If the semantics of a protocol don't depend on the definition of purity why would it be necessary to provide such a definition? That said, there is an intimate relationship with purity and the definition would be useful to have. I was hoping it might be possible to avoid roping in yet another large and complex topic into the already large set of discussions happening right now.

Quoting Wikipedia:

the expression value must be the same for the same inputs and its evaluation must have no side effects

The reason Array.capacity is not referentially transparent is that "the same" here means ==. It's harder to pin down "the same" when an == conformance is not possible but in this example that is not a problem.

I agree that basis operations should be pure and that this is sufficient to capture the notion of a "value semantic" type. It does drag along some complexity that might be good to avoid though. Is it really necessary to determine whether Array.capacity is a basis operation or not to know whether or not Array has value semantics?

My thinking was to try and identify general rules that could be applied to a finite set of code (i.e. the declaring module) to determine whether or not a type has value semantics without relying on a notion of purity or information extrinsic to the code about which operations are basis operations. I used public as a proxy for "implementation boundary". You're right that this does not work for single file or single module programs. This line of thinking may be better suited to providing guidelines for upholding value semantics than it is to defining the notion.

Fair enough. I wasn't thinking about edge cases like this.

No problem. Thanks for pointing out several oversights in my thinking. Looking forward to your reply!

anandabits · November 3, 2020, 6:38am

One reason I keep going back to the idea that ValueSemantic types should have pure operations by default is that it is conservative in that it only forces programmers to think about purity when they define an impure operation. This feels more in the spirit of progressive disclosure and would provide important feedback (in the form of error messages) as programmers work.

While an exact set of basis operations would not be explicitly defined, a de-facto basis would be defined (specifically, the pure operations). impure members would be de-facto considered outside the basis. This seems like a useful approximation of basis that doesn't rely on humans coming to agreement on a topic that can be very subtle. It would also serve as a good place to start for teams that are interested in going further and defining a more precise basis for their types.

EDIT: it looks like @xwu is thinking along similar lines

dabrahams · November 3, 2020, 5:41pm

Yeah, I think it's important to distinguish arguments that are just rephrasings of things we've already said from those that bring something new to the table.

I think you mean “if we can rigorously figure out if any specific operation has purity?”

While I think we can reduce the need for explicit basis definitions to zero in most cases by choosing the right defaults, I am not betting on being able to eliminate them entirely, for a number of reasons that I'm having trouble spelling out cogently right now. In general terms, I'm thinking about the fact that it's often important to expose some unsafe and/or impure operations as part of a type's public API, that the operations available on a type are dependent on scope relationships and access control but we want the ValueSemantic conformance to be valid regardless of scope, and that we have no formal notions of purity in the type system.

For me personally, completely eliminating the need for explicit basis definition seems like a hard problem with few benefits, compared to almost completely eliminating the need, which seems obviously possible. The more important problem at this point is to validate or refute my hypothesis that a well-defined pure basis operation set allows us to create a useful definition of ValueSemantic, because without that, it's irrelevant whether the basis sometimes requires explicit declaration.

dabrahams · November 3, 2020, 6:48pm

Sure. I don't think I understand why that's relevant, though.

I think pure should follow the conventional meaning of referentially transparent.

Perhaps, though I'm not sure I understand exactly what that means. “Referentially transparent” applies to expressions and “pure” applies to functions. Maybe you mean something like, “A function is pure iff every valid invocation in which all arguments are side-effect-free is a referentially-transparent expression.”

There's another horrible wrinkle: there's “pure” and there's “pure in the absence of multithreading.” For example, on a single thread (c.f. @UIActor) a memoized function that uses an unsynchronized Dictionary as a cache is perfectly pure for semantic purposes. But it's not thread-safe. IMO conformance to a ValueSemantic protocol should mean the basis operations have thread-safe purity. It's just a more generally-useful constraint.

It's definitely fundamentally related, but I'm less sure that it is more fundamental. To understand whether a function is consistent with value semantics, in general, you need to look at not only the operations performed by the function, but the invariants of the types on which it operates. For example (it may not be the case in Swift today, but once we add move-only types, or C++ interop, it will change) you can have a type with dynamically-allocated storage that is known never to be shared (that's an invariant). Obviously you can mutate that storage freely without breaking value semantics. The invariants of a type are defined by its set of basis operations and their legal usage.

But as they have also pointed out, this property only describes individual operations, not types. Since a protocol is applied to a type an independent concept is necessary which defines a property of a type .

Sure. On the other hand, IMO, there are ways to describe value semantics without defining purity at all. There have been several attempts in this thread that simply describe the properties of referential transparency as they apply to value semantic types.

a protocol should not try to impose semantics requirements that are impossible to fulfill,
such as "all members of the type must be pure".

Maybe you mean, “impossible for the author to enforce?” Certainly I can fulfill that requirement:

struct X {}

Are you just saying we need to account for the possibility that anyone can add any API they want in an extension? If so, sure. Let's stop thinking of members as though they are special, except in defining a useful default basis.

Gotcha! I misread it as "it isn't necessary to provide a definition of ValueSemantic at this time.” Also the use of “cannot” (as opposed to “don't”, which you used just above) was confusing. Thanks for clearing that up.

That said, there is an intimate relationship with purity and the definition would be useful to have. I was hoping it might be possible to avoid roping in yet another large and complex topic into the already large set of discussions happening right now.

Agreed. Actually I typed and deleted the suggestion that maybe we could avoid defining purity for now in this reply.

It is still a problem when the element type is not Equatable, but your point is taken. capacity is a non-salient attribute and thus not part of the value of the array. So the values of the inputs a and b can be the same, but a.capacity can differ from b.capacity.

I agree that basis operations should be pure and that this is sufficient to capture the notion of a "value semantic" type.

I don't think I've quite proven that yet. I need to write down a formal definition and see if it holds up.

It does drag along some complexity that might be good to avoid though. Is it really necessary to determine whether Array.capacity is a basis operation or not to know whether or not Array has value semantics?

I don't think so; I hope not. My post with a formal definition should make the answer clear, though.

My thinking was to try and identify general rules that could be applied to a finite set of code (i.e. the declaring module) to determine whether or not a type has value semantics without relying on a notion of purity or information extrinsic to the code about which operations are basis operations. I used public as a proxy for "implementation boundary". You're right that this does not work for single file or single module programs. This line of thinking may be better suited to providing guidelines for upholding value semantics than it is to defining the notion.

I think it's a fine way to avoid being explicit everywhere, just as I feel comfortable not explicitly documenting the complexity of any property that can be accessed in O(1).

dabrahams · November 3, 2020, 6:57pm

A couple of problems with that:

A chain of member accesses does not “reach a variable.” But maybe you just mean s/variable/value/ everywhere in the whole definition?
Whether or not something has the syntactic form of a member access is fundamentally irrelevant to value semantics, so I don't see how this can fix the problem.

anandabits · November 3, 2020, 11:03pm

Yes that's right. I didn't notice the subtle distinction before, probably because I was introduced to the concept in functional languages where functions are defined as expressions. Thanks for pointing it out.

I agree that pure functions must be thread-safe.

There's a further wrinkle in your wrinkle: memoization involves shared mutable state so I wouldn't call it "pure" in implementation. I would expect the compiler to produce an error in this case and require an "unsafe pure" annotation.

Yes, I would expect the compiler to understand this and therefore allow mutation of uniquely owned state in "pure" functions.

I don't know how the compiler could ever understand arbitrary invariants though. Depending on the invariants that are relied upon by an operation, it wouldn't surprise me if "unsafe pure" annotations were necessary. That said, I would not expect to need "unsafe pure" for assertions, preconditions and fatal errors. These are like "bottom" in Haskell.

Yes that is what I mean and fair enough.

Yep. Interestingly, this is an impure operation that I don't think the compiler would have any chance of rejecting. It would require an explicit impure annotation.

Looking forward to seeing what you come up with.

I'm going to give this a shot:

Let the canonical basis set O of a type T be the set of basis operations wrapped such that each inout parameter is replaced with a normal parameter and a corresponding element in result tuple of the transformed operation and replacing throws with a Result wrapper around the result tuple.

A type T has value semantics if and only if for each operation o in O and each valid argument tuple a consisting of expressions ( E0 ... En ) the following is always the case:

let input = a
let result = o(a)
assert(input == a)

Further, it is always the case that for any valid argument tuple b where b == a, it is always the case that o(b) == o(a).

Note: this glosses over the fact that some value-semantic types don't implement ==, tuples don't implement == and Result<T, Error> doesn't implement ==.

dabrahams · November 4, 2020, 2:15am

I didn't say that. Actually I'm trying to do this without defining “pure.” What I am saying is:

a useful ValueSemantic protocol could be defined such that basis operations aren't necessarily safe to use from multiple threads that operate on distinct instances. Models of that protocol might be less threadsafe than Int.
However I don't think that protocol would be useful enough. It certainly wouldn't serve the needs of the concurrency issue that spawned this thread.

Since I think we only get one shot at defining a protocol in this realm, it had better imply that models are as threadsafe as Int.

There's a further wrinkle in your wrinkle: memoization involves shared mutable state so I wouldn't call it "pure" in implementation. I would expect the compiler to produce an error in this case and require an "unsafe pure" annotation.

I am trying to create a useful definition of ValueSemantic that doesn't depend on any new language features such as “unsafe pure” annotations. Evolving new language features to support enforcement of value semantics in the type system is an interesting and noble problem, but not one I'm trying to solve immediately.

The problem I'm trying to solve is actually a prerequisite for the language features. If you can characterize the semantic requirements of ValueSemantic, you can write those into the protocol documentation and ValueSemantic is meaningful, just like Collection is meaningful, even though the compiler doesn't know about its semantic requirements. If you can't characterize the semantic requirements, you also can't build language features to enforce them.

Chris_Lattner3 · November 5, 2020, 7:01pm

Responding to several points above:

I don't think there is any productive type-level definition of ValueSemantic that can take into consideration "every possible" function/extension that would use values of the type. We have to define the property based on the logical behavior of the type's implementation and contract itself.

To stress test this, I think it is fine for UnsafePointer to be ActorSendable (and ValueSemantic), for the same reason that Int is ActorSendable even if you use it to index into a global dictionary: the definition of the value itself obeys value semantics, even though some external operations (dereferencing, indexing) can depend on additional state. This may not seem obvious given that we want to preserve actor isolation here. However, observe that UnsafePointer is explicitly unsafe right there in the name. In contrast, a SafeReferenceSemanticArray should not be actor sendable.

I think the crux of the issue (as many people describe above) is "what is the encapsulated state represented by the value" and "does a copy (with equal sign) give value preserving semantics" over that state?

This is the difference between Array, UnsafePointer, and SafeReferenceSemanticArray: Array and SafeReferenceSemanticArray logically encapsulate the elements in the array. UnsafePointer is not an encapsulation of all global memory.

To be explicit, yes, I'm arguing that Array<UnsafePointer> should conform to ValueSemantic. :)

-Chris

anandabits · November 5, 2020, 7:06pm

You said above that you believe UnsafePointer should conform to ActorSendable but didn't specifically say it should conform to ValueSemantic. That seems to be implied by your comment about Array<UnsafePointer> here, is that correct?

Chris_Lattner3 · November 5, 2020, 7:06pm

Right - updated the post to make that explicit, thx

anandabits · November 5, 2020, 7:12pm

Gotcha. I trust your judgement on this. The argument that "unsafe" makes this ok seems reasonable.

xwu · November 5, 2020, 7:26pm

This is a great way of looking at it.

If an Int value could be said to encapsulate the memory at a certain address, then it could equally be said to encapsulate the memory at any offset from that address and, by induction, all memory. That would not be useful.

dabrahams · November 5, 2020, 8:18pm

Defining Value Semantics for Swift

(with thanks for editorial input to the Swift for Tensorflow Team, and to @anandbits for a crucial insight)

For years we've talked informally about certain types “having value semantics”
and used that statement to draw correct conclusions about the behavior of some
code. But until now, we've never really nailed down what it means for a type to
have value semantics. This document provides a clear definition on which
language and/or library extensions could be based, and explores its
implications.

Requirements of Value Semantic Types

When we say “type X has value semantics,” we mean:

Each variable of type X has an independent notional value.
A language-level copy (e.g., let b = a) of a variable of type X has an
equivalent value.
Given a local variable a of type X, safe code cannot observe the value of a
except via an expression that uses a.
Given a variable a of type X, safe code cannot alter the value of a
except by one of the following means applied to a or to a property of a
that reflects all or part of a's value.
- assignment.
- invocation of a mutating method.
- invocation of a mutating accessor of a property or subscript
- passing the expression as an inout parameter.
Concurrent access to the values of distinct variables of type X cannot
cause a data race.

Safety

Swift has an existing definition of safety, which just implies “memory safety.”
The above definition says that for value semantic types, some memory-safe
operations—those that can cause value access on distinct instances to race—are
also classified as unsafe. This constraint on the meaning of “safety” does not
break any existing code because no types have yet been classified as value semantic
under this definition.

Corollaries

From the above properties and Swift’s semantics, we can conclude that for a type
with value semantics:

The value of a let-bound instance is constant for all time.
All instances have the same properties as variables.
Assignment causes the left-hand side to have the same value as the right-hand
side.
Any struct, enum, or tuple whose value is composed of the values of its
stored value-semantic properties also has value semantics.
The type's value semantic properties are preserved by any new APIs
(e.g. extensions) implemented using only safe operations.

Usefulness

Knowing that a type has value semantics is useful because:

Variables can be safely copied or moved across concurrency boundaries
without introducing race conditions in code that otherwise appears to be
safe.
The value of a non-global variable is immune from “spooky action at a distance.”
Value-semantic types support equational reasoning.

Defining Value

The statement that “X has value semantics” is obviously meaningless without a
concept of the value of an instance of X. This concept must be clear for each
type with value semantics. Likewise, to be useful, the operations that access a
value of an instance of X must be clear. In many ways, the semantics of those
operations constitute a definition of the type's value.

Depending on how narrowly one defines a type's value, any type can be said to
have value semantics. For example, UnsafeMutablePointer can have value
semantics if pointee and the data accessed by subscript (and the operations
that affect them) are excluded from its value but its other members are
included.

Documenting Value

For types that are consumed only by their authors (e.g. in a quick script), a
clear concept of value can be maintained in the author's brain. For other types,
the value must be documented.

For most types with value semantics, it would be difficult to describe the type
without describing its value, but it could be cumbersome to explicitly and
separately document the value of every such type. Fortunately we can avoid
some of that description by acknowledging the patterns these types usually
follow. I suggest the following convention for types documented as having value
semantics:

Unless explicitly documented otherwise,

public and internal properties are implicitly part of a type's value.
Data accessed by public and internal subscripts is implicitly part of a
type's value.
private and fileprivate properties and subscripts are implicitly not
part of the type's value.
The results of witnesses for any Equatable, Comparable, Hashable, and
Codable conformances depend entirely on the type's entire value.
Non-salient attributes of the type (like an Array's capacity) can be
assumed to be safely accessible on distinct instances, across threads.

Other APIs probably don't need any special attention in documentation for value
semantics, since it is pretty much impossible to describe their semantics in
documentation without describing how they use, reflect, and/or affect the type's
value.

Deployment in Practice

It follows from the definition of “value semantics” that any APIs that don't
uphold the requirements value semantics imposes on safe operations are unsafe, and
should be labeled as such in public APIs, by the usual Swift convention.

It follows that “Unsafe” in a type name should mean that one is to assume that
operations are unsafe unless otherwise specified. For example, the safe
operations on UnsafePointer should be documented as such to support its claim of
value semantics.

It's important that clients of APIs exposing non-salient attributes know that
these attributes are not part of the instance's value.

Open Questions

I think UnsafePointer should be declared to have value semantics. Whether a
similar thing should be done for UnsafeBufferPointer is an interesting
question; it would imply that there are some Collections whose elements are
not part of their value. Whether that's OK may depend on how extant Collection
algorithms are documented.

xwu · November 5, 2020, 8:57pm

I think this is a very workable definition at first blush.

Since you argue that value semantics confer additional "safety" above memory safety to types, I think it would be reasonable to consider unsafe operations and unsafe types generally as exceptions.

That is, a type can have value semantics and still offer withUnsafe* APIs that aren't thread-safe, and users will have to read the documentation for those operations to use them correctly across concurrency boundaries.

Similarly, since we don't have to name operations unsafe on types that are already themselves Unsafe*, it makes sense to say that those types neither have nor do not have value semantics. If every operation available on a type is unsafe, it would be a vacuous statement to reckon whether it has value semantics. (For the same reason, it'd be not very useful to argue whether Never has value semantics.)