On purity grades of value types [was: ValueType protocol]

tera · September 16, 2021, 9:58pm

update

it turns out the example i provided was not correct, thanks to @Nickolas_Pohilets for pointing it out. the new corrected example provided

basically it is an example that:

invalidates Set/Dictionary invariants (e.g. duplicated dictionary keys / set elements)
when the app doesn't do anything "obviously" wrong (foot shooter)
which is a pure swift and doesn't involve Objective-C / Foundation bridging.

to not pollute the other thread even further:

let's do a spin-off. the bottom of that thread reveals a problem:
Set / Dictionary invariants could be easily broken if Set elements or Dictionary keys contain values with reference semantics or are themselves reference types. it is easy to get oneself into a situation (either by mistake or on purpose) having two or more equivalent elements in a set, or having a dictionary with duplicated keys.

incorrect example (ignore)

class C: Equatable, Hashable, CustomStringConvertible {
    var string: String
    
    init(_ string: String) {
        self.string = string
    }
    
    static func == (lhs: C, rhs: C) -> Bool {
        lhs.string == rhs.string
    }
    
    func hash(into hasher: inout Hasher) {
        hasher.combine(string)
    }
    
    var description: String {
        string
    }
}

func bar() {
    let hello = C("hello")
    let world = C("world")
    var s: Set<C> = [hello, world]
    var d: [C: String] = [hello: "hello value", world: "world value"]
    print(s)
    print(d)
    hello.string = "world"
    print(s)
    print(d)
    s.remove(C("world")) // will this remove one or both values?
    d[C("world")] = "which value is going to be overridden?"
    print(s)
    print(d)
}

corrected example

class Ref: CustomStringConvertible {
    var string: String
    init(_ string: String) {
        self.string = string
    }
    public var description: String {
        "\(ObjectIdentifier(self)): " + string
    }
}

struct Val: Equatable, Hashable {
    var int: Int = 0
    let ref: Ref
    
    static func == (a: Self, b: Self) -> Bool {
        a.int == b.int && a.ref.string == b.ref.string
    }
    
    func hash(into hasher: inout Hasher) {
        hasher.combine(int)
        hasher.combine(ref.string)
    }
}

func test() {
    let helloRef = Ref("hello")
    let worldRef = Ref("world")
    
    let hello = Val(ref: helloRef)
    let world = Val(ref: worldRef)
    var s: Set<Val> = [hello, world]
    var d: [Val: String] = [hello: "hello value", world: "world value"]
    print(s)
    print(d)
    helloRef.string = "world"
    print(s)
    print(d)
    
    // thanks @lorentey for pointing out that the check is triggered upon resizing.
    // is there an explicit way to trigger the check without resizing?
    
    let newKeys = (1...1000).map { _ in Val(ref: Ref(UUID().uuidString)) }
    newKeys.forEach {
        s.insert($0) // Fatal error: Duplicate elements of type 'K' were found in a Set.
        d[$0] = "XXX" // Fatal error: Duplicate keys of type 'K' were found in a Dictionary.
    }
}

we can ask developers: "please don't do that". but mistakes happen and it is too easy to shoot yourself in the foot here without realising. one possible remedy would be to have a debug switch / environment variable, when activated all instances where Set elements / dictionary keys have reference semantics will trap. i believe we should do this.

however, is it possible to address this issue at compile time? Can we only allow "ValueType" types (a special pseudo protocol ¹) to be used as Set elements / Dictionary keys?

¹ protocol ValueType {} // a pseudo protocol

Int, Double, String, etc - implicitly conforms to ValueType

struct PureStruct { // implicitly conform to ValueType as every field are ValueType
	let v: Int
	let d: Double
}

var dict: [PureStruct: Int] // possible
var s: Set<PureStruct> // possible

(Hashable conformance is also needed here, not showing it to not pollute the example)

struct ImpureStruct {
	let v: Int
	let c: SomeRefType
}

class SomeClass {
...
}

var dict: [ImpureStruct: Int] // error: Dictionary keys must be ValueType
var s: Set<ImpureStruct> // error: Set elements must be ValueType

var dict: [SomeClass: Int] // error: Dictionary keys must be ValueType
var s: Set<SomeClass> // error: Set elements must be ValueType

extension ImpureStruct: ValueType {
	// explicit (or implicit?) COW implementation. not shown
}

extension SomeClass: ValueType {
	// explicit (or implicit?) COW implementation. not shown
}

var dict: [ImpureStruct: Int] // now possible
var s: Set<ImpureStruct> // now possible
var dict: [SomeClass: Int] // now possible
var s: Set<SomeClass> // now possible

with this change it won't be possible to shoot yourself into the foot and ruin Set/Dictionary invariants. it would still be possible to use Reference types (or value types containing values with reference semantics) as Set elements and Dictionary keys, just those of those who explicitly opt-in to have "ValueType" semantics, and whenever the references are mutated the COW machinery will kick it and leave the Set/Dictionary values intact.

thoughts?

(original example that invalidates Dictionary/Set invariants corrected above)

anon9791410 · September 16, 2021, 10:44pm

Disagree. It takes work to override the default Hashable and Equatable implementations for classes. By the time you have learned to do that, I think you're prepared for the easier-to-hit problem scenario of switching from struct to class, when the struct has == and hash defined.

And since it's become easier over time to not write your own Hashable and Equatable implementations for value types either, I don't think anyone will have a problem with this.

If you get to the point where you're defining your own hashability for classes, you've surely got a handle on all this and don't need hand-holding. Does anyone have an example to the contrary since Swift 4.1?

xwu · September 17, 2021, 2:18am

Lots of thoughts:

Nickolas_Pohilets · September 17, 2021, 7:37am

Example that you provided, is not a reference semantics. Part of the reference semantics is also reference equality.

Dictionary and Set work with actual reference semantics just fine, there is no need to constraint that.

I’d rather be interested in banning conformance to Equatable and Hashable by classes altogether. Custom equality hints that you want value semantics, but if you want value semantics with heap-allocated data - you can wrap a class into a struct and conform wrapping struct to Equatable/Hashable.

tera · September 17, 2021, 3:19pm

thank you @Nickolas_Pohilets! example corrected.

Nobody1707 · September 17, 2021, 5:00pm

If you make all of a reference types members immutable value types then it naturally becomes a heap allocated value type. There's no reason to artificially limit reference types to avoid misimplementing Equatable & Hashable.

The only people who would even attempt to make a reference type conform to these protocols in a real codebase would be either Java (or Javascript, Objective-C, etc.) developers who just started learning Swift, or someone experienced in Swift with a legitimate reason to do so. The former would make the class immutable out of habit; while the later would do so because they know what they're doing.

It's not as if the language does anything to encourage using classes to make value types. If anything, it goes out of it's way to encourage you to use structs for such things. Adding extra boilerplate when one does need to do so is just unnecessary.

lorentey · September 18, 2021, 2:31am

This is a far more complicated topic than I could meaningfully contribute to without extensive research. But, one thing jumps out at me:

If the goal with ValueType is to make it safe to use PureStruct as a Dictionary key, then automatic inference won't work at all.

AFAICT types aren't just raw records; they are more than just the sum of their parts. A type that is composed purely of types with "value semantics" isn't necessarily a "value type" itself.

(I'm using scare quotes because I don't know what exactly these terms mean in Swift. The most salient point when it comes to keys in a set/dictionary is that "value types" aren't subject to spooky action at a distance. I.e. if I have a variable that holds a "value", then the only way that "value" may "change" is if someone directly mutates the variable. But, as I said, I don't know precisely what any of these terms mean, so I am ill-prepared to argue about it.)

PureStruct isn't necessarily free of spooky action at a distance effects -- we can't tell for sure because we don't know how its operations are implemented. Here is an example where it has "reference semantics":

var storage: [String] = ...

extension PureStruct: Hashable {
  static func ==(left: Self, right: Self) -> Bool { 
    storage[left.v] == storage[right.v] 
  }
  func hash(into hasher: inout Hasher) { 
    hasher.combine(storage[v]) 
  }
}

Furthermore, I think "value semantics" isn't even an all-or-nothing thing, either. A type can look like a "value" when it comes to its Hashable implementation, but then it can behave more like a "reference" when it comes to CustomStringConvertible. (Or, indeed, vice versa.)

(Worse, we cannot even categorically state that the above definition makes PureStruct unsuitable for use as a dictionary key. This depends on how storage is used elsewhere -- e.g., if it only ever gets appended with new items, then there is no issue whatsoever.)

I suspect [@unchecked] Sendable might have the same problem when it comes to "concurrency". (I'm sure this has come up during its discussions.)

I don't understand any of this, though.

I guess my point is that I'm not sure introducing a protocol will make these things any easier in practice. (Without reliable verification, it's sort of equivalent to requiring a permit, but also automatically issuing one to anyone who shows up to ask for it. Structural inference is like automatically sending permits out to everyone who was born in July, whether or not they want it.)

tera · September 18, 2021, 3:41am

i like it. this is the very essence of it.

PureStruct isn't necessarily free of spooky action at a distance effects -- we can't tell for sure because we don't know how its operations are implemented. Here is an example where it has "reference semantics":
var storage: [String] = ...

extension PureStruct: Hashable {
  static func ==(left: Self, right: Self) -> Bool { 
	storage[left.v] == storage[right.v] 
  }
  func hash(into hasher: inout Hasher) { 
	hasher.combine(storage[v]) 
  }
}

let's pause here.. consider we are not talking about existing Dictionary / Set (where our hands are tied, as these types need to adhere to a bunch of current rules and practices, the very rules and practices we are questioning. let's talk about some platonic new ideal "PureDictionary" and "PureSet", which are somewhat similar to the current counterparts but possibly with more strict rules and restrictions about them. PureDictionary / PureSet will only accept PureValueTypes in them. is the above example a subject of spooky action at a distance? certainly. then that's not a pure value type! obviously there is no way to check what particular EQ / hash method does (now or in the future). but: it is easy to prohibit something we can't control or check. hence from here follows that pure value types can not have custom EQ / hash implementation.

struct PureStruct: PureValueType {
	static func == (left: Self, right: Self) -> Bool {
		// compilation error: can't do that with PureValueType
	}
	func hash(into hasher: inout Hasher) {
		// compilation error: can't do that with PureValueType
	}
}

however there's a question: can PureValueTypes still have reference type members? or are they as dumb as "POD" of C? after all, String looks "pure", right? and it's implementation definitely has some internal reference type for dynamic storage! perhaps they could! but just those of them which implement the CoW machinery and if reference types in question also "behave" (what that means - to be defined later). following the same logic as previously - CoW machinery of pure value types can not be custom (and thus "screwable"), it will have to be provided by the compiler itself, e.g.:

struct PureStruct: PureValueType {
	let refType: RefType
	// compilation error: pure types with ref members must adhere to  standard CoW machinery rules
}

struct PureStruct: PureValueType, CowMachinery {
	let refType: RefType
	// compiler implements the CoW machinery. and there is no way to customize / override it
}

with these pure platonic ideas in place some other things become clear instantly:

not just dictionary keys must be pure - the values as well.. "thou shalt not change "let v = PureDictionary(...)" value via some spooky action at a distance".
PureArray et al join the party, they can also only contain PureValueTypes.
PureString while similar to String can't be created with one of those legacy "no copy" routes (available via "noCopy" initialiser of String / CFString / NSString).
PureData while similar to Data can't be created with "dataWithBytesNoCopy" variants (available via Data / CFData / NSData routes). or be backed by mapped memory files (at all or writeable only)

just the protocol - indeed... it would be like a "comment" that's not compile type checked / and gets out of sync. only if protocol conformance obliges real adhering to the specific rules. e.g. by providing standard implementations of EQ/hash/(Cow when needed) and imposing certain (some would say draconian!) restrictions on types that are impossible to bypass.

you see where i'm heading with all this, from: "here's a guideline to follow" to "if it compiles - it is correct". for some of you this platonic exercise might be too, hmm, "detached from current swift reality". we'll see. maybe that's the direction worth exploring anyway. would be interesting to see what you and others think about these platonic / pure types.

that's exactly what my local woods does with their walk permits scheme

lorentey · September 18, 2021, 4:09am

This is where things get Really Confusing for me.

Is Equatable/Hashable conformance a required feature of "value types"?

What makes these two protocols special when it comes to "valuetypedness"?

What about Comparable? (Consider the same problem in SortedSet.)

What about protocols like Identifiable? (Consider the same problem in something like SparseSet.)

tera · September 18, 2021, 4:45am

you specify Hashable only when needed (e.g. to put it into a PureDictionary / PureSet which is hash based)

just that the inability to customise them would prohibit spooky action at a distance for other pure value collection types (dict/set) they reside in (if they are). thus keeping them pure.

La loi, c'est la loi. Comparable (if value wants to conform to it) would be provided by the compiler itself with no ability of customisation (yes, i am talking about memcmp (and nested memcmp's of nested types). there are somewhat tangentially related precedent and precedent. no, PureTreeDictionary / PureSortedSet will not be sorted according to the current language / locale rules (neither is SortedSet, nor is "Å < Æ")

alternatively...

we could explore a different direction altogether, with some concept of "very pure function that can't depend on anything but the value itself" and whereas it would be a compilation error to try:

var storage: [String] = ...

struct PureStruct: PureValueType, HashableValue {
    verypure func == (right: Self) -> Bool { 
        storage[left.v] == storage[right.v]  // compilation Error:
        // verypure methods can't reference external data
   }

}

i like this less, it's quote far from swift, as it is now.

i don't know about this one, but i believe that would be similar to the situation with EQ/hash/LessThan. perhaps in addition (if needed) you'd point compiler to particular fields (mark them as such) that establish value identity and compiler does everything else.

lorentey · September 18, 2021, 7:14am

My point with the protocols is that none of them seem special to me; the same action-at-a-distance problem can cause issues with pretty much any of them, including custom ones outside the stdlib. E.g., consider the assumptions LazyFilterSequence makes about the Collection value it holds in its storage variable.

We can't add special built-in compiler support for every single protocol out there.

I don't think this is science fiction -- at least it seems possible for the compiler to decide/verify whether a function is "pure" in this sense by some "simple" static analysis. This definition is also straightforward to extend to mutating methods, which seems like a good thing.

Would a "value type" then be defined as one that is composed of other "value types" and is constrained by the type system to only provide "pure" operations? This would at least gets around having to treat protocols specially.

I expect we'd need @unchecked variants for both "pureness" and "valueness" for use when we know better than the compiler. (So that e.g. it becomes possible to use an external side table to implement localized comparisons and the like. Not supporting such things would make this impractical.)

This could work!

So it feels like we would be able to teach the type system about a useful definition of "value types" in this narrow action-at-a-distance sense.

I suspect the end result would feel a bit too fussy to me, though.

For one thing, we'd need to be explicit about what types we intend to be "value types" -- inference wouldn't be safe because unknown operations may exist through extensions. Every "value type" would need to carry an explicit stamp that proclaims it as such.

But it would also limit the intentional use of non-"value types" in dynamic contexts where we know they behave like one. For example, an NSMutableArray (that conformed to Collection) may not be true "value type" in this sense (or any sense), but that doesn't mean that we must reject every attempt at lazy filtering them (or putting them in a hash table / heap etc), or else. It is very much possible (routine, even) to use them in a way that avoids action-at-a-distance problems, by e.g., "simply" refraining to do any operation that triggers such changes while they are being used in a "valuey" way.

This entire thing is very, very similar to the situation with Sendable, but the stakes here aren't nearly as high -- if we get "value typiness" wrong, the effects are usually far less destructive than, say, a use-after-free issue that arises from a race condition.

The consequences of accidentally mutating a dictionary key feel similar to, say, messing up Array indexing. We may get results that are nonsensical, or sometimes we may get a trap in a line of code that is far away from the root cause of the bug, but it doesn't lead to undefined behavior -- the consequences of the error seem limited and knowable. (Randomized hashing makes the symptoms nondeterministic, but this doesn't fundamentally affect the nature of these bugs, it just complicates their reproduction.)

Then again, Optional is fussy, too -- but I think most people agree that teaching the type system about nullability was a good idea. Is "valuetypiness" more like Optional, or is it more like indexing?

I guess what I'm saying is that I don't know enough to form an opinion.

(What are the chances of this exact train of thought appearing somewhere in the (very fascinating) topic linked above and/or the Sendable discussions? I think I have some reading to do.)

tera · September 18, 2021, 1:06pm

this could work indeed. we are talking about different purity degrees, and they can even work together.

types themselves could be marked (bike-shedding):

@purity(absolute) struct Dictionary { ... }

methods also could be marked with their purity level:

@purity(clean) func foo() { ... }

there would be some rules along the lines that "things of purity of level N can only call functions and use data structures of purity of the same level or cleaner". similar rules about function parameters and return values.

what are these levels:

absolute. no custom code allowed. compiler generated simple things like "EQ", "hash", bitwise "lessThan", and similar. limited set of operations / functionality, no customisation. but still might be enough for 90% things developers need! *or even 100% as they are Turing complete, just restrictive. this is the platonic purity i was talking about above.
clean. custom code is allowed but only that code that uses the value itself (or the parameter, which will be of type of certain inherited purity level). the code can't reference external variables, etc. this will allow things like filter / map.
fair. custom code is allowed and it can use external memory "read-only". note that this is already somewhat not clean, as this external state might change externally amid lazy filtering operation! unless we somehow manage to fuse this with ability to designate external memory "really read-only" - those would not be changeable from safe portions of swift (or at all... e.g. memory from ROM or a read-only mapped memory region)
dirty. custom code is allowed and it can use external memory read-write.

what we have in swift now is "dirty" purity level - it is the easiest and most dangerous. "absolute" can (i trust) be introduced reasonably easy - it is the simplest one to implement after dirty, and it is the most rudimentary / most restrictive. "clean" is harder to implement (i believe this is what pure functional languages are using but could be mistaken). "fair" is the hardest. still doable.

tera · September 18, 2021, 3:51pm

further thoughts on the subject:

eventually we may blend "clean" and "fair" purity grades together: after all a constant 123 allowed at level "clean" is not any more superior than a global constant "let colors = ["red", "green", "blue"]". however for the sake of this discussion i'd like to keep these levels separate, if not for anything else then to lay out stages of future work to be done (first "absolute" stage then "clean" stage then "fair" stage).

can "absolute" and "clean" grades blend together? the short answer is no, that would be unwise. consider hash: in all these three levels (absolute, clean and fair) it won't be physically possible to write code that violates the main hash axiom (if "a" == "b" ==> then a.hash == b.hash), so on the correctness front these three levels are equal. but on levels clean/fair it would be trivial to implement a bad hash function, e.g. always returning "0", which will open the app to hash collision attacks.

likewise it would be trivial to write a bogus "lessThan" (or EQ) function that, say, ignores some of the bits of the operands, or simply always returns false. in this case the levels will be different not just on the performance front, but on the correctness front as well! thus it makes total sense to have "absolute" as a separate "absolutely safe" level where nothing can possibly go wrong (at least with value-typeness of pure value type objects) no matter how hard malicious or unexperienced developer would try!

when it comes to practice absolute level would be required for code certified to work in, say, nuclear reactors, life support, or avionics environments. perhaps banking apps. clean/fair purity grade - anything else. dirty purity grade - e.g. to inter-op with other languages / foundation.

answering your question (statement) above "what makes EQ & hash special". the fact that dictionary is part of the language and to make absolute level more capable... array/dictionary/set are all part of the language. and the established practice is for the last two to be hash based -> hence hash is special. and so is EQ. LessThan would also be special if we had "TreeDictionary" in the language. I'd say - let's also make it part of this special "absolute" purity grade team. going forward we may include other things to this special domain to make absolute purity domain more capable. without eq/hash/lessThan - the absolute domain would be safe, but like asm... totally inconvenient.

we may also need another level, in some ways even more restrictive than absolute!

a very special level, not like the others of this scheme, but more like another dimension or perhaps a separate trait: realtime. it won't be able to call anything that can block or cause memory allocation. on the other hand accessing global memory even read-write is not a problem on this level. realtime level would be required, say, for realtime audio thread code. let's not discuss this level in this thread, just keep it in mind.

dabrahams · September 18, 2021, 7:33pm

FWIW, I agree with the fact ther things you’ve written but inasmuch as the main purpose of the term is to make a distinction from value semantics, i don’t think the ability to do identity comparison is actually crucial.

Nickolas_Pohilets · September 19, 2021, 12:55pm

I think the very notion of value semantics is inseparable from equality. Even “equivalence”, because currently Equatable in Swift is a pure-mans proxy solution for that. Ideally it should be an inherent attribute of all types, including functions and weak references.

This probably belongs to ValueSemantic protocol, but I’m too lazy to go over entire discussion (currently on 20/126). I’ll try to digest it and join the discussion later. Meanwhile let me comment here.

I think I can provide a formal theoretical definition for value semantics:

Given expression E, let {x1, x2, … , xn} be a set of variables touched by expression E of types {T1, T2, …, Tn}. This includes let’s and var’s which are lexically used in E, but also all global variables and constants touched by a transitive closure of functions (including initializers, getters, setters, subscripts and deinitializers) callable from E. Including global variables outside Swift code. For example, behind Date.init() there is some hidden global variable which stores current time. Such variable might even reside in external hardware.

Let’s consider evaluating E twice - first when variables have values {v1, v2, … , vn} it evaluates to ve, and second time when variables have values {u1, u2, … , un} it evaluates to ue.

All types in set of types S have value semantics, if “equivalence” of all pairs vi <-> ui, implies equivalence of ve <-> ue for all E touching only variables of {Ti} ⊆ S.

Type Tk does not have value semantics if there exists expression E, touching variables of types {T1, T2, …, Tn}, where all types {Ti | i != k} have value semantics, vi is equivalent to ui for all i, but ve is not equivalent to ue.

If we have an expression E where types of touched variables include type Tk of unknown semantics, and type Tj known to have reference semantics, we cannot say anything about semantics of Tk based on E.

Based on this definition:

Date has value semantics. If you can somehow control global variable holding current time, Date.init() will give you perfectly reproducible results. Similar for Int.random().
Array has reference semantics, because of Array.capacity. But for the practical purposes, I would mark it with some attribute, so that “remaining Array without capacity property” still has value semantics.
Classes have value or reference semantics, depending on how you define equivalence on them. If you compare them only by address, then classes have reference semantics. If you compare them by address and values of all the stored properties, they become value semantics, because now values of properties become part of the value of the class type and part of the corresponding vi/ui. If you compare them only by values of properties, but ignore the address - you still have reference semantics, because, you can get two different ObjectIdentifiers out of two equivalent values.
If you add extra variable for entire memory (or even entire hardware state), every type has value semantics.
If you don’t add such variable, but have an API which allows to read hardware state based on Int, all types having at least two values (so, except Void and Never) have reference semantics.

Also, I suspect that transitive closure of callable functions is not computable. Even if we disregard control flow because of halting problem and do conservative analysis, still because of closures, protocols and subclassing, such set would include almost every function.

Transitive closure of callable functions is needed only for global variables. So maybe instead it should be an all or nothing approach. We can track if functions is referentially pure or touches global variables. It is possible to make purity part of function type. Then we could limit value semantics analysis only to referentially pure expressions.

dabrahams · September 20, 2021, 1:37am

Getting value semantics wrong can easily lead to a race condition. All you have to do is share mutable state but advertise it as part of a well-defined value.

I still believe that I have already posted a useful formal definition of value semantics that is consistent with how we've used the term casually. In fact @Alvae, @shabalin, @dan-zheng, @saeta and I have written several papers based on this definition. Unfortunately can't share them yet since we're still trying to get them accepted somewhere…

dabrahams · September 20, 2021, 1:53am

One more thing before I go: while I very much support continued thinking about the meaning and consequences of value semantics, I am actually opposed to dissecting the concept of value semantics into multiple grades (especially before you've settled on a simple definition). The power of concepts like this one lies in their ability to simplify reasoning about code. I haven't absorbed what you're trying to accomplish by this classification, but it seems inevitable that thinking about correctness will be complicated by such division.

lorentey · September 20, 2021, 2:11am

Okay. I don't know what exactly "equivalence" means, when it comes to Swift entities. Please bear with me while I try to figure it out below.

My intuition is that two entities of type T held by two variables a and b are "equivalent" if there is no way to define a function f that differentiates between them. This is a very loose definition, as I don't know what "differentiates" means -- I expect it involves equivalence of the function's results, which makes this a circular definition.

Perhaps we can resolve this by axiomatically defining equivalence for a simple type like Bool, and only allowing functions that return this type.

The "equivalence" relation over instances of type Bool is defined the obvious way, that is, by equality of the boolean values they represent.
Two instances of the type T, held in variables a and b, are "equivalent" iff for every possible¹ Swift function* of type f: (T) -> Bool it holds that f(a) == f(b).

Once we learn "equivalence" for more types, I believe we can safely loosen (2) to any function that returns a type with a known "equivalence" relation. (By a tiny lemma.)

(¹: This is probably too strict. For one thing, functions that have undefined behavior should not be considered.)

For example, I expect this definition of "equivalence" considers Int values "equivalent" iff they represent the same integer. This is good, because it matches my intuition about them. (I don't know how I'd go about proving this, though.)

As it happens, Int's Equatable conformance happens to define the an equivalence relation that exactly matches this definition of "equivalence". But this seems like the exception, not the rule.

In actual Swift practice inside and outside the stdlib, Equatable simply defines an arbitrary equivalence relation; that is all it does. (And that's only when we're lucky enough that the conformance doesn't violate requirements -- looking at you, floating point types.) It most definitely isn't being used to define the "Equivalence"; at best, it just defines an equivalence; and usually a pretty lousy one at that.

All we can say for sure is that if a is "equivalent" to b, then it is guaranteed that Equatable.== will return the same value for them, no matter what value we pass in the second operand. But this statement doesn't give us anything; it follows directly from the definition. The implication is strictly one way only, and Equatable.== isn't special at all -- the same holds for literally every other operation.

Clearly, the sign property differentiates between the Double values representing "positive zero" and "negative zero".

let a: Double = 0
let b: Double = -1 / .infinity
print(a == b) // true
print(a.sign, b.sign) // plus, minus

This suggests to me that these values aren't "equivalent", even though they belong in the same equivalence class under Equatable. (Ignoring the reflexivity issue with NaNs, of course.)

Notably, my intuition is that Double is obviously a "value type". The fact that it has a loosey-goosey == definition does not preclude it from being one.

(The same goes for String, despite capacity, utf8 and the multitude of other ways we can differentiate between Equatably equal, but not "equivalent" instances.

To me, as a naive but enthusiastic student of Swift, Array feels like a "value type" as long as its Element is a "value type". capacity feels completely irrelevant to me, and so are things like ObjectIdentifier(array as AnyObject) -- their behavior seems just fine, perfectly consistent with what I imagine a "value type" to be.)

Nickolas_Pohilets:

Given expression E , let {x1, x2, … , xn} be a set of variables touched by expression E of types {T1, T2, …, Tn} . This includes let ’s and var ’s which are lexically used in E , but also all global variables and constants touched by a transitive closure of functions (including initializers, getters, setters, subscripts and deinitializers) callable from E . Including global variables outside Swift code. For example, behind Date.init() there is some hidden global variable which stores current time. Such variable might even reside in external hardware.

Let’s consider evaluating E twice - first when variables have values {v1, v2, … , vn} it evaluates to ve , and second time when variables have values {u1, u2, … , un} it evaluates to ue .

All types in set of types S have value semantics, if “equivalence” of all pairs vi <-> ui , implies equivalence of ve <-> ue for all E touching only variables of {Ti} ⊆ S .

This is a nice formal definition of something. I don't understand what it defines -- I can feel bits of pieces of some shape, but what I'm feeling seems different to my intuitive notion of "value typiness" in Swift. (Which, as I explained in a previous post, largely revolves around the lack of spooky action at a distance -- because that is the one thing I really do need to care about as a Swift programmer when I decide whether a type is appropriate for use as a Set element or a LazyFilterSequence base.)

This definition only specifies the notion of "value type" relative to some set S. What is S? Is "value typiness" a global property of a type, or does it depends on the choice of S? I'm assuming it is meant to be a global property (otherwise the passage about the type Tk does not make sense to me), but I don't think this obviously follows from the definition.

More to the point, if we took the highlighted passage seriously, then literally every type would be a "value type".

Array would be a "value type".
- capacity isn't problematic at all -- its input state includes the value of the _capacityAndFlags field in the array's storage representation, which is a variable of type Int, imported from C.
- If the theory is that capacity proves that Array isn't a "value type" because it allows us to distinguish between two "equivalent" arrays, then that's not true, either. By the definition above, two "equivalent" arrays must have the same capacity.
  
  Array.init involves a storage allocation, so the transitive closure of its input state includes the state of the heap. When run on equivalent input state, Array.init will always give us an array with a predetermined capacity.
- Similarly, the expression ObjectIdentifier(array as AnyObject) consistently returns the same value, as long as we can faithfully reproduce its input state. (Which includes the array's storage address in memory, or (depending on the Element type) the state of the heap.
NSMutableArray would be a "value type". The major difference between it and Array is that NSMutableArray's mutation operations do not implement copy on write; but this detail is irrelevant from the perspective of the formalism. (Mutations operations include the original value as their input.)
SystemRandomNumberGenerator would probably be considered a "value type" too. There is no way to produce two instances of SystemRandomNumberGenerator that aren't equivalent -- it's a unit type, after all. I expect the input state of SRNG.next() includes some aspect of the quantum state of the universe (whatever that means), which makes it rather difficult to replicate.

I don't understand this definition, either, but my vague intuitive concept of what should be a "value type" in Swift feels to be a much better match with this shape!

(What does it mean to "observe" the value of a variable a? What does it mean to "alter" the value of a? I look forward to reading those papers.)

tera · September 20, 2021, 3:29am

there's a catch though... if you start insisting on a single simple definition of value semantics it is unlikely you'll ever end up with several grades of value type purity. and if you start with a line of thought allowing several different grades of purity - you may end with several different definitions for it.. the end result is very different depending upon where you start.

i agree but at the same time you'll potentially throw the baby out with the bath water and end up with a lowest common denominator compromise definition. consider a few "user stories" like this very simple one:

let a = some value type expression 1
let b = some value type expression 2
let eq1 = a == b
let eq2 = a == b
assert(eq1 == eq2) // assert 1
let eq3 = b == a
assert(eq1 == eq3) // assert 2

let hash1 = a.hashValue
let hash2 = a.hashValue
assert(hash1 == hash2) // assert 3

let lt1 = a < b
let lt2 = !(b >= a)
assert(lt1 == lt2) // assert 4

can this assert or not in the "value semantics" definition you are trying to define? can your value semantics definition even start answering simple questions like that? will that definition depend upon developer "playing fair" and always "following the rules" which are "attached" to the definitions or will it work "no matter what"? the above grade system at least gives answers to questions like those (assert1 and assert3 will never trigger in absolute/clean/fair purity grade code but might trigger in dirty grade code, assert2 and assert 4 will never trigger in absolute purity grade code but might trigger in other grades).

Alvae · September 20, 2021, 8:31am

I can perhaps paraphrase @dabrahams's definition of value semantics, repeating some of what we've written in our papers.

One way to understand that definition is to first think about what we can and can't do with types that have (mutable) value semantics. More precisely:

If the type T of a variable x has value semantics, then there is no way to observe its value (i.e., read its contents) with an expression that does not mention x.
If the type T of variable x has value semantics, then its value is independent: its contents cannot change due to an operation on another variable.

Both points uphold local reasoning [1] and suggest that reference semantics must not surface. If the value of x can be observed without mentioning x, then there must be a (possibly immutable) reference on x somewhere. Similarly, if the value of x can change via an operation on another variable, then x must share mutable state.

For a theoretical foundation about the idea of value independence, one can look into external uniqueness [2, 3], which I think formally expresses @tera's clean level.

From points 1 and 2, it naturally follows that:

Concurrent access to the values of distinct variables of a type T with value semantics cannot cause a data race.

For a theoretical justification of that assertion, one can look into concurrent separation logic [4], which states that parallel composition of separate environments is race-free. Intuitively, if x has an independent value (in the sense of points 1 and 2), then one can reason locally about its value, without any concern for concurrent execution.

When you add the first point of @dabrahams's definition about notional values, you can derive other interesting properties.

Composition of value types expresses whole/part relationships.
Equality and hashing are well-defined and synthesizable for every value type.

Note for the purists: formally, hashing requires an ad-hoc definition for at least some predefined types, acting as the base cases of an inductive definition.

The fourth point is particularly important, because it answers recurring questions of "depth" in object-oriented languages (e.g., deep vs shallow copy/equality/immutability).

Notional values are what lets us discard capacity when we're talking about the value of an Array, because the notional value of an array is made of its elements. It follows that equality and hashing do indeed derive from the elements contained in the array, not the current state of its internal storage. In other words, the concept of notional value is what allows us to "ignore some bits" in a type's representation. Further, you can represent the notional value of a value type with all sorts of reference shenanigans, as long as you satisfy points 1 and 2. That bit explains why CoW does not violate value semantics.

Note that synthesizability is not a requirement. You can have a value type that does not want to conform to Equatable for any justified reason in a given codebase. The definition only says that (value) equality is well-defined and derives from the type's notional value.

AFAICT, the other points in @dabrahams's definition relate to the mutable part of a value and are more specific to the way Swift expresses mutation (e.g., via assignment, mutating methods, etc.). Indeed, if we outright ban mutation, all types trivially have value semantics and there isn't any useful distinction between structs and classes (as far as value typedness is concerned) [5]. So, we can add the following point:

We use the term mutable value semantics meaning “value semantics with in-place, part-wise mutation.”

The terms "part-wise" and "in-place" are in opposition to pure functional updates, i.e., the fact that any mutation of a variable x can be substituted by the rebinding of x to brand new value. While that approach is perhaps justifiable as a theoretical model for value semantics, it clashes with performance perdictability.

Yes, I think it does.

As it's been made clear already in this thread, substituting struct for class in a type definition is far from sufficient to satisfy value semantics, without even considering unsafe code.

There are multiple ways to unintendedly create references in Swift (which is a bit unfortunate, if you ask me). In particular, global variables and (by default) closure captures have reference semantics. That makes it difficult to establish static rules that would enforce "value typedness" according to the definition I have presented.

Nonetheless, I think that is possible. We would "only" have to be more careful about the way a value can capture references. Swift already proposes one mechanism in that direction, in the form of the @escaping annotation on function parameters. An extension of that idea has been studied the context of Scala [6].

I think that the idea of extending the capability of @escaping (or rather @nonescaping) relates to @tera's fair level.

Purity levels are not in opposition or even orthogonal to the definition of value semantics above. Nonetheless, I am not yet convinced that these distinctions are as useful as they may seem, at least for the user. IMHO, local reasoning is enough.

I also fear that these levels might fail to accurately characterize all possible implementations of a value type. For example, it looks like implementing CoW would immediately put a type into dirty territory. It could be tempting to add yet another level, but that's a slippery slope.

[1] Peter W. O'Hearn et al.: Local Reasoning about Programs that Alter Data Structures. CSL, 2001
[2] Dave Clarke, Tobias Wrigstad: External Uniqueness Is Unique Enough. ECOOP, 2003
[3] Philipp Haller, Martin Odersky: Capabilities for Uniqueness and Borrowing. ECOOP, 2010
[4] Stephen Brookes, Peter W. O'Hearn: Concurrent separation logic. ACM SIGLOG, 2016
[5] Dimitri Racordon, Didier Buchs: Featherweight Swift: a Core calculus for Swift's type system. SLE, 2020
[6] Leo Osvald et al:. Gentrification gone too far? affordable 2nd-class values for fun and (co-)effect. OOPSLA, 2016