So would that mean Float.nan.isIdentical(Float.nan)canât be implemented? And then the same would be true for any struct containing a Float as a field if that field is part of ==?
I personally donât understand why it has to be tied to equality. To me the question isIdentical should answer is âis that value interchangeable with this other value so that there is no way to observe a difference?â. And if this is the question it should provide an answer for, itâs pretty much answered by looking at the bit pattern (where in some cases some unused bits can be ignored).
I don't see why Float would implement isIdentical, since it's not a CoW type or something that needs a "fast path" for sameness testing. This isn't a completely general operation that's being proposed; it only makes sense for types with specific representations.
Even ignoring that, technically by the definition of Equatable, Float shouldn't be able to implement == either because Float.nan doesn't satisfy the reflexivity requirement, but that special case is a concession that's been made to make floating point types still support common useful operations. So I don't think it's useful to focus on the potential Float case, because as mentioned above, it's already awkward.
This definition is awfully close to the definition of substitutability with respect to Equatable, which deems that "any two instances that compare equally can be used interchangeably in any code that depends on their values". Drawing a distinction between that and "no way to observe a difference", where "observing a difference" would mean "knowing its memory address", feels far too subtle.
So this sort of makes me feel that isIdentical is the wrong name for this because it encourages people to think of it as a formal relation, when IMO it's really just answering the question "do these two values have the same internal representation".
Float already violates the semantic requirements of Equatable anyway. The problem is with Float. Thereâs no reason to hamstring isIdentical over this. If we think this method is worth adding to the standard library, then it really should follow the relations @dnadoba listed.
All I mean is thereâs no reason to link isIdentical meaning to Equatable. Thereâs no useful reason a type using Float as part of its equality testing shouldnât be able to have isIdentical. Requiring this would limit what you can use isIdentical for while providing no benefit.
All isIdentical is telling you is whether the values are known to be interchangeable with no observable difference. âObservable differencesâ should cover observable differences in equality. But .nan not being equal to itself (even if in violation of Equatable's principles) does not create an observable difference if you interchange the values, so itâs not really relevant for the purpose of isIdentical.
As for Float not being worth optimizing with isIdentical⊠hereâs an example of a struct with a Float inside carrying an additional payload that isnât part of its definition of equality:
struct FloatWithHistory: Equatable {
let value: Float
let history: String
// Operations on this type keep track of how the value was computed in `history`
static func +(a: Self, b: Self) -> Self {
Self(
value: a.value + b.value,
history: "\(a.history) + \(b.history)"
}
// Equality is based solely on the value's equality
static func ==(a: Self, b: Self) -> Bool {
a.value = b.value
}
// But FloatWithHistory is identical to another only if it has
// identical history (histories between two equal values aren't
// interchangeable)
func isIdentical(a: Self, b: Self) -> Bool {
a.bitPattern == b.bitPattern && a.history.isIdentical(b.history)
}
}
This is a type that violates the equality principle since equality is based on Float.==. But why should it not be allowed to participate in the isIdentical fast path because of this? Would it really case a problem?
I'm in favour of defining === and !== for this purpose, as they effectively mean "is identical" for object references today, but I can appreciate that not everyone wants more operators in the language.
We have a carve-out in the semantics of Equatable for âexceptionalâ values to accommodate floating-point NaNâthis is well trodden ground in that we have decided that (other than acknowledging the value exists) we will not contort or invent new APIs to regularize it.
For the same reason, we should not consider NaN behavior further than that for these purposes: for each proposed semantic guarantee, read in a caveat âor else the value is exceptional.â For example:
a.isIdentical(b) implies a == b (or else a and b are exceptional values)
In that light both Johnâs proposed semantics and Davidâs seem perfectly reasonable.
I might be missing something since that's essentially what I had said in the quoted text ("concession" vs. "carve-out", "special case" vs. "exceptional values"), but I believe we're in agreement?
They are, but I worry that they either imply too much instead of stating it outright, or they don't guarantee enough. As long as we're clear about the purpose of this API (exposing a way to quickly detect known-sameness of potentially large values via their internal memory representation), then having those formalized semantics certainly doesn't hurt.
So I want to make sure we're not stopping there, because there are a lot of types that could implement the operation satisfying those semantics that probably shouldn't. Would we accept a proposal that adds Int.isIdentical(to:), whose only sensible implementation is the same as Int.==? It would satisfy those semantics. If so, I think we risk muddying API surfaces because the operation becomes very close to but subtly different from ==. If not, then we need to make sure that we also capture the reasons why not in the proposed semantics for this new operation.
And not to open up a can of worms, but the Swift docs on Equatable makes a big statement that is not really true:
Equality implies substitutabilityâany two instances that compare equally can be used interchangeably in any code that depends on their values.
An easy counterexample to this is the following:
func f(_ string: String) -> Int {
string.utf8.count
}
With this function, f("caf\u{e9}") != f("cafe\u{301}") yet "caf\u{e9}" == "cafe\u{301}", contradicting "substitutability" defined above. What the docs calls "substitutability" is also known as "well-defined" functions, and while related to equivalence relations, it certainly isn't implied by having one.
However, isIdenticalshould satisfy this substitutability property, so it is another property that could be added to the docs to describe its semantics.
The way we square this observation is to say that this isn't code that "depends on [the string's] value." That is, since the equivalence relation given by String.== considers Unicode canonicalization, the UTF-8 view of a string is a "non-value aspect," to borrow terminology from the same documentation you're quoting. (Indeed, offering different views of the same underlying storage is the way in which the stdlib provides access to different equivalence relations.)
It would be rather pedantic to insist that spelling such views as a computed property like string.utf8 is not allowed because it'd violate a semantic guaranteeâin favor of, say, a more verbose and less fluent converting initializer String.UTF8View(string). Instead, the pragmatic approach is simply to regard utf8 as a "non-value aspect."
If I understand correctly that you are referring to substitutability with respect to all publicly exposed APIs whatsoever, rather than a flexible approach where some subset of the APIs can be deemed to be a "non-identity aspect," I think we actually don't want to commit to this, and particularly for String.
For example, consider the fair number of flags for "known ASCII," "known UTF-8," etc. It would be needlessly rigid to early-exit a fast-path comparison simply because one of these flags was pre-computed in one string but not another while they share the same underlying storage buffer. On the other hand, it would also be needlessly constricting to prohibit exposing APIs to access memoized properties simply because a type has implemented a useful isidentical method.
All of that seems a bit hand-wavy to me (why would a collection of UTF8 bytes underlying a string be a "non-value aspect" to the string?), and so is why it surprised me for the docs to mention "substitutability" at all. It starts with a very concise definition of "substitutability" (a.k.a. well-definedness more broadly), and then muddies with terms it doesn't really define.
Sorry, but I am actually arguing the opposite. Functions that are not well-defined can be useful, given that they are documented as such. I am only arguing that "substitutability" is actually not apart of Equatable's semantics at all and probably doesn't even belong in the docs, at least not as it is stated now.
I don't think I'm arguing that either. I only wanted to point out that this substitutability concept mentioned in the docs seems more relevant to isIdentical than to == and so perhaps this would be the best place to mention it.
Why "but"? === already exists, so if it is reused of the feature being discussed we will not have "another" operator.
The difference in Nan's could be observed:
let x = Double(bitPattern: 0b0_11111111111_1000000000000000000000000000000000000000000000000000)
let y = Double(bitPattern: 0b0_11111111111_1100000000000000000000000000000000000000000000000000)
print(x, x.bitPattern)
print(y, y.bitPattern) // different bit patterns
However I'd agree with you that identical should not be necessarily coupled with equivalence.
I don't understand how this feature will work for structs. E.g.:
struct S {
let items: [Double]
let value: String
}
the feature (whatever the syntax is chosen) works out of the box
I'd need to conform S to some protocol, then it works regardless of the fields
I'd need to conform S to some protocol, but it will compile and work only if the fields conform to the protocol
I'd need to provide my own implementation of this feature.
it's not supported.
Which of these do we want?
Say we decided that (4) is inconvenient and (3) is the way to go. Then I'd prefer a situation that a single inline array field of the struct doesn't break the ability of the struct values to be comparable for "identicalness". On that ground it would be beneficial to have everything included, even if the identicalness operation is not meaningfully faster than comparison.
(2) also makes sense (if an array or a dictionary of two fields could be compared for identicalness regardless of the fields why can't a struct of two fields).
I guess I spared the discussion from talking about nan payload bits. That said, itâs similar to my example FloatWithHistory: a type having an additional field that does not participate in equality but should be taken into account for isIdentical because it is carrying information. Itâs also not for nothing that in that example I was comparing the floatâs bitPattern for identity comparison (something that is easy to miss in a custom implementation).
Ideally, I would say (1): make it always valid using a bitwise comparison, possibly capped to a certain length chosen by the caller (to provide an upper limit in time spent for the comparaison relative to the callerâs lost opportunity cost). This can result in false negatives (due to padding/unused bits), but itâs still correct for the purpose of opportunistically detecting identical values.
(Maybe we could even limit false positives in the bitwise comparison with a special memcmp-equivalent intrinsic that would ignore (when possible) unused bits and padding when the compiler is aware the typeâs layout. Itâs not exactly needed since itâs always valid to return false, but itâd reduce false negatives when the type layout is statically known.)
If we go the protocol route, itâll be either manual implementations everywhere, or a compiler-generated implementation like Equatable that would call isIdentical on all the fields. Calling isIdentical for all the fields could transform the check into a more expensive operation than what might be appropriate; there could be a hundred fields! Should it limit itself to the first 5 fields?
Moreover, going the protocol route means a lot less types will be able to be checked for isIdentical. All the structs trivially wrapping a single String or Array, all the enums with cases wrapping a String or an Array: they arenât exactly a rarity and they deserve this opportunistic check as much as raw String and Array do. Thatâs a lot of conformances people would have to add manually everywhere just to make this opportunistic check work.
Because of this, I think a simple bitwise comparison is the way to go. It works with every type and can easily be capped to a chosen size decided by the caller based on the opportunity cost of the operation that can be skipped when the two values are known identical.
This proposal specifically isn't asking for a generalized equivalence relationship; it's defining isIdentical(to:) as an operation that quickly peers at the underlying memory location of some property of the value to determine whether more computation needs to be done. So if you have two structs at different memory addresses, then their InlineArray fields would be tautologically not identical by the proposed definition.
The reason it doesn't make sense to include InlineArray is because, unlike CoW types where "copies" might share underlying storage, an InlineArray would only ever be "identical" to itself according to the proposed definition.
It sounds like you're proposing a different operation, either involving protocols (which is not being proposed here, there was a separate pitch for that) or which doesn't satisfy the desired requirements of being fast (i.e., comparing underlying memory addresses instead of contents).
I think all types â big or small could benefit from this "is identical" check. For some types, like Array or String, this operation could be meaningfully faster than a standard comparison, while for others it may not offer a significant speedup. In all cases, though, the check would take time proportional to the number of bytes in the value:
Array: 8 bytes
String: 16 bytes
Int and Double: 8 bytes
Arbitrary struct: N bytes
To put it another way:
Suppose I have a dictionary with 20 fields. If I can call "is identical" on two instances without needing any custom code, I'd expect that behaviour to remain consistent even if I refactor the dictionary into a struct. In other words, I should still be able to perform "is identical" checks on two struct instances without writing additional code.
The semantics are clear as long as theyâre confined to String, Array, and cie: just compare the âgutsâ. Donât follow the pointer into the storage.
Personally, I also think the semantics of InlineArray for isIdentical are very clear: the content of the array is inline so what you then compare is the content. For instance: InlineArray<1, String> has exactly the same memory layout as String, so thereâs no reason it should behave differently when it comes to isIdentical.
If the inline array is âtoo bigâ for comparing in a âreasonable timeâ, pick a threshold size (number of elements, number of bytes, whatever) and once you pass that threshold return false instead of actually comparing. This threshold is a deference to practicality related to how often you can expect identical values and how much work you can skip in cases things are identical, so itâs sort of caller-dependent.
We would actively not want this for fast path comparison. This is because it would cause double the work if isIdentical does the same thing as == and then returns false, unless the compiler can optimize it out. That defeats the purpose of an API that's supposed to help users make things go faster as a manual optimization.
Again, we are deliberately not considering a generalized API or a protocol here. We want to find the right semantics for Array and other concrete stdlib types.
I think this gives us a good framework to get started. Thanks!
My basic strategy here is to "work backwards". I'm going to start by attempting to define these semantics on the concrete types in this proposal. I will then attempt to "pull up" these semantics to a "meta contract" that might apply across all concrete types that adopt isIdentical.
1.4.a and 1.4.b are just derived implications of 1.4. It might not necessary to spell this out in the proposal⊠but we can just stub them out here for now.
These all look reasonable and I don't see much room for controversy here. And this looks like it would also apply to Substring.
Extra Credit Assignment
A potential sidequest here might be to determine to what extent there are more derived semantic axioms here. For example:
If we start with the assumption that String adopts the semantic contract of Equatableand we add the semantic contract of Reflexivity to isIdentical⊠could we then somehow derive the semantic contracts of Symmetry and Transitivity on isIdentical? Because we lack the ability to prove that a == b implies a.isIdentical(b) I'm sort of thinking this proof is not possible.
If we can somehow prove that Symmetry and Transitivity on isIdentical are derived properties that could clean some things up for us later on down the road.
As we demonstrated before any of these "container" types can lead to problems because of Float and Double. However:
Which then gives us:
`Set`:
4.1 `a.isIdentical(a)` is always `true` (Reflexivity)
4.2 `a.isIdentical(b)` implies `b.isIdentical(a)` (Symmetry)
4.3 `a.isIdentical(b)` and `b.isIdentical(c)` implies `a.isIdentical(c)` (Transitivity)
4.4 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
4.4.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
4.4.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
This is legit and LSG understands why this is consistent with the long standing precedent shipping in standard library.
Things get a little more complex when we start to consider Dictionary:
`Dictionary`:
5.1 `a.isIdentical(a)` is always `true` (Reflexivity)
5.2 `a.isIdentical(b)` implies `b.isIdentical(a)` (Symmetry)
5.3 `a.isIdentical(b)` and `b.isIdentical(c)` implies `a.isIdentical(c)` (Transitivity)
5.4 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
5.4.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
5.4.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
This works for us when Dictionary is Equatable⊠but Dictionary is only conditionallyEquatable. We do not currently intend to restrict isIdentical along the Value element being Equatable. We can try and generalize this to handle both cases:
`Dictionary`:
6.1 `a.isIdentical(a)` is always `true` (Reflexivity)
6.2 `a.isIdentical(b)` implies `b.isIdentical(a)` (Symmetry)
6.3 `a.isIdentical(b)` and `b.isIdentical(c)` implies `a.isIdentical(c)` (Transitivity)
`Dictionary` is `Equatable`:
6.4 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
6.4.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
6.4.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
A potential "missing piece" here is that nothing about 6.1 through 6.3 tells us under what conditions two Dictionary values that do not adopt Equatable are ever not identical to each other. I don't see this as a big blocker⊠we can file a mental TODO and the header doc comments for Dictionary can offer more guidance on that.
We can carry the Dictionary semantics over to Array, ArraySlice, and ContiguousArray and those also work.
Now that we have some basic semantics defined for the concrete types included in this proposal⊠how easy is it for us to then generalize these semantics as a "meta contract" that could potentially apply to any concrete type?
Let's begin with the two basic requirements from the earlier pitch:
Here is an attempt to begin to formalize that across some type T that is Equatable:
`T` is `Equatable`:
7.1 `isIdentical(a)` it is able to return a correct answer
7.2 `isIdentical(a)` is *meaningfully* faster than `==`
What exactly did we mean by "correct" answer?
`T` is `Equatable`:
8.1 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
8.1.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
8.1.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
8.2 `isIdentical(a)` is *meaningfully* faster than `==`
An isIdentical method that always returns false is "legal" but effectively worthless: let's think about how to enforce that isIdenticalmust return truesometime:
`T` is `Equatable`:
9.1 `a.isIdentical(a)` is always `true` (Reflexivity)
9.2 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
9.2.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
9.2.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
9.3 `isIdentical(a)` is *meaningfully* faster than `==`
Because T might not always be Equatable:
`T`:
10.1 `a.isIdentical(a)` is always `true` (Reflexivity)
`T` is `Equatable`:
10.2 `a.isIdentical(b)` implies `a == b` (*or else `a` and `b` are exceptional values*)
10.2.a (derived) `a != b` implies `a.isIdentical(b) != true` (*or else `a` and `b` are exceptional values*)
10.2.b (derived) `a == b` does *not* imply `a.isIdentical(b)`
10.3 `isIdentical(a)` is *meaningfully* faster than `==`
Which all the concrete types we investigated so far would conform to. And AFAIK this would also "backdeploy" to the isIdentical method shipping on Span (which is not Equatable):
Investigating 10.1 a little closer we see this addresses some earlier feedback:
Because we require T to return truesometime we effectively have blocked the library maintainer from always returning false and technically satisfying our semantic contract.
Investigating 10.3 a little closer we see this addresses some earlier feedback:
We don't really have a legit way to enforce runtime performance through our compiler⊠but if we clearly communicate that isIdenticalmust return meaningfully faster than the == operator we are communicating that this should not be applied to types such as Int or InlineArray.
One missing piece we saw previously is that when T is not Equatable we do not formally define under what conditions isIdentical ever returns false. Again I do not see this as a big blocker. My thinking is we can just make it the choice of the library maintainer to communicate that in more detail on their concrete type.
At the risk of sounding like a broken record, we ought to have good, real-world motivating examples for the use of this API to determine our desired semantics. For instance, for values of some type T that are neverEquatable, what can one usefully do that isIdentical would make faster or more expressive?
For that matter, what concrete types in the stdlib that are never Equatable are we thinking about here?