No, I don't think you are reading that right. I understood @xwu's point to be that Hasher
itself 'haphazardly' changes its hashing behavior such that we could quite easily adopt any change in hashing behavior we want without worry—in some sense it is 'maximally' unreliable and free of Hyrum's law.
OK, thanks, Frederick. I'm really tired, I must not have parsed this right. I apologize; I removed that outburst.
Hasher
gets closer to the ideal, but it is not free of Hyrum's Law, either. Changing the hash encoding of a type is technically still an observable change, and this does regularly lead to issues. This is particularly the case if hash(into:)
ever gets marked @inlinable
-- which sadly it often is. (Even I found myself being forced to mark it inlinable in the stdlib (within generic types in particular), because the alternative was unacceptably slow.)
Hasher
also prominently (and very explicitly) does not document the specific hash function it implements (I suppose this is what you meant by referring to "undocumented" details). This (and random seeding) was indeed designed to allow us to change the hash function at will -- however, this still does not prevent Hyrum's Law from raising its ugly head: if anyone ever breaks Hasher's type barrier to extract its random seed, or to replace it with their own choice, then all bets are off again. I cannot really prevent that; all I can do is to hope it won't prevent us from shipping a replacement, if it ever becomes necessary. A theoretical problem is not an actual problem, and it should not prevent us from landing routine changes.
Trying to over-specify an entry point is counter-productive -- I think our job is to establish useful abstractions that free clients from having to think about every single tiny detail. Nobody but String
's maintainers should care whether some hidden bit in its representation is compared by isIdentical(to:)
-- in practice, practically no-one would need know about it, and the few people who do can trivially look it up in the implementation.
(Over-explaining things is also dangerous; developers do get scared when faced with complex documentation, and in my experience they often end up trying to reimplement the same thing on their own, often ending up breaking encapsulation and/or implementing a far less efficient home-brew alternative.)
Sure, it is an important question. However, whether or not an API ships inlinable is not usually fodder for Swift Evolution discussions -- it is an implementation detail that tends to be refined on our way to shipping the thing. @inlinable
may get added or removed (or, say, replaced with @_alwaysEmitIntoClient) due to regular, routine engineering work; it would be quite messy to have to formally update the associated proposal whenever we do that.
Quite right—and I agree these decisions in the scenarios you outline are an implementation detail and would like them to remain as such.
My point being—sorry if I'm being ineffective at articulating it—that by giving a definition of identicality in the vein that you have for this proposed API, that "there must be no way to distinguish between one instance and a copy of the other through their public API surface," I fear you've incorporated all observable behavior into the semantics of the operation, and thereby elevated all of these concerns into API-level concerns.
I think what you're saying is that it'd be a can of worms, which is what I'm saying too. My view is that by (perhaps in some cases unnecessarily strictly) adhering to the view that the operation compares what you've termed representational equality rather than trying to define the semantics based on distinctions via the behavior of any part of the public API, we avoid all of the above worms.
The alternative I find very intriguing which you've laid out is to make the implementation opaque—which, quite rightly, is not something we normally have to discuss but in this case would be essential to guaranteeing the semantics you've articulated without restricting future evolution of the type. To my mind, documenting opaqueness here is in a similar vein to how we document per-execution Hasher
behavior—also not something we typically do but important for understanding that functionality.
A potentially "weaker" semantic guarantee is to worry only about "value" state:
There are two main semantic implications of our proposed operation on String
: a true
value implies that other
is identical-indistinguishable to self
. An important implication of being identical-indistinguishable is also being interchangeable-substitutable. And we already know that substitutable implies equal.
But interchangeable-substitutable is a weaker semantic guarantee than identical-indistinguishable. Interchangeable-substitutable is a guarantee on the public "value" state. Identical-indistinguishable is a guarantee on the public value and non-value state.
If LSG insisted that this operation must ship as isKnownSubstitutable(for:)
we would no longer be making any guarantees about the subset of public state that is non-value. We would only be making a semantic guarantee about the public value state.
This of course drops potentially useful and impactful information on the floor… if the engineer using our function wants to know if these two representations are true "copies" identical-indistinguishable then we no longer communicate that.
If LSG wanted to get real fancy we could ship isIdentical(to:)
and isKnownSubstitutable(for:)
. But I don't really see any need to overthink this and have to then produce documentation explaining the difference between isIdentical
and isKnownSubstitutable
and ==
. Let's just go ahead pick one and ship that.
If LSG wants to ship isIdentical(to:)
then we should ship isIdentical(to:)
and stay consistent with the prior art from Span
. If LSG wants to ship isKnownSubstitutable(for:)
then we should ship that and add it to Span
. If LSG wants to ship both for one strong guarantee and one weak guarantee then we should ship that.
But after 100 comments I'm ready to just hand it over to LSG and ask them to pick one. I agree that isIdentical(to:)
is the right choice and the right semantic guarantee for now. LSG is welcome to take that advice and recommendation and make their decision using all the information available here to them. But I'm sort of beginning to run out of ideas how else to express my opinions without just repeating something that was already covered here.

My point being—sorry if I'm being ineffective at articulating it—that by giving a definition of identicality in the vein that you have for this proposed API, that "there must be no way to distinguish between one instance and a copy of the other through their public API surface," I fear you've incorporated all observable behavior into the semantics of the operation, and thereby elevated all of these concerns into API-level concerns.
I think what you're saying is that it'd be a can of worms, which is what I'm saying too. My view is that by (perhaps in some cases unnecessarily strictly) adhering to the view that the operation compares what you've termed representational equality rather than trying to define the semantics based on distinctions via the behavior of any part of the public API, we avoid all of the above worms.
OK; I think I get your point!
What bothers me about the representational definition is that it necessarily breaks through the veil of type abstraction, opening the magic box -- it inherently has to talk about the bits and blobs that are actually stored in memory, and it has to somehow explain how to sort them into two buckets based on whether they contribute to an instance's identity or not. Even assuming we wish to avoid giving type authors any freedom to reserve certain bits for special effects, any potential layout-derived padding bytes still need to be addressed.
As a type author, I'd find it deeply uncomfortable to have to spell out whether my abstract type's layout happens to include any such unspecified/padding bytes, much less to try to figure out where exactly they are in the representation. As a low-level stdlib engineer working on @frozen
types, I would most likely be able to reason about these things, but Swift types generally do not have stable layout -- so it isn't very practical at all to try reasoning about their underlying bytes. (I believe the layout algorithm isn't documented, and it may change between compiler releases.)
For this reason, I think I would prefer to avoid recommending developers to implement their own isIdentical(to:)
operations by comparing the bytes exposed by withUnsafeBytes(of:)
. I think that would have the potential of making quite a mess, even if everyone would limit this activity to types that they own/maintain (which I do not expect they would).
In contrast, the functional/operational specification approach avoids punching through abstraction boundaries. It's pretty easy to figure out if a field affects the result of any public API -- and finding that out immediately settles the matter, as we've seen just now when you pulled up UTF8Span
's fresh new properties.
It also lends itself to a readable, non-magical (and safe) implementation: at first approximation, we simply have to compare all direct stored properties through isIdentical(to:)
, ===
, bitPattern
comparisons, or regular ==
invocations, as needed.

For this reason, I think I would prefer to avoid recommending developers to implement their own
isIdentical(to:)
operations by comparing the bytes exposed bywithUnsafeBytes(of:)
. I think that would have the potential of making quite a mess, even if everyone would limit this activity to types that they own/maintain (which I do not expect they would).
Ah, I see your point. I agree it would be unwise to recommend straight-through bitwise comparison of contiguous memory as a general strategy for implementing this facility, for the reasons that you state. Whether it’s the best way of squeezing the highest performance out of standard library types, that’s an implementation detail that I’d happily leave to standard librarians.
To go back to Hashable
, as I think it’s illustrative, you’ve been very clear that a proper implementation should feed every member to the hasher that contributes to value equality. For this API, I agree that a parallel exhortation that you have to compare every member that contributes to identicality would suffice. But it still leaves the question of how to know which members that would include or exclude.
When talking about representation, I don’t necessarily think it always has to drop down to the level of discussing which bit is stored exactly where. (Perhaps in this I’m overly influenced by IEEE floating point, where there’s the notion of a value’s representation as sign, exponent, and significand distinct both from the bit pattern encoding and from the value itself.)
I wonder if we could converge on a (handwavy) notion that values of a type are represented in memory by various fields, perhaps based on what Swift must copy in order to correctly make an independent copy of the value, and that these fields should be compared. The exact memory locations are not necessarily needed to do this (unless desired as a handrolled optimization for advanced use) but it would be a notion that is independent of what the rest of the API surface area of the type happens to be. Indeed, we end up at the same point where, to a first approximation, all directly stored properties need to be involved in the comparison.
If we ever introduce the "opposite" operation (that would give a definite "not same" quickly or "don't know"), what would we name it?
Consider the following table that includes a row for that distant future hypothetical operation:
Name | Synonym for | Complexity | Examples |
---|---|---|---|
== |
"is equal" | O(n) | Current Equatable |
isIdentical |
"is identical" | O(1) | References or bits being equal |
TBD | "definitively not equal" | O(1) | Data counts or hashes being different |
Note that "not identical" would not be the right name for that operation as we could quickly tell different values apart even when they are not identical, the examples would be differers data counts or different hashes. "notEqual" - again, not quite right, as the false
could then be misinterpreted as "equal" instead of "don't know". "knownToBeNotEqual" or, perhaps, removing the double negative - "knownDifferent" sounds right.
Then it word be tempting to rename the second row of this table as well to get to a consistent table:
Name | Synonym for | Complexity | Examples |
---|---|---|---|
== | "is equal" | O(n) | Current Equatable |
isKnownEqual |
"known to be equal" | O(1) | References or bits being equal |
isKnownDifferent |
"known to be different" | O(1) | Data counts or hashes being different |
OTOH, “isEqual”, “isKnownIdentical” and “isKnownDifferent” trichotomy is very precise in meaning and could also work.
On something else that was brought up above: performance flags. I admit I don't have the full picture here, but it does feel to me that:
- they should NOT be masked out / ignored during discussed identity check. Even if they are not exposed via public API they could be observable (e.g. via they performance implications).
- for "long" strings they should probably be in the underlying string object to begin with, not in the outer 16-byte String value. If done so they would be totally irrelevant to this topic.

Name Synonym for Complexity Examples == "is equal" O(n) Current Equatable isKnownEqual
"known to be equal" O(1) References or bits being equal isKnownDifferent
"known to be different" O(1) Data counts or hashes being different
As of right now I'm still opposed to putting "equal" in the name itself. We propose a new method on Array
without constraining that availability based on whether or not this Array
is Equatable
. Given all the extended and spirited discussions over 100 comments about choosing a name that communicates correctly… it seems to me like we are not improving correctness by returning true
for isKnownEqual
if that Array
itself has no ability to return true
from ==
because it has no ==
operator to begin with.

OTOH, “isEqual”, “isKnownIdentical” and “isKnownDifferent” trichotomy is very precise in meaning and could also work.
I would also argue that isKnownIdentical
is actually not very precise in meaning. A return value of true
seems to imply "we know" that other
is identical to self
. A return value of false
seems to imply we don't-slash-can't know that other
is identical to self
. The actual concrete types we have proposed for this operation do know: it's a binary decision. It either is or it isn't and there really is not any sort of legit "undetermined state" here where we would be unable to return a meaningful result.
Seungsahn Haengwon

If we ever introduce the "opposite" operation (that would give a definite "not same" quickly or "don't know"), what would we name it?
The Korean Zen Master Seungsahn Haengwon practiced "Don't Know" Buddhism. Students would ask "What is the nature of the Buddha" and the answer would be "Don't know".
The "alternatives considered" are just that: alternatives that are interesting enough to briefly discuss in our proposal document… but we see no compelling reason to include them in our detailed design at this moment.
As an abstract thought exercise is it impactful to then not only debate the name of the current operation being proposed on its own but to then begin a secondary debate for the name of a hypothetical second operation we explicitly decided not to include in this proposal? And then begin a tertiary debate back to the current operation being proposed leveraging the hypothetical second operation?
Don't know. There's my answer. Don't know. What is the right name for "must not be equal"? Don't know. How would the name we choose for "must not be equal" live alongside "is identical"? Don't know.

I would also argue that
isKnownIdentical
is actually not very precise in meaning. A return value oftrue
seems to imply "we know" thatother
is identical toself
. A return value offalse
seems to imply we don't-slash-can't know thatother
is identical toself
. The actual concrete types we have proposed for this operation do know: it's a binary decision. It either is or it isn't and there really is not any sort of legit "undetermined state" here where we would be unable to return a meaningful result.
Know what, I have to agree, it's logical. Between isIdentical
isKnownEqual
and isKnownIdentical
only the first two make sense, the last one doesn't as you rightfully pointed out.
Seungsahn Haengwon
I mean at some point in future we may find ourselves in a "if we had thought about that upfront we'd have chosen another name but we didn't so now we have to live with that decision" situation and my thought exercise above was an attempt to prevent that.

A return value of
false
seems to imply we don't-slash-can't know thatother
is identical toself
. The actual concrete types we have proposed for this operation do know: it's a binary decision. It either is or it isn't and there really is not any sort of legit "undetermined state" here where we would be unable to return a meaningful result.
I’m struggling to understand what you mean here. It must depend on a particular interpretation of “identical.”
This proposal will establish a naming pattern (a term of art) for future APIs, and it makes sense to choose a name that preserves the flexibility to return false negatives.

I’m struggling to understand what you mean here. It must depend on a particular interpretation of “identical.”
A representation that is "identical-indistinguishable" is a representation that is "interchangeable-substitutable". And a representation that is "interchangeable-substitutable" is equal by the existing definition from Equatable
.
But not all representations that are interchangeable-substitutable are also identical-indistinguishable.

This proposal will establish a naming pattern (a term of art) for future APIs, and it makes sense to choose a name that preserves the flexibility to return false negatives.
TBF… I've been a little sloppy here before in the pitch threads about using "false negative" for a situation where other
is not identical to self
but other
is equal to self
. But this is probably not a completely correct "false negative" because it's a false negative looking at the implications of what a return value of true
would give us: identical-indistinguishable implies interchangeable-substitutable implies equal.
I fail to see any compelling reason why a concrete type would need an API to return false
indicating "we can't" determine if other
is an identical representation. The concrete types in this proposal can determine if other
is an identical representation and I don't know if there is any reason a library maintainer would ever not have the ability to return a meaningful answer for that operation.
For the sake of simplicity let's suppose is(Known)Identical
is implemented in terms of
memcmp(...) == 0
Then for any two values we know the result, it is either false
or true
.
And false
means:
we know it is not identical
and not that:
we don't know if it's identical or not
Now, false could also mean or be interpreted as:
we don't know it is equal or not
in those regards isKnownEqual
could be a reasonable choice. But definitely not isKnownIdentical
.

in those regards
isKnownEqual
could be a reasonable choice.
As I said before: if an engineer here believed that isKnownEqual
was a "good" choice I would strongly argue that isKnownSubstitutable
is the better choice. When the concrete type adopts Equatable
this gives us the same information and semantic guarantees. When the concrete type does not adopt Equatable
this no longer implies something that can't be possible because we don't have an ==
operator.
But either choice is still a weaker semantic guarantee than isIdentical
. Returning true
from isIdentical
tells us that other
is identical-indistinguishable and therefore interchangeable-substitutable. Returning true from isKnownEqual
or isKnownSubstitutable
tells us that other
is interchangeable-substitutable without telling us other
is also identical-indistinguishable. It's potentially useful and impactful information we are dropping on the floor and I see no compelling we want or need to ship a weaker guarantee for engineers.
You convinced me on that one as well... After all the two arrays might be identical but not necessarily equal or even Equatable... Or the infamous Nan
peculiarity. I take my vote back from both isKnownIdentical
and isKnownEqual
. On isKnownSubstitutable
... I don't quite like the sound of it, but on the other hand we want something deliberately scary, so it might work.

On
isKnownSubstitutable
... I don't quite like the sound of it, but on the other hand we want something deliberately scary, so it might work.
Again… please keep in mind that choosing isKnownSubstitutable
is not just a "more scary" name for the same semantic guarantees… it's a more scary name for weaker semantic guarantees than what is currently being proposed. I do not believe this is the correct tradeoff for us… but the community can continue to share their opinions and LSG can use their best judgement next week.
I searched above and found yours:
One drawback here is we are now making a weaker guarantee. Returning
true
to indicateother
is identical toself
is a stronger statement that returningtrue
to indicate we knowother
is substitutable forself
. We're dropping potentially useful and impactful information on the floor with that. This is also inconsistent with all our prior art that indicatesisIdentical(to:)
should be the correct name for this operation.
Could you expand on that? I don't quite understand what you are saying here. Maybe an example will help.

One drawback here is we are now making a weaker guarantee. Returning
true
to indicateother
is identical toself
is a stronger statement that returningtrue
to indicate we knowother
is substitutable forself
. We're dropping potentially useful and impactful information on the floor with that. This is also inconsistent with all our prior art that indicatesisIdentical(to:)
should be the correct name for this operation.Could you expand on that? I don't quite understand what you are saying here.
Expand on what exactly? I see a statement about identical implying substitutable and a different statement about prior art in this space.

I fail to see any compelling reason why a concrete type would need an API to return
false
indicating "we can't" determine ifother
is an identical representation. The concrete types in this proposal can determine ifother
is an identical representation
Well, not exactly. String
is the difficult case here where the implementation details have bearing on the semantics and (as you say) the naming also.
Specifically, we have reserved but unused performance bits in String
. Certainly, you can require the semantics you've articulated above for String
. However, if the semantics are that we must never return false
if self
and other
are identical in all observable ways, then the implementation must not require these unused performance bits to be the same.
Unless I'm mistaken, that is not the approach taken in this proposal in the form it's being reviewed. It's certainly trickier to implement, requiring that the implementation never be emitted into the client and never inlined, and requiring that it be revisited in tandem with any changes to the type. Also, requiring careful bit-by-bit accounting in this way is—for the same reasons that @lorentey outlined above—not exactly ideal semantics for generalization to end-user types.
The pragmatic approach sufficient for the motivating use case is to compare the reserved bits and return false
when they don't match, because we can't know for sure. Again, unless I'm mistaken, it is the approach that's actually presented in this proposal. What that means is that the semantics are such that we are leaving room to return false negatives—that is just another way of stating what we've said above, that representational equality is sufficient but overly strict. (Whether, in practice, it is actually possible to construct otherwise identical values that differ in the unused performance bits without bitwise manipulation is another matter.)

Specifically, we have reserved but unused performance bits in
String
. Certainly, you can require the semantics you've articulated above forString
. However, if the semantics are that we must never returnfalse
ifself
andother
are identical in all observable ways, then the implementation must not require these unused performance bits to be the same.
Ok… fair enough. Here's a Dirty Trick:
let s1 = "Hello, SE-0494!"
let s2 = s1
print(s2.isIdentical(to: s1)) // true
let s3 = BitterFlipper(s2)
print(s3.isIdentical(to: s2)) // false
print(s3.isIdentical(to: s1)) // false
Here we have:
- We know that
s2
is identical tos1
. - We know that
s3
flipped some unused performance bit ons2
. - We know that
s3
is not identical tos2
.
3.1. You are looking for some observable way thats3
is different froms2
. - We know that
s3
is not identical tos1
.
4.1. But we also know thats2
is identical tos1
.
So we have s3
and s2
that produce an observable difference across some public API:
print(s3.isIdentical(to: s1)) // false
print(s2.isIdentical(to: s1)) // true
Is this "cheating"? Is this "not fair"? Maybe… but if you were looking for some public API that returns some observable difference between two values that might have flipped some unused performance bit: well… here you go.
@lorentey provided an excellent example of String’s special bits that keep 2 instances identical while these bits differ.
The definitions of these functions in proposal are felt hard to understand, as they seems similar to ==.
I’ve found the following clarifications useful for myself:

What bothers me about the representational definition is that it necessarily breaks through the veil of type abstraction, opening the magic box -- it inherently has to talk about the bits and blobs that are actually stored in memory
if parts of the type's representation do not impact the actual result of any operation (which is clearly the case with String's opportunistic performance bits), then I think it would be quite reasonable to leave the matter or comparing them to the type author's judgment
we do not have a way to mechanically (and quickly) compare the representations of two instances of a copyable type in a reliable way that skips over unspecified padding bytes
This clarify that isIdentical is not about raw representation

There is a clear need to have a lightning fast way to reliably figure out if two
String
types are identical, in the sense of not being able to distinguish them from copies without breaking encapsulation

it approaches substitutability from an operational angle, based on the public API surface
This clearly explain the goal

it would be worth the effort/cost to introduce a quick universal mechanical test for representational identity

my expectation is that if
a.isIdentical(to: b)
returns true, then there must be no way to distinguish between the two instances through their public API surface
This helps better understand the semantics and the goal

Allowing the implementation to get inlined would not give us any real benefit! The String representation includes reserved bits that we may well decide to start using in the future; we do not know what their semantics might be and we should not put arbitrary constraints on that.

I would prefer to avoid recommending developers to implement their own
isIdentical(to:)
operations by comparing the bytes exposed bywithUnsafeBytes(of:)

Crucially,
isIdentical(to:)
must always have constant complexity. (That is to say, for each specific concrete type that implements it, we must be able to provide a constant upper bound on (say) the number of instructions the operation needs to execute. The
These will be good to mention in the proposal as guideline or warning for those how decide implement these functions to own types.

at first approximation, we simply have to compare all direct stored properties through
isIdentical(to:)
,===
,bitPattern
comparisons, or regular==
invocations, as needed.
This helps to better understand both semantics and that isIdentical
operation itself is very type specific. All those questions about Double.nan, -0 / +0 and similar become clear.

NaNs aren't at all relevant here. Accordingly, the proposal should not need to attempt trying to define "exceptional" values as if they were legitimate --
Float
andDouble
are violatingEquatable
's reflexivity requirement, but that's a problem between them andEquatable
For me it seems to be good to mention this example for better understanding the nuances and relation between isIdentical and ==.
I’ve recently talked to some folks about this term, more concretely about String._isIdentical
. Some of them are frustrated by its meaning and difference with ==, another think about it like “oh, this is like == but faster“.
Personally agree with the term “isIdentical“ as this term definitely says it is not about equality.
The only problem I see is that all these methods will be easily discoverable in public API and misunderstood.
Current is String._isIdentical is not easily discoverable with autocomplete.
However, I think it is enough to add proper documentation where at the beginning it is clearly stated that isIdentical
:
- Differs a lot from ==. Precisely it is not a fast way to compare values for equality.
- Not for general usage / for specific needs
If isIdentical
term seems to be misleading for LSG as the result of review, then my choice is isKnownIdentical(to:)
– such term sounds far less similar to isEqual
for folks from another languages, beginners and those who even hasn’t read the API documentation before using it.
If so, current isIdentical
presented in Span types should be deprecated, unfortunately.