I noticed that the proposal does not include the addition of isIdentical(to:)
methods on the Unsafe[Mutable][Raw]BufferPointer
and UTF8Span
type families. I think we should add those, to avoid unnecessary inconsistencies with our Span
types.
Unlike with the types we are proposing, I can of course implement these outside the stdlib myself using public API already present, but I find myself annoyed whenever I have to do so, which happens notably often.
I do continue to believe isIdentical(to:)
is the natural name for this operation -- that's precisely why we proposed this exact same operation under this very same name, in both SE-0447 and SE-0467, for Span
and RawSpan
.
If the Language Steering Group ends up determining that the isIdentical(to:)
name is not right for this operation, please do remember to update the Span APIs accordingly, so that we have consistent names throughout the stdlib.
Underscored methods are not public API, and we don't need to go through this (frankly) soul-sucking process to add them, especially ones as tiny as these ones happen to be. I do believe these isIdentical(to:)
methods are worthy of becoming public API, as they are filling a clear functional hole that isn't possible to safely work around outside the stdlib.
To be very clear, I continue to have zero interest in exposing any ObjectIdentifier
-like externalized identifiers. I strongly believe that to be a wholly unnecessary complication, and it would be a mistake to do so. I've previously posted my detailed reasoning:
I do see concrete, significant value in exposing simple tests for checking if two concrete collection/container types happen to be identical. I routinely wish to be able to perform such checks.
As far as I am aware, this proposal does not introduce any new terms of art. SE-0447 and SE-0467 have already established isIdentical(to:)
to denote this specific operation. This proposal merely adds this same operation on some other container types where we need it.
I'm assuming that by "representation", you mean the raw bytes that get stored in memory. Does that include any padding bytes with unspecified, arbitrarily varying values, that aren't semantically part of the actual value?
Do we have an existing mechanism that would allow us to quickly compare the representations of two values of the same type without including such bits? If we do, then the stdlib could simply ship that as a free-standing two-argument function. As you kindly pointed it out, I'm a mere library engineer; but as far as I'm aware, we do not have such an operation. (Do we? It doesn't strike me as something regular everyday code would routinely need to do at runtime, but, again, I'm no compiler engineer.)
However, I'm not sure we'd actually want isIdentical(to:)
to map to such a representational comparison, even if that was a thing. Representational equality is a sufficient test for identicalness, of course, but I don't think it's always a necessary one.
Consider our String
type -- it includes optional performance bits (isASCII
, isNFC
, with room for others) that may conceivable be set on one copy of the exact same string instance and not on another. I'd argue that String.isIdentical(to:)
should probably ignore these bits, even though they are technically part of the underlying representation. (Although it wouldn't be a huge deal if we failed to do that -- these bits are only present if the strings happen to be large, and we could choose not to force isIdentical(to:)
to go into all the various string forms. The point is that this should be up to the type author to decide these questions.)
Another example is something like String.Index
, which has non-essential UTF-8/16 encoding signal bits that are (for historical reasons) not necessarily always set (for example, in case an index was produced by an older binary). If we were to add an isIdentical(to:)
method on that type, it may be desirable to allow an index value that does not have any of these bits to compare identical to one that does happen to carry them. (Even though it would technically violate transitivity to do so, and even though the distinction does lead to some observable behavioral changes when the index is applied to the wrong string instance.)
Kyle did make a similar point, although with a far less plausible scenario: (paraphrased here to fix some issues)
struct Subarray<Element> {
let _base: [Element]
var _start: Int
var _length: Int // Can sometimes be negative, for whatever weird reason
func isIdentical(to other: Self) -> Bool {
self._base.isIdentical(to: other._base)
&& (
(self._start == other._start && self. _length == other. _length)
|| (self._start + self. _length == other._start && _other._start + _other. _length == self._start)
)
}
}
This does stretch suspension of disbelief a bit. (Why would we not want to normalize ranges? It'd just complicate access with no apparent benefit.) However, it is not inconceivable for a type to incidentally end up with multiple equivalent ways to represent the "same" value, while still allowing strictly O(1) identity checks.
We used the name isIdentical(to:)
for the same API on [Raw]Span
and Mutable[Raw]Span
SE-0447 and SE-0467.
I really don't think it'd be wise to throw out perfectly suitable names we've already shipped on a whim.
The proposed String.isIdentical
, Array.isIdentical
etc routines are not doing anything more or less than what the Span
operations do. They return false if and only if the two given values are distinguishable through their public API, just like the operations on borrowing spans.
I'm a little slow today; could you please spell out this trivial implementation of Optional.isIdentical(to:)
for me?
The one in the proposal's Alternatives Considered section is just that. It's something one would consider for a fleeting moment, then discard as obviously nonviable. The implementation given there only works on optional arrays, as if that was a particularly deserving special case, which it isn't. We certainly wouldn't want to add a myriad distinct overloads of isIdentical(to:)
on Optional to handle every specific type that comes with an isIdentical
implementation.
What would be technically possible is to define a protocol around isIdentical(to:)
, and have Optional
conditionally conform to it when its wrapped type does. I do not personally see any reason we'd want to do that. If somebody wants to experiment to see if a protocol like that would have useful applications, then they are free to explore that idea in a package; as long as this proposal ships the isIdentical(to:)
methods, the protocol itself does not need to get defined in the stdlib. (Until and unless someone finds an actual, clear need for it, of course. At which point, they should write a separate proposal.)
Huh. What does isIdentical(to:)
have to do with vague concepts like "underlying storage"?
My expectation is that if (and only if) a.isIdentical(to: b)
returns true, then there must be no way to distinguish between one instance and a copy of the other through their public API surface. That is to say, a
must behave exactly as if it was a direct copy of b
, and vice versa.
Crucially, isIdentical(to:)
must always have constant complexity. (That is to say, for each specific concrete type that implements it, we must be able to provide a constant upper bound on (say) the number of instructions the operation needs to execute. The bound is allowed to depend on type arguments.)
This is a clear, library-level definition, not a representational one. The "public API surface" is something that is carefully designed by the type author, and it does not necessarily fully expose all details of the underlying representation -- although often it does happen to do that.
Do y'all see any actual problem with this definition?
(If we wanted to restrict this to representational tests, then the "public" qualifier can be dropped -- a.isIdentical(to: b)
would return true if and only if a
would not be distinguishable from a copy of b
through any operation, and vice versa. I do think the definition works better if it didn't have to talk about the bytes exposed by things such as withUnsafeBytes(of:)
-- no self-respecting collection author would like their clients to ever mess with those.)
I think this definition is very similar to how "value equality" is usually described, except for two things:
- It seems no two people agree on how that applies to a real programming language (
Equatable
most definitely does not model this in Swift, no matter its pretensions of "substitutability". In my definition,isIdentical(to:)
limits concerns to the public API of a type, which I hope is a relatively well-understood concept.) - We have our crucial extra requirement of O(1) complexity.
The definition applies the same way to Span
as it does to Array
, UnsafeBufferPointer
, Set
or String
.
This proposal only suggests adding these members to data structure implementations with in-memory storage, because that's where we have the most dire need for them.
For Collection types in particular, I expect the definition to have the following two notable consequences:
- Index values must be completely interchangeable across the two identical collection instances. Indices from one collection must be fully valid and usable in the other. (Note that this is one glaringly obvious area where Equatable fails to live up to its own "substitutability" expectations.)
- Operations like
func withContiguousStorageIfAvailable
must not expose pointer values that may be used to distinguish between the two collection instances. (If the collection has in-place storage, then the pointers may vary depending on the location of the specific instance. But then we cannot use them to disprove that one of the instances is a copy of the other.)
To be honest, I think those axioms are not very useful, and people are rightly using them as easy target practice.
Again, my expectation is that if a.isIdentical(to: b)
returns true, then there must be no way to distinguish between the two instances through their public API surface -- a
must be indistinguishable from a copy of b
and vice versa.
That's it; I do believe that's what the documentation needs to say about the matter. (Along with the crucial expectation of O(1) complexity, and perhaps a pragmatic carveout to ignore differences in execution time on types like String
, if we feel like we want to actually allow fancy things like that.)
It would in fact be quite reasonable to extend Float
and Double
with isIdentical(to:)
methods -- a representational definition would be exactly right.
extension Float {
func isIdentical(to other: Self) -> Bool { self.bitPattern == other.bitPattern }
}
Here, I implemented this in an extension outside of the stdlib, using perfectly valid, safe code that only calls supported public API. We can ship it in the stdlib if we want, of course, but there is no particular reason to do so.
Notably, I cannot do the same with String
or Array
without assuming things about their representation that I have no business assuming unless I'm within the stdlib, where these types are defined. (Such as that they do not have unspecified/padding bits in their representation, and there aren't any other reasons representational equality would be a sufficient, but not necessary test for identity.)
Trying to defining isIdentical(to:)
outside the stdlib would be quite a layering violation. I think using unsafeBitCast
would technically work on all the platforms we currently support, but that's not a function that we should be encouraging anyone to call on standard collection instances. (Or, frankly, pretty much ever.)
NaNs aren't at all relevant here. Accordingly, the proposal should not need to attempt trying to define "exceptional" values as if they were legitimate -- Float
and Double
are violating Equatable
's reflexivity requirement, but that's a problem between them and Equatable
; it has absolutely nothing to do with isIdentical(to:)
. Nada.
If someone wants to precisely formalize what "indistinguishable" means, they should of course feel welcome to try to do so on their own free time. We are generally using regular English sentences to explain what operations do in Swift, not formal math.

Back to this proposal, it's the
String
family of types where I struggle, because I thinkisIdentical(to:)
can be easily misinterpreted there. Even thoughString
is a collection, it's a very specific kind of collection that is more than just the sum of its parts. Someone might reach for it thinking it's testing "are these strings identical w.r.t. their code points" (i.e., not checking canonical equivalence).
By the exact same argument, someone might also reach for String.==
thinking it's testing code point equality. (People do that quite often.) Is that not a problem? Should we rename String.==
to String.hasEqualCharacters(to:)
to avoid such misunderstanding? (But then, why would it be okay to assume that everyone knows that the term "Character" means "extended grapheme cluster" in our stdlib?)
One really good way to avoid arbitrarily confusing people is to stick to using consistent names for the same operations; String.isIdentical(to:)
serves the same basic purpose as Span.isIdentical(to:)
does -- I do strongly believe they need to have the same names. Sticking to consistent names establishes a strong pattern that avoids confusion.
The LSG can of course choose whatever name it wants -- at the end of the day, these aren't going to be prominently used APIs, and their names certainly do not need to be perfect. But I think the name really, really should be consistent across all standard types that provide this operation; otherwise we really would invite constant, unending, actual confusion and frustration.
The best way to do that is to simply continue to use the names we've already shipped in 6.2. The next best way is to figure out a name y'all like, and rename the Span/RawSpan operations to match.