SE-0494: Add `isIdentical(to:)` Methods for Quick Comparisons to Concrete Types

lorentey · October 1, 2025, 6:06am

I noticed that the proposal does not include the addition of isIdentical(to:) methods on the Unsafe[Mutable][Raw]BufferPointer and UTF8Span type families. I think we should add those, to avoid unnecessary inconsistencies with our Span types.

Unlike with the types we are proposing, I can of course implement these outside the stdlib myself using public API already present, but I find myself annoyed whenever I have to do so, which happens notably often.

I do continue to believe isIdentical(to:) is the natural name for this operation -- that's precisely why we proposed this exact same operation under this very same name, in both SE-0447 and SE-0467, for Span and RawSpan.

If the Language Steering Group ends up determining that the isIdentical(to:) name is not right for this operation, please do remember to update the Span APIs accordingly, so that we have consistent names throughout the stdlib.

Underscored methods are not public API, and we don't need to go through this (frankly) soul-sucking process to add them, especially ones as tiny as these ones happen to be. I do believe these isIdentical(to:) methods are worthy of becoming public API, as they are filling a clear functional hole that isn't possible to safely work around outside the stdlib.

To be very clear, I continue to have zero interest in exposing any ObjectIdentifier-like externalized identifiers. I strongly believe that to be a wholly unnecessary complication, and it would be a mistake to do so. I've previously posted my detailed reasoning:

[Pitch] `Distinguishable` protocol for Quick Comparisons

Exposing a fully formed identity raises too many questions for comfort:

Will people expect that identifier to change whenever the container is mutated? What if some container types did that, and people started assuming that it is a general requirement?

Will people expect that the identifier only changes when the container is mutated in a way that makes it reallocate its storage? (Again, what if some containers did that, and people start assuming that is a general requirement?)

Will people expect that identifiers never get reused, even if a container gets destroyed? (That has been a constant issue with ObjectIdentifier, despite the docs.)

To properly design a container identifier, we'd need to think about such questions, and some poor soul will likely have to write documentation on the recommended design patterns for such types. I believe this would be completely unnecessary work. We don't need to go overcomplicate this like that -- the actual problem doesn't need it.

I do see concrete, significant value in exposing simple tests for checking if two concrete collection/container types happen to be identical. I routinely wish to be able to perform such checks.

As far as I am aware, this proposal does not introduce any new terms of art. SE-0447 and SE-0467 have already established isIdentical(to:) to denote this specific operation. This proposal merely adds this same operation on some other container types where we need it.

I'm assuming that by "representation", you mean the raw bytes that get stored in memory. Does that include any padding bytes with unspecified, arbitrarily varying values, that aren't semantically part of the actual value?

Do we have an existing mechanism that would allow us to quickly compare the representations of two values of the same type without including such bits? If we do, then the stdlib could simply ship that as a free-standing two-argument function. As you kindly pointed it out, I'm a mere library engineer; but as far as I'm aware, we do not have such an operation. (Do we? It doesn't strike me as something regular everyday code would routinely need to do at runtime, but, again, I'm no compiler engineer.)

However, I'm not sure we'd actually want isIdentical(to:) to map to such a representational comparison, even if that was a thing. Representational equality is a sufficient test for identicalness, of course, but I don't think it's always a necessary one.

Consider our String type -- it includes optional performance bits (isASCII, isNFC, with room for others) that may conceivable be set on one copy of the exact same string instance and not on another. I'd argue that String.isIdentical(to:) should probably ignore these bits, even though they are technically part of the underlying representation. (Although it wouldn't be a huge deal if we failed to do that -- these bits are only present if the strings happen to be large, and we could choose not to force isIdentical(to:) to go into all the various string forms. The point is that this should be up to the type author to decide these questions.)

Another example is something like String.Index, which has non-essential UTF-8/16 encoding signal bits that are (for historical reasons) not necessarily always set (for example, in case an index was produced by an older binary). If we were to add an isIdentical(to:) method on that type, it may be desirable to allow an index value that does not have any of these bits to compare identical to one that does happen to carry them. (Even though it would technically violate transitivity to do so, and even though the distinction does lead to some observable behavioral changes when the index is applied to the wrong string instance.)

Kyle did make a similar point, although with a far less plausible scenario: (paraphrased here to fix some issues)

struct Subarray<Element> {
  let _base: [Element]
  var _start: Int
  var _length: Int // Can sometimes be negative, for whatever weird reason

  func isIdentical(to other: Self) -> Bool {
    self._base.isIdentical(to: other._base)
    && (
      (self._start == other._start && self. _length == other. _length)
      || (self._start + self. _length == other._start && _other._start + _other. _length == self._start)
    )
  }
}

This does stretch suspension of disbelief a bit. (Why would we not want to normalize ranges? It'd just complicate access with no apparent benefit.) However, it is not inconceivable for a type to incidentally end up with multiple equivalent ways to represent the "same" value, while still allowing strictly O(1) identity checks.

We used the name isIdentical(to:) for the same API on [Raw]Span and Mutable[Raw]Span SE-0447 and SE-0467.

I really don't think it'd be wise to throw out perfectly suitable names we've already shipped on a whim.

The proposed String.isIdentical, Array.isIdentical etc routines are not doing anything more or less than what the Span operations do. They return false if and only if the two given values are distinguishable through their public API, just like the operations on borrowing spans.

I'm a little slow today; could you please spell out this trivial implementation of Optional.isIdentical(to:) for me?

The one in the proposal's Alternatives Considered section is just that. It's something one would consider for a fleeting moment, then discard as obviously nonviable. The implementation given there only works on optional arrays, as if that was a particularly deserving special case, which it isn't. We certainly wouldn't want to add a myriad distinct overloads of isIdentical(to:) on Optional to handle every specific type that comes with an isIdentical implementation.

What would be technically possible is to define a protocol around isIdentical(to:), and have Optional conditionally conform to it when its wrapped type does. I do not personally see any reason we'd want to do that. If somebody wants to experiment to see if a protocol like that would have useful applications, then they are free to explore that idea in a package; as long as this proposal ships the isIdentical(to:) methods, the protocol itself does not need to get defined in the stdlib. (Until and unless someone finds an actual, clear need for it, of course. At which point, they should write a separate proposal.)

Huh. What does isIdentical(to:) have to do with vague concepts like "underlying storage"?

My expectation is that if (and only if) a.isIdentical(to: b) returns true, then there must be no way to distinguish between one instance and a copy of the other through their public API surface. That is to say, a must behave exactly as if it was a direct copy of b, and vice versa.

Crucially, isIdentical(to:) must always have constant complexity. (That is to say, for each specific concrete type that implements it, we must be able to provide a constant upper bound on (say) the number of instructions the operation needs to execute. The bound is allowed to depend on type arguments.)

This is a clear, library-level definition, not a representational one. The "public API surface" is something that is carefully designed by the type author, and it does not necessarily fully expose all details of the underlying representation -- although often it does happen to do that.

Do y'all see any actual problem with this definition?

(If we wanted to restrict this to representational tests, then the "public" qualifier can be dropped -- a.isIdentical(to: b) would return true if and only if a would not be distinguishable from a copy of b through any operation, and vice versa. I do think the definition works better if it didn't have to talk about the bytes exposed by things such as withUnsafeBytes(of:) -- no self-respecting collection author would like their clients to ever mess with those.)

I think this definition is very similar to how "value equality" is usually described, except for two things:

It seems no two people agree on how that applies to a real programming language (Equatable most definitely does not model this in Swift, no matter its pretensions of "substitutability". In my definition, isIdentical(to:) limits concerns to the public API of a type, which I hope is a relatively well-understood concept.)
We have our crucial extra requirement of O(1) complexity.

The definition applies the same way to Span as it does to Array, UnsafeBufferPointer, Set or String.

Optionally, we can still allow a bit of pragmatic flexibility about the effect of hidden opportunistic performance flags or incidental representational variability, like the two examples I argued above. For instance, it can still allow the two values to differ in how fast they run certain operations (as in, whether or not they can take opportunistic fast paths), but the same operations must still always end up with indistinguishable results. Of course, if String's performance bits or the dummy `Subarray` type's `_start` and `_length` values were directly exposed by public API, then `isIdentical(to:)` would necessarily have to compare them. If they aren't, then I see absolutely no issue with allowing type authors to come up with their own decision on whether to allow identical instances to have incidental differences like that.

This proposal only suggests adding these members to data structure implementations with in-memory storage, because that's where we have the most dire need for them.

For Collection types in particular, I expect the definition to have the following two notable consequences:

Index values must be completely interchangeable across the two identical collection instances. Indices from one collection must be fully valid and usable in the other. (Note that this is one glaringly obvious area where Equatable fails to live up to its own "substitutability" expectations.)
Operations like func withContiguousStorageIfAvailable must not expose pointer values that may be used to distinguish between the two collection instances. (If the collection has in-place storage, then the pointers may vary depending on the location of the specific instance. But then we cannot use them to disprove that one of the instances is a copy of the other.)

To be honest, I think those axioms are not very useful, and people are rightly using them as easy target practice.

Again, my expectation is that if a.isIdentical(to: b) returns true, then there must be no way to distinguish between the two instances through their public API surface -- a must be indistinguishable from a copy of b and vice versa.

That's it; I do believe that's what the documentation needs to say about the matter. (Along with the crucial expectation of O(1) complexity, and perhaps a pragmatic carveout to ignore differences in execution time on types like String, if we feel like we want to actually allow fancy things like that.)

It would in fact be quite reasonable to extend Float and Double with isIdentical(to:) methods -- a representational definition would be exactly right.

extension Float {
  func isIdentical(to other: Self) -> Bool { self.bitPattern == other.bitPattern }
}

Here, I implemented this in an extension outside of the stdlib, using perfectly valid, safe code that only calls supported public API. We can ship it in the stdlib if we want, of course, but there is no particular reason to do so.

Notably, I cannot do the same with String or Array without assuming things about their representation that I have no business assuming unless I'm within the stdlib, where these types are defined. (Such as that they do not have unspecified/padding bits in their representation, and there aren't any other reasons representational equality would be a sufficient, but not necessary test for identity.)

Trying to defining isIdentical(to:) outside the stdlib would be quite a layering violation. I think using unsafeBitCast would technically work on all the platforms we currently support, but that's not a function that we should be encouraging anyone to call on standard collection instances. (Or, frankly, pretty much ever.)

NaNs aren't at all relevant here. Accordingly, the proposal should not need to attempt trying to define "exceptional" values as if they were legitimate -- Float and Double are violating Equatable's reflexivity requirement, but that's a problem between them and Equatable; it has absolutely nothing to do with isIdentical(to:). Nada.

If someone wants to precisely formalize what "indistinguishable" means, they should of course feel welcome to try to do so on their own free time. We are generally using regular English sentences to explain what operations do in Swift, not formal math.

By the exact same argument, someone might also reach for String.== thinking it's testing code point equality. (People do that quite often.) Is that not a problem? Should we rename String.== to String.hasEqualCharacters(to:) to avoid such misunderstanding? (But then, why would it be okay to assume that everyone knows that the term "Character" means "extended grapheme cluster" in our stdlib?)

One really good way to avoid arbitrarily confusing people is to stick to using consistent names for the same operations; String.isIdentical(to:) serves the same basic purpose as Span.isIdentical(to:) does -- I do strongly believe they need to have the same names. Sticking to consistent names establishes a strong pattern that avoids confusion.

The LSG can of course choose whatever name it wants -- at the end of the day, these aren't going to be prominently used APIs, and their names certainly do not need to be perfect. But I think the name really, really should be consistent across all standard types that provide this operation; otherwise we really would invite constant, unending, actual confusion and frustration.

The best way to do that is to simply continue to use the names we've already shipped in 6.2. The next best way is to figure out a name y'all like, and rename the Span/RawSpan operations to match.

sveinhal · October 1, 2025, 9:33am

Swift already has established the isKnown construct in e.g. isKnownUniquelyReferenced
When something "is known" to have a property, it has that property. But when "is known" is false, it is uncertain if it has that property.

I think most people understand that, and if they don't immediately get the meaning of it, the name has just enough friction for you to pause and reflect and/or investigate the docs, methinks.

Tino · October 1, 2025, 11:57am

Imo the quoted explanation (or a variation) should be part of the proposal. I still think exposing some sort of identity-property would massively reduce the risk of misunderstanding, but it is really a can full of worms as soon as people store the value.

However, I think there is another option to eliminate confusion with an additional type:

Why does isIdentical have to return a Bool? Naturally, we are in a three-state territory, so why not define a enum? This would have the added benefit of being useful if there are cases where people just want to know if two objects are not equal.

I guess it’s the case of two nils – that should be a really fast check, even if the base type is terrible huge ;-).

xwu · October 1, 2025, 12:04pm

lorentey:

However, I'm not sure we'd actually want isIdentical(to:) to map to such a representational comparison, even if that was a thing. Representational equality is a sufficient test for identicalness, of course, but I don't think it's always a necessary one.

Consider our String type -- it includes optional performance bits (isASCII, isNFC, with room for others) that may conceivable be set on one copy of the exact same string instance and not on another. I'd argue that String.isIdentical(to:) should probably ignore these bits, even though they are technically part of the underlying representation. (Although it wouldn't be a huge deal if we failed to do that -- these bits are only present if the strings happen to be large, and we could choose not to force isIdentical(to:) to go into all the various string forms. The point is that this should be up to the type author to decide these questions.)

This is the crux of the matter: This proposal, which is proposing to add this API to String, must reckon with this question. It is probably the most challenging case in the standard library, and it will give evidence to help answer the questions you and all of us continue to reckon with:

Is representational equality always necessary? Given that virtually everything about the underlying representation for these concrete types under consideration is visible some way or other, is there a principled definition of the “public” API surface which allow us to ignore some bits (recalling that even if you ignore unsafe APIs, we still have public isKnown* APIs that check some of the performance bits)?

And if we open up this distinction, would it mean that every time we add to the public API of String we’ll need to double check that this operation stops ignoring any newly publicly exposed bits? If so, would the implementation of this API have to be never-emit-into-client never-inlinable on ABI-stable platforms, and would that even be viable as a fast path?

My sense is that the most we can reasonably do with this operation is to mask out any leading or trailing padding bits that we would declare not to be part of the representation, and that the notion of identicality is in practice going to be yoked to representational equality. Indeed, if I amend your expectation that “if ~~(and only if)~~ a.isIdentical(to: b) returns true, then there must be no way to distinguish between one instance and a copy of the other through their public API surface, which for an ABI-stable type includes any possible future safe API because we want this implementation to be inlinable,” then it’s hard to imagine any alternative. If so, we should say it, because without doing so there's going to be a lengthy reckoning like this every time we try to implement it for another type.

It is this: no such conversation as we’re having was or could be necessary for Span (or the unsafe pointer types), because it models nothing more or less than contiguous storage. By contrast, the semantics of the proposed additions to the concrete types here expose questions that you pose above, and the answer to them is not so clear. The question about spelling is merely a proxy for the underlying question about semantics.

Indeed, I think this goes to the foundational argument why these APIs should be part of the stdlib.

But—and this is not a counter for an actual design (which would be bad), but rather a thought experiment to tease out the semantics we want—consider that what you've outlined here lends itself to the following—again using deliberately bad names because it's not meant to be an actual design:

protocol HasRepresentationEqualityUsefulForIdentity {
  static var bitMaskToHideUnspecifiedBits: <<some type>> { get }
}

extension Array: HasRepresentationEqualityUsefulForIdentity {
  static var bitMaskToHideUnspecifiedBits: <<some type>> { ~0 }
}

// Top-level stdlib free function
func areIdentical<T: HasRepresentationEqualityUsefulForIdentity>(_ a: T, _ b: T) -> Bool {
  return unsafeBitCastOrSomethingSafer(a, to: <<some type>>) & T.bitMaskToHideUnspecifiedBits ==
    unsafeBitCastOrSomethingSafer(b, to: <<some type>>) & T.bitMaskToHideUnspecifiedBits
}

woolsweater · October 1, 2025, 4:38pm

Removed by me, since the text to which it referred has also been removed.

davedelong · October 1, 2025, 5:52pm

I am a reluctant +0.5 on this proposal. I understand the desire to have a concrete way to identify O(1) “equality”, even if I think it will rarely actually be used.

I find myself convinced by the arguments that:

This should be expressed as a protocol, so it can be used as a generic constraint
The name needs work

To that end, I propose this simplification:

// the new requirement
public protocol TriviallyEquatable {
    // return true iff the type can guarantee O(1) equality checking
    // if that guarantee cannot be provided, return false (the default)
    // and implement Equatable
    func isTriviallyEqual(to other: Self) -> Bool
}

// the default implementation, implying all existing types have O(n) equality checking
extension TriviallyEquatable {
    public func isTriviallyEqual(to other: Self) -> Bool { return false }
}

// Equatable now refines TriviallyEquatable
public protocol Equatable: TriviallyEquatable { ... }

Some thoughts:

I love @QuinceyMorris’s terminology of “trivial” identicalness. I find that a compelling descriptor of low-effort and low-cost checking
Making this a protocol separate from Equatable means it can be used to identify any type that can do a trivial equatability check
Calling this “Equatable” means we avoid the “what is Identical” and “what is a copy” questions
Refining Equatable makes adoption onto other types (including the pointer APIs) simpler, since they’ll already have an implementation that they can override at their leisure

bbrk24 · October 1, 2025, 5:55pm

While I mostly agree with your take on this, I don't think this is possible or desirable. Many types have either no mechanism for trivial equality, or have no reason to expose it (e.g. Int). Furthermore, I don't think the inheritance tree for existing protocols can be changed for ABI reasons.

davedelong · October 1, 2025, 6:07pm

Hm, I don’t recall specifically about whether protocol inheritance can change. I thought I’d remember seeing that before, but I could be misattributing it to something else (retroactive Sendability perhaps?).

Perhaps this could be inverted then:

public protocol TriviallyEquatable: Equatable { ... }

That would kind of be like Hashable, where I know that types with different hash values cannot be equal, so it’s not necessary to check the parent’s implementation of static func ==.

I stand by the rest of my comments.

Mordil · October 1, 2025, 6:07pm

I agree, I think the relationship should be inverse. If you can be triviallyEqutable you must also be equatable

bbrk24 · October 1, 2025, 6:08pm

I don't think that's right either. Array is only conditionally Equatable if its elements are Equatable, but it could unconditionally adopt TriviallyEquatable. I think they have to be unrelated protocols.

scanon · October 1, 2025, 6:28pm

Some quick notes on protocols.

We cannot make an existing protocol refine a newly introduced protocol. This would be ABI breaking.
There is not a natural refinement relationship between the hypothetical protocol and Equatable.
The proposal under discussion quite explicitly makes such a protocol out-of-scope. Let’s stay on topic.

Mordil · October 1, 2025, 6:53pm

I don’t disagree, I was speaking directly to the suggestion of the proposed hierarchy of the protocols.

But I also agree with the author’s that we’re probably not at the stage yet of being able to formalize this concept into a protocol

vanvoorden · October 1, 2025, 7:24pm

xwu:

lorentey:

Consider our String type -- it includes optional performance bits (isASCII, isNFC, with room for others) that may conceivable be set on one copy of the exact same string instance and not on another. I'd argue that String.isIdentical(to:) should probably ignore these bits, even though they are technically part of the underlying representation. (Although it wouldn't be a huge deal if we failed to do that -- these bits are only present if the strings happen to be large, and we could choose not to force isIdentical(to:) to go into all the various string forms. The point is that this should be up to the type author to decide these questions.) [emphasis added]

This is the crux of the matter: This proposal, which is proposing to add this API to String, must reckon with this question. It is probably the most challenging case in the standard library, and it will give evidence to help answer the questions you and all of us continue to reckon with:

Is representational equality always necessary? Given that virtually everything about the underlying representation for these concrete types under consideration is visible some way or other, is there a principled definition of the “public” API surface which allow us to ignore some bits (recalling that even if you ignore unsafe APIs, we still have public isKnown* APIs that check some of the performance bits)?

And if we open up this distinction, would it mean that every time we add to the public API of String we’ll need to double check that this operation stops ignoring any newly publicly exposed bits? If so, would the implementation of this API have to be never-emit-into-client never-inlinable on ABI-stable platforms, and would that even be viable as a fast path?

So these are all legit engineering questions… what I'm just not clear on is why these engineering questions should be the domain of a design proposal review stage. Are these not appropriate engineering questions that can — and should — be discussed during diff implementation review? Why must we block design proposal review if the library maintainers building and shipping the concrete types can choose the implementation that works for them under the constraints they are operating under?

If the constraints that our library maintainers are operating under means that our "is identical" test is essentially a de facto test for "same representation" — which I would argue it is not — then that's the implementation that we ship. And if the constraints that they are operating under means that our "is identical" test is not a test for "same representation" then those library maintainers have the freedom to use their best judgement to ship isIdentical(to:) that chooses to ignore optional performance bits — on the assumption they assume responsibility for these questions about ABI stability and future APIs that might expose those performance bits. Must the discussion about the implementation continue to take place here in design proposal review?

vanvoorden · October 1, 2025, 7:36pm

Our first pitch thread proposed a new Distinguishable protocol which was completely independent of Equatable. The big ergonomic problem for us was then managing the amount of generic overloads this would lead to for engineers using the API:

[Pitch] `Distinguishable` protocol for Quick Comparisons

The approach being pitched is this:
public protocol Distinguishable {
   func isIdentical(to other: Self) -> Bool 
}

func f3<S>(sequence: S) async throws 
where S: AsyncSequence, S.Element: Distinguishable 
{
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement.isIdentical(to: element) { continue } // Quick shortcut
    oldElement = element
    doLinearOperation(with: element)
  }
}
Note that Distinguishable is part of the algorithm's requirements; we cannot call this function on anything that doesn't conform to this niche protocol. To support calling it on classic Equatable elements, we'd need to add an overload that falls back to that, and a third overload if we also want to support types that conform to both. I don't think there is a reasonable way to merge the Equatable and Distinguishable variants into a single function; the only way I can see to do that is by doing runtime as? downcasts, and doing that would generally be costly enough not to be worth the effort of implementing the shortcut in the first place.

This led to the second pitch thread: which proposed adding a member requirement to Equatable. But we failed to come up with any compelling reason why this must ship right now as a protocol in standard library.

[Pitch #2] Add `isKnownIdentical` Method for Quick Comparisons to `Equatable`

Because the requirement is abstract and provided by arbitrary Equatable types, it struggles to define its expected semantics, which often only make sense in the context of a specific conforming type like Array or String. It also has to introduce optionality to the result type, which really feels like it ought to be reflected statically: if many types do not provide a meaningful implementation of this operation, perhaps it shouldn't be available on those types.

We would prefer to move forward with only the proposal that adds concrete isIdentical methods to the CoW types in the library. If experience with that proposal reveals that it is useful to have a generalization, we can reconsider this then. That is how generic proposals are best developed, by generalizing them from an existing pattern that's come to be well-understood. I've agreed to be review manager for these proposals, so I'll go read that proposal document and see if I have comments to make on the PR.

One clear missing piece for us to prioritize shipping a protocol in standard library would be if anyone from standard library could highlight for us code operating on a generic context that demonstrates clear and measurable performance improvements once this protocol is available in standard library. Another option would be if an anyone from a "high-profile" ABI stable library like SwiftUI could highlight for us a place where having this protocol available from standard library demonstrates clear and measurable performance improvements over some generic context.

If we don't have those perf wins waiting for us… then it's not clear for us why this must ship in standard library at this moment. Our preference is for library maintainers that want this protocol to ship their own protocol for their own generic contexts to constrain over. And if this incubates "in the wild" and the community seems to find some impactful use out of this pattern then I'm all for coming back to discuss either the Distinguishable protocol — or update to Equatable — at some future point in time.

lorentey · October 1, 2025, 7:37pm

Must it? I really don't think this question matters to folks who need to reach for isIdentical(to:) -- if parts of the type's representation do not impact the actual result of any operation (which is clearly the case with String's opportunistic performance bits), then I think it would be quite reasonable to leave the matter or comparing them to the type author's judgment. I don't think it's even necessary to document it.

The existence and semantics of these bits are entirely private implementation details of the String type; if I recall correctly, they were never proposed, or even ever referenced from any Swift Evolution proposal. We don't even rely on them that much yet -- they're mostly an engineering idea that's been stuck on the slow burner; I somewhat expect them to become more prominent at some point, but that can happen anytime between next month and never. Do we really need to force them into the spotlight now like this?

There is a clear need to have a lightning fast way to reliably figure out if two String types are identical, in the sense of not being able to distinguish them from copies without breaking encapsulation. This is basically a substitutability test, intended to resolve a somewhat niche, but real need.

It's true, these isIdentical(to:) operations are related to deep semantic questions on the nature of value semantics, and how it relates to our standard copy-on-write collection types, which are leaky abstractions. The specification of isIdentical(to:) I'm suggesting is a pragmatic one that avoids having to formally settle that messy subject -- it approaches substitutability from an operational angle, based on the public API surface.

This operational approach leaves some wiggle room for engineers to play within the confines of the specification, as usual. This is a feature, because it lets us solve practical problems -- like avoiding a hypothetical bug where one copy of the same string may have incidentally gained opportunistic performance bits that a (semantically) identical copy has not. (For example, we may find a magical way to locally memoize the results of isNFC/isASCII tests as we perform them. One trivial way of doing this would be to add an explicit operation for "greasing" a string instance (like a dictionary key) by analyzing its contents up front and setting NFC/ASCII/etc bits as appropriate. As a practical matter, we'd likely expect the result to still compare identical to the original instance -- as sometimes performance work necessarily involves pragmatic compromises like that.)

John's representational approach to defining identity is also workable; but (AFAIK) we do not have a way to mechanically (and quickly) compare the representations of two instances of a copyable type in a reliable way that skips over unspecified padding bytes. So we'd still need to have concrete entry points that type authors need to implement on individual types -- so that still leaves wiggle room for them to customize the implementation. I think the benefit of an operational specification is that it does not force us into breaking the spec when we decide that some bits should not be compared.

Importantly I do not believe it would be worth the effort/cost to introduce a quick universal mechanical test for representational identity -- the need for it is really not something that comes up all that often, and resources would be better spent on stuff that actually matters in practice.

xwu · October 1, 2025, 7:59pm

Yes, in my view, it is blocking because the question influences what semantic guarantees are made (or even possible to make) and, in turn, the naming discussion. As @lorentey points out, it also touches on philosophical questions as to notions of equality for value types. Finally, it engages policy questions because deployment of an inlinable implementation that does not use representation equality would constrain future evolution of the concrete type. We here assume responsibility for questions about future APIs: this is not some afterthought delegated to "our library maintainers."

Perhaps we are thinking about different bits. Whether a string is known ASCII or known NFC is already exposed as public API (e.g.: str.utf8Span.isKnownASCII).

But in any case, if it is possible that other bits currently entirely private might be exposed as public API any time sooner than never, then either the implementation of isIdentical(to:) must never be inlinable (wouldn't be "lightning fast" then) or it must already consider these bits. I'm not sure I see any other way of squaring this while adhering to your formulation of what it means to be identical--what am I missing?

lorentey · October 1, 2025, 8:06pm

Great! I skipped over that bit. I believe this entirely settles the matter then -- those bits will need to be compared.

(That's probably a minor regret, but it's really not that important.)

Hm, in fact I do expect String.isIdentical(to:) to ship as a strictly non-inlinable entry point. Dynamic dispatch is cheap.

String and its views aren't generic types, and so it is very much desirable to keep implementations like this opaque. Allowing the implementation to get inlined would not give us any real benefit! The String representation includes reserved bits that we may well decide to start using in the future; we do not know what their semantics might be and we should not put arbitrary constraints on that.

xwu · October 1, 2025, 8:17pm

Right, but the larger point I'd like to make is that whatever specific bits we've exposed now are ultimately immaterial. Rather, my claim is that your formulation where identicality means "there must be no way to distinguish between one instance and a copy of the other through their public API surface" runs up against Hyrum's law:

With enough users and enough time, there probably isn't anything we can call "entirely private" behavior—not unless we intend to actively exercise that privilege by haphazardly and without warning changing the undocumented behavior (for example, on a per execution basis as was done for seeding Hasher). For example, I'll bet that there's somebody out there observing what you think are entirely private bits of String in a load-bearing way. (And with enough Evolution participants and enough time, there probably isn't any internal state that won't become observable through some proposed public API...)

Oh! Now that is interesting and a major detail in my view. And another way in which this would contrast with Span.isIdentical(to:)—I'd expect we'd agree that it'd be a nonstarter to make that API opaque.

Do you think Array and its ilk should also have opaque implementations? And should then the design for these APIs boil down to, essentially, either representational equality or opaque?

vanvoorden · October 1, 2025, 8:30pm

The current "stub" implementation diff is aeic based on this feedback from @Alejandro:

Because @backDeployed commits this as the stdlib's ABI vs. @_aEIC which does not. If we find we need to replace this in the future with some more generalized thing or such, we pay the price of having to maintain this forever instead of just being able to update the definition. @_aEIC is the best attribute in my opinion because it is the "pay for what you use" attribute both for the stdlib and the client. The stdlib doesn't have to take the code size hit (unless it started using it in its own opaque implementation) or the ABI hit, and clients don't pay for anything unless they use it themselves or use something that uses it.

lorentey · October 1, 2025, 8:51pm

Hyrum's law can be used to defeat any change to any part of any public API surface. The proof of the pudding is whether a particular change actually breaks any actual code, and how much. This is a call engineers need to routinely make as they work on real life performance and correctness issues; validating these calls is a routine part of the process of shipping updates. This is the real world; we do not deal with absolutes.

If someone deliberately breaks encapsulation, then all bets are off. The thing is, people really do not tend to do that unless they are forced to. One way they can find themselves being forced into that corner is by types not providing a full set of necessary operations, such as the isIdentical(to:) methods we're supposedly trying to discuss here.