[Pitch] `Distinguishable` protocol for Quick Comparisons

Oh we had exhaustive arguments on every single aspect. The post I wrote already summarized what I remember discussing.

  1. Checking that two Span instances are addressing the same memory is a basic need that is not readily implementable outside the stdlib (unlike with UBP). (Checking whether one span is contained in another is a similarly important need, and that opened even more discussions.)
  2. Adding an === overload would be the obvious move, but it seemed unlikely it'd be a good idea to start defining ad hoc operator overloads in Swift (at least not on the level of the stdlib).
  3. Defining == to compare referential identities would be incredibly confusing/misleading, as people have come to expect that a container type (like Span wants to be) would implement its equality test by comparing contents.
  4. For a number of years we've already been using the isIdentical(to:) name for exactly this purpose, including in stdlib internals and in public API inside packages like swift-collections. It seems like as straightforward/boring/obvious a name as they come -- I don't remember anyone misunderstanding it to mean anything other than what it is.
  5. The one notable drawback (of both isIdentical(to:) and ===) is that it is quite verbose to test if two optionals contain identical spans, unless we also add extra helpers on Optional itself. Luckily, Optional did not support nonescapable wrapped types at the time Span was proposed; it does now, but this still seems a minor wrinkle -- we have some far more fundamental usability issues with nonescapable/noncopyable optionals.

These are just the concerns I remember discussing with Guillaume; while I've been quite shamelessly exploiting that we're working within shouting distance, he was also receiving useful feedback from others, including some folks much higher up in the pecking order than me.

Perhaps I've missed some earlier posts on this topic, but this is the first time I see mention of a mustBeEqual function; for what it's worth, I have a difficult time figuring out what it would do, based on just its name. To me, it reads like it states a requirement, which gives me the impression it might be for doing nonstandard precondition checks.

For what it's worth, I'm fully on board with the first half of this! (I'm willing to help push for it. Caveat: it isn't clear if that will help or hinder the desired outcome.)

I am also willing to work on shipping any missing isIdentical(to:) methods on types defined by swift-collections, provided the stdlib ends up adding the prerequisite dependencies. (E.g., OrderedSet.isIdentical would currently need to forward to Array.isIdentical.)

I'm not fully ready to sign up swift-collections to ship a protocol like Distinguishable though -- I need more convincing that (1) it would fit within the mandate of that package, and (2) that it would actually carry its weight.

I really think protocols are best started as implementation details in code that needs them. swift-collection itself has already gathered a small selection of "marker-like" protocols (_SortedCollection and _UniqueCollection) that help it implement generic fast paths, similar to the niche that Distinguishable might want to fill for equality checks. In principle, introducing protocols that express static guarantees about the shapes of collection types seems like a good fit for that package; but so far it didn't seem necessary to expose these helpers, and their particulars in fact make it safer for them to remain internal indefinitely -- making them public just on the off chance someone might find a use for them tends not to work well.

(Note: This line of objection would quickly melt away if some high-profile ABI stable codebase like SwiftUI had a good reason to want fast generic identity tests -- but in that case, I think extending Equatable is the far better option -- see below.)

Returning the abstract "identity" of a container as a value in its own right exposes information about implementation details that do not need to be exposed, and are not worth exposing.

I am fully on board with the proposition that people sometimes do have a legitimate need for a way to quickly test if two containers are entirely identical without going through the a full equality test. (I regularly need that myself.) This is merely asking for a narrow performance hook -- an entirely reasonable request, with very few opportunities for abuse or unforeseen consequences. We can easily satisfy this need by simply adding a single member function. (If it turns out we also need to allow doing this shortcuts in contexts that are generic over Equatable, then the most direct way to support that is to bolt a new requirement directly onto Equatable.)

There is no need to go overboard and invent the notion of an abstract identity, decoupled from the object it identifies. Abusing the Identifiable protocol would be even worse -- the purpose of that protocol is to allow reasonably efficient identifier-based lookups (such as while calculating the difference between two snapshots of a model state), by formalizing the idea of hashable user-level identifiers (such as UUIDs, serial numbers, or database keys), to allow their use as keys in Dictionary or Set. I think it would be a Bad Idea to encourage people to do that sort of thing with the "identity" of an Array value.

Identifiable allows (and encourages) conformer types to choose whatever hashable Identifier type they prefer. In general this makes it tricky to optimize code that uses it, by forcing the var identity getters and the == call to separately go through non-inlinable/non-optimizable dynamic dispatch through some witness tables. This is actively harming the "quick shortcut to ==" use case that the pitch is set out to solve. (It trades one opaque isIdentical(to:) function call that returns a simple Bool for three separate calls that return and take some opaque identifier type. That is not a good trade to make if we're trying to speed things up.)

Array is expressly designed not to expose an identity for itself -- it prefers to pretend it is a simple value type. By only adding a narrow isIdentical(to:) function, we'd be chipping away at this illusion, but we wouldn't completely shattering it.

Exposing a fully formed identity raises too many questions for comfort:

  • Will people expect that identifier to change whenever the container is mutated? What if some container types did that, and people started assuming that it is a general requirement?
  • Will people expect that the identifier only changes when the container is mutated in a way that makes it reallocate its storage? (Again, what if some containers did that, and people start assuming that is a general requirement?)
  • Will people expect that identifiers never get reused, even if a container gets destroyed? (That has been a constant issue with ObjectIdentifier, despite the docs.)

To properly design a container identifier, we'd need to think about such questions, and some poor soul will likely have to write documentation on the recommended design patterns for such types. I believe this would be completely unnecessary work. We don't need to go overcomplicate this like that -- the actual problem doesn't need it.

Assuming there is a legitimate, real need to run isIdentical(to:) in generic contexts, then I do suspect that the most efficient and most practical way to support that is to add opt-in identity tests directly as a new requirement to Equatable. (This opinion is not strong, because I'm not fully convinced that the premise is true yet.)

I feel that proposed solution of defining algorithms that explicitly require Distinguishable conformances unnecessarily pushes an incidental implementation shortcut (whether the algorithm wants to implement a fast path for identical objects) into the public API surface.

The approach being pitched is this:

public protocol Distinguishable {
   func isIdentical(to other: Self) -> Bool 
}

func f3<S>(sequence: S) async throws 
where S: AsyncSequence, S.Element: Distinguishable 
{
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement.isIdentical(to: element) { continue } // Quick shortcut
    oldElement = element
    doLinearOperation(with: element)
  }
}

Note that Distinguishable is part of the algorithm's requirements; we cannot call this function on anything that doesn't conform to this niche protocol. To support calling it on classic Equatable elements, we'd need to add an overload that falls back to that, and a third overload if we also want to support types that conform to both. I don't think there is a reasonable way to merge the Equatable and Distinguishable variants into a single function; the only way I can see to do that is by doing runtime as? downcasts, and doing that would generally be costly enough not to be worth the effort of implementing the shortcut in the first place.

As a library author, I would not love if I was forced to add a bunch of overloads to my public API only to add a trivial fast path over ==. And as an author of copy-on-write Equatable types, I already routinely implement == to start with a trivial identity test; pasting that code into a new method does not require much effort from me. As a user of Swift libraries, I prefer not to have to wade through a myriad overloads of the same basic function to find the variant I'm looking for.

It would be far less mentally taxing on me (in all three hats) if we could simply go with a new requirement on Equatable:

public protocol Equatable {
   static func ==(left: Self, right: Self) -> Bool
    func isKnownIdentical(to other: Self) -> Bool
}
extension Equatable {
   func isKnownIdentical(to other: Self) -> Bool { false }
}

func f3<S>(sequence: S) async throws
where S: AsyncSequence, S.Element: Equatable
{
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement.isKnownIdentical(to: element) { continue }
    oldElement = element
    doLinearOperation(with: element)
  }
}

There are two drawbacks I can see: (1) adding a new requirement to a commonly conformed protocol Equatable would ever so slightly increase the overall code size of Swift binaries with Equatable conformances (i.e., most binaries), and (probably much more importantly) (2) the new requirement would get added to the member namespace of every type that conforms to Equatable, including ones like Bool where its semantics don't really make sense.

The second issue may be somewhat mitigated by taking inspiration from Jordan, and making the new requirement return a tristate bool:

public protocol Equatable {
   static func ==(left: Self, right: Self) -> Bool

    // Returns if `self` can be quickly determined to be identical to `other`.
    // 
    // - A `nil` result indicates that the type does not implement a fast test for 
    //   this condition, and that it only provides the full `==` implementation.
    // - A `true` result indicates that the two values are definitely identical 
    //   (for example, they might share their hidden reference to the same 
    //   storage representation). By reflexivity, `==` is guaranteed to return 
    //   `true` in this case.
    // - A `false` result indicates that the two values aren't identical. Their
    //   contents may or may not still compare equal in this case.
    //
    // Complexity: O(1). 
    func isKnownIdentical(to other: Self) -> Bool?
}
extension Equatable {
   func isKnownIdentical(to other: Self) -> Bool? { nil }
}

Note how this differs from Jordan's design in the meaning of false. I intended this modification to resolve some fundamental objections to the tristate variant, but I haven't fully thought through what complications it introduces in their place. I think it's a good idea for a member requirement solve just one problem -- the idea of quickly testing "definitely not equal" is a separate, far more controversial/ambiguous task, and it should not be forced to get tackled by the same member.

I think having a distinct nil value as default does help by adding a little more sense to having this member on flat Equatable types like Bool or Int. The name isKnownIdentical does not quite match the behavior in this case, but it's pretty close. (isIdenticalIfAvailable would be quite bad, I think.)

The Standard Library includes ample precedent for adding optional protocol requirements in this nil-returning way. The result is usually not the most user friendly API, but it fits an established pattern, and it helps that (unlike withContiguousStorageIfAvailable and friends) this isn't a higher-order function:

func f3<S>(sequence: S) async throws
where S: AsyncSequence, S.Element: Equatable
{
  var oldElement: S.Element?
  for try await element in sequence {
    // Is this readable enough? Is it easy to figure out how to write it?
    if oldElement.isKnownIdentical(to: element) ?? false { continue }
    oldElement = element
    doLinearOperation(with: element)
  }
}
7 Likes