[Pitch #2] Add `isKnownIdentical` Method for Quick Comparisons to `Equatable`

It's noticeable that every code example calling isKnownIdentical given here is followed by ?? false (assuming I didn't miss one that does otherwise). Are there any cases where the caller would gain anything from the distinction between false and nil? Is it just a pointless leaky abstraction?

This analogy came to mind: asking "do we know already the thing is 1 meter long?" and being told "I have no way to know without measuring" (analogous to nil) compared to "no, I measured it approximately before but not precisely enough to say" (analogous to false). To both answers, I'd say "dude, just yes or no would have been fine :roll_eyes:".

Why isn't yes or no fine for the question isKnownIdentical?

Unless there's some super-subtle reason a caller would care to distinguish false and nil results, then how about just:

public protocol Equatable {
  func isKnownIdentical(to other: Self) -> Bool
}
extension Equatable {
  func isKnownIdentical(to other: Self) -> Bool { false }
}
3 Likes

I think it's a totally fair point that putting this optional value on all Equatable types could be considered a form of "static ambiguity" if the alternative was to enforce it through an independent protocol like Distinguishable. But that ambiguity might even have some desirable side effects down the road…

Some observations here:

[Insert disclaimer here about avoiding strong opinions about how specific concrete types should choose to implement isKnownIdentical.]

Conforming a container to a potential Distinguishable protocol would probably not require that the elements of that container conform to Distinguishable. As mentioned in my previous comment a check for isKnownIdentical needs to return in constant time. A container of N elements should not perform N different operations before returning from isKnownIdentical. The isKnownIdentical should probably be implemented as a "shallow" operation on the root of the container itself… not a "deep" operation that needs information from every node in its tree.

Because we are talking about a new member either on Equatable directly or as a refinement on Equatable… we are indirectly implying that the elements of a container such as Array must be Equatable because Array is only Equatable if its elements are Equatable. But that's more an artifact of our decision to go through Equatable for an operation on identity equality. It's not because an operation to determine identity equality on an Array should directly in any way need to check the elements for value equality.

If Distinguishable was an independent protocol like it was in the previous pitch… a type like Array could conform to Distinguishable even if the elements in that Array were not Equatable.

If Distinguishable was an independent protocol that would then get us into the situation documented in the pitch we wanted to avoid:

func f2<S>(sequence: S) async throws
where S: AsyncSequence, S.Element: Equatable {
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement == element { continue }
    oldElement = element
    doLinearOperation(with: element)
  }
}

func f4<S>(sequence: S) async throws
where S: AsyncSequence, S.Element: Distinguishable {
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement?.isKnownIdentical(to: element) ?? false { continue }
    oldElement = element
    doLinearOperation(with: element)
  }
}

func f5<S>(sequence: S) async throws
where S: AsyncSequence, S.Element: Distinguishable, S.Element: Equatable {
  var oldElement: S.Element?
  for try await element in sequence {
    if oldElement?.isKnownIdentical(to: element) ?? false { continue }
    oldElement = element
    doLinearOperation(with: element)
  }
}

What would this look like if Distinguishable refined Equatable? What would a generic algorithm look like if we want to also operate on types that are Equatable and not Distinguishable? How many overloads would product engineers need for a generic algorithm to handle both those cases: assuming we do not want to leverage something like a dynamic runtime resolution.

There's another side effect to consider WRT creating a new protocol in standard library: ABI and source stability. My understanding is if we add a new protocol in standard library we can immediately adopt that protocol on standard library types… but if at some point in the future we want to then adopt that protocol on another existing type from standard library that is a breaking change. I'm not completely sure how that rule would be affected by Distinguishable being a refinement protocol… but if it would lead to the same results that can also limit our engineers in the standard library. If we ship this Distinguishable protocol in 6.3 and in 6.4 standard library engineers take some existing type and "turn it into a COW" those engineers AFAIK would be blocked on adopting Distinguishable on that type.

First of all… please correct me if I'm wrong about my understanding of ABI and source stability when adding new refinement protocols. I would also say that ABI and source stability on its own is not a very compelling reason not to ship a new protocol. But ABI and source stability could be one more data point against a new protocol together with reducing the amount of generic overloads that product engineers would need for their generic algorithms.

Would there be anything blocking a product engineer from adopting these semantics themselves with an extension on Equatable in their own repo?

//  EquatableUtils.swift

extension Equatable {
  func isKnownIdentical(to other: Self) -> Bool {
    self.isKnownIdentical(to: other) ?? false
  }
}

I wanted to add some prior art here; recently as we were adding isIsolatingCurrentContext to Executor which serves a similar purpose, we also opted to go with Bool? in order to be able to determine the "we don't know" vs "definitely no". It is an useful piece of information to have and I do think it fits this API design proposed here as well.

4 Likes

There's also this public-but-underscored example on Sequence:

Here is the default implementation that returns nil:

And here's where we use that nil to perform some conditional logic:

Looks like desirable feature to me: I'd like to check if two arrays are "known to be identical" regardless of whether array elements are Equatable / "known to be identical" or not.

I don't like that I'd need to implement something like this manually... Would be quite a burden to use. memcmp works out of the box (aside from padding bytes issue), and if to address padding – Swift knows about padding, so something like "memcmp_padding_aware" is not totally unimaginable, which would work out of the box without manual implementation. I'd wish this feature be is as simple as comparing reference types with === that works out of the box.

1 Like

The flexibility of an independent Distinguishable protocol was discussed in our pitch. We prefer Equatable to reduce the amount of generic overload specializations that product engineers would need to declare if Distinguishable was independent.

That's not to say Distinguishable could never ship as an independent protocol… but our POV for right now in this pitch is we propose an API on Equatable.

It's not that I think memcmp is a bad default. I can even agree that memcmp might be a good default for some types.

As of today… nothing I have read or learned across three different discussion-pitch threads has persuaded me that memcmp is the best default.

Remember our two most important goals for isKnownIdentical:

  • If isKnownIdentical returns true these instances must be equal by value.
  • The isKnownIdentical must return in constant time and should perform its operation meaningfully faster than a check for value equality.

Returning nil as a default satisfies both these goals. Returning nil is never a "bug" in the sense that we would ever give a false positive (where we return true for identity equality and return false for value equality). Returning nil also returns in constant time and will always perform meaningfully faster than a check for value equality (unless a library maintainer for some reason decided their == operator is hardcoded to always return true or false).

It's possible that we could "prove" that a default operation on memcmp would never lead to a false positive result for isKnownIdentical… but that would still leave us dealing goal number two: a default operation on memcmp would not return meaningfully faster than a check for value equality for many types.

Another POV is: do we want memcmp to be an "opt out" default… or do we want memcmp to be an "opt in" override? The decision to implement isKnownIdentical on memcmp belongs with the library maintainer. That is the engineer that has the right context to make that decision. My opinion has not changed on this: the default value built in the infra to accompany the requirement on Equatable should return nil.

Just to kind of brainstorm around that: potential future directions could ease the friction for a library maintainer or product engineer that wants memcmp… but doesn't want to stress out about building a "boilerplatey" implementation by hand. A macro could potentially codegen that from one declaration added on the type. That macro wouldn't even need to ship in standard library: it could incubate as a community package.

1 Like

Firstly, isIsolatingCurrentContext: @ktoso says the nil result is 'in order to be able to determine the "we don't know" vs "definitely no"'. For isKnownIdentical however, the two suggested non-true results aren't like this, but rather "we don't know" and again "we don't know".

Secondly, _customContainsEquatableElement: when given this nil result the caller needs to do more work to determine a correct value. But for isKnownIdentical, the caller needs to do more, identical work when receiving either nil or false.

I'm arguing that isKnownIdentical as proposed doesn't have semantic tripartite result true, false, and nil but instead that it has the semantic result of true and not-true where the latter is expressed by either false or nil.

Well, not quite arguing it, I'm wanting to explore that possibility :^) I'm asking if there's any conceivable case where the caller would care false vs. nil because I can't think of any.

Yes, I'm thinking of the caller's point of view. The Equatable type knows the difference between those 2 not-true states, but we could imagine lots of functions where the callee has internal state it may feel like leaking to the caller even though it's not needed or ever helpful.

Somebody who gets what I'm talking about, could you come up with an example? The one I was going to include was the Character property var isWhitespace: Bool instead being a function returning .true, .falseBecauseNonWhitespaceASCII, .falseBecauseNonWhitespaceUnicode, .falseBecauseEmoji but that's not even a good example because the caller might indeed like to know both if whitespace and separately if ASCII or emoji making this conceivably useful, even though weird for a function named isWhitespace.

To callers of isKnownIdentical the nil vs false result is, in contrast, useless trivia AFAIK. Again, I'm asking to be proved wrong, to illuminate a reason the caller might find use in the proposed tripartite function result.

2 Likes

I think I understand, but I'm not 100% sure. Are you suggesting that if memcmp is frequently used to implement the isKnownIdentical check, we could just add a macro so that a type could just be decorated with something like @MemoryComparable to automatically generate an isKnownIdentical that just returns memcmp's result?
If that is what you are suggesting, what's the downside to just including that macro in the new feature rather than having it live in "a community package" instead?