Why does Hashable require Equatable?

Karl · November 7, 2022, 1:20am

I wouldn't characterise it as "backlash". Just disagreement.

As we frequently say around here, protocols are not just bags of syntax - they have semantic meaning. Hashable primarily exists to support user-defined types in hashed collections. It's the very first sentence of the documentation:

You can use any type that conforms to the Hashable protocol in a set or as a dictionary key.

Equatable is part of the essential semantics for such usage, that is why it is part of the type. It is not, and does not claim to be, the only way to produce digests of custom data. For instance, as I mentioned previously, it does not support custom hashing algorithms, so it can't be used for a high-quality bloom filter in the first place. You would need your own protocol for that, and you are more than welcome to create one.

It is not uncommon that consumers of protocols only need some part of what it offers. As an example, @taylorswift has just posted a thread where they wish for a more limited version of RangeReplaceableCollection which only supported appending (AppendableCollection). Ultimately, protocols are the units of useful abstraction for generic code, where "useful" has to be judged in the context of particular use-cases the authors intend to support.

(as it happens, RRC does have some design flaws in that regard; it is not a particularly useful unit of abstraction due to its lack of ability to maintain positional information across mutations, but that's a separate topic).

It would not be practical to make every function its own protocol, and even if we did so, it would not necessarily allow for more powerful generic abstractions - as more important than the function or type names are the semantics each protocol encapsulates. Hashable types support hashing using a single algorithm only, and require Equatable, as that it what is required to support Dictionary keys and Set elements. That is part of its semantics.

Jumhyn · November 7, 2022, 1:21am

I sort of touched on this up-thread, but let me try to be a little clearer: I’m not at all trying to assert that it is impossible to write useful algorithms on a Hashable type which never invoke their corresponding Equatable.== witness. What I am arguing is that when you (properly) implement Hashable there is always some underlying equivalence relation you’re relying on that satisfies the semantics of Equatable, and so it makes sense for the language to enforce this by requiring that Hashable: Equatable.

In the case of your FancySet implementation, you’ve cleverly sidestepped the need to call == heterogenously by using ===, but this usage is still relying on the underlying equivalence relation defined on ObjectIdentifier: you know that a === b iff ObjectIdentifier(a) == ObjectIdentifier(b), and that ObjectIdentifier satisfies the appropriate Equatable/Hashable implementation.

So the ‘issue’ you’re trying to solve by implementing == as fatalError() only arises because you don’t really want individual Node-conforming types to themselves be Hashable (or Equatable) at all. Rather, you only want that you can derive a Hashable value from them based on reference identity (namely, ObjectIdentifier). In fact, if a Node-conforming type tried to implement hashValue as doing anything but your proposed ObjectIdentifier-based implementation, it would potentially cause your FancySet to behave very sub-optimally (because such an implementation might admit many more hash collisions than expected).

Instead, what I think you want here looks something like this, which also makes the Equatable/Hashable connection clear:

protocol NodeEntry: AnyObject {
  var next: Node? { get set }
}

struct Node: Hashable {
  var entry: NodeEntry

  var hashValue: Int {
    ObjectIdentifier(entry).hashValue
  }
  static func ==(_ lhs: Self, _ rhs: Self) {
    lhs.entry === rhs.entry // or, equivalently, 'ObjectIdentifier(lhs.entry) == ObjectIdentifier(rhs.entry)'
  }
}

ETA:

I’m not trying to simply be reactionary against the idea that hashValue might occasionally be useful independent of == invocation. Instead, I’m attempting to tease out in the examples you’ve provided where the hidden equivalence relation you’re relying on his hiding, and how we can potentially improve the example by making the relation explicit via a ‘true’ conformance to Hashable. As a categorical matter, I think it’s particularly not useful if such examples end up having types’ Hashable conformances be highly coupled with their use in a particular data structure (as with FancySet).

jeremyabannister · November 7, 2022, 9:30am

When I read this I immediately thought of the single responsibility principle as a counterpoint, although I don’t feel myself expert enough on the topic to make any real assertions myself. What do you all think about the idea that, although it’s of course infeasible to achieve 100%, protocols should in fact never be found to have a “useful half”, because then why wouldn’t we just find a proper name for that half and make it it’s own protocol that the other one inherits from?

Avi · November 7, 2022, 10:09am

Is that not what happened here? Hashable requires Equatable.

Karl · November 7, 2022, 12:26pm

From the person who coined the term:

Another wording for the Single Responsibility Principle is:

Gather together the things that change for the same reasons. Separate those things that change for different reasons.

If you think about this you’ll realize that this is just another way to define cohesion and coupling. We want to increase the cohesion between things that change for the same reasons, and we want to decrease the coupling between those things that change for different reasons.

However, as you think about this principle, remember that the reasons for change are people. It is people who request changes. And you don’t want to confuse those people, or yourself, by mixing together the code that many different people care about for different reasons.

The Single Responsibility Principle

It relates to reasons for change; not the idea that every type should be reduced to its minimal useful interface. Indeed, the first part of the principle advises developers to gather related things together.

Even if there are hypothetical users who will only ever need some subset of a protocol's functionality, that is not, in itself, a violation of the SRP or necessarily an indication that the protocol's scope is too broad. Every protocol author decides which kinds of generic algorithms or data-structures their protocol should facilitate, and they all need to draw a line somewhere.

If somebody needs the ability to calculate a hash value without Equatable (or to define an AppendableCollection, or whatever else), in order to support some particular data types with particular generic algorithms/data structures, they are free to do so -- of course, they/their users will need to manually add conformances to existing types, which is annoying, but conceptually not relevant.

Perhaps one day we'll be able to lift that limitation to some extent (e.g. to say that all RRCs can conform to AppendableCollection via a synthesised conformance if required), which would help with the boilerplate when you have compatible/partially-compatible abstractions, but for now the language is unable to do that.

tera · November 7, 2022, 5:43pm

Don't you think there's a bit of a circular reasoning here. "It is this way because we need to draw a line somewhere and we decided to draw it here". The topic is questioning the location of this drawn line. E.g. the documentation could have this instead:

You can use any type that conforms to both Equatable and Hashable protocol in a set or as a dictionary key.

And the only practical inconvenience would be that we'd need to include both conformances in the majority of cases (or we'd just create a combo EquatableAndHashable protocol / type alias and use that instead).

OTOH with the currently drawn line, Equatable self requirement "puts all hashable types squarely in the homogeneous statically dispatched world" (using the wwdc parlance). Whilst without Equatable requirement Hashable types would be more flexible.

Consider this example: a file on a magnetic tape. It is easy to calculate file's hashValue in one pass (as tape rotates we accumulate and combine bytes). Yet it would be a nightmare (from a hardware perspective) to compare two files of the same tape, e.g. one closer to the beginning of the tape and one closer to its end - assuming file doesn't fit into memory there'll be a constant tape fast forward, read a chunk / rewind, read a chunk, compare the chunk, repeat; the author of the corresponding value type would rightfully not want providing a meaningful EQ operation for such a media.

Please clarify what do you mean here, as we can override var hashValue or func hash(into:) - in the latter case I can do something along these lines:

struct S: Hashable {
    var a, b, c: Int

    func hashWithSalt(_ salt: Int) -> Int {
        let hasher = Hasher()
        hasher.combine(a)
        hasher.combine(b)
        hasher.combine(c)
        hasher.combine(salt)
        return hasher.finalize()
    }
}

The biggest issue IRT bloom filter (which typically needs same hashes across app launches and devices) would be built-in hash randomisation, but I guess one can use a custom (albeit now deprecated) "hashValue" implementation to circumvent that.

Jumhyn · November 7, 2022, 6:52pm

Additional flexibility isn't an inherent good in language design, particularly when it comes at the cost of semantic clarity. We could make Equatable itself more 'flexible' by having the signature of == be (Any, Any) -> Bool or requiring a func equals(_ other: Any) -> Bool instance method a la Java, but this would IMO make the purpose of Equatable less clear and potentially make the implementations more boilerplate-y and error prone since users would have to handle the type juggling themselves. (Indeed, Swift's entire static type provides better semantic clarity at the expense of restricting flexibility.)

Further, I haven't been convinced that Swift isn't already flexible enough to express the examples you've raised by slightly tweaking the approach (without resorting to a fake == implementation). And to the extent that there are (today) some fundamental limitations on what you can do with Equatable (e.g., heterogeneous comparison) I'm not convinced that relaxing the relationship between Equatable and Hashable is the 'right' answer. Hashable fundamentally relies on the definition of an equivalence relation, and the language is, IMO, better off for having made this semantic requirement very explicit in the type system.

If it's never a desirable operation to compare these files directly for equality, what advantages does a HashableButNotEquatable conformance give you here over working with the precomputed hashValue as an (Equatable, Hashable) Int directly?

taylorswift · November 7, 2022, 7:08pm

i disagree with this, we should never be encouraging people to retroactively conform out-of-module types to out-of-module protocols (with at least one requirement). that just opens the door for conflicting conformances, duplicate conformances, etc. and it is a documentation nightmare. because it really is hard to understand/remember that ArraySlice is an AppendableCollection, but only when FooModule is imported in the current file.

so as much as i wish the standard library had AppendableCollection or something like that, i don’t vend it as a library protocol because i think that would create more problems than it would solve.

scanon · November 7, 2022, 7:23pm

I feel like this discussion isn't really going anywhere useful.

There's a short and simple answer to the question as asked: almost every API in the standard library constrained by Hashable also requires Equatable, so if Hashable didn't refine Equatable, then every one of those API would be generic over Hashable & Equatable, and that would be unnecessarily verbose. Read Hashable as Hashable (and Equatable), because that's what it means, for pragmatic reasons.

If you want to write your own API constrained only to Hashable without Equatable, define a JustHashable protocol and use that. We don't have names for every possible point in the conformance hierarchy; we have AdditiveArithmetic but not Semigroup or Magma or ... We try to pick the most useful set for defining the API that the standard library provides, and to give them reasonable names. Developers are encouraged to define their own protocols when they have to express different requirements.

taylorswift · November 7, 2022, 10:18pm

not sure how much this ties into what was previously discussed in this thread, but i did just realize that Equatable can be a pretty challenging concept when it gets mixed with canonical equivalence. i am writing a library that vends an opaque Decimal128 type that looks like:

extension BSON
{
    /// An opaque IEEE 754-2008 decimal.
    @frozen public
    struct Decimal128:Hashable, Equatable, Sendable
    {
        /// The low 64 bits of this decimal value.
        public 
        var low:UInt64
        /// The high 64 bits of this decimal value.
        public 
        var high:UInt64
    }
}

and it needs to be Hashable and Equatable to get along with the rest of the library, but it really shouldn’t be, because it is using bitwise equivalence instead of “proper” decimal equivalence.

i’ve got a similar issue with an opaque Millisecond type, which is also Equatable but really shouldn’t be.

extension BSON
{
    /// A number of UTC milliseconds since the Unix epoch.
    ///
    /// This library does not have calender-aware facilities. Therefore, UTC milliseconds
    /// are not ``Comparable``, at least when importing this module alone.
    ///
    /// Take caution when using this type’s ``Hashable`` and ``Equatable`` conformances.
    /// Two equivalent `Millisecond` values do not necessarily reference the same
    /// instant in time.
    @frozen public
    struct Millisecond:Hashable, Sendable
    {
        public
        let value:Int64
    }
}

i think the problem with Equatable is it presumes one universal notion of equality per type, but “equality” can mean a lot of different things depending on how you’re using it.

scanon · November 7, 2022, 11:06pm

This feels simply like a bug, like a rational type using bitwise equivalence instead of first reducing.

Types can have as many notions of equality as they want, but only one of them gets to be spelled == and participate in the Equatable conformance.

taylorswift · November 7, 2022, 11:22pm

from what i can tell, this is also what mongo-swift-driver does, (although there could always be a custom ==(_:_:) witness in another file i wasn’t able to find through github search).

implementing the decimal128 standard is way out-of-scope for a BSON library.

scanon · November 7, 2022, 11:27pm

Equality is pretty easy to implement; in particular easier than most other operations, including conversions to other formats--if you don't have equality, what can you do with the type?

taylorswift · November 7, 2022, 11:31pm

right, that’s why i went with bitwise equality, since that is better than no equatability at all.

if you don’t mind me mining your numerics knowledge a bit, how does one implement decimal128 equality? there is a dearth of existing libraries, and the wikipedia article is pretty dense.

scanon · November 7, 2022, 11:56pm

If you're not doing any computational operations at all, this isn't absurd, since you're effectively treating it more like a string than a number; maybe you do want 1.0e2 and 100.0 to be different. But as soon as you start doing arithmetic, this doesn't make a lot of sense.

look at the combination field (bits 122:126). If either is b11111, you have a NaN, and so they are unequal.
if either is b11110, it's an infinity, so the other must match and have the same signbit (127) for them to be equal.
extract the exponent and significand. The b11xxy form is always non-canonical for Decimal128 (so it represents zero), so you only need to worry about decoding bxxyyy with xx != 11: concatenate xx with the exponent continuation field (bits 110:121) and yyy with the significand continuation field (bits 0:109).
If the significand is bigger than 1ED09BEAD87C0378D8E63FFFFFFFF, it's zero.
If both are zero, they are equal.
If the signs do not match, they are unequal.
If the difference between the exponents is bigger than 33, they are unequal.
Multiply the smaller significand by 10^exponent difference. If this overflows, they are unequal.
If the significands are equal after rescaling, they are equal. Otherwise, unequal.

taylorswift · November 8, 2022, 12:12am

thank you so much!!

is this a 128-bit multiplication? (since the significand has 113 bits?) how do you do a 128-bit multiply in swift?

Karl · November 8, 2022, 2:07am

That is a limitation of the current implementation of the language runtime, not a conceptual problem of how we model the semantic characteristics of data.

For something like RRC, you could slice up that functionality in a billion different ways. Yes, there's appending - that could be one protocol. What about prepending? Should that be another protocol? What about insertions in the middle? What about removals, etc...? There are hypothetical data types which can support one or more of these operations, but not all of them, and hypothetical generic algorithms which only need one or more of them, but not all of them.

It is, IMO, not acceptable to say that all of these operations must be individual protocols, and that RRC should inherit from all of them, just because the language runtime is today unable to resolve conflicts from overlapping conformances. Solutions to that problem have been floated in the past ("private conformances"), and maybe one day something will be implemented - allowing you to define abstractions based on what you really need, while taking advantage of existing abstractions to provide broad support out-of-the-box.

In other words, if this is a significant problem for you, I'd recommend advocating for better language support for overlapping conformances, instead of advocating for swarms of microprotocols.

taylorswift · November 8, 2022, 8:37pm

i don’t think it’s actionable to design libraries around language features that do not yet exist, for the same reason we cannot design libraries around microprotocols that do not exist. so i do not vend AppendableCollection as part of the BSON module, i have just written its API in terms of RangeReplaceableCollection, despite its many shortcomings.

unfortunately it seems the only solution presently available to us is to specify things in terms of real standard library protocols that do exist, even if that involves “lying” about certain aspects of the conformance.