Why does Hashable require Equatable?

Jumhyn · November 5, 2022, 10:16pm

But the Element type of your BloomFilter here is not S or WrappedS—it's the type of the objects which you are add-ing and maybeHas-ing, i.e., String. Your S and WrappedS are still relying on the underlying guarantee that the 'same' value will always produce the same hash value. In the case of S this arises because we want that two S values containing the same String will always produce the same hasher state when they are hashed as part of WrappedS, and WrappedS further relies on this guarantee to ensure that given the 'same' S and the 'same' salt i, you'll produce the same index for bools every time.

That you've deliberately broken the implementation for == while also ensuring it doesn't get called isn't really material here. The implementation of BloomFilter here only works because of the semantic guarantees imposed on String (and transitively, S) by the combination of Equatable and Hashable.

To look at this another way, suppose that String exposed a public capacity property that gave the current capacity of the underlying buffer, and we implemented your example exactly as above, except that we implemented S.hash(into:) as:

    func hash(into hasher: inout Hasher) {
        hasher.combine(string)
        hasher.combine(string.capacity)
    }

Clearly, this gives us a broken BloomFilter implementation because we could potentially have:

var s = "Hello"
filter.add(s)
s.append("!")
s.removeLast()
filter.maybeHas(s) // false

But how can you explain what's broken here without invoking the notion of 'sameness'? And how would you describe what's broken in the following implementation of WrappedS.hash(into:) without invoking the same?

func hash(into hasher: inout Hasher) {
  hasher.combine(s)
  hasher.combine(salt)
  hasher.combine(Int.random())
}

My point, ultimately, is that you haven't really implemented Hashable without Equatable, because there are still very important properties that you want your Hashable implementation to have in order for BloomFilter to work properly. And you'll find that if you try to describe what those properties are, it's precisely a description of the relationship between Hashable and the correct implementation of Equatable for your types.

ETA: Yet another way to think about this is that the relationship between Equatable and Hashable is about more than just particular implementations of protocol requirements. It's about the fundamental idea that hashability requires you to invoke a notion of equality which agrees with your hash value in some way. Yes, it's possible to write algorithms which depend only on hashability and never directly invoke equality, but such algorithms still depend on the underlying equality notion in order to work properly.

tera · November 5, 2022, 10:40pm

String is of course irrelevant here. Here's a simplified version that has custom types only and has corrected salting / hashing implementation:

Simplified version

// Notion of "sameness" a.value == b.value
// Notion of EQ - absent / broken
struct Value: Hashable {
    var value: Int
    func hash(into hasher: inout Hasher) {
        hasher.combine(value)
    }
    static func == (a: Self, b: Self) -> Bool {
        fatalError("EQ is deliberatly froken here")
    }
    func hashWithSalt(_ salt: Int) -> Int {
        Value(value: value + salt).hashValue
    }
}

struct BloomFilter {
    var bools: [Bool]
    let k: Int
    
    init(size: Int, k: Int) {
        bools = [Bool](repeating: false, count: size)
        self.k = k
    }
    
    private func indexForString(_ value: Value, _ i: Int) -> Int {
        let v = value.hashWithSalt(i)
        return Int(UInt.init(bitPattern: v) % UInt(bools.count))
    }
    
    mutating func add(_ value: Value) {
        for i in 0 ..< k {
            bools[indexForString(value, i)] = true
        }
    }
    
    func maybeHas(_ value: Value) -> Bool { // if false - definitely doesn't have
        for i in 0 ..< k {
            if !bools[indexForString(value, i)] { return false }
        }
        // maybe has
        return true
    }
}

var filter = BloomFilter(size: 10000, k: 10)
filter.add(Value(value: 1))
filter.add(Value(value: 2))
filter.add(Value(value: 3))
filter.add(Value(value: 4))

print(filter.maybeHas(Value(value: 1))) // true
print(filter.maybeHas(Value(value: 2))) // true
print(filter.maybeHas(Value(value: 3))) // true
print(filter.maybeHas(Value(value: 4))) // true
print(filter.maybeHas(Value(value: 5))) // false
print(filter.maybeHas(Value(value: 6))) // false

Maybe we are not that far apart then? I can agree that hashability must agree with the "implied notion of sameness", what I am disagreeing with is that hashability's implied notion of sameness must be necessarily expressed as the matching Equatable (explicit or autosynthesized). I believe the above example shows that.

Jumhyn · November 5, 2022, 11:15pm

But this is precisely what Hashable: Equatable (along with the protocols' respective documentation) expresses: that to be Hashable, a type must have:

A notion of sameness (i.e., an equivalence relation)
A hash function
'Agreement' between the notion of sameness and the values produced by the hash function

You can of course write implementations which violate the semantic guarantees of Swift protocols and then write code against those implementations which carefully avoid the brokenness, but that doesn't imply such conformances are actually useful.

Suppose Swift did separate Hashable and Equatable. Because, as you agree, Hashable is useless without an implied notion of sameness, we'd likely want to have Hashable: HashableEquivalent (or else define static func hashableEquivalent(_: Self, _: Self) -> Bool on Hashable directly).

But would such a setup actually be desirable? If HashableEquivalent and Equatable implementations were permitted to diverge, then we could have a == b (Equatable) but a.hashValue != b.hashValue (or similarly, we could have filter.maybeContains(a) == false but largeExpensiveBackingArray.contains(a) == true). So, we'd probably want to require that if a type is both Equatable and Hashable, that the implementations of Self.== and Self.hashableEquivalent always agree.

But then, are there any Hashable types for which it would not make sense to conform to Equatable? I think not. Your BloomFilter implementation purports to be able to answer "is element x definitely not in the collection?" but what would answering that question mean without a highly suitable notion of 'same' for the purposes of an Equatable implementation?

tera · November 6, 2022, 12:25am

A bit hypothetical example but here you go:
Imagine a value type representing a file.
It has hashValue implemented and in order to make hashing quick - it used a specially calculated hashValue that is easily updated when individual blocks of file are changed (without the need to recalculate the hash value / re-read the whole file).
It could have had EQ implementation but that would be extremely slow and as excessive reads lead to disk degradation the implementer decided not providing it. Besides there might be no second file to compare with (see use case examples below) so it is not possible to call it at all even if it wasn't slow.
Value of this types are used with a bloom filter to provide an answer of the form: a given file might be in a file set with high probability / and a given file is definitely not in a file set.
And finally let's assume for a second it can be useful for something (e.g. to answer a question like "your computer might have a virus with high probability" or "it's highly likely this is a pirate movie", etc.

Q&A:

does this value has an implied "notion of sameness""? - yes, when the corresponding bytes of two files are the same.
does this value have an explicit or autosynthesized EQ?
a) no it does not (if it was allowed).
b) yes but we are providing some bogus implementation just to make compiler happy.
will this value work with the mentioned bloom filter to get the mentioned probabilistic answer - yes
will this value work as a Dictionary Key / Set or other similar collection with "normal" usage of Hashability - nope.

xwu · November 6, 2022, 1:45am

I’m confused what you are trying to illustrate with this example: you’re describing a type with a broken Hashable implementation and a broken Equatable implementation?

jeremyabannister · November 6, 2022, 4:39am

I think tera is arguing that there are valid situations in which one would want the option to implement Hashable but not Equatable, and that therefore, even though it obviously would never change, technically it is a flaw/not correct/etc that Hashable inherits from Equatable. I feel not much clear opinion about this topic myself, I'm just clarifying what I think tera's point is.

The thought does come to my mind that, if conceptually/philosophically speaking Hashable: Equatable is a valid inheritance relationship, but in some obscure contexts it's implementationally not ideal, isn't that exactly what fatalError() is for? To allow us to skip writing implementations if we don't need to run them?

xwu · November 6, 2022, 1:10pm

But @tera’s example doesn’t show a valid Hashable implementation, by their own description, so how does it support that argument?

jeremyabannister · November 6, 2022, 2:30pm

Here's a protocol that just occurred to me that I thought perhaps could serve as a useful thought experiment for the purposes of the debate:

protocol OnlyConceptuallyEquatable: Equatable {}
extension OnlyConceptuallyEquatable {
    static func == (lhs: Self, rhs: Self) -> Bool {
        fatalError()
    }
}

It represents a value that is conceptually equatable but infeasible to implement as such for whatever reason, like in your hypothetical example:

Jumhyn · November 6, 2022, 3:14pm

I’m a bit confused too—setting aside implementation concerns for this hypothetical data structure, when you get to the point of inserting your ‘file hash’ object into the bloom filter, the only data you’re hashing with each of the k functions is the precomputed hash value. So what you’d really be testing with that bloom filter is “is the file’s hash value definitely not in the set?” This isn’t an invalid operation given the constraints you’ve mentioned, but I don’t think it makes sense to think about that type as a FileContents type that is hashable and not equatable; rather, it’s a FileContentsHash which is both equatable and itself hashable.

Or, if instead your FileContentsHash type somehow tracks each of the k hash functions independently, then you have a type which is much more deeply tied to the implementation of the bloom filter and not really suitable for a general Hashable conformance anyway.

xwu · November 6, 2022, 4:07pm

…such as?

jeremyabannister · November 6, 2022, 4:31pm

I really don't know, it's not my horse in this race. I just thought the OnlyConceptuallyEquatable concept was fun to think about in a similar way to @Jumhyn 's distinction between the FileContents type and the FileContentsHash type.

Karl · November 6, 2022, 4:35pm

I think there is a difference in perspective here - between a type that conforms to Hashable, and a type that consumes Hashable values.

For a type that conforms to Hashable, Equatable is also vital, because you don't know how somebody is going to use those hashes. If they use your type as Dictionary keys, or as elements of a Set, you MUST define ==. Hash values may collide, and there must be a way to resolve whether those colliding hash values come from equivalent values.

For a type that consumes Hashable values, it knows exactly what its requirements are, and it might not call a particular method/operator. From its perspective, some protocol requirements may not be strictly necessary for its task - a bloom filter does not need to resolve colliding hash values.

That said, I think this is a largely hypothetical concern. The only time you would be able to implement Hashable but omit Equatable (or fatalError it) is when you can guarantee that your type will only be consumed by types that do not care about resolving hash collisions. I don't think this is what Hashable is designed to model - that's a different protocol.

By the way - I don't think you'd actually be able to implement a high-quality bloom filter just with Hashable, since the standard library's Hasher is a struct, not a protocol. You couldn't visit the same data members with multiple hash functions, which IIUC is what bloom filters typically do. Hashable isn't as flexible as Codable, for instance.

taylorswift · November 6, 2022, 5:19pm

i think a lot of times when i conform to Equatable i don’t actually mean equatable, i just mean “uniquely-identified” in some context. like:

struct Article:Identifiable
{
    let id:String
    let content:String
}
extension Article:Hashable
{
    static
    func == (lhs:Self, rhs:Self) -> Bool
    {
        lhs.id == rhs.id
    }
    func hash(into hasher:inout Hasher)
    {
        self.id.hash(into: &hasher)
    }
}

and then use it like:

let articles:[Article: Comments]

but i don’t really want Article to be Equatable like that, because ignoring the content is only valid in one of the contexts where i am using it.

Jumhyn · November 6, 2022, 5:48pm

As you note, this feels a bit like an abuse of Equatable/Hashable—is there a reason your dictionary can’t look up the comments by Article.ID instead of Article?

taylorswift · November 6, 2022, 11:28pm

because then you give up the ability to iterate through (Article, Comments) at the same time:

let articles:[Article]
let comments:[Article.ID: Comments]

for article in articles
{
    guard let comments:Comments = comments[article.id]
    else
    {
        // ???
    }
}

not only do you lose the guarantee that an Article always comes with a Comments, it also involves two hash table lookups instead of one.

or:

let articles:[Article.ID: (Comments, Article.EverythingExceptID)]

for (id, (comments, _body) in articles
{
    let article:Article = .init(id: id, _body: _body)
}

but there is the additional work of factoring Article into Article.ID and Article.EverythingExceptID.

Jumhyn · November 6, 2022, 11:35pm

I wouldn’t really expect this refactoring—rather, the values stored in the dictionary would just be (Article, Comments) (where Article continues to include the id property).

taylorswift · November 6, 2022, 11:42pm

it’s not memory-efficient since ID is stored twice in each collection element. sometimes this doesn’t matter, because Article doesn’t occur very frequently and it comes with much larger remotely-allocated data. but sometimes it matters a lot if the remotely-allocated stuff is shared, and Article is serving in more of a metadata descriptor-like role where there are a lot more instances of Article than its underlying storage, and the ID makes up most of its inline footprint.

Jumhyn · November 7, 2022, 12:04am

Well, in many such cases it seems like ID could be shared as well to avoid the duplication, but yeah, I suppose under those narrow constraints it would potentially make sense to pull Article apart into Article.ID and Article.Metadata or whatever. IMO that’s still preferable to ‘lying’ about your Hashable/Equatable conformance.

tera · November 7, 2022, 12:10am

That is exactly what I meant, thank you @jeremyabannister to put it so clear.

To be honest I'm quite surprised to see such a backlash here to a mere and benign idea that there are situations when Hashable is useful without the matching Equatable implementation. One of the frequent responses is "what would you do with this hash if you don't compare elements afterwards". To answer that I'm showing some examples, which, I admit, might be not ideal as I quickly cooked something to demo the concept illustrating my point.

Please note, that I am not suggesting that people here would explain "hurrah, now we see!" and then change Hashable to not include Equatable in the standards library – obviously the change won't happen (and I am fine with this status quo).

Yet another example illustrating the point:

FancySet

<<<<<<< Hashable
typealias TheHashable = Hashable
=======
typealias TheHashable = MyHashable
protocol MyHashable {
    var hashValue: Int { get }
}
>>>>>>> MyHashable

protocol Node: AnyObject, TheHashable {
    var next: NodeType? { get set }
}

<<<<<<< Hashable
typealias NodeType = any Node
=======
typealias NodeType = Node
>>>>>>> MyHashable

class A: Node {
<<<<<<< Hashable
    static func == (a: A, b: A) -> Bool {
        fatalError()
    }
=======
>>>>>>> MyHashable
    var hashValue: Int {
        ObjectIdentifier(self).hashValue
    }
    var next: NodeType?
}

class B: Node {
<<<<<<< Hashable
    static func == (a: B, b: B) -> Bool {
        fatalError()
    }
=======
>>>>>>> MyHashable
    var hashValue: Int {
        ObjectIdentifier(self).hashValue
    }
    var next: NodeType?
}

struct FancySet {
    var root0000: NodeType?
    var root0001: NodeType?
    var root0002: NodeType?
    // ...
    var root1023: NodeType?

    subscript(_ index: Int) -> NodeType? {
        get {
            switch index {
            case 0: return root0000
            case 1: return root0001
            case 2: return root0002
            // ...
            case 1023: return root1023
            default: fatalError()
            }
        }
        set {
            switch index {
            case 0: root0000 = newValue
            case 1: root0001 = newValue
            case 2: root0002 = newValue
            // ...
            case 1023: root1023 = newValue
            default: fatalError()
            }
        }
    }
    
    func index(of node: NodeType) -> Int {
        Int(UInt(bitPattern: node.hashValue) % 1024)
    }

    mutating func insert(_ node: NodeType) {
        node.next = self[index(of: node)]
        self[index(of: node)] = node
    }
    
    mutating func removeNode(_ node: NodeType) {
        // TODO: todo
    }
    
    func contains(_ node: NodeType) -> Bool {
        var current = self[index(of: node)]
        while current != nil {
            if node === current {
                return true
            }
            current = current?.next
        }
        return false
    }
}

func test() {
    var fs = FancySet()
    let a = A()
    let b = B()
    let c = B()
    fs.insert(a)
    fs.insert(b)
    print(fs.contains(a)) // true
    print(fs.contains(b)) // true
    print(fs.contains(c)) // false
}

Note that it contains "two-in-one" implementations in a form of merge conflict markers: the upper version uses the current Hashable the lower uses MyHashable that doesn't require Equatable.

it is a fancy specialised Set implementation that has some unique features and quirks:
fixed memory structure that doesn't allocate memory (handy in some situations).
gives O(1) add/search performance for a small number of elements.
only supports reference typed elements.
requires elements having a gettable / settable field "next".
elements can be placed only on one list.
the FancySet is heterogeneous (can store elements of different types).

Note that this implementation doesn't need Equatable from its elements – Equatable would not help in this case (you can't compare apple's to oranges with ==). Instead it is using "===" to disambiguate elements with the same hash value, === operator is defined for all reference type elements and we can compare apple's and orange's with ===. Also note some other differences between Hashable/MyHashable implementations - as the current Hashable has "self" requirement (via Equatable) it "infects" Node type and all usages of Node as existential are limited (e.g. I have to use any Node for Hashable whilst just Node is ok for MyHashable.

From my pov this example shows that there are uses of Hashable operation that do not require matching Equatable operation.

But... I am not pushing my POV, just looking for the truth, and when I see red called white (and that's what I see) my brain protests - but obviously some of you may see this case totally different, and I admit it might well be my perspective which is skewed, so I would appreciate if you prove me wrong – in case I am wrong – so I can take my rose glasses off, or whatever it is which is skewing my view. This case or others. Cheers.

taylorswift · November 7, 2022, 12:26am

this is exactly what i have ended up doing every time i ran into this in the past. one thing that would really make this pain point go away is if we had “ID-collection” types like

struct IdentifierDictionary<Key, Value> where Key:Identifiable {}
struct IdentifierSet<Element> where Element:Identifiable {}

now that we have Identifiable in the standard library. then Key/Element would not themselves have to satisfy Equatable, only ID would need to.