Set with 'custom Equatable and default Hashable Structs'

While the Set-based implementation presented by @blangmuir is logically sound, elegant, and easy to comprehend, I recently found that this approach has considerable repercussions for performance.

We were using the following generic structure:

struct UnorderedPair <T> { let a, b: T }

Our initial implementation used Set to provide an order-independent hash value. For our use case, however, we noticed pretty severe performance issues down the line (millions of such values may be inserted into a Set or Dictionary). Profiling showed that the Set incurred a significant penalty for managing its underlying reference-counted storage.

Upon looking into it, we tried providing a custom implementation of hashValue using the ^ (bitwise XOR) operation instead.

SE-0206 states that custom implementations of hashValue are to be deprecated, but this baseline of performance is kept for reasons discussed below.

extension UnorderedPair: Hashable where T: Hashable {
    static func == (lhs: UnorderedPair, rhs: UnorderedPair) -> Bool { ... }
    var hashValue: Int { return a.hashValue ^ b.hashValue }
}

The measurement of these two implementations revealed remarkable performance differences.

These measurements were tested using the XCTestCase.measure(_:) API, using the Release configuration, with hashValue / hash(into:) marked @inlinable

For generating 1_000_000 hash values for a given UnorderedPair<String>, we got the following results:

Implementation Time Delta
hashValue / ^ 0.0535s ---
hash(into:) / Set 1.753s 3175%

For generating 1_000_000 hash values for a given UnorderedPair<Int>, we got the following results:

Implementation Time Delta
hashValue / ^ 0.0326s ---
hash(into:) / Set 1.764s 5311%

First, is using the ^ operation logically correct and safe in these cases? I have tested several million iterations to ensure that there isn't collision between hash values where there shouldn't be, and so far things check out. I would be very interested to know if there is danger here, though!

We noticed an interesting thing that could perhaps be elucidated by @lorentey or others regarding the interplay between hashValue and hash(into:). In order to get a further handle on things, we tried out the following implementation:

func hash(into hasher: inout Hasher) {
    hasher.combine(a.hashValue ^ b.hashValue)
}

For generating 1_000_000 hash values for a given UnorderedPair<String>, we got the following results:

Implementation Time Delta
hashValue / ^ 0.0535s ---
hash(into:) / ^ 0.107s 99%

For generating 1_000_000 hash values for a given UnorderedPair<Int>, we got the following results:

Implementation Time Delta
hashValue / ^ 0.0326s ---
hash(into:) / ^ 0.062s 91%

Are these implementations logically different? What could be contributing to the performance differences here?

Let me know if I am missing something obvious, as I am not expert in any of these domains. Feel free to point me to places for further research. I am currently looking into SE-0206 itself, as well as the linked thread on combining hashes.

1 Like