Set with 'custom Equatable and default Hashable Structs'

lorentey · October 5, 2018, 2:26pm

You're on the right track! Your last hash(into:) implementation is a great way to hash components whose order doesn't matter for equality.

extension UnorderedPair: Equatable where T: Equatable {
    static func == (lhs: UnorderedPair, rhs: UnorderedPair) -> Bool {
      return 
        (lhs.a == rhs.a && lhs.b == rhs.b) || 
        (lhs.a == rhs.b && lhs.b == rhs.a)
    }
}

extension UnorderedPair: Hashable where T: Hashable {    
    func hash(into hasher: inout Hasher) {
      hasher.combine(a.hashValue ^ b.hashValue)
    }
}

Creating and immediately discarding a Set works great, but as you've seen, it has allocation overhead.

The reason hashValue comes out first in your hashing benchmarks is that you aren't measuring the same amount of work: the hashValue case doesn't include the postprocessing Set and Dictionary has to do on the returned value to guarantee good dispersion. Set and Dictionary never uses the hashValue as is to find the correct bucket for elements -- they always need to run the value through a hasher to reduce the chance of performance regressions due to hash value clustering:

https://github.com/apple/swift/blob/swift-4.2-branch/stdlib/public/core/Dictionary.swift#L2560-L2566

As a general rule, conforming to Hashable by implementing hashValue is never faster than implementing hash(into:) and feeding the same value to it. (It is, however, often slower.)

A benchmark that would capture real-life hashing performance is to measure the time it takes to insert values of your type into a Set. In the case of UnorderedPair, the two implementations come out as exactly the same in this benchmark.

There is no reason whatsoever to ever implement hashValue in code targeting Swift 4.2 or above.