How is synthesized Hashable implemented?

kelin · September 7, 2018, 6:48pm

What hash function it uses and how does it calculate hashValue for enum's with associated values and structs?

lorentey · September 7, 2018, 10:04pm

The hash function in Swift 4.2 is SipHash-1-3; however, this is an implementation detail that is subject to change in any release. (Note that hashing changed considerably between Swift 4.1 and 4.2. I'm going to concentrate on Swift 4.2 here. Swift 4.1 uses a less robust algorithm to generate hash values, but compiler synthesis works in a roughly similar way.)

The new Hasher API (introduced by SE-0206, implemented in 4.2) exposes the standard hash function as public API. It has been carefully designed to prevent exposing the underlying algorithm; for example, the seed value is not user-selectable, so e.g. you can’t use Hasher to produce repeatable-looking hashes that may break later in a new version of the standard library.

SE-0206 also adds a new requirement to Hashable that is better suited to universal hash functions such as SipHash. Automatically generated Hashable conformances now implement this new hash(into:) requirement. hashValue is still generated, of course, but the implementation simply forwards the actual job of hashing to hash(into:). (See below.)

Automatic synthesis of Hashable conformance is relatively easy for structs: the compiler generates a hash(into:) implementation that simply feeds all stored properties into the supplied hasher:

struct Dog {
  let name: String
  let breed: String
  let age: Int
}

extension Dog: Hashable {
  func hash(into hasher: inout Hasher) { // Generated automatically
    hasher.combine(name)
    hasher.combine(breed)
    hasher.combine(age)
  }
}

Note that the order in which properties are fed to the hasher is significant. Synthesized implementations currently follow declaration order, but this may change before the ABI stabilizes. (In case we need to do something special for resilient structs, for example.)

Hashable synthesis is a bit more complicated for enums, because hash(into:) needs to ensure that two different cases never feed the same data to the hasher. Fortunately, there is an easy trick for doing this: we simply feed the hasher with a distinguishing value that is unique to each case. For cases with associated values, we feed those to the hasher, too.

enum List<T> {
  case empty
  indirect case node(value: T, rest: List<T>)
}

extension List: Hashable where T: Hashable {
  func hash(into hasher: inout Hasher) { // Generated automatically
    switch self {
      case .empty:
        hasher.combine(0) // discriminator
      case let .node(value: value, rest: rest):
        hasher.combine(1) // discriminator
        hasher.combine(value)
        hasher.combine(rest)
    }
  }
}

The discriminator value is currently assigned sequentially to each case in the enum, in declaration order. However, it is not wise to write code that relies on this. (We may need to change it to some other scheme to e.g. better support extensible enums.)

As for the old hashValue requirement, the compiler always synthesizes it the same way, in terms of the hash(into:) definition:

extension Dog {
  var hashValue: Int {
    var hasher = Hasher()
    self.hash(into: &hasher)
    return hasher.finalize()
  }
}

(Actually, this is implemented by a function in the stdlib; the generated hashValue consists of a call to it. But it boils down to the same code.)

The code that implements automatic Hashable (and Equatable) synthesis lives in lib/Sema/DerivedConformanceEquatableHashable.cpp. Look there for the nitty gritty details, including how hash(into:) is generated for types that only implement hashValue.

kelin · September 9, 2018, 7:55am

Wow, that code is extremely hard to read, I need compiler-brain to understand it. Swift 4.1 uses the same SipHash-1-3 function, right?

lorentey · September 10, 2018, 4:19pm

Swift 4.1 used a less rigorous hash function "inspired by" MurmurHash:

github.com

apple/swift/blob/swift-4.1-branch/stdlib/public/core/Hashing.swift#L78-L87


      
          internal static func hash16Bytes(_ low: UInt64, _ high: UInt64) -> UInt64 {
            // Murmur-inspired hashing.
            let mul: UInt64 = 0x9ddfea08eb382d69
            var a: UInt64 = (low ^ high) &* mul
            a ^= (a >> 47)
            var b: UInt64 = (high ^ a) &* mul
            b ^= (b >> 47)
            b = b &* mul
            return b
          }

This function has good dispersion, but it isn't randomly seeded. (I seem to remember that it originally came from llvm, but I may be wrong.)

Synthesized hashing in 4.1 was based around the _mixInt function in the same file. _mixInt was designed to mix up the bits of an integer to defeat clustering -- it wasn't really intended for combining hash values.

Note that on platforms other than Apple's, Swift 4.1 did use randomly seeded SipHash-1-3 to hash Strings, as a special case.