Combining hashes

lorentey · March 10, 2018, 1:06am

As a step towards a better hashing API, the Standard Library now uses a randomly seeded, high-quality, resilient hash function:

I'm working towards extending this to synthesized Hashable implementations; once that is done, all hashing will be based on the new hash function and we'll be able to remove _mixInt and _combineHashValues from the standard library.

The _Hasher type and the new _hash(into:) requirement in Hashable seem like good candidates for a proposal to make them public: (Note that API names aren't final; they merely reflect the current code's choices.)

/// Represents Swift's standard hash function.
/// The hash function maps an arbitrary number of bytes to 
/// a single fixed-size integer value (the hash), such that slight changes in
/// the input byte sequence will usually result in wildly different hashes.
///
/// The byte sequence to be hashed is fed to the function by a series
/// of `append` calls, followed by `finalize` to extract the hash value.
///
/// A hash function initialized with the same key will always map the 
/// same byte sequence to the same hash value.
struct Hasher {
  // Initialize a new hasher using a per-execution random key
  // seeded at process startup.
  init()
  // Initialize a new hasher using the specified key.
  init(key: (UInt64, UInt64))

  // Feeds bits from `value` into the hasher.
  mutating func append<H: Hashable>(_ value: H)

  // Feed a specific sequence of bits to the hasher
  mutating func append(bits: Int)
  mutating func append(bits: UInt64)
  mutating func append(bits: UInt32)
  // etc.

  /// Finalize the hasher's state and extract a hash value.
  mutating func finalize() -> UInt64
}

protocol Hashable: Equatable {
  var hashValue: Int { get }
  func hash(into hasher: inout Hasher)
}

Here is how you'd implement Hashable in new code:

struct BraveNewWorld: Hashable {
  let a: Int
  let b: String

  func hash(into hasher: inout Hasher) {
    hasher.append(a)
    hasher.append(b)
  }
  // hashValue is automatically synthesized by the compiler
}

Of course, existing code that defines hashValue would keep working, too: (Plus, there may be legitimate cases where hashValue is the most suitable API even if hash(into:) becomes available.)

struct LegacyCode: Hashable {
  let a: Int
  let b: String

  var hashValue: Int {
    return a.hashValue ^ b.hashValue // Not a great combinator
  }

  // hash(into:) is automatically synthesized by the compiler
}

Naturally, the most sensible choice would be to let the compiler do all the work:

struct Sensible: Hashable {
  let a: Int
  let b: String

  // Both hashValue and hash(into:) are synthesized by the compiler
}

I feel like this would be a much nicer way to approach hashing. What do you think?