As a step towards a better hashing API, the Standard Library now uses a randomly seeded, high-quality, resilient hash function:
I'm working towards extending this to synthesized Hashable implementations; once that is done, all hashing will be based on the new hash function and we'll be able to remove _mixInt and _combineHashValues from the standard library.
The _Hasher type and the new _hash(into:) requirement in Hashable seem like good candidates for a proposal to make them public: (Note that API names aren't final; they merely reflect the current code's choices.)
/// Represents Swift's standard hash function.
/// The hash function maps an arbitrary number of bytes to
/// a single fixed-size integer value (the hash), such that slight changes in
/// the input byte sequence will usually result in wildly different hashes.
///
/// The byte sequence to be hashed is fed to the function by a series
/// of `append` calls, followed by `finalize` to extract the hash value.
///
/// A hash function initialized with the same key will always map the
/// same byte sequence to the same hash value.
struct Hasher {
// Initialize a new hasher using a per-execution random key
// seeded at process startup.
init()
// Initialize a new hasher using the specified key.
init(key: (UInt64, UInt64))
// Feeds bits from `value` into the hasher.
mutating func append<H: Hashable>(_ value: H)
// Feed a specific sequence of bits to the hasher
mutating func append(bits: Int)
mutating func append(bits: UInt64)
mutating func append(bits: UInt32)
// etc.
/// Finalize the hasher's state and extract a hash value.
mutating func finalize() -> UInt64
}
protocol Hashable: Equatable {
var hashValue: Int { get }
func hash(into hasher: inout Hasher)
}
Here is how you'd implement Hashable in new code:
struct BraveNewWorld: Hashable {
let a: Int
let b: String
func hash(into hasher: inout Hasher) {
hasher.append(a)
hasher.append(b)
}
// hashValue is automatically synthesized by the compiler
}
Of course, existing code that defines hashValue would keep working, too: (Plus, there may be legitimate cases where hashValue is the most suitable API even if hash(into:) becomes available.)
struct LegacyCode: Hashable {
let a: Int
let b: String
var hashValue: Int {
return a.hashValue ^ b.hashValue // Not a great combinator
}
// hash(into:) is automatically synthesized by the compiler
}
Naturally, the most sensible choice would be to let the compiler do all the work:
struct Sensible: Hashable {
let a: Int
let b: String
// Both hashValue and hash(into:) are synthesized by the compiler
}
I feel like this would be a much nicer way to approach hashing. What do you think?