Combining hashes

I believe Hashable.hashValue is not a great API. It combines two distinct tasks that should be kept separate:

  1. Choosing a particular hash function
  2. Feeding it with data

When we manually implement Hashable, the job should just be about identifying bits and pieces of our data that must participate in hashing. We shouldn't need to think about how these bits are shredded into a fixed-width value, because it is a completely different concern, and even worse, it requires careful consideration and specialist knowledge. Almost nobody is able (or willing) to provide a good implementation of hashValue from scratch, because task (1) is hard.

SE-0185 is a partial solution to this problem. If we augmented it with some sort of facility (e.g., a @transient attribute) to omit specific fields from synthesized implementations, it would reduce the problem of implementing Hashable to task (2) above, leaving the choice of a suitable hash function to the standard library.

Exposing _combineHashValues as public API is one way to help people who still need to implement Hashable manually. However, it is not the best choice to expose it as is: it lacks support for initializing/managing/finalizing internal state, so it cannot implement an important class of hash functions. Additionally, there is the issue of discoverability; the hashValue API itself does not naturally guide developers to use helper functions like this. (Although this can probably be solved with documentation.)

Even worse, the concept of a "good hashValue" is not universal -- some applications require secure, randomized, complex hashes, while others may be satisfied with a less secure function that calculates faster. Baking the hash function directly into our hashValue implementation makes it harder to reuse our data types. Potentially, it also makes it harder to replace the hash function in case a bug (such as a security issue) comes to light later.

If I were to redesign hashing from scratch, I would assign the responsibility of selecting a particular hash function to the collections that rely on hashing, and to change Hashable to be only concerned with task (2) above. @Vincent_Esche's HashVisitable pitch from last year is one way to do this. I'm currently experimenting with a slightly modified approach that changes Hashable directly.

In any case, the Hasher concept from Vincent's pitch is a likely candidate interface for replacing _combineHashValues and _mixInt internally in the stdlib. If we want to expose the hash function used for synthesized hashing as public stdlib API, I think Hasher is the way to go!

1 Like