I believe Hashable.hashValue
is not a great API. It combines two distinct tasks that should be kept separate:
- Choosing a particular hash function
- Feeding it with data
When we manually implement Hashable
, the job should just be about identifying bits and pieces of our data that must participate in hashing. We shouldn't need to think about how these bits are shredded into a fixed-width value, because it is a completely different concern, and even worse, it requires careful consideration and specialist knowledge. Almost nobody is able (or willing) to provide a good implementation of hashValue
from scratch, because task (1) is hard.
SE-0185 is a partial solution to this problem. If we augmented it with some sort of facility (e.g., a @transient
attribute) to omit specific fields from synthesized implementations, it would reduce the problem of implementing Hashable
to task (2) above, leaving the choice of a suitable hash function to the standard library.
Exposing _combineHashValues
as public API is one way to help people who still need to implement Hashable
manually. However, it is not the best choice to expose it as is: it lacks support for initializing/managing/finalizing internal state, so it cannot implement an important class of hash functions. Additionally, there is the issue of discoverability; the hashValue
API itself does not naturally guide developers to use helper functions like this. (Although this can probably be solved with documentation.)
Even worse, the concept of a "good hashValue" is not universal -- some applications require secure, randomized, complex hashes, while others may be satisfied with a less secure function that calculates faster. Baking the hash function directly into our hashValue
implementation makes it harder to reuse our data types. Potentially, it also makes it harder to replace the hash function in case a bug (such as a security issue) comes to light later.
If I were to redesign hashing from scratch, I would assign the responsibility of selecting a particular hash function to the collections that rely on hashing, and to change Hashable
to be only concerned with task (2) above. @Vincent_Esche's HashVisitable
pitch from last year is one way to do this. I'm currently experimenting with a slightly modified approach that changes Hashable
directly.
In any case, the Hasher
concept from Vincent's pitch is a likely candidate interface for replacing _combineHashValues
and _mixInt
internally in the stdlib. If we want to expose the hash function used for synthesized hashing as public stdlib API, I think Hasher
is the way to go!