These are really interesting points; thank you for raising them.
There is a small semantic issue here -- it isn't clear if we can allow hasher.combine(42 as Int64)
and hasher.combine(42 as Int8)
to produce different hashes.
They are different types, so I'd argue we can and should be free to do that. However, the standard library defines heterogeneous comparison operators between all fixed-width integer types. This means that (42 as Int8) == (42 as Int64)
, and having the two values produce different hashes may be confusing. (AnyHashable
and its interaction with NSNumber
produces additional complications.)
The bits:
label for the primitive combine
functions makes it clear that the bit width of integer values is significant.
The implementation of hash(into:)
for integer types is currently quite a bit more complicated than that. I hope to make hasher.combine(foo)
mean the same thing as hasher.combine(bits: foo)
for all integer types, but it will take more discussion and some extra work.
(For Int
, the current implementation does indeed end up calling combine(bits: Int)
, but this is an accidental consequence of Hasher
not implementing the full API proposed here, and we will likely need to change it once we have combine(bits: Int8)
working.)
We shouldn't be doing unnecessary work. However, as far as I can see, commutative hashing requires calculating a separate hash value for each element, and spawning a new hasher is simply the correct way to do that.
Hasher.finalize()
is not in itself slow; calling it only harms performance when the call can be eliminated.
Can you provide an example how a Hasher
protocol would make things faster here? (I'd hesitate to have Set
and Dictionary
produce weaker hash values.)
This is a good point! We changed the name from append
to combine
late in the drafting process; I think combine
is clearly better, but it does not by itself imply strict ordering. We can (and should) make up for this in the documentation. (append
is uncomfortably close to Array.append
. While Hasher.combine
and Array.append
are similar operations in an abstract sense, they actually mean completely different things.)
I agree too, and your implementation is the correct way to do it. (I.e., spawning/finalizing a separate hasher to generate the cached hash, then feeding the hash value in hash(into:)
.)
Note that caching the hash involves mutating operations, so for value types, you cannot do it directly in hash(into:)
. (This is by design, and the same applied for hashValue
.) This is not a real limitation, as you can simply force-generate the cached hash sometime before you start using the value in hashing contexts.
Hm; maybe we should simply turn the example into a demonstration of how the compiler implements hash(into:)
. It would serve the same purpose,, but it would avoid getting bogged down in questions like why wouldn't you want to hash the color of a GridPoint
.