Meaning of 'must' in `Hashable` documentation

In the documentation for Hashable.hash(into:) we currently have the following line:

The components used for hashing must be the same as the components compared in your type’s == operator implementation.

IIRC, back when Hashable was simply based on the hashValue requirement, the semantic "rule" of the protocol was something like "values which return true for val1 == val2 must also return true for val1.hashValue == val2.hashValue," which is a notably weaker guarantee.

In particular, the current semantics of Hashable would seem to prohibit implementing a sort of "fast path" version of hash(into:) which only hashes a subset of the components used for Equatable.

The top-level Hashable documentation features a somewhat confusing statement (emphasis added):

Hashing a value means feeding its essential components into a hash function, represented by the Hasher type. Essential components are those that contribute to the type’s implementation of Equatable.Two instances that are equal must feed the same values to Hasher in hash(into:), in the same order.

The final sentence seems in-line with the former, looser semantics for Hashable, but it is seemingly contradicted by other elements of the documentation.

Are the newer, stricter semantics of Hashable meant to be interpreted as I've understood them? If so, why the change?

4 Likes

See the discussion here:

In particular, @lorentey writes:

Beyond simplifying hashing, the intent of SE-0206 is to enable Swift to provide certain guarantees about its quality. In particular: as long as hash(into:) feeds enough data the hasher to unambiguously decide equality, Swift attempts to guarantee that collision attacks won't be possible . For this to work, it is critically important for Hashable implementations to include everything that Equatable.== looks at; and this is especially the case for the basic boundary types that come built-in with Swift, like Data .

So the short answer, I guess, is that the protocol attempts through its documentation to ensure a certain hash quality for conforming types. Attempting a “fast path” which does not adhere to its recommendations will degrade that quality, which has associated consequences for its downstream uses.

6 Likes

Thanks @xwu, that indeed answers my question! This excerpt is also highly relevant:

While this is not a hard requirement for user code, for boundary types provided in the stdlib/SDK, we require that hashing isn't just consistent with equality, but that it's equivalent to it. The Swift test suite has checks to actively enforce this -- this is possible through repeatedly salting the hash function. "Optimizing" hashing by omitting some of the data compared by == is generally a mistake in Swift, because it completely breaks all guarantees about the strength of hashing, and opens the door to (accidental or deliberate) collision attacks.It's perfectly acceptable to hash a gigabyte of data if someone inserts some large value (such as a big collection) as a key in a hash table. Multi-megabyte String keys are easy to protect against; hidden hashing weaknesses aren't.

Yep, makes total sense. I will note, though, that this is not expressed in the documentation as a "recommendation"—it's expressed as a "must" which, by my reading, means that types which don't satisfy this detail do not validly conform to Hashable (just as types for which == is not an equivalence relation have not validly conformed to Equatable).

Given @lorentey's statement that the equivalence between hashability and equatability is "not a hard requirement for user code," I'm still curious whether the documentation intends to express this as a recommendation or a requirement for valid conformers.

1 Like

It's also worth noting that the as-accepted wording for hash(into:)'s documentation was:

  /// Hash the essential components of this value into the hash function
  /// represented by `hasher`, by feeding them into it using its `combine`
  /// methods.
  ///
  /// Essential components are precisely those that are compared in the type's
  /// implementation of `Equatable`.
  func hash(into hasher: inout Hasher)

So it appears that this has been the rule since the dawn of SE-0206.

2 Likes