Thanks for everyone who has participated in the review thus far! The core team discussed this proposal in our meeting today. There are still a couple days left in the review period, but we're tentatively inclined to accept this proposal, but wanted to suggest some changes based on public review discussion and our own review:
-
As @jrose noted, init(seed:)
's behavior may be misleading to users who expect it to enable deterministic hashing, and has no real benefit over a collection implementation seeding the hasher directly with combine
calls. To avoid misleading users, and to simplify the interface, we strongly suggest not including init(seed:)
in the public Hasher
API, instead suggesting that hashed collection implementers seed their hashers manually.
-
As a further simplification, the core team observes that the many combine(bits:)
overloads on Hasher
that take fixed-sized integers don't need to be public API. Instead, they can be @usableFromInline
entry points invoked by @inlinable
implementations of hash(into:)
on the integer types themselves:
extension Hasher {
@usableFromInline internal mutating func _combine(bits: UInt8)
/* etc. */
}
extension UInt8: Hashable {
@inlinable public func hash(into hasher: inout Hasher) {
hasher._combine(bits: self)
}
}
This greatly reduces the public API surface area of Hasher
, in particular eliminating the large overload set for combine(bits:)
, while preserving the performance characteristics of the proposed design, since using the generic combine<H: Hashable>
method with a fixed-sized integer type would naturally inline down to a call to the corresponding @usableFromInline
entry point. This would reduce the public Hasher
API to only init()
, combine<H: Hashable>(_: H)
, the combine(bits:)
method that takes a raw buffer of hash input, and finalize()
as public API.
With those changes, the proposal is likely to be accepted. The core team also considered many of the points raised in public review. @Gankro raised the importance of mixins for defending against collision family attacks in hashes of collection values, and many people suggested dedicated APIs for facilitating this in response. @lorentey noted in reply that dedicated API for this doesn't really carry its weight or shield the implementer of a collection hash from needing to know that they need mixins or what information to mix into the hasher, and the core team concurs with this point of view. Documentation will describe the importan
In the same post, @Gankro also pointed out the usefulness of a one-shot hashing interface for certain families of hash function that change algorithm based on input size, and so benefit from a complete view of the hash input at once. In the proposed design, the hasher implementation is intended to be fixed and chosen by the standard library, and the current algorithm being considered does not require one-shot hashing to benefit. If a future version of Swift decided to change the hash function to one that benefited from one-shot hashing, we should be able to introduce a new requirement resiliently by adding a new requirement with a default implementation in terms of the current incremental hashing interface. In a follow-up post, @Gankro described problems Rust had doing exactly that, albeit acknowledging that some of those problems are due to other Rust-specific design choices in their hashing design, particularly the ability to index their hashed containers with types other than the key type of the container itself. It isn't clear that Swift would suffer these same issues resiliently introducing one-shot hashing, or that Swift would have cause to switch the standard hash function to one that required it, so the core team would like to err on the side of simplicity and leave it out of the current design.
@lemire raised the concern that the proposed design does not allow for rolling hash functions. @lorentey replied that Hashable
's primary purpose is hashed collections, and that the usefulness of rolling hashes is largely independent of Hashable
's intended purpose, and the core team concurs.
@hlovatt and others asked how to cache hashing work in this design; it's more non-obvious and error-prone to do so than before. @lorentey replied, and the core team agrees, that automated caching is an orthogonal design question that could be added in a separate proposal, and that hash caching should in general be used sparingly as it's rarely a win in practice.
Many people raised concerns about the naming of methods in the proposed design. Particularly, @nnnnnnnn noted that perhaps hash(into:)
ought to be deemphasized as a static method, to avoid polluting documentation and code completion, and many people raised the issue that combine
may imply order insensitivity. The core team feels that the documentation/code completion pollution problem is a larger issue that deserves more holistic consideration. While many alternative names for combine
were suggested, the core team did not find any of them significantly clearer. The core team is inclined to accept the names as is.
The core team considered the question of whether the old hashValue
requirement could be eliminated altogether in a future version of Swift. There are significant source compatibility concerns in doing so; @blangmuir also lays these concerns out nicely in his post. Deprecating hashValue
would not affect source compatibility with existing Swift 3 or 4 code while discouraging new code from being written to implement it, but the protocol requirement cannot be completely removed without breaking source compatibility, since there is no perfect automated migration path. However, a special case in the Hashable
synthesis logic may be possible; in theory, the compiler could see that a type is declared to conform to Hashable
with only a hashValue
implementation provided, and in the situation, the compiler could then synthesize a hash(into:)
implementation in terms of hashValue
. It would be nice to be able to eliminate hashValue
entirely from the API and ABI of the standard library, but it isn't worth breaking source stability for, and the core team does not want to gate acceptance of this proposal on even more compiler work to handle special-case compatibility hacks.
Thanks again to everyone who's given feedback in the review discussion. The review period is still scheduled to continue through this Friday, so if you have any other concerns to raise about the proposal or responses to the core team's decisions thus far, you still have time to provide them, and your feedback is welcome.