[Accepted] SE-0206: Hashable Enhancements

From that point of view, “stir in” and “feed from” aren’t very clear about implying order. You can make an argument in their favor, but having to make an argument to end users is the thing we’re trying to avoid.

What about refine(using:)? Each step of a refinement depends on the previous intermediate result.

1 Like

What is your evaluation of the proposal?
+ 1

Is the problem being addressed significant enough to warrant a change to Swift?

Does this proposal fit well with the feel and direction of Swift?

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
I read the proposal and have followed the discussions.

I think that the shed might be best colored combine(next:)

I think if the code is mildly restructured you can get some benefits:

  1. Performance improvements by not copying hashers with a large state (use a clash).
  2. Prevent/discourage people from calling finalize.
  3. Provide a selection of hashing algorithms.
  4. Enable caching of expensive hashes.

The restructuring I am suggesting is:

protocol MyHashable { // As before except not an `inout` since `Hasher` is a class. Spelt `MyHashable` to allow example code to run.
    func hash(into: Hasher)

// Similar to before, but without `finalize` or `init` since user shouldn't call these and must be a class since hashers can have a large internal state and therfore should not be bitwise copied.
protocol Hasher: AnyObject {
    func combine(bits: UInt64) // Only combine methods.
    // Other types of combine ...

// Mixin `finalize` and `init`.
protocol HasherImplementation: Hasher { // A HasherImplementation isa Hasher.
    init() // No arguments, since one use is as a default hasher. Seed using combine.
    func finalize() -> UInt64 // Good to be same type as Random, so hashing and random interoperate.

enum Hashers { // Library of Hash functions.
    typealias Default = Sip
    final class Sip: HasherImplementation { // *Not* Sip but instead something simple to post!
        private var hash: UInt64 = 211// Real seed is longer!
        func finalize() -> UInt64 {
            // Real code ...
            return hash
        func combine(bits: UInt64) {
            hash = 499 &* hash &+ bits // Real code more involved!
        // Other types of combine ...

/// Protocol that enables an implementing class to track modifications so that something that uses hashes can cache the hash.
protocol CacheableHash: AnyObject, MyHashable { // Only classes!
    var modificationCount: UInt64 { get } // Incremented eachtime a modification is made.

// A user of a hashing algorithm, set, dictionary, etc., would normally support caching by using this class rather than the `Hasher` directly.
final class HashCache {
    private struct Cached {
        weak var element: CacheableHash?
        let modificationCount: UInt64
        let hash: UInt64
    private var cache = [ObjectIdentifier : Cached]() // Would be a sorted dictionary ideally!
    private let hasherFactory: () -> HasherImplementation
    private var numberOfHashes = 0
    init(hasherFactory: @escaping () -> HasherImplementation = { Hashers.Default.init() }) {
        self.hasherFactory = hasherFactory
    func hash(for element: MyHashable, withSeed seed: UInt64 = 0) -> UInt64 {
        defer {
            if numberOfHashes == 131 {
                numberOfHashes = 0
                for entry in cache {
                    if entry.value.element == nil {
                        cache.removeValue(forKey: entry.key) // Rmove collected elements.
            } else {
                numberOfHashes += 1
        guard let e = element as? CacheableHash else {
            let hasher = hasherFactory()
            hasher.combine(bits: seed)
            element.hash(into: hasher)
            return hasher.finalize()
        let id = ObjectIdentifier(e)
        if let cached = cache[id] {
            if cached.element == nil {
                cache.removeValue(forKey: id)
            } else if cached.modificationCount == e.modificationCount {
                return cached.hash
        let hasher = hasherFactory()
        hasher.combine(bits: seed)
        e.hash(into: hasher)
        let hash = hasher.finalize()
        cache[id] = Cached(element: e, modificationCount: e.modificationCount, hash: hash)
        return hash

The Hashable protocol, called MyHashable above to prevent name clashes, has a hash(into:) method that accepts a class and therefore is not an inout parameter. This is more efficient for hashers that have a large state.

The Hashable protocol only has combine methods not init or finalize, thus preventing accidentally calling of something you shouldn’t. As noted above Hashers are classes.

HasherImplementation mixes in init and finalize.

Hashers is a library of hash functions with a nominated default hasher. Initially probably only Sip would be provided and hence Sip would be the default.

CloneableHash is a protocol that you implement if you want the hashers to cache the hash value because it is expensive to calculate. CloneableHash requires the implementer to be a class because the cache needs to track the objects it has cached the hashes for.

HashCache wraps a HasherImplementation and adds to it caching for CloneableHash types. You would normally wrap a HasherImplementation in a HashCache because HashCache works with both types that implement CloneableHash and types that don’t, i.e. it is universal. You can choose which laser to use with HashCache, but by default it uses Hashers.Default. The code for HashCache uses a dictionary, in practice a sorted dictionary would be used (improved performance and less space).

To demonstrate how to use this API a simple bloom filter and some tests which demonstrate HashCache with both cached and non-cached types is given below:

/// Example of something that can cache hashes.
struct Bloom64<T> where T: MyHashable {
    private var hashersSeeds = zip([HashCache(), HashCache(), HashCache()], [1021, 2053, 4099] as [UInt64])
    private var bloom: UInt64 = 0
    mutating func insert(_ element: T) {
        for hash in hashes(for: element) {
            bloom |= hash
    func contains(_ element: T) -> Bool {
        for hash in hashes(for: element) {
            guard bloom & hash != 0 else {
                return false
        return true
    private func hashes(for element: T) -> [UInt64] {
        return hashersSeeds.map { hasherSeed in
            (1 as UInt64) << (hasherSeed.0.hash(for: element, withSeed: hasherSeed.1) & 0x3F)

// Make `UInt64` conform to `MyHashable` for testing of cheap hashes.
extension UInt64: MyHashable {
    func hash(into: Hasher) {
        into.combine(bits: self)
var bi = Bloom64<UInt64>()
bi.insert(1 as UInt64)
print(bi.contains(1 as UInt64)) // T
print(bi.contains(0 as UInt64)) // F

// Example of an expensive class that has its hash cached.
class Expensive: CacheableHash {
    private(set) var modificationCount: UInt64 = 0
    private var state: UInt64 // Real code would contain a lot of state!
    init(_ initial: UInt64) {
        state = initial
    func hash(into: Hasher) {
        print("Calculating hash for \(state) with count of \(modificationCount)")
        into.combine(bits: state) // Real code would be an extensive data structure to hash.
    func modify() {
        state += 1
        modificationCount += 1 // Any 'mutating' method *must* increment count.
var be = Bloom64<Expensive>()
let e0 = Expensive(0)
let e1 = Expensive(11)
be.insert(e1) // "Calculating hash for 11 with a count of 0" x 3 for e1
print(be.contains(e1)) // No calculating hash because uses chached value for e1, T
print(be.contains(e0)) // "Calculating hash for 0 with a count of 0" x 3 because 1st time for e0, F
print(be.contains(e1)) // "Calculating hash for 12 with a count of 1" x 3 because e1 modified, F

I think it is a modest change that has great benefits. What do you think?

1 Like


@lorentey you posted an example of an enum with associated values somewhere above but I couldn’t quite follow then what you said about it. Is it reasonable to prefix each case with an index like value when creating a new hash value?

I wonder if this proposal will eventually allow us to synthesize the hash(into:) function for enums with associated values, which is currently missing. I couldn’t find the reasoning why synthesizing the hash value for enums with associated values is source breaking. If someone can clarify this here or send me a PM, I’d very appreciate it.

In my code I use a very ugly solution, because I didn’t came up with anything better and know that at least this solution will work as expected:

enum Foo : Hashable {
  case a
  case b
  case c(Int)
  case d(Int, String)

  var hashValue: Int {
    switch self {
    case .a:
      return "a".hashValue
    case .d(let intValue, let stringValue):
      return "d-\(intValue)\(stringValue)".hashValue
  static func == (lhs: Foo, rhs: Foo) -> Bool { ... }

I really liked the example of yours which associated an index to each case before consuming the associated values of the enum case.

Yes! Feeding discriminator values to the hasher is the correct way to do it.

I have good news! This was included in SE-0185, and it is already fully supported in the current compiler. This proposal merely updates generated code to use hash(into:). For example, this works just fine:

enum Foo: Hashable {
  case a(Int, String?, [Double])
  case b([String: Bool])
  case c(Set<Foo>)

SE-0185 does require that associated values implement Hashable. (Note that the example above makes use of conditional Hashable conformances introduced by SE-0143 that did not make it into the 4.1 release but are implemented on the master branch.)

1 Like

To me, this seems like an extremely significant change: it replaces a single struct with three protocols, a caseless enum, and two classes. It may seem like a surface-level change to you, but it significantly complicates the hashing API. The new hash(into:) and (especially!) HashCache would also very likely not benchmark the way you expect.

We did carefully consider making Hasher a protocol. The proposal explains why we decided to go with a simple struct instead. There may be a use case for custom Hashers, but it hasn’t come up in review discussions so far.

The Bloom filter you’ve shown does not utilize multiple Hasher implementations, so it argues more against generalizing Hasher than in favor of it. This is as it should be; we expect that the stdlib’s Hasher implementation will be good enough to support Bloom filters – there is no reason to implement custom Hashers for that. (Note that Bloom64 allocates a shockingly large amount of space to cache hash values; I believe this goes directly against the spirit of bloom filtering, and some of that space would be much better used as additional state space for the filter itself. The current version of insert/contains also yields a much higher rate of false positives than what Bloom suggested.)

The idea to partially automate hash caching is intriguing, but it seems like it is orthogonal to this proposal, and it can be built on top of it later. It also adds a brand new concept for tracking mutations that seems non-obvious enough to deserve its own separate discussion thread.

(I’d like to note again that, in order to improve hash table performance, we can’t reuse hash values across Set and Dictionary instances (or even during the lifetime of a single collection instance). So caching full raw hash values would rarely be helpful for these collections.)

This is a remarkably confident assertion. It’s difficult to correctly reason about performance without actually comparing benchmarks. I strongly suspect you may be overestimating the cost of passing around an inout struct, and underestimating the cost of allocating/deallocating a class instance or of passing around a protocol existential. Performance considerations weren’t the only reason we chose the design we proposed, but they were indeed quite significant – and the hundreds of benchmarks I had run over the past few months were decidedly in favor of an interface that uses inout structs.


Er, I don’t quite follow. What exactly is the problem with having, say, an instance of Set cache its own hash value? (Or more accurately, cache its “prehash”, namely the xor of the hash values of its elements.)

Any mutation would clear the cached hash so it can never become stale, and I doubt that setting an optional integer to nil has any significant performance implications relative to mutating a set.

The prehash is only used in the set’s own hash(into: Hasher) implementation, so it never “leaks”. Instantiating a new set using the elements of an existing set doesn’t look at either set’s prehash.

Storing one set into another calls the inner set’s hash(into:) method with the outer set’s hasher, exactly as expected. And instantiating a new set using the elements of the outer set involves calling the inner set’s hash(into:) method with the new set’s hasher. So the inner set’s cached prehash gets reused successfully, while the outer sets are free to maintain independent hashers.

• • •

Edit: Scratch all that. We can’t cache hashes (or even prehashes) because the elements could be classes, which can be mutated without it being “mutating”.

• • •

Edit 2: Wait, if a class instance can have its hash value change out from under a Set, that poses larger problems even without caching. How do we handle sets where the element is a class today?

• • •

Edit 3: It appears we do not handle that situation today. The set claims not to contain the mutated class instance, although it does show up during iteration.

Hold your horses! :wink: I was talking about the specific idea of reusing the raw hash values across Set/Dictionary instances, not about hash caching in general. Hash tables must choose a new seed when they resize themselves, to prevent unacceptable performance regressions. (This is unrelated to collision attacks.)

Using the memoized result of a secondary hash function to reduce the number of bytes fed into Hasher in hash(into:) is a perfectly valid optimization. (That being said, it does have some fundamental correctness tradeoffs, and I believe it should only be used sparingly, and with copious amounts of justification in the form of before/after profiling data.)

There is nothing wrong with that, except it would needlessly slow down mutations (especially bulk removals) for Sets that will never be hashed – which I expect to be the usual case. Much less importantly, it also interferes with using Sets as keys in advanced hashing scenarios, like Bloom filters.

Keeping an up-to-date hash cache for Sets is still an important optimization; it’s just probably better done outside of the standard library. (But if the performance tradeoff seems worth it, the standard Set may end up doing it too. This is an implementation detail that is independent of the proposal we’re discussing.)

No, it’s a serious programming error to do that.

While this is an interesting topic, it too seems unrelated to the proposal at hand.

Cocoa handles this by defining the NSCopying protocol, and making a copy of all keys inserted into NSSet/NSDictionary instances. Foundation differentiates between mutable/immutable classes, where a copy of a mutable class gets to be immutable. This works fine within types defined in Foundation, but it’s just a coding convention, not a language-enforced rule. You’re free to ignore this convention when you’re defining your own NSObject subclasses, and then you’ll run into the same problem.

Swift chose to handle the same issue by emphasizing value types over reference types. Value types are literally impossible to mutate while they’re stored in Set/Dictionary's hash table. Notably, this is enforced by the type system of the language itself, rather than just a convention.

1 Like

Okay, that makes sense.


The same issue arises for a struct that contains a class instance. If the class instance is visible anywhere outside the struct (either through another reference, or even as a read-only property on the struct itself), the hash value of the struct can be changed out from under a set it is in. This applies even if the struct uses copy-on-write, because no write to the struct is occurring.

And yes, this is not related to the subject of the current thread, except that it came to mind while discussing cached hashes.

1 Like

This review is focused only on source compatibility. I read that section in depth and skimmed the rest of the propasal. I did not read all of the review thread or previous discussion.

When upgrading to Swift 4.2, Hashable types written for earlier versions of Swift will need to be migrated to implement hash(into:) instead of hashValue. There are two options for doing this:

IIUC, we will synthesize the new hash(into:) based on the user’s hashValue implementation when using the pre-4.2 language mode. Why not do the same in 4.2 and deprecate hashValue instead of forcing users to update it immediately? A warning combined with excellent fixits is a great way to get everyone to migrate over time, but it’s painful for it to become an error in the same release that the new functionality is added. The bar for making it an error in 4.2 should be that we can migrate it fully automatically, but…

The compiler should simplify the migration process by providing fix-its for both options. A fix-it to remove hashValue should only be provided if automatic synthesis is available. For the second option, it would be nice to have the compiler suggest a full implementation of hash(into:), but a template would probably suffice:

… I believe this is not sufficient for automatic migration. We will not be able to ensure that the behaviour of the program doesn’t change with either of these options. I think we really need the user to make the decision to change their hashing.

1 Like

Thank you for bringing this up; ease of migration is a hugely important issue.

While the compiler would produce a deprecation warning in 4.2 mode, synthesis would continue to work, and no code would need to be actually changed until/unless hashValue is actually removed from the protocol, which is not going to happen soon (if ever).

(It may also be a good idea to defer hashValue's formal deprecation till a future language version. We’d like to encourage people to implement hash(into:), but there may be gentler ways of doing that than formally deprecating hashValue. However, it seems like a bad idea to support both requirements indefinitely – we do want to get rid of the legacy interface at some point.)

I agree wholeheartedly. It may be helpful to provide it as an interactive fix-it attached to the deprecation warning, though.

The proposal doesn’t cover fully automated migration of existing hashValue implementations; it seems Difficult to do that correctly in the most general case (with complications like caching, partial hashing, etc.). However, automatic migration may be possible for the simpler cases.

For example, (in theory at least) the migrator could recognize that the hashValue implementation below calls the getters of all stored properties without touching anything else, and it could replace it with a hash(into:) implementation or remove it altogether:

struct Foo: Hashable {
  var a: Int
  var b: Double

  var hashValue: Int { return a.hashValue ^ 31 &* b.hashValue }
  static func ==(l: Foo, r: Foo) -> Bool { return l.a == r.a && l.b == r.b }

(The same goes for the == implementation.) Of course, the “without touching anything else” part may not be easy to formally verify. (The code does call the hashValue properties along with a bunch of Int operations. Do we implement a whitelist?)

This migration would still change the meaning of the code in evil cases like the one below, where a.hashValue is canceled out from the returned hash value:

var hashValue: Int { 
  var hash = 0
  hash ^= a.hashValue
  hash = 31 &* b.hashValue
  return hash

I’d argue a human migrator would probably not fair any better, and I’d also argue the original intention behind this code was probably to hash both fields; the second xor was simply lost somewhere along the way.

As an improvement to this, the migrator could recognize partial hashing, and try to do the right thing there, too.

Thanks for everyone who has participated in the review thus far! The core team discussed this proposal in our meeting today. There are still a couple days left in the review period, but we’re tentatively inclined to accept this proposal, but wanted to suggest some changes based on public review discussion and our own review:

  • As @jrose noted, init(seed:)'s behavior may be misleading to users who expect it to enable deterministic hashing, and has no real benefit over a collection implementation seeding the hasher directly with combine calls. To avoid misleading users, and to simplify the interface, we strongly suggest not including init(seed:) in the public Hasher API, instead suggesting that hashed collection implementers seed their hashers manually.

  • As a further simplification, the core team observes that the many combine(bits:) overloads on Hasher that take fixed-sized integers don’t need to be public API. Instead, they can be @usableFromInline entry points invoked by @inlinable implementations of hash(into:) on the integer types themselves:

    extension Hasher {
      @usableFromInline internal mutating func _combine(bits: UInt8)
      /* etc. */
    extension UInt8: Hashable {
      @inlinable public func hash(into hasher: inout Hasher) {
        hasher._combine(bits: self)

    This greatly reduces the public API surface area of Hasher, in particular eliminating the large overload set for combine(bits:), while preserving the performance characteristics of the proposed design, since using the generic combine<H: Hashable> method with a fixed-sized integer type would naturally inline down to a call to the corresponding @usableFromInline entry point. This would reduce the public Hasher API to only init(), combine<H: Hashable>(_: H), the combine(bits:) method that takes a raw buffer of hash input, and finalize() as public API.

With those changes, the proposal is likely to be accepted. The core team also considered many of the points raised in public review. @Gankro raised the importance of mixins for defending against collision family attacks in hashes of collection values, and many people suggested dedicated APIs for facilitating this in response. @lorentey noted in reply that dedicated API for this doesn’t really carry its weight or shield the implementer of a collection hash from needing to know that they need mixins or what information to mix into the hasher, and the core team concurs with this point of view. Documentation will describe the importan

In the same post, @Gankro also pointed out the usefulness of a one-shot hashing interface for certain families of hash function that change algorithm based on input size, and so benefit from a complete view of the hash input at once. In the proposed design, the hasher implementation is intended to be fixed and chosen by the standard library, and the current algorithm being considered does not require one-shot hashing to benefit. If a future version of Swift decided to change the hash function to one that benefited from one-shot hashing, we should be able to introduce a new requirement resiliently by adding a new requirement with a default implementation in terms of the current incremental hashing interface. In a follow-up post, @Gankro described problems Rust had doing exactly that, albeit acknowledging that some of those problems are due to other Rust-specific design choices in their hashing design, particularly the ability to index their hashed containers with types other than the key type of the container itself. It isn’t clear that Swift would suffer these same issues resiliently introducing one-shot hashing, or that Swift would have cause to switch the standard hash function to one that required it, so the core team would like to err on the side of simplicity and leave it out of the current design.

@lemire raised the concern that the proposed design does not allow for rolling hash functions. @lorentey replied that Hashable's primary purpose is hashed collections, and that the usefulness of rolling hashes is largely independent of Hashable's intended purpose, and the core team concurs.

@hlovatt and others asked how to cache hashing work in this design; it’s more non-obvious and error-prone to do so than before. @lorentey replied, and the core team agrees, that automated caching is an orthogonal design question that could be added in a separate proposal, and that hash caching should in general be used sparingly as it’s rarely a win in practice.

Many people raised concerns about the naming of methods in the proposed design. Particularly, @nnnnnnnn noted that perhaps hash(into:) ought to be deemphasized as a static method, to avoid polluting documentation and code completion, and many people raised the issue that combine may imply order insensitivity. The core team feels that the documentation/code completion pollution problem is a larger issue that deserves more holistic consideration. While many alternative names for combine were suggested, the core team did not find any of them significantly clearer. The core team is inclined to accept the names as is.

The core team considered the question of whether the old hashValue requirement could be eliminated altogether in a future version of Swift. There are significant source compatibility concerns in doing so; @blangmuir also lays these concerns out nicely in his post. Deprecating hashValue would not affect source compatibility with existing Swift 3 or 4 code while discouraging new code from being written to implement it, but the protocol requirement cannot be completely removed without breaking source compatibility, since there is no perfect automated migration path. However, a special case in the Hashable synthesis logic may be possible; in theory, the compiler could see that a type is declared to conform to Hashable with only a hashValue implementation provided, and in the situation, the compiler could then synthesize a hash(into:) implementation in terms of hashValue. It would be nice to be able to eliminate hashValue entirely from the API and ABI of the standard library, but it isn’t worth breaking source stability for, and the core team does not want to gate acceptance of this proposal on even more compiler work to handle special-case compatibility hacks.

Thanks again to everyone who’s given feedback in the review discussion. The review period is still scheduled to continue through this Friday, so if you have any other concerns to raise about the proposal or responses to the core team’s decisions thus far, you still have time to provide them, and your feedback is welcome.


Just out of curiosity, do you have a bottom-line estimate of the performance cost of making Hasher a protocol rather than a unitary struct?

Minor question that I don’t see addressed in the proposal: will Hasher itself conform to Hashable?

It’s not obvious to me why it would. Is there a use case you had in mind?

I really like the proposal but the one feature that I would like is a way to explicitly control the hash value given by finalize() rather than being forced into the underlying implementation used by Hasher.

The use case I see for this is when working with non-Swift tools (such as C, Python, etc.) which can have hash values. Being able to have consistent hash values of a datatype in each realm is very handy.

For example if I have this in a C header:

struct Foo {
   int bar;
   int baz;

int ComputeHashOfFoo(struct Foo foo);

I would personally want the hash value that I compute in C land to be the same as in Swift if I added an extension like this:

extension Foo: Hashable {
    func hash(into hasher: inout Hasher) {
        hasher.fix(to: Int(ComputeHashOfFoo(self)))

Also, I’m not at all attached to the name fix(to: Int) or just a function like this being the interface to accomplish this task, it is just a quick example.

I imagine this isn’t a very common use case so to discourage the use of fixing the value it should probably be named something like unsafelyLockFinalizedValue(to: Int) or something along those lines. My brain is a little too fried to think of better naming at the moment…

It could, and it would be cool, but you’d need something that differentiates two Hashers in a way that is prone to hashing.
This could be accomplished by extracting a representation of each Hashers’ hashing algorithm using some advanced Reflection feature. Then, you’d feed that representation to a Hasher and get their hash values.

I don’t think something like that exists in Swift atm, but damn that would be cool to make someday. A Hashtable with Hashers as keys? Hell yeah, why not :D

I wanted to raise a couple of questions about the design of finalize that came up in discussions yesterday:

First of all, a recurring theme in this discussion was the finalize method and the fact that, as proposed, it invalidates the Hasher value it’s invoked on so that further combine operations trap. There’s a straightforward answer to this problem in the language today: finalize could be made into a nonmutating consuming method. In Swift today, that would prevent the finalize operation from invalidating the hasher, since it would receive a copy of the hasher state instead of modifying the hasher in-place. Improperly invoking finalize on an inout hasher argument would give you a hash value based on the current hasher state up to this point, but not invalidate the hasher for future use. Looking forward to the proposed ownership model for Swift, a hash(into:) implementation could mark the hasher argument as move-only (and maybe Hasher itself could be made move-only or explicit-copy-only conditional on Swift language version), making so that you wouldn’t be allowed to finalize during a hash(into:) operation, since you can’t consume an inout-borrowed value. It’s debatable whether silently providing an incomplete hash value is a better failure mode than trapping after improper use of finalize in the short term, but this design choice could lead to the correct long-term behavior further down in Swift’s roadmap.

Next, I’d like to suggest the idea of using a stronger return type for the return type of finalize. @lorentey pointed out a potential problem if in the future we added support for one-shot hashing, since a one-shot hashOnly has the potential to be misimplemented by ignoring its hasher and simply returning a fixed value:

struct MyType: Hashable {
  func hashOnly(into hasher: Hasher) -> Int {
    // We ought to use the hasher to generate a hash so that we factor in its seed,
    // but nothing's forcing us to:
    return 1738

This is even worse than hashValue since we’d expect hashOnly to return a pre-seeded value, so the raw hash value would be accepted blindly instead of getting mixed in with the seed like a plain hashValue would. We could use the type system to direct use here, by wrapping up the final hash value in a struct that only the standard library Hasher can construct:

@fixedLayout public struct HashValue {
  public let value: Int

  // Only usable by Hasher. Could be public API with a sufficiently scary name?
  fileprivate init(_rawHashValue: Int) { self.value = _rawHashValue }

That way, hashOnly or another API that is intended to return a strong hash value can return the HashValue type instead of a plain Int to ensure that the result in fact came from a Hasher.


@lorentey Thanks for commenting.

I do consider the change I suggested as a surface change, primarily because it retains the innovative step of providing the hasher instead of receiving a hash value.

Actually I expect them currently to be slower because of retain-release cycles, however once we get ownership manifesto and can mark the hasher as @nonescaping (or whatever it is called) then I would expect that like in other languages the cost of passing a class will be lower than passing a largish struct.

I can think of lots of uses for custom hashers:

  1. Often implementations of Bloom filters Provide many different hashing functions, e.g. https://github.com/Baqend/Orestes-Bloomfilter provides 10+.
  2. You may well want to know exactly what hasher is used, for example many Bloom Filter implementations choose Murmur as the best compromise between speed and performance.
  3. A library of Hashers could, maybe not at first but latter on, contain common message hashers like MD5 and SHA-1.

The Bloom Filter I wrote was just an example, I don’t consider it serious code. I wanted to demonstrate that the proposed API would enable more exotic use of hashers whilst still enabling Caching and not requiring tons of boilerplate code. As noted above there are many useful hashers.

Yes it could be a seperate proposal.

Two points:

  1. The caching shown is far from optimal to keep the post short, as noted in the comments. In practice a SortedDictionary would perform better than the example’s Dictionary. (But SortedDictionary doesn’t exist yet!)
  2. The extra space is only for caching if nothing caches then the overhead would be small (see point 1). If something is cached then yes you are trading of space for speed, but that is what the programmer has decreed worthwhile.

In summary: I think the main point is the restructuring to enable future growth and to anticipate the implementation of the ownership manifesto and I agree that the caching could be seperate.

Thanks for putting all your work into your proposal, it is a really worthwhile improvement to Swift.

1 Like

After sleeping on this, I’m getting increasingly convinced we should just embrace that one-shot hashing provides an unsafe escape hatch from Hasher:

protocol Hashable {
  func hash(into hasher: inout Hasher)
  func unsafeHashValue(seed: (UInt64, UInt64)) -> Int

extension Hashable {
  public func unsafeHashValue(seed: (UInt64, UInt64)) -> Int {
    var hasher = Hasher(seed: seed)
    self.hash(into: &hasher)
    return hasher.finalize()

In addition to one-shot hashing, this would also allow people to experiment with new hash functions without having to change the stdlib. (unsafeHashValue is essentially the same as the hashValue property, but with a seed.)

(To be clear, we should not do this now. Adding this as public API would be a separate proposal. But introducing a HashValue type now may turn out to be overly cautious later.)

Terms of Service

Privacy Policy

Cookie Policy