Let Optional, Dictionary and Array conditionally conform to Hashable

Unfortunately this feature came too far into the 4.1 beta period to be included in that release; however, it's now implemented in the current 4.2 snapshots.

With the Core Team's approval, we amended SE-0143 to explicitly cover Hashable conformances (for the same types that are conditionally Equatable), and the implementation has now landed on the master branch (thanks to @Morten_Bek_Ditlevsen and @xwu who provided the initial implementation).

There is room for potential future work; in particular, it may be worth drafting a proposal to conditionally implement Equatable and Hashable for the following types, that aren't covered by SE-0143:

  • Slice
  • AnyCollection
  • DictionaryLiteral
  • CollectionOfOne
  • EmptyCollection
  • anything else we're currently missing

Some of these are not obviously a good idea -- we've already touched on Slice during earlier discussions, but it may be worth revisiting it, if only to summarize what we learned. AnyCollection raises similar issues; it's worth discussing it separately. (Hashable conformances of these are slightly controversial when the underlying collection type does not prescribe a particular ordering for its elements -- such as Set and Dictionary.)

3 Likes

(missed this; wasn't aware it moved past 'pitch'. 2 drive-by comments...)

  1. It's probably worth asking the Foundation folks about performance. For example, NSDictionary's hash value is just its count. I guess they decided it wasn't worth hashing the contents? Or perhaps only under certain conditions?

    ([1: "hello", 2: "goodbye"] as NSDictionary).hashValue // returns 2
    ([3: "yo", 4: "dude"] as NSDictionary).hashValue // returns 2
    ([5: "seriously", 6: "its", 7: "just", 8: "count"] as NSDictionary).hashValue // returns 4
    
  2. I think we could lower the collection requirements so their elements only have to be Equatable. We can provide a specialised override if you want to hash the contents for a better result (if the cost is worth it). Or am I wrong?

    extension Dictionary: Hashable where Value: Equatable {
      var hashValue: Int { return count } // Basic hash.
    }
    extension Dictionary where Value: Hashable {
      var hashValue: Int { /* fancy stuff */ }
    }
    

Array etc. are already copy on write so no big deal to cache hash.

We did discuss this with the Foundation team during PR review; see https://github.com/apple/swift/pull/15382

We hash the contents because it provides the best performance for large hash tables. For small hash tables, this is not necessarily the case; however, the correct place to start optimizing that is in the implementation of Set and Dictionary.

NSDictionary.count is a terrible hash value. Unfortunately, content-based hashing is problematic in Foundation because it doesn't align well with Cocoa's mutation model. (NSArray elements and NSDictionary values aren't copied when they're placed in the collection.) This is not the case in Swift, where stdlib collections are COW value types -- it's impossible to mutate a hash key. (Unless it includes a mutable reference type. We don't currently have a nice solution for that, and this change does not materially change that.)

Yeah I'm not suggesting it's great, only that, IMO, Hashing is a sort of rough pre-Equatable check. That's why hash collisions are a normal (if undesirable) fact of life but == collisions are not. Even if you can't or don't want to hash the contents of the collection, you can at least encode information about its shape.

I actually wonder about NSDictionary's implementation - it copies its keys, so surely it could derive a hash at least from those. It seems to me like even when the values aren't hashable, we could still do better than count.

I'm not an expert on the mathematics of hashing, but I've always been taught that it's a tradeoff between speed and accuracy - and that you should almost always optimise for speed and leave accuracy to ==. That's why it seems strange to me that we require accuracy - I think it's a 'bonus', in that it might allow us to generate a better value, but not critical to the concept of hashing a collection in general.

It doesn't really bother me all that much (Hashable is easy to implement, and the new hash(into:) system will make it even easier), it's just that I browsed the discussion (!!!) and everybody seemed to just take it for granted that a collection's elements must be Hashable for the collection itself to be.

This is one of the most treacherous misconceptions about hashing; I got burnt by it myself. It also came up in the PR, but let me make another attempt at dispelling it here.

The count of a general-purpose collection makes a terrible hash value, period. It literally wipes away the magic sauce that makes hash tables work.

There is a really simple Law of Hashing; it states that we must feed the hash function with all the bits we use to decide equality. There are no ifs or buts about this -- hashing stops working when we violate this rule. If we forget more than a couple of bits of essential data, then our speedy hash table is all but guaranteed to ruin our day by turning into an unsorted array in the most embarrassing moment possible.

Some data sets contain redundancies that allow equality decisions to be made based on a mere fraction of the bits in each item. That's great! If we can guarantee such redundancies, then we can make both hashing and equality checks faster by writing clever custom code for hashValue and ==, without violating the Law.

However, Array and Dictionary are general-purpose data structures that know absolutely nothing about the data that they will be asked to hold. Therefore, for these types, hashing the full contents is the only Lawful choice.

People who break the Law of Hashing get exiled into the quadratic swamps.

Please don't break the Law.

9 Likes

I'd like to include a shim of the implementation in 4.1.

@lorentey I've looked at Optional.swift on the swift-4.2-branch and I do not see any mention of the word Hashable.

Is the extension somewhere else?

Ah, that's a good point. The feature is on the master branch; it's not yet on swift-4.2-branch.

If you're looking to backport these conditional conformances to 4.1, please be aware that (1) doing so will cause source compatibility issues down the road, and (2) the implementation you'll find on master uses internal stdlib interfaces that aren't available in 4.1. Earlier commits from the PR above have 4.1-compatible implementations, but they're still using using undocumented interfaces. Simply copying those into your own codebase would be a bad idea.

In general, only the standard library is allowed to conform stdlib types to stdlib protocols. (This rule isn't currently enforced by the language, but consider what would happen if two modules in the same app happened to define the same conformance.)

Hah! That's good to know… :slight_smile: