Let Optional, Dictionary and Array conditionally conform to Hashable

Morten_Bek_Ditlevsen · January 26, 2018, 8:18pm

Hi all,
I just tried out Swift 4.1 synthesis of Equatable and Hashable.
This is truly awesome, and removes boilerplate code that is bothersome to maintain.
In case of Hashable it also eliminates the need for the developer to know and understand the intricacies of creating good hashes.
When synthesizing Hashable I noticed that Array, Optional and Dictionary do not conditionally conform to Hashable, while Set (unconditionally) does.
While it is fairly easy to add the support through conditional conformances (awesome!), the developer still has to know and understand how to create good hashes.
I looked at Set’s Hashable implementation and made similar implementations for the other three types.

Are there any reasons that these should not be added to the standard library??

I’d be happy to give the implementation a go and try writing a formal proposal.

Sincerely,
/morten

moiseev · January 26, 2018, 10:14pm

@Ben_Cohen, IIRC we made Array etc. conditionally Equatable without an evolution proposal. This addition seem to be from the same bucket. Do we need a proposal or the implementation would suffice?

saagarjha · January 27, 2018, 12:13am

I'm on board for Optional and Array, but when would Dictionary conform? When both its Keys and Values are Hashable?

xwu · January 27, 2018, 4:07am

That's the only way, no?

Morten_Bek_Ditlevsen · January 27, 2018, 1:21pm

Right, when both keys and values are Hashable. When two dictionaries are equal they must hash to the same value, so that requirement is given.

Morten_Bek_Ditlevsen · January 28, 2018, 2:44pm

And Dictionary keys are already Hashable, so the extra requirement is only for the values to be Hashable.

Morten_Bek_Ditlevsen · January 28, 2018, 3:16pm

I am right now trying to move my very simple implementation into the standard library, and I am meeting a few obstacles, never having tried to build/test the entire swift codebase before.

I have a few questions, but I don't know if this is the correct forum to ask the questions in.
If it's not the best place to ask these questions, please have me excused.

For reference, here is my first shot (still lacking tests for Array and Dictionary):

For Optional, I have added the implementation to stdlib/public/core/Optional.swift right after the conditional conformance to Equatable.

For testing the Optional conditional conformance to Hashable, I have basically duplicated the tests for Equatable:

OptionalTests.test("Hashable") {
    expectEqual([true, false, false, false, false, true], testRelation { $0.hashValue == $1.hashValue })
    expectEqual([false, true, true, true, true, false], testRelation { $0.hashValue != $1.hashValue })
}

So this verifies that equal optionals have equal hashes (and for these specific values that the non-equal optionals also happen to have non-equal hashes).

Is this a sufficient test?

With regards to Array, I have added the implementation to stdlib/public/core/Arrays.swift.gyb.
Here I have also followed the style for Equatable cond. conf. so my implementation is added for both ContiguousArray, ArraySlice and Array. Does this choice make sense? I am uncertain.

With regards to Dictionary, the implementation is added to stdlib/public/core/HashedCollections.swift.gyb - where the (unconditional) conformance for Set already resides.
Here I am a bit uncertain as to whether I need special cases for cocoa vs. native Dictionaries - or whether my more 'naive' implementation is fine.

For both Array and Dictionary I have no clue as to where I should add the tests.

Regarding the build/test process:

Is the easiest way to build/test to:

Follow build instructions
Run initial test before making changes
After changing stdlib only, run ninja swift-stdlib from SWIFT_BUILD_DIR
Run utils/build-script --test from 'swift' dir

Number 4 takes quite a lot of time. I have tried adding --test-paths test/stdlib/Optional.swift command line arg to build-script, but I'm guessing that's not the correct format, because it didn't appear to work.

Anyone with experience in building and running tests for the swift project, please feel free to comment if I am on the right track or not. :-)

Sincerely,
/morten

Morten_Bek_Ditlevsen · January 28, 2018, 3:35pm

One additional question:
The Hashable conformance for Set includes a few FIXMEs. I don't know if I should copy these to my implementations since I do not understand them. :-|
They are:

@_inlineable // FIXME(sil-serialize-all)

and

// FIXME(ABI)#177: <rdar://problem/18915294> Cache Set<T> hashValue

Should similare fixmes be added for any of the implementations for Optional, the Array types and Dictionary?

Sincerely,
/morten

nnnnnnnn · January 29, 2018, 5:28pm

Thanks for working on an implementation!

There's a separate standard library development category for questions like this, and you can also open a PR (prefixed with WIP) and ask for feedback that way. With the PR, it's a little easier to comment on specific parts of your code.

There's a checkHashable function that tests different aspects of the conformance—you can see how Set uses it in "validation-test/stdlib/Set.swift".

Makes sense to me! Note that the way you're using _mixInt is commutative, so the hash value for [1, 2, 3] and [3, 2, 1] will end up the same. That makes sense for sets and dictionaries (where the order shouldn't affect equality or hashing) but for arrays you can use the _combineHashValues function instead.

I think your naive implementation should be fine, as it follows what Set is already doing. Tests could go in "validation-tests/stdlib/Arrays.swift.gyb" and ".../Dictionary.swift".

Once you've built once for tests, you should be able to run subsequent tests individually with "utils/run-test".

moiseev · January 29, 2018, 6:43pm

Please copy the first on (@_inlineable), and in fact, please add this line to every new method/property that you add to the standard library.

As for the second one, here is the text from that radar:

Computing the hashValue of a Set is O(n), so we should cache it and only update in mutating methods.

If this applies to your new methods, it won't hurt to copy the FIXME. It will simplify the task of finding all the relevant places in code that should be updated.

moiseev · January 29, 2018, 6:50pm

Even when you only change the standard library, running utils/build-script is still fine. There is not too much overhead there.

It might be a little better to invoke the build-script with --skip-build-benchmarks --skip-test-cmark --release-debuginfo --debug-swift-stdlib. This will avoid building unnecessary things, as well as speed up the stdlib build.

Morten_Bek_Ditlevsen · January 29, 2018, 7:55pm

Hi Nate,
Thanks for your reply - and for the pointer to the stdlib dev category.

I updated the implementation to use _combineHashValues. I looked at the implementation for Set for inspiration, but failed to realize that Set does naturally not have the ordering issue that Array does.

Following advise from Xiaodi Wu the new implementation follows the example from the code that autosynthesizes Hashable support for enums and structs.

I will have a look at testing, and then do a WIP-PR.

Thanks!

Morten_Bek_Ditlevsen · January 29, 2018, 7:56pm

Hi Max,
Thanks - I will add the FIXMEs to the new methods too.

Morten_Bek_Ditlevsen · January 29, 2018, 8:13pm

Regarding the commutativity: For Dictionaries, the dictionary values should be combined using _combineHashValues too, right?
Because [0: "a", 1: "b"] would otherwise hash equal to [0: "b", 1: "a"], right?

In my current suggestion I use _combineHashValues for all keys and values.

But if the dictionary keys could be thought of as a Set, then all the keys could combined by xor'ing _mixInts and the result of this could then be combined with all the values using _combineHashes.
I guess this would probably be a slight performance optimization, but would such an optimization be worthwile?
I guess it's peanuts compared to caching the hash.

Nevin · January 29, 2018, 8:30pm

You could treat the dictionary as a set of (Key, Value) pairs. That is, for each (Key, Value) pair, use _combineHashValues on that key and its value, then merge the results as you would for a Set.

Morten_Bek_Ditlevsen · January 29, 2018, 8:58pm

I have now added tests using checkHashable for Optional, Array and Dictionary.

I have created a WIP PR: https://github.com/apple/swift/pull/14247

Thanks for the pointers to running the tests much, much faster than I previously did, @nnnnnnnn and @moiseev.

@Nevin right now I am still doing _combineHashValues on all keys and values, but your suggestion would cut that use in half, so that sounds like a good idea.

jawbroken · January 30, 2018, 3:47am

Is that implementation for Dictionary correct? I haven't looked into the currently implementation, but I thought that the iteration order of key-value pairs in a Dictionary was undefined, i.e. equal Dictionaries may iterate in a different order, which would make your hash value unequal for equal dictionaries. See the implementation of Equatable for Dictionaries in that same file, for example.

Morten_Bek_Ditlevsen · January 30, 2018, 5:10am

Thanks a lot, jawbroken, that makes perfect sense.
I will try adding a failing test for equal dictionaries traversed in different order, and then refactor to the suggested “consider each (k, v) as an element in a Set”.

lorentey · January 30, 2018, 4:11pm

Strong +1; types in the standard library should conform to Hashable whenever possible.

Here are some additional types that could benefit from conditional Hashable conformance, in order of decreasing (subjective) importance:

Range and the partial range types
Slice, with caveat below
DictionaryLiteral
CollectionOfOne
EmptyCollection

(I expect the last two would pop up mostly when they're used as constrained type parameters of generic types.)

Slice supports unordered collections, so if we want to make it Hashable, we need to decide if lexicographic equality and ordered hashing is suitable for it. @nnnnnnnn has a convincing argument that it is:

Indeed, a Set.SubSequence value is decidedly not substitutable with slices from other sets, even if they happen to contain the same elements.

hlovatt · January 30, 2018, 10:09pm

Other languages, e.g. Java, have set types that have the concept of order, either order of addition or are sorted. These order retaining sets if added to Swift should conform to Hashable since element order is known and similarly slices of these sets should also be Hashable.

However an unordered set shouldn’t conform to Hashable since if the sets are equal, they contain the same elements, then they should have the same hash. Similarly slices of unordered sets shouldn’t be Hashable.