Hashable performance regression

west0r · March 22, 2021, 1:50pm

Hello. I'm trying to get rid of Hashable.hashValue' is deprecated as a protocol requirement warnings but discovered that new hashable function hash(into hasher: inout Hasher) bit slower than hashValue even if I use only one value to combine inside hasher.

Hashable performance is important to my app. I'm using DI implementation built on top of dictionaries like [TypeKey: TypeCreationClosure] where TypeKey is Hashable class. Application register lot of types on start and later resolve them.

With new Hashable implementation I have application startup time regression. And this is critical for me.

I wrote small test project where I you can see regression - GitHub - west0r/HashablePerformanceTest. Tests runs 1000 times with 1000 iterations operations wits set.

old Hasahble implementation tests run 103ms
new Hashable implementation (with 5 combine calls) tests run 144ms (40% slower)
new Hashable implementation (with 1 combine calls) tests run 107ms (4% slower)

I would like to keep `Hashable.hashValue' due performance reason but it's deprecated. Sooner or later I will need to migrate to new implementation and get regression.

How can I avoid regression with new Hashable implementation or disable warnings?
Maybe Swift team have any plans to impove Hashable performance in future?

ahti · March 22, 2021, 6:51pm

OldHashable and TweakedHashable both calculate and store their hash in init(), which runs twice per repetition, after which they only return/combine it in the hashing method. NewHashable calculates the hash it in the hashing method each time, which is called (estimated, might depend on Set internals) around 17000 times per repetition, so I don't think this benchmark is measuring quite the right things.

I'm actually amazed it's only 40% slower. Maybe the compiler is optimizing away some of those calls?

(Also, equal hashes don't mean the objects are equal, so the == implementation isn't quite right either)

lorentey · March 22, 2021, 8:02pm

I strongly recommend correctly implementing Hashable requirements — i.e., combining all 5 UUIDs into the Hasher, and fixing the bug @ahti highlighted in your implementation of ==. Whenever possible, prefer to use compiler-synthesized Hashable conformance instead of hand-rolling your own implementations.

If you are loading enough data into a Dictionary at startup that it produces a measurable performance issue, then it may be a good idea to consider approaching the problem in a different way. (E.g., load the data on a background queue, factoring out places where you need synchronous access. If the data stays the same across executions, consider implementing your own serializable hash table implementation with a perfect hash function, and loading it from a resource instead.)

How big is your Dictionary? Why do you need to synchronously load data into it at startup?

Things to consider:

Outside of microbenchmarks, creating all the values that you insert into the hashed collection tends to be far more complicated than hashing them. You can use the Time Profiler template of Instruments to get a good picture of where time is spent at startup. I would be very much interested to know if Hasher operations show up us significant!
Have you harvested low-hanging performance fruit like reserving capacity before inserting elements into the hashed collection? That alone can lead to a 2x improvement.

Precomputing a hash value is sometimes helpful, but not nearly as often as one would expect — it only helps if you expect to repeatedly hash the same instance, and if you have enough data to make up for the overhead of feeding the precomputed hash into Hasher. The correct way to precompute a hash value is to feed the data into your own Hasher instance:

 init() {
  var hasher = Hasher()
  hasher.combine(value1)
  hasher.combine(value2)
  hasher.combine(value3)
  hasher.combine(value4)
  hasher.combine(value5)
  self.precomputedHash = hasher.finalize()
}

func hash(into hasher: inout Hasher) {
  hasher.combine(precomputedHash)
}

static func ==(left: Self, right: Self) -> Bool {
  // hash equality is a necessary but not sufficient condition of equality
  guard left.precomputedHash == right.precomputedHash else { return false }
  return left.value1 == right.value1 && left.value2 == right.value2 && …
}

If you have constant data, consider switching to a sorted array instead. Binary search tends to be a bit slower at lookups than hashing, but if you can sort the elements at compile time, populating a sorted array can be a trivial operation.
Using ^ as a hash function is generally a bad idea. That said, if you are confident that the data won’t ever come from an untrusted source, then it may sometimes speed things up — until things change and it suddenly doesn’t. Properly using Hasher will result in reliable performance, no matter what data is stored in the hashed collection.

west0r · March 24, 2021, 12:50pm

OldHashable and TweakedHashable both calculate and store their hash in init() , which runs twice per repetition, after which they only return/combine it in the hashing method. NewHashable calculates the hash it in the hashing method each time, which is called (estimated, might depend on Set internals) around 17000 times per repetition, so I don't think this benchmark is measuring quite the right things.

@ahti it's measuring right things. NewHashable implemented as Apple recommend in code examples. It's slow. TweakedHashable is faster because of using one combine() call and caching hash value in init, but it's slower than OldHashable anyway.

(Also, equal hashes don't mean the objects are equal, so the == implementation isn't quite right either)

Yes. It's dummy implementation but it's not important for test. I'm testing only hashable, not equatable and avoiding comparison of instance values.

I strongly recommend correctly implementing Hashable requirements — i.e., combining all 5 UUIDs into the Hasher, and fixing the bug @ahti highlighted in your implementation of == . Whenever possible, prefer to use compiler-synthesized Hashable conformance instead of hand-rolling your own implementations.

I'm testing different Hashable implementations and try to pick up best by performance.

If you are loading enough data into a Dictionary at startup that it produces a measurable performance issue, then it may be a good idea to consider approaching the problem in a different way. (E.g., load the data on a background queue, factoring out places where you need synchronous access. If the data stays the same across executions, consider implementing your own serializable hash table implementation with a perfect hash function, and loading it from a resource instead.)
How big is your Dictionary? Why do you need to synchronously load data into it at startup?

It's a part from dependency injection container. I register all application's services in it and lot of them resolve (getting by key which implements Hashable protocol) on app startup. I need dictionary but I can use something instead of Hashable protocol and Hasher due to Dictionary restrictions.

Outside of microbenchmarks, creating all the values that you insert into the hashed collection tends to be far more complicated than hashing them. You can use the Time Profiler template of Instruments to get a good picture of where time is spent at startup. I would be very much interested to know if Hasher operations show up us significant!

I have application startup tests on my CI server. I get performance degradation when I change only Hashable implementation, nothing else. I don't need TimeProfiler because I reproduced degradation in demo project (from github)

Have you harvested low-hanging performance fruit like reserving capacity before inserting elements into the hashed collection? That alone can lead to a 2x improvement.

I can't reserve capacity because I don't know number of insertions. It happens dynamically from different parts of application. And it wont help me because slow hash implementation uses not only for insetions but for getting objects from dictionary by key. I have more gettings than insertions.

Precomputing a hash value is sometimes helpful, but not nearly as often as one would expect — it only helps if you expect to repeatedly hash the same instance, and if you have enough data to make up for the overhead of feeding the precomputed hash into Hasher. The correct way to precompute a hash value is to feed the data into your own Hasher instance:

I have lots of getting value from dictionary by key which is Hashable. I need precomputing.
Ok, I precompute my hash correctly, but it wont speed up my code.

If you have constant data, consider switching to a sorted array instead. Binary search tends to be a bit slower at lookups than hashing, but if you can sort the elements at compile time, populating a sorted array can be a trivial operation.

I need to have ability to get values by key, not index. Binary search is fast, problem with concrete new hash implementation. It's slower than old hashValue even if I precompute hash.

Using ^ as a hash function is generally a bad idea. That said, if you are confident that the data won’t ever come from an untrusted source, then it may sometimes speed things up — until things change and it suddenly doesn’t. Properly using Hasher will result in reliable performance, no matter what data is stored in the hashed collection.

I know but it it fast and normally work for this case. Hasher is slower even if I precompute only one value for Hasher. Hasher is slower itself.

I appreciate all your advices but it not help me. I know that all, but I have performance degradation in application by 4-5% when I only change hashValue to Hasher. My benchmark shows it. It not best benchmark but it's close for my application start up pattern. I trigger hash computing as many as I can and see that different hash implementations is different by performance. Anyway Hasher is slower than hashValue, even precomputed. In my application and benchmark.

Unfortunately I need to replace hashValue to Hasher to remove Xcode warnings.

I think it shows problem with Hasher for cases when you need to use hash in one instance many times. Hasher brings performance overhead.

I need to know

How to avoid this overhead without writing custom Dictionary implementation
How to get rid of warnings and use good old hashValue (temporary solution, because Apple will obsolete it in future)
Do Swift team have any plans to make hashable performance better? Maybe custom hash functions, custom Hashers, HashVisitable etc

lorentey · March 25, 2021, 5:34am

In most use cases, correctness tends to be by far more important than raw performance.

If you want to have the absolute best Dictionary insertion/lookup performance, then the right way to achieve that without compromising correctness is to switch to a Key type that consists of fewer bits than five whole UUIDs. (A single UUID alone is usually considered a pretty good universal identifier. For things within a single process, a unique 8-byte address (such as an ObjectIdentifier corresponding to a live object) is a frequent & very convenient choice.)

What sort of performance degradation? I.e., how many percentage points, how many milliseconds?

Again, how big is your Dictionary? (You said it was for dependency injection -- how many dependencies do you have?) How many times is the dictionary actually queried during app startup?

For what it's worth, as a rule of thumb, I expect a million-element Dictionary to be able to look up about ten million random items per second; but the exact numbers vary wildly with hardware details.

What exactly is your Key type in the actual app?

Your microbenchmark tells me that hashing/comparing 80 bytes instead of 8 bytes only leads to a 40% regression in overall Dictionary performance. I am actually pretty happy with that statistic -- I'm not sure how much more we can reasonably narrow that gap!

The microbenchmark tells me very little about how (and if!) this piece of information actually affects your application performance. The Time Profiler instrument lets you objectively quantify how much of your observed performance degradation can be attributed to Hasher. (Knowing nothing about your app, I still feel very confident about promising it's not actually spending a significant time in Hasher.combine/finalize. You are very welcome to prove me wrong, though!)

Changing one thing can have surprising consequences -- adding slightly more code to an innocuous function can randomly change inlining decisions in its callers that can end up triggering huge performance swings. We've also routinely measured 10-15% performance fluctuations in microbenchmarks just because unrelated code randomly got realigned in the generated binary in a way that lead to slightly worse instruction cache utilization.

Note that even if you only implement hashValue, the returned value is still fed to a Hasher instance to scramble its bits. (Otherwise sequential hash values would lead to arbitrarily long lookup chains in our hash table implementation.)

Please do believe me when I say that an (unexplained) 4-5% performance degradation in a hashing microbenchmark rarely if ever directly translates to a measurable performance degradation in any real world app.

5% is below the typical threshold for random microbenchmark fluctuations in Swift -- sorry for being dismissive, but the phase of the moon can probably have a bigger impact on the performance of compiled Swift code.

One way to try figuring out why one benchmark is 5% slower is to compare Time Profiler results for the two runs. Where does one spend more time than the other? (For such a small difference, the results can be inconclusive, though.) Look at the disassembly -- how does it differ between the two benchmarks?

For reference, I predict that both the hashValue-based benchmark and the one that feeds a single integer to the hasher in hash(into:) will spend exactly the same time in Hasher.

If you need me to take a look, I can probably set aside some time for it this weekend.

There is no way to disable deprecation warnings other than by refraining from using deprecated features.

Again, (unless your code needs to run on Swift 4.1 or below) there is no possible reason to prefer implementing hashValue. The only reason it's still a Hashable requirement is to preserve compatibility with code written before Swift 4.2. (Otherwise we would've already turned it into a simple non-customizable extension property.)

This isn't just because I personally hate hashValue (although frankly I do, and I've written a whole tirade against it in the proposal that deprecated it ), but because implementing it does not achieve anything that you cannot achieve far better in every way with hash(into:).

If you write code that only implements hashValue, your type will still have the following compiler-synthesized hash(into:) implementation, and Dictionary simply ends up calling this instead of calling hashValue directly:

func hash(into hasher: inout Hasher) {
  hasher.combine(self.hashValue)
}

Implementing hashValue gets you the exact same amount of Hasher invocations as implementing hash(into:).

There is no technical reason to ever implement hashValue in new code.

We're always on the lookout for performance improvement opportunities. I personally agonize over every detail of Dictionary performance.

However, there is no actual indication that hashing is particularly slow -- so resources are probably much better spent on more fruitful optimizations elsewhere. (Hint: getting real-world Time Profiler reports where Hasher shows up as a significant percentage would definitely help reprioritizing things!)

Note that the current SipHash implementation has plenty fast throughput -- if you have enough data, it measures as only 2x slower than xoring the same, which is mind boggling to me (it's surprisingly close to being limited by RAM speed, rather than CPU). However, its finalization makes it "slow" (when compared to xor!) in the more typical case when only a handful of bytes are being hashed. (In the configuration the stdlib is currently using, finalize is about three times as much work as combining a single UInt64 value.)

Switching to a standard hash function other than SipHash is definitely in the cards at some point; however, SipHash has a high reputation for security that makes it a very comfortable choice. Set and Dictionary are very sensitive to hash collisions, and if someone was able to reliably generate them, these collection types would regress to performance that's worse than an unsorted array.

So I need to balance potential best case performance against the risk of triggering easily reproducible worst-case performance, and I'd need a bit of convincing before I accepted (or created) a patch that switched to a different function. (Again, Time Profiler data showing significant improvements in a real app would be a helpful incentive!)

A probably more fruitful optimization opportunity would be to weaken hashing in smaller dictionaries, where accidental (or deliberate) hash collisions aren't impacting performance that much -- not necessarily by switching to a different hash function (although that's an option), but by setting a limit on how much data standard variable-width types feed into Hasher. I have plans to investigate this direction. This scheme may or may not pan out in practice, though -- hashed collection performance can be somewhat finicky.

Supporting custom hash functions would require the introduction of a generic hash(into:) requirement, which would generally make hashing significantly slower, not faster. It would also complicate things for no particularly good reason; the current standard hash function seems plenty fast, and I like the simplicity of never asking people to implement their own. I also sleep better at night knowing that if the current function proves insecure, we can replace it in every Dictionary instance in every ABI-stable Swift program by simply pushing out an OS update.

So no, HashVisitable is unlikely to ever happen as a hashed collections feature in the standard library. (Do check out CryptoKit.HashFunction, though!)

lorentey · March 25, 2021, 6:45am

I recommend applying this small patch to your microbenchmark and rerunning it.

diff --git a/TestHashable/TestHashable.swift b/TestHashable/TestHashable.swift
index 3e37875..87f4b47 100644
--- a/TestHashable/TestHashable.swift
+++ b/TestHashable/TestHashable.swift
@@ -53,7 +53,7 @@ class TweakedHashable: Hashable {
     }
     
     static func == (lhs: TweakedHashable, rhs: TweakedHashable) -> Bool {
-        lhs.hashValue == rhs.hashValue
+        lhs.hashableValue == rhs.hashableValue
     }
     
     func hash(into hasher: inout Hasher) {

In the original benchmark, the TweakedHashable test unnecessarily performs three times as many Hasher invocations as the OldHashable one, and it still comes out only ~5% slower! That's a pretty awesome result in my book. There is nothing slow about Hasher; it's plenty quick for this usecase.

Nevertheless, the recommendations above still stand. None of the three types in the benchmark project implement Equatable correctly -- basing == on a 64-bit hash value is a recipe for disaster (independent of whether its generated by XOR or Hasher). And even worse, it's infrequent, unreproducible disaster, which is the worst sort.

With the above patch applied, I get:

I believe this means hash(into:) is officially 3.2% faster than hashValue. Sounds good to me!

Karl · March 25, 2021, 7:22am

I must say, @lorentey , I really enjoy it when you post about hashing. Always informative, always a good read. Thank you very much for looking in to it so closely