In most use cases, correctness tends to be by far more important than raw performance.
If you want to have the absolute best Dictionary insertion/lookup performance, then the right way to achieve that without compromising correctness is to switch to a Key type that consists of fewer bits than five whole UUIDs. (A single UUID alone is usually considered a pretty good universal identifier. For things within a single process, a unique 8-byte address (such as an ObjectIdentifier
corresponding to a live object) is a frequent & very convenient choice.)
What sort of performance degradation? I.e., how many percentage points, how many milliseconds?
Again, how big is your Dictionary? (You said it was for dependency injection -- how many dependencies do you have?) How many times is the dictionary actually queried during app startup?
For what it's worth, as a rule of thumb, I expect a million-element Dictionary
to be able to look up about ten million random items per second; but the exact numbers vary wildly with hardware details.
What exactly is your Key type in the actual app?
Your microbenchmark tells me that hashing/comparing 80 bytes instead of 8 bytes only leads to a 40% regression in overall Dictionary performance. I am actually pretty happy with that statistic -- I'm not sure how much more we can reasonably narrow that gap!
The microbenchmark tells me very little about how (and if!) this piece of information actually affects your application performance. The Time Profiler instrument lets you objectively quantify how much of your observed performance degradation can be attributed to Hasher. (Knowing nothing about your app, I still feel very confident about promising it's not actually spending a significant time in Hasher.combine/finalize. You are very welcome to prove me wrong, though!)
Changing one thing can have surprising consequences -- adding slightly more code to an innocuous function can randomly change inlining decisions in its callers that can end up triggering huge performance swings. We've also routinely measured 10-15% performance fluctuations in microbenchmarks just because unrelated code randomly got realigned in the generated binary in a way that lead to slightly worse instruction cache utilization.
Note that even if you only implement hashValue
, the returned value is still fed to a Hasher
instance to scramble its bits. (Otherwise sequential hash values would lead to arbitrarily long lookup chains in our hash table implementation.)
Please do believe me when I say that an (unexplained) 4-5% performance degradation in a hashing microbenchmark rarely if ever directly translates to a measurable performance degradation in any real world app.
5% is below the typical threshold for random microbenchmark fluctuations in Swift -- sorry for being dismissive, but the phase of the moon can probably have a bigger impact on the performance of compiled Swift code. 
One way to try figuring out why one benchmark is 5% slower is to compare Time Profiler results for the two runs. Where does one spend more time than the other? (For such a small difference, the results can be inconclusive, though.) Look at the disassembly -- how does it differ between the two benchmarks?
For reference, I predict that both the hashValue
-based benchmark and the one that feeds a single integer to the hasher in hash(into:)
will spend exactly the same time in Hasher
.
If you need me to take a look, I can probably set aside some time for it this weekend.
There is no way to disable deprecation warnings other than by refraining from using deprecated features.
Again, (unless your code needs to run on Swift 4.1 or below) there is no possible reason to prefer implementing hashValue
. The only reason it's still a Hashable
requirement is to preserve compatibility with code written before Swift 4.2. (Otherwise we would've already turned it into a simple non-customizable extension property.)
This isn't just because I personally hate hashValue
(although frankly I do, and I've written a whole tirade against it in the proposal that deprecated it
), but because implementing it does not achieve anything that you cannot achieve far better in every way with hash(into:)
.
If you write code that only implements hashValue
, your type will still have the following compiler-synthesized hash(into:)
implementation, and Dictionary
simply ends up calling this instead of calling hashValue
directly:
func hash(into hasher: inout Hasher) {
hasher.combine(self.hashValue)
}
Implementing hashValue gets you the exact same amount of Hasher invocations as implementing hash(into:)
.
There is no technical reason to ever implement hashValue
in new code.
There is no technical reason to ever implement hashValue
in new code.
There is no technical reason to ever implement hashValue
in new code.
We're always on the lookout for performance improvement opportunities. I personally agonize over every detail of Dictionary performance.
However, there is no actual indication that hashing is particularly slow -- so resources are probably much better spent on more fruitful optimizations elsewhere. (Hint: getting real-world Time Profiler reports where Hasher shows up as a significant percentage would definitely help reprioritizing things!)
Note that the current SipHash implementation has plenty fast throughput -- if you have enough data, it measures as only 2x slower than xoring the same, which is mind boggling to me (it's surprisingly close to being limited by RAM speed, rather than CPU). However, its finalization makes it "slow" (when compared to xor!) in the more typical case when only a handful of bytes are being hashed. (In the configuration the stdlib is currently using, finalize
is about three times as much work as combining a single UInt64 value.)
Switching to a standard hash function other than SipHash is definitely in the cards at some point; however, SipHash has a high reputation for security that makes it a very comfortable choice. Set
and Dictionary
are very sensitive to hash collisions, and if someone was able to reliably generate them, these collection types would regress to performance that's worse than an unsorted array.
So I need to balance potential best case performance against the risk of triggering easily reproducible worst-case performance, and I'd need a bit of convincing before I accepted (or created) a patch that switched to a different function. (Again, Time Profiler data showing significant improvements in a real app would be a helpful incentive!)
A probably more fruitful optimization opportunity would be to weaken hashing in smaller dictionaries, where accidental (or deliberate) hash collisions aren't impacting performance that much -- not necessarily by switching to a different hash function (although that's an option), but by setting a limit on how much data standard variable-width types feed into Hasher. I have plans to investigate this direction. This scheme may or may not pan out in practice, though -- hashed collection performance can be somewhat finicky.
Supporting custom hash functions would require the introduction of a generic hash(into:)
requirement, which would generally make hashing significantly slower, not faster. It would also complicate things for no particularly good reason; the current standard hash function seems plenty fast, and I like the simplicity of never asking people to implement their own. I also sleep better at night knowing that if the current function proves insecure, we can replace it in every Dictionary
instance in every ABI-stable Swift program by simply pushing out an OS update.
So no, HashVisitable
is unlikely to ever happen as a hashed collections feature in the standard library. (Do check out CryptoKit.HashFunction
, though!)