PSA: The stdlib now uses randomly seeded hash values

(Karoy Lorentey) #1

With PR #14913, the stdlib has switched to a randomly seeded, high-quality hash function for all hashing. (The function is currently SipHash-1-3, but this is an implementation detail that is subject to change.)

“Random seeding” means that hashValue properties will return different values on each execution of a Swift program. This is an important tool for improving the reliability of the Standard Library’s hashing collections, Set and Dictionary. In particular, random seeding enables better protection against (accidental or deliberate) hash-flooding attacks.

This change fulfills a long-standing prophecy in Hashable's documentation:

Hash values are not guaranteed to be equal across different executions of your program. Do not save hash values to use during a future execution.

All Hashable types in the standard library (including primitive types like Bool and Int) now generate randomized hash values:

$ swift
 1> 10.hashValue
$R0: Int = 3189672894122490707
 2> 20.hashValue
$R1: Int = -731278079191967151
 3> true.hashValue
$R2: Int = 1698870856037189238

$ swift
 1> 10.hashValue
$R0: Int = -7576852868862274754
 2> 20.hashValue
$R1: Int = 6522600449632548270
 3> true.hashValue
$R2: Int = 2902202285030183828

I expect hash randomization will have minimal/no impact on the vast majority of existing code. Please let us know by posting here if you have questions or concerns.

Synthesized Hashable implementations are currently still using a lesser form of hashing; I’m preparing an update to change this in PR #15122. Once this gets done, the next step is to consider making the new hashing interface public – we can discuss this in Combining hashes.

Combining hashes
What's the intended behavior of this program that uses a Set as the Sequence it currently is?
(Karoy Lorentey) #2

Update: See below on how to disable hash randomization in more recent builds.

If you need deterministic hash values (for example, to make your test cases repeatable), you can currently override the random seed by setting the _Hasher._secretKey property to a pair of UInt64 values.

_Hasher._secretKey = (0, 0)

Because updating the seed affects all hash values in the current process, you’ll have to do this before the first Set or Dictionary is created. The StdlibUnitTest package has an example of how this may be done – it ensures that the seed is overridden when the first TestSuite is initialized.

Note that _Hasher._secretKey is not intended for use in regular Swift programs, and will probably disappear in a future update. (In all likelihood, it will be replaced by a better interface for overriding the seed.)

(Karoy Lorentey) #3

In certain controlled environments, such as while running certain CI tests, it may be helpful to selectively disable hash seed randomization, so that hash values and Set/Dictionary orderings remain consistent across executions.

In recent trunk development builds, you can disable hash seed randomization by defining the environment variable SWIFT_DETERMINISTIC_HASHING with the value of 1. The Swift runtime looks at this variable during process startup and, if it is defined, replaces the random seed with a constant value.

If you need more control over the seed, please let us know by describing your use case below.

Note that hash randomization has no effect on the performance of Swift programs; however, it is an important factor in the reliability of Set and Dictionary, so generally it should not be disabled.

Additionally, setting SWIFT_DETERMINISTIC_HASHING does not guarantee that hash values remain stable across different versions of the compiler/stdlib – please do not rely on specific hash values and do not save them across executions.