Hmm, I wonder why the checksum in the 3rd row changed for `Float`

and `Double`

but not `Float80`

. I would expect it to stay the same for all three types, since the range `-greatestFiniteMagnitude ... 0`

uses the standard codepath in both versions, which has not changed.

Oops, looks like I accidentally posted results after having edited the third range (tried with `-1 ... 0`

), sorry about that.

## Here are the results again, with the correct 3rd range.

I've added the checksums for the previous version of your code after the ones for this latest version.

```
Float
0.226241222 seconds (checksum: 138254 (138254))
0.053931518 seconds (checksum: 5896375 (3957719))
0.212501097 seconds (checksum: 3631585 (3631585))
0.051749585 seconds (checksum: 5896271 (3957703))
Double
0.280503323 seconds (checksum: 2981475560360739 (2981475560360739))
0.079444674 seconds (checksum: 1323779508271287 (315125084029635))
0.277631542 seconds (checksum: 759283211863005 (759283211863005))
0.079980958 seconds (checksum: 1323779508272975 (315125084030247))
Float80
1.086825454 seconds (checksum: 1370150508309074436 (1370150508309074436))
0.377088032 seconds (checksum: 1289353272936233143 (8536050357597743157))
7.324208065 seconds (checksum: 4117734076763979444 (4117734076763979444))
0.392084622 seconds (checksum: 1289353272936204111 (8536050357597755341))
```

I’m noticing that the “fast path” is actually slower when using the `SystemRandomNumberGenerator`

.

With both `Float`

and `Double`

, my implementation is about 2x slower than the stdlib’s for `0..<1`

, but only 1.5x slower for `0..<1.5`

.

This makes sense, since the “fast path” requires 2 calls to the generator whereas the standard path only needs 1 call. When the generator is slow, that extra call has a major effect.

In contrast, with `Xoroshiro128Plus`

, my implementation is about 4x slower for `0..<1`

, and about 12x slower for `0..<1.5`

.

When the generator is fast, a second call to it doesn’t matter and the simpler logic of the fast path does indeed make it faster.

Interestingly, with `Float80`

and the system generator, my implementation is only about 1.2x slower than the stdlib’s on `0..<1`

, and 1.6x slower on `0..<1.5`

.

• • •

Edit:

I just added a check so the “fast path” is not used with `SystemRandomNumberGenerator`

. And…it significantly sped up `Float`

and `Double`

on `0..<1`

so they’re only 1.2x slower than the stdlib with the system random number generator now.

It did bump `Float80`

down to 1.3x slower than the stdlib, but I’m much more interested in optimizing for `Float`

and `Double`

.

This is getting close!

• • •

Edit 2:

For many subnormal ranges like `0 ..< .leastNormalMagnitude/4`

, using the system RNG my implementation is about 0–5% faster than the stdlid for `Float`

and `Double`

, and about 35–40% faster for `Float80`

.

And using xoroshiro, my implementation is 25–30% faster for `Float`

and `Double`

, and a whopping 45–50% faster for `Float80`

.

I’m starting to think this might actually be viable.

Yes, but the fast path could make more efficient use of the random bits and thereby require 1 rather than 2 calls to `generator.next()`

, at least for `Float`

and `Double`

(for `Float80`

it would require 2 calls), eg:

```
extension BinaryFloatingPoint where
RawExponent: FixedWidthInteger,
RawSignificand: FixedWidthInteger
{
static func randomThroughRawExponent<R: RandomNumberGenerator>(
_ maxExponentBitPattern: RawExponent,
using generator: inout R
) -> Self
{
var bits: UInt64
var tzSum = 0
again: do {
bits = generator.next()
let maxZS = Int(truncatingIfNeeded: maxExponentBitPattern) &- tzSum
let tzs = min(bits.trailingZeroBitCount, maxZS)
tzSum &+= tzs
if tzs == 64 && tzSum < maxExponentBitPattern { continue again }
let consumedBits = tzs &+ 1
bits &>>= consumedBits
if (64 &- consumedBits) < Self.significandBitCount {
bits = generator.next()
}
}
let e = maxExponentBitPattern &- RawExponent(truncatingIfNeeded: tzSum)
let shifts = RawSignificand.bitWidth &- Self.significandBitCount
let mask = ~RawSignificand(0) &>> shifts
let s = RawSignificand(truncatingIfNeeded: bits) & mask
return Self(sign: .plus, exponentBitPattern: e, significandBitPattern: s)
}
}
```

Using `SystemRandomNumberGenerator()`

, this implementation will (on average) be as fast as the current stdlib `F.random(in: 0 ... powerOfTwo)`

when `F`

is `Float`

or `Double`

, and it will actually be faster than the current stdlib implementation when F is `Float80`

.

To support `-powerOfTwo ..< powerOfTwo`

ranges, one more bit would be needed, but that shouldn't change the performance.

Using a fast pseudo random number generator, this implementation of the fast path seems to be about twice as fast as that of the current implementation (per this commit).

But for some reason my code is (significantly) *slower* for `Float`

than for `Double`

which seems counterintuitive (I'd expect it to be equally fast or maybe a tiny bit faster for `Float`

).

EDIT: Ah, it turned out that the cause of this was the call to `min`

(somehow) and an `<`

operator between different integer types, as well as the optimizer happening to inline the method for `Double`

but not for `Float`

. So the above implementation was giving me this (counterintuitive) result:

```
Float
0.233315511 seconds
Double
0.153776696 seconds
```

## And this improved version of the method gives me the following (more expected) result.

```
extension BinaryFloatingPoint where
RawExponent: FixedWidthInteger,
RawSignificand: FixedWidthInteger
{
@inline(__always)
static func randomThroughRawExponent<R: RandomNumberGenerator>(
_ maxExponentBitPattern: RawExponent,
using generator: inout R
) -> Self
{
let maxExpInt = Int(truncatingIfNeeded: maxExponentBitPattern)
var bits: UInt64
var tzSum = 0
again: do {
bits = generator.next()
let maxZs = maxExpInt &- tzSum
let tzs_ = bits.trailingZeroBitCount
let tzs = tzs_ < maxZs ? tzs_ : maxZs
tzSum &+= tzs
if tzs == 64 && tzSum < maxExpInt { continue again }
let consumedBits = tzs &+ 1
bits &>>= consumedBits
if (64 &- consumedBits) < Self.significandBitCount {
bits = generator.next()
}
}
let e = maxExponentBitPattern &- RawExponent(truncatingIfNeeded: tzSum)
let shifts = RawSignificand.bitWidth &- Self.significandBitCount
let mask = ~RawSignificand(0) &>> shifts
let s = RawSignificand(truncatingIfNeeded: bits) & mask
return Self(sign: .plus, exponentBitPattern: e, significandBitPattern: s)
}
}
```

```
Float
0.14883914 seconds
Double
0.148902778 seconds
```

Looking at the performance of the fast path in your current implementation, I see that it too is slightly slower for Float than for Double, although not as much as was the case for my code. Adding `@inline(__always)`

to the relevant methods fixed that. With this change I see this (with WyRand):

```
Your fast path:
0.33 seconds for Float and Double, 2.11 seconds for Float80
My fast path (the improved version):
0.14 seconds for Float and Double, 1.01 seconds for Float80
```

Thus there should be no need to special case away (an improved implementation of) the fast path for the system generator.

Good point. I can rework the `randomUpToExponent`

function to use the leftover bits for the significand when possible.

I can even introduce an `allowNegative`

argument, so it will use a leftover bit to handle the symmetric-bounds case (eg. `-1..<1`

), and thus avoid calling `Bool.random()`

.

My first crack at it (using the system RNG, and compared to the stdlib’s `random(in:)`

) is 10% slower for `Float`

and `Double`

, and 15% faster for `Float80`

, on both `0..<1`

and `-1..<1`

.

And on `0 ..< .leastNormalMagnitude`

, it’s 10% *faster* for `Float`

and `Double`

, and a remarkable 3x faster for `Float80`

.

I did not yet eliminate the heterogeneous comparison, but I did locate and mark all of them in the file with a comment (`// Heterogeneous compare`

). All of them are between `RawExponent`

and either `Int`

or `UInt64`

.

Rather than special-case this one function, if those comparisons are really eating the performance then I’d like to take care of all of them together.

• • •

In your implementation, you unconditionally convert from `RawExponent`

to `Int`

. I’m not really comfortable with that, because (a) the standard library types use `UInt`

which could exceed `Int.max`

(even though it can’t actually for floating-point values), and (b) an arbitrary floating-point type could use a larger integer type.

I’m going to try to special-case by checking whether `exponentBitCount < Int.bitWidth`

, and if not then whether `maxExp`

itself fits in an `Int`

(eg. `maxExp._binaryLogarithm() < Int.bitWidth &- 1`

).

Your `bits`

variable is `UInt64`

, so this can only put at most 64 bits into the significand. That’s enough for the standard library types (including `Float80`

) but for the general case it’s not sufficient.

(The fix is pretty easy: first check if the number of available bits is at least `significandBitCount`

, and if not then generate a random `RawSignificand`

.)

• • •

Edit:

I just tested my work-in-progress implementation with xoroshiro on `0 ..< .leastNormalMagnitude`

, and compared to the stdlib mine is 5x faster for `Float`

, 6x faster for `Double`

, and a blazing 14x faster for `Float80`

.

Cool, not using floating point `*`

and `+`

probably makes a lot of difference for subnormals, especially for `Float80`

which on top of those usually slow subnormal arithmetic operations also seems to be subject to a lot of missed optimization opportunities.

However, note that the stdlib implementation is around 15x (`Float`

and `Double`

) and 72x (`Float80`

) slower for

`0 ..< leastNormalMagnitude`

than it is for eg

`0 ..< 1`

or `0 ..< 1024`

.

The implementation in my previous post is about 14x faster than the stdlib for the subnormal range for `Float`

and `Double`

and 120x faster for `Float80`

. But it doesn't handle the sign bit etc of course.

EDIT:

BTW, I have a very simple sketch for a `PseudorandomNumberGenerator`

protocol, something which I think should be added to the Random API of the stdlib, since it's needed for eg initializing or restarting a PRNG with some specfic state or seed in a generic context. I've needed it for writing tests for this stuff.

## Here it is together with the WyRand generator

```
protocol PseudorandomNumberGenerator : RandomNumberGenerator {
associatedtype State
init(seed: UInt64)
init?(state: State)
}
extension PseudorandomNumberGenerator {
init(seed: UInt64? = nil) {
self.init(seed: seed ?? UInt64.random(in: .min ... .max))
}
}
/// The wyrand generator, translated from:
/// https://github.com/wangyi-fudan/wyhash
struct WyRand : PseudorandomNumberGenerator {
typealias State = UInt64
private (set) var state: State
init(seed: UInt64) {
self.state = seed
}
init(state: State) {
// Every UInt64 value is a valid WeRand state, so no need for this
// initializer to be failable, it will satisfy the requirement anyway.
self.state = state
}
mutating func next() -> UInt64 {
state &+= 0xa0761d6478bd642f
let mul = state.multipliedFullWidth(by: state ^ 0xe7037ed1a0b428db)
return mul.high ^ mul.low
}
}
```

This depends a lot on the platform. On Apple's recent ARM cores, there's zero penalty for operations with subnormal numbers. On x86, it can easily be 100x.

On platforms with slow subnormal operations, *everything* is going to be slow, so it's not really a concern if random number generation is also slow. This is an extremely niche corner case, that is broadly not worth worrying about (and certainly not worth designing around--if fast subnormals fall out naturally from other optimizations, great, but it doesn't make sense to spend software engineering effort on them).

Without any constraints on `State`

, `init(state:)`

is almost useless as a protocol requirement (it remains useful on concrete types, of course). What's the reason to put it on the protocol?

The reason for the associated type `State`

is simply this:

```
init?(state: State)
```

which is

For example, you might be running tests and you want to rerun something from a specific state (not seed) (for the WyRand generator, the state and seed happens to be the same, but that is not true for every PRNG, some will not take 0 as a state, and the `State`

might be 128 bits or whatever).

But without a public `state`

property or getter, you can't save the state to be able to restart it. Plus, you need to have the constraint that `State`

is POD for any of this to work.

What you definitely *do* want to have (and seem to be missing) is some mechanism to provide an `init()`

that fills the whole state from the system generator. I can see a few different ways to put that into the protocol, with slightly different tradeoffs.

I get similar performance for `WyRand`

and `Xoroshiro128Plus`

for both `Float`

and `Double`

, but `WyRand`

is 2x slower for `Float80`

.

• • •

In any event, I’ve uploaded my latest version: link to diff.

I gave you a mention in the “author” comment, I marked the lines that have heterogeneous comparisons, and I made `randomUpToExponent`

faster as we’ve been discussing.

The `randomExponent`

function now returns a triple of (exponent, spare bits, bit count), so the extra bits can be repurposed for the significand. I also made it generic over the exponent type, with a specialization for `Int`

. That way if the exponent is small it can first be cast to `Int`

and run fast, but if it’s large the code still works.

Seeing the improvement that comes from avoiding a single heterogeneous comparison, the next optimization target will likely be the other three.

I have that protocol too, but instead of the `init(seed: )`

I have a generic init that takes a RandomNumberGenerator inout.

```
protocol PsuedorandomNumberGenerator: RandomNumberGenerator {
associatedtype State
var state: State { get }
init?(state: State)
init<Source: RandomNumberGenerator>(from: inout Source)
}
extension PseudoRandomGenerator {
public init() {
var entropy = SystemRandomNumberGenerator()
self.init(from: &entropy)
}
}
struct WyRand : PseudorandomNumberGenerator {
typealias State = UInt64
private (set) var state: State
init<Source: RandomNumberGenerator>(from source: inout Source) {
state = source.next()
}
init(state: State) {
// Every UInt64 value is a valid WeRand state, so no need for this
// initializer to be failable, it will satisfy the requirement anyway.
self.state = state
}
mutating func next() -> UInt64 {
state &+= 0xa0761d6478bd642f
let mul = state.multipliedFullWidth(by: state ^ 0xe7037ed1a0b428db)
return mul.high ^ mul.low
}
}
```

I find `init(state:)`

is usually more useful than `init(seed:)`

, and if I want it then I usually need to implement it using `init(from:)`

anyway.

Yes, there should be a property, here's my not-as-sketchy actual protocol:

```
/// A type that produces a reproducible stream of uniformly distributed
/// pseudorandom bits.
protocol PseudorandomNumberGenerator {
associatedtype State
var state: State { get }
/// Creates a new instance of this generator with a state that is
/// completely determined by the given seed. Any `UInt64` is allowed and
/// maps to a valid state. Each seed may or may not map to a unique state
/// depending on the generator.
init(seed: UInt64)
/// Creates a new instance of this generator with the given state.
/// The initializer fails if the given state is invalid for this generator.
init?(state: State)
/// Advances the state of the generator and returns the next 64 uniformly
/// distributed pseudorandom bits.
mutating func next() -> UInt64
}
```

(It doesn't inherit from RandomNumberGenerator for a couple of irrelevant reasons.)

How would you add the constraint that State can be eg `UInt32`

or `(UInt64, UInt64)`

? Without any constraints (and just manually ensuring that any conforming type doesn't have strange `State`

like `Void`

, `Never`

or a class type), I can use the getter and init no matter the type of the State, and it has worked for me for some years now.

You mean this:

?

No, because that only consumes 64b of randomness, which is insufficient for a lot of applications. The empty `init()`

should set the entire internal state using the system RNG.

Sure. I had one in which I used `SecRandomCopyBytes`

at some time, but I didn't use it so I just kept the one that does it given a seed.

Note that `init(seed: UInt64)`

is required/expected to set *all* bits in the state in a sufficiently scrambled way, given the seed (for example xoroshiro uses splitmix64 a couple of times to scramble the seed and set the entire state). For a generator with a state that is less than 64 bit, some seeds will map to the same state, and for generators with more than 64 bits not all states will be possible to achieve given the available seeds. But in my experience it's useful to have a single common type for the seed, just as it is for the resulting output / next bunch of bits (`next() -> UInt64`

) (a generator with a 32 bit state would have to advance its state two times in order to produce the next output).

(This protocol has just been for personal / in-house use, and is only meant as a sketch here, not as a formal proposal.)

Is it even possible to express the POD constraint on an associated type, or is that a requirement that can only be documented?

I'm not at all sure about the heterogeneous comparisons though, I was just trying out various things and I think it made some difference in my specific case. Do you mean you saw an improvement by avoiding that heterogeneous comparison?

It's not possible today, is fairly likely to be possible at some (near) future point, because we'd like to use it in a bunch of other places as well.

So what did you mean by this?

EDIT: I guess what you meant is that it would work given something more like @Nobody1707's:

```
protocol PsuedorandomNumberGenerator: RandomNumberGenerator {
associatedtype State
var state: State { get }
init?(state: State)
init<Source: RandomNumberGenerator>(from: inout Source)
}
```

That the function described cannot be used as intended.

There are a variety of workarounds available; you could constrain state to be exposed in the protocol API as `[UInt64]`

, or have a `FixedWidthInteger`

constraint, or require init to/from `Data`

or something similar instead of exposing the actual type--all of those options would get the functionality you want, without actually limiting things very much. If the language had a POD constraint, that would be more general, of course, but we don't have to throw our hands up in the air just because it doesn't yet =)

Sorry for being slow / tired, but what function, `init(state:)`

?

I use the protocol and `init(state: State)`

and it works, here is a generator that has a larger than 64 bit state as an example:

```
/// The jsf64 generator, translated from:
/// www.pcg-random.org/posts/bob-jenkins-small-prng-passes-practrand.html
struct Jsf64 : PseudorandomNumberGenerator {
typealias State = (a: UInt64, b: UInt64, c: UInt64, d: UInt64)
private (set) var state: State
init(seed: UInt64) {
self.state = (a: 0xf1ea5eed, b: seed, c: seed, d: seed)
for _ in 0 ..< 20 { _ = next() }
}
init(state: State) {
// Not sure if every state is a valid state … test or search internet.
self.state = state
}
mutating func next() -> UInt64 {
func rot64(_ x: UInt64, _ k: UInt64) -> UInt64 {
return ( x &<< k ) | ( x &>> (64 &- k) )
}
let e = state.a &- rot64(state.b, 7)
state.a = state.b ^ rot64(state.c, 13)
state.b = state.c &+ rot64(state.d, 37)
state.c = state.d &+ e
state.d = e &+ state.a
return state.d
}
}
```