I've been using my own little random api for some years, and as SE-0202: Random Unification is now implemented in recent snapshots, I've started to try it out, adding some PRNGs and doing some performance testing.
My use cases are image and audio processing and other situations where what I need is a PRNG that is as fast as possible but with sufficiently good quality. So the default RNG, Random.default
, is not an option for me (it's not seedable and much too slow).
Therefore I do my performance testing using PRNGs that I have previously implemented and tested, such as SplitMix64 and Xoroshiro128Plus.
Now to my observation about generating Double values in the half open unit range [0, 1).
Random floating point values in this particular range (and a couple of other ones) are very common in my use cases. The following command line program demonstrates that using the method provided by the Random API for this:
let rndDoubleInUnitRange = Double.random(in: 0 ..< 1, using: &rg)
is about 45 times slower than using an alternative method (see implementation in the program below):
let rndDoubleInUnitRange = Double.randomInUnitRange(using: &rg)
I think I know why this is: The former method (the one provided by the Random API) calls the generator's next(upperBound:)
internally, which performs a loop etc, in order to support any range, and it doesn't have special cases for the unit range or other common ranges for which it suffices to use (some of) the 64 bits from the generator's next()
directly.
Note that I haven't investigated this thoroughly so it might be that next(upperBound:)
doesn't have to loop (and call .next()
repeatedly), for the particular range 0.0 ..< 1.0
, and if so, the slow down is caused by something else, or my tests are flawed.
Question: I'd like to be able to use the methods that the Random API provides rather than rolling my own, but it will not be possible in cases like this (since it makes my code 45 times slower if I do). Would it be possible/sensible to modify the implementation of FloatingPoint.random(in: using: )
by adding checks for unit range and perhaps a couple of other common ranges, eg [-1, 1), to see if it would get about 45 times faster for these common ranges, or perhaps the checks (short paths?) should actually go in next(upperBound:)
?
Or perhaps none of that is possible or sensisble, and it would be better to add separate methods, like my .randomInUnitRange(using: )
?
Or is this considered too niche for motivating changes or additions to the Random API? (If so, then I guess I'll just have to forget about the Random API and continue using my own.)
Here's the demo program, in case anyone would like to have a critical look (don't forget -O / release build! And note that you need a recent dev snapshot with the Random API implementation to compile it):
import AppKit
// Just as a side note, I'm including this protocol even though it isn't
// needed or relevant for this thread or demo, but it might be interesting to
// start a separate thread discussing if we should add something like this
// protocol to the Random API.
protocol PseudoRandomNumberGenerator : RandomNumberGenerator {
associatedtype State
init(state: State)
init(seed: UInt64)
}
extension PseudoRandomNumberGenerator {
// All PRNGs get an init() which seeds the PRNG with a truly
// random seed (from the default RNG).
init() {
self.init(seed: Random.default.next())
}
}
// I'll be using the following simple PRNG for this demonstration since it is
// about 200 times faster than the default RNG Random.default (on my machine),
// and it results in less fluctuating timing results than when using Random.default.
// See the comments at the end for my speculations about why that might be.
struct SplitMix64 : PseudoRandomNumberGenerator {
// Based on http://xoshiro.di.unimi.it/splitmix64.c
// It is a very fast generator passing BigCrush.
var state: UInt64
init(state: UInt64) {
self.state = state
}
init(seed: UInt64) {
self.init(state: seed)
}
mutating func next() -> UInt64 {
state &+= 0x9e3779b97f4a7c15
var z = (state ^ (state &>> 30)) &* 0xbf58476d1ce4e5b9
z = (z ^ (z &>> 27)) &* 0x94d049bb133111eb
return z ^ (z &>> 31)
}
}
// This is the interesting part. It is a method that produces a Double value
// in unit range [0, 1), just like Double.random(in: 0 ..< 1, using: &rng).
extension Double {
// This seems to be almost 50 times faster than
// Double.random(in: 0 ..< 1, using: &rg)
static func randomInUnitRange<R>(using rng: inout R) -> Double
where R: RandomNumberGenerator
{
let ui64 = rng.next()
return Double(ui64 >> 11) * (Double.ulpOfOne / 2)
}
}
func executionTime<C, R>(
of count: Int,
_ operation: (C, inout R) -> C,
using rng: inout R) -> Double
where C: Numeric, R: RandomNumberGenerator
{
var checksum: C = 0
let t0 = CACurrentMediaTime()
for _ in 0 ..< count { checksum = operation(checksum, &rng) }
let t1 = CACurrentMediaTime()
let time = t1 - t0
print(" time:", time, "seconds (checksum: \(checksum))")
return time
}
@_transparent // The non-baseline tests will be about 4 x slower without this …
func test<C, R>(label: String,
using rng: inout R,
operation: (C, inout R) -> C) -> Double
where C: Numeric, R: RandomNumberGenerator
{
print("\n\(label):")
var minTime = Double.infinity
for _ in 0 ..< 7 {
let time = executionTime(of: 10_000_000, operation, using: &rng)
minTime = min(minTime, time)
}
print(" minimum time:", minTime, "seconds")
return minTime
}
func test<R: RandomNumberGenerator>(using rng: inout R) {
let minTimeOfExistingMethod =
test(label: "Existing method", using: &rng) { (checksum, rg) in
return checksum + Double.random(in: 0 ..< 1, using: &rg)
}
let minTimeOfMyMethod =
test(label: "My method", using: &rng) { (checksum, rg) in
return checksum + Double.randomInUnitRange(using: &rg)
}
let minTimeOfBaseline =
test(label: "Baseline", using: &rng) { (checksum, rg) in
return checksum ^ rg.next()
}
let ratio = (minTimeOfExistingMethod - minTimeOfBaseline) /
(minTimeOfMyMethod - minTimeOfBaseline)
print("\nMy method was \(ratio) times faster than the existing method.")
}
var prng = SplitMix64()
test(using: &prng)
// test(using: &Random.default)
// Uncomment the above line if you're interested in seeing the results when
// using Random.default instead of SplitMix64. You'll have to wait much longer
// and the result will be very erratic. Lowering the number of operation-calls per
// test will make it even more erratic. I suppose this is because the time it
// takes to call the underlying random function is fluctuating a bit and
// because it is so large compared to the time it takes to perform
// the UInt64 -> Double in unit range conversions.
Here's an example compile and run on my MBP, 2 GHz i7, late 2013:
$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-05-22-a.xctoolchain/usr/bin/swiftc -O -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk test.swift
$ ./test
Existing method:
time: 0.30905857200014 seconds (checksum: 4999285.093711603)
time: 0.30168970900012937 seconds (checksum: 5000657.718667782)
time: 0.30922266299967305 seconds (checksum: 5000852.352980945)
time: 0.29837784899973485 seconds (checksum: 4999644.374941994)
time: 0.29741116300101567 seconds (checksum: 4999527.347985679)
time: 0.30160584000077506 seconds (checksum: 5001061.41999039)
time: 0.3061574759994983 seconds (checksum: 5000080.011050173)
minimum time: 0.29741116300101567 seconds
My method:
time: 0.016804585999125266 seconds (checksum: 4999748.809723855)
time: 0.01670135200038203 seconds (checksum: 4999867.282415561)
time: 0.017937091000931105 seconds (checksum: 5000992.65858493)
time: 0.0171803799985355 seconds (checksum: 5001887.357391258)
time: 0.01743405500019435 seconds (checksum: 5001176.764179978)
time: 0.017931289999978617 seconds (checksum: 4998501.019897718)
time: 0.017273727000429062 seconds (checksum: 4999930.674665641)
minimum time: 0.01670135200038203 seconds
Baseline:
time: 0.011295344000245677 seconds (checksum: 434670723333784857)
time: 0.011931514000025345 seconds (checksum: 565380713243076264)
time: 0.01088286199956201 seconds (checksum: 7609415011435692545)
time: 0.010823972999787657 seconds (checksum: 7875593923504742495)
time: 0.01154418300029647 seconds (checksum: 17857085083325635480)
time: 0.011229211000681971 seconds (checksum: 3830828303701629253)
time: 0.011102663000201574 seconds (checksum: 7544674021828474327)
minimum time: 0.010823972999787657 seconds
My method was 48.761053179018354 times faster than the existing method.
(I assume that the situation would be about the same for Float
, but I haven't tested it, since it is actually worse, since you can get two Float
values, or a simd float2
, from a single UInt64
bit pattern, and I guess having methods that returns a random (Float, Float)
or float2
in unit range (like I have in my own little random api) would be crossing the line of too niche ... which is kind of sad in a way, because scenarios requiring things like this constitutes my (and I guess at least some other people's) only real need for a Random API.)