Creating an array of random numbers in Swift is slow

Jerry1144 · June 2, 2025, 11:10am

If only Swift introduces rounding mode settings to floating point numbers' init() methods, or a dedicated setting. Unsigned ConVert To Float, rounding down would be the one-liner that we want, instead of manually selecting 52 bits and discarding the remaining 12.

scanon · June 2, 2025, 12:31pm

On arm64, yes. That instruction doesn't exist on many other targets (most notably the baseline x86_64 target), so spelling the transformation that way would pessimize the code generation everywhere else. Bitmasking and subtraction (or in the other formulation, multiplication) are pretty much supported everywhere, which is why those idioms are used for generic library code.

tera · June 2, 2025, 1:35pm

If I don't want either the lower or the upper bound (or both) included with an algorithm that has them included I could simply filter those values out:

func filtered_random() -> Double {
    while true {
        let v = some_random_between_0_and_1()
        if v != 0 && v != 1 { return v }
    }
}

wigging · June 8, 2025, 1:29am

If x is a 64-bit unsigned integer, would Float(x >> 40) * 0x1.0p-24 convert it to a 32-bit float on interval [0, 1)?

sbooth · June 10, 2025, 11:16pm

Yes, I believe that will work, although if you only need 32-bit floats faster PRNGs such as xoshiro128+ are available.

wigging · June 11, 2025, 12:58am

Good point. I'll see if I can make a Swift version of xoshiro128+ and compare it to the Swift wyrand (float version) that I shared above.

sbooth · June 11, 2025, 1:32am

I took a stab at an initial implementation that you're welcome to try.

wigging · June 11, 2025, 3:10am

Thanks. Here is my version based on your implementation. But it doesn't run because RandomNumberGenerator requires that next() return a UInt64, not a UInt32. Since xoshiro128+ is 32-bit, I don't see how you can use it with RandomNumberGenerator without converting to 64-bit.

struct Xoshiro128: RandomNumberGenerator {
    private var state: (UInt32, UInt32, UInt32, UInt32)

    init(seed: (UInt32, UInt32, UInt32, UInt32)? = nil) {
        let r = UInt32.random(in: 1..<1000)
        state = seed ?? (r, r, r, r)
    }

    mutating func next() -> UInt32 {
        let result = state.0 &+ state.3
        let t = state.1 << 9

        state.2 ^= state.0
        state.3 ^= state.1
        state.1 ^= state.2
        state.0 ^= state.3

        state.2 ^= t

        state.3 = (state.3 << 11) | (state.3 >> (32 - 11))

        return result
    }
}

Andropov · June 11, 2025, 6:07am

Note this will init the state with the same random rumber repeated four times (ie: (768, 768, 768, 768)) —instead of generating 4 random numbers — if the seed is nil.

sbooth · June 11, 2025, 12:49pm

For that PRNG it probably doesn't make sense for an implementation to conform to RandomNumberGenerator.

wigging · June 12, 2025, 12:31am

Here is an example where nextUniform returns the float value. I compared this to the WyRand example where I convert the double to float and this Xoshiro128Plus is not faster; it is actually a little slower than WyRand double to float conversion.

Also, @Andropov, this addresses your comment about the seed.

import Accelerate

struct Xoshiro128Plus: RandomNumberGenerator {
    private var state: (UInt32, UInt32, UInt32, UInt32)

    init(seed: (UInt32, UInt32, UInt32, UInt32)? = nil) {
        state = seed ?? (arc4random(), arc4random(), arc4random(), arc4random())
    }

    mutating func next32() -> UInt32 {
        let result = state.0 &+ state.3
        let t = state.1 << 9

        state.2 ^= state.0
        state.3 ^= state.1
        state.1 ^= state.2
        state.0 ^= state.3

        state.2 ^= t

        state.3 = (state.3 << 11) | (state.3 >> (32 - 11))

        return result
    }

    mutating func next() -> UInt64 {
        let upper = UInt64(next32())
        let lower = UInt64(next32())
        return (upper << 32) | lower
    }

    mutating func nextUniform() -> Float {
        let x = next32()
        return Float(x >> 8) * 0x1.0p-24
    }
}

wigging · June 12, 2025, 1:59am

The BNNS library has a random float generator. This example creates a 100,000,000 float array that is almost 3x faster than the Xoshiro128Plus that I shared above. But I don't see anything in BNNS for generating a random double array; it seems to be restricted to integer and float types.

import Accelerate

let n = 100_000_000

let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in

    var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))!
    let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil)

    BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1)

    initCount = n
}

Nobody1707 · June 12, 2025, 2:56pm

How does Wyrand stack up to the official 64-bit Xoroshiro128++ by Sebastiano Vigna?

import SwiftShims

@_transparent
func rotl(_ x: UInt64, _ k: Int) -> UInt64 {
  (x &<< k) | (x &>> (64 &- k))
}

struct Xoroshiro128PlusPlus: RandomNumberGenerator {
    public typealias Seed = (UInt64, UInt64)
    private var state: (UInt64, UInt64)

    init(seed: (UInt64, UInt64) = (0, 0)) {
        state = if seed != (0, 0) {
          seed
        } else {
          withUnsafeTemporaryAllocation(of: Seed.self, capacity: 1) {
            swift_stdlib_random($0.baseAddress!, MemoryLayout<Seed>.size)
            return $0.moveElement(from: 0)
          }
        }
    }

    mutating func next() -> UInt64 {
        let s0 = state.0
        var s1 = state.1
        let result = rotl(s0 &+ s1, 17) &+ s0

        s1 ^= s0
        state.0 = rotl(s0, 49) ^ s1 ^ (s1 << 21)
        state.1 = rotl(s1, 28)

        return result
    }

    mutating func nextUniform() -> Float64 {
        Double(next() >> 11) * 0x1.0p-53
    }
}

Original C code.

EDIT: It is supposed to be Xoroshiro.

wigging · June 13, 2025, 2:25am

In your example code, the struct is named Xoshiro but I think you mean Xoroshiro. Anyway, WyRand is about 2x faster than Xoroshiro128++ on my Mac.

Nobody1707 · June 13, 2025, 1:27pm

That's pretty fast. Wyrand doesn't start failing statistical tests until you start producing 32TB of randomness, so that should be fine for most uses.