An API for bulk random bytes

I'm proposing that we add the following requirement to the RandomNumberGenerator protocol:

mutating func fill(_ buffer: UnsafeMutableRawBufferPointer)

with a default implementation that uses the existing next() API to produce 64b at a time, and a customized implementation for SystemRandomNumberGenerator that directly requests the specified amount of random bytes from the underlying platform APIs.

On its own, this doesn't change much, but we can then adopt it within the standard library in cases where we know that we will need more than 64b of randomness (e.g. shuffling an array) to realize significant speed gains (by amortizing per-call overhead, especially for the SystemRNG). Those changes are not included in this proposal, because they are purely implementation details of the standard library.

This API does make an operation available that is useful on its own in some narrow circumstances (e.g. seeding the state of a higher-throughput RNG), generating random values larger than 64b, or sampling non-uniform distributions. This has very real ergonomic and performance advantages for these use cases. A follow-on proposal once move-only types are accepted will use this API to provide an adapter that buffers the output of a provided source and conforms to the RNG protocol, making this process easier still.

35 Likes

Yes, this is fantastic.

There was already a pitch a while ago: Pitch: Requesting larger amounts of randomness from SystemRandomNumberGenerator
cc: @lukasa

2 Likes

I definitely see the need for this so big +1 from my side. However, only adding an unsafe version seems to be at odds with Swifts safety first principal. I think we should at least add a safe version of this e.g.:

func randomBytes(count: Int) -> [UInt8]

as proposed by @lukasa.

This can be implemented on top of the proposed fill(_:) or the other way around.

Alternatively we could also add new API to Array e.g.:

extension Array {
    init<Generator: RandomNumberGenerator>(randomBytesCount: Int, using: inout Generator)
}

A version where one could choose the integer type and range would be nice too but I'm not sure if this can be implement efficiently or if there is even an actually need for that.

2 Likes

Allocating new storage entirely defeats the purpose of this API (to remove as much overhead as possible from bulk-randomness generation).

A follow-on proposal would provide API to produce bulk random numbers, rather than bulk random bytes, but the thing that we need as the fundamental building-block (and therefore on the protocol) is a non-allocating "fill this buffer" operation.

5 Likes

+1

Will this be able to replace SecRandomCopyBytes Apple Developer Documentation

But is … non throwing :grinning:

We could certainly make it throw, though in practice these calls "never" fail (for sufficiently hand-wavy notions of "never"). Throwing could be beneficial on more niche platforms, however.

1 Like

Right, makes perfect sense, but least they would be called with de facto try (since throwing), so no need to check any semi-awkward errSecSuccess

Could it take a MutableCollection of bytes instead? Possibly with a check for UMRBP and/or contiguous storage as the first thing. That would solve the unsafe-API concern without losing the no-allocation guarantee.

6 Likes

+1 - I also pitched this before.

I agree with @dnadoba that this shouldn't require an unsafe buffer, and my first thought was, as @jrose suggested, to use MutableCollection instead. So +1 to that as well.

IMO this API should scale to use-cases such as filling an Array with random bytes or scribbling over a Data, without requiring those developers to use unsafe constructs. If some random number generators can make use of contiguous mutable storage (e.g. because they are C functions provided by the system), they should do so as an implementation detail.


Also, if we're taking a look at the RNG API, I'd also suggest (once again) that we add static member syntax for SystemRandomNumberGenerator. It's annoying to type out, and now that we can infer generic parameters, it would make these functions a lot more ergonomic to read and write:

// We should add this:
extension RandomNumberGenerator where Self == SystemRandomNumberGenerator {
    static var system: Self { Self() }
}

// To allow code like this:
func doSomething(rng: some RandomNumberGenerator = .system) {
  ...  
}
3 Likes

The problem is that this is usually not exactly the signature you want. You actually want the parameter to be inout and we sadly don't yet support default values for inout parameters.

5 Likes

Ahh, shame.

I should have looked at the previous pitch more carefully. You could still use it in an explicit overload if you write it like this:

extension RandomNumberGenerator where Self == SystemRandomNumberGenerator {

  static var system: Self {
    get { SystemRandomNumberGenerator() }
    set { /* No-op, throwaway instance. */ }
  }
}

// To allow:
func doSomething() {
  doSomething(rng: &.system)
}

But yeah... it's not as nice as just using it as a default argument. sigh... maybe one day...

3 Likes

What element type(s) should it use? The most obvious choices to me are:

  • UInt8 (i.e. UnsafeRawBufferPointer.Element)
  • UInt64 (i.e. the return type of RandomNumberGenerator.next())
  • a generic FixedWidthInteger & UnsignedInteger (i.e. the generic type argument to RandomNumberGenerator.next(upperBound:))

How would this method work in cases where the collection doesn't provide contiguous storage? I can see three possibilities:

  1. It would repeatedly call the random byte–generation function for each element. This can be expensive on non-Apple platforms like Linux, where getrandom, getentropy, and /dev/urandom all require one syscall per use amortized (/dev/urandom requires a call to open on first use). Each syscall requires an expensive userspace -> kernel context switch.

  2. It would create a buffer using withUnsafeTemporaryAllocation, fill the buffer with random bytes, then copy the buffer's contents to the collection. This option would require the least amount of potential random number generation–related syscalls, but it may require a heap allocation (including potential heap allocation–related syscalls) if the collection is large enough.

  3. It would allocate a buffer up to a certain size such that it can always be stack-allocated. It would then repeatedly fill that buffer with random bytes, then copy the buffer's contents to the collection until the entire collection has been filled. This would never require a heap allocation, but it would require multiple syscalls to fill large collections.

All three options here create overhead which undermines the purpose of this API (which is, to quote @scanon, "to remove as much overhead as possible from bulk-randomness generation").


I think we should go with the original proposed method that accepts an UnsafeMutableRawBufferPointer. Then, we could build safe constructs on top of that method like the following:

Click to view code
extension Array where Element: FullWidthInteger {
    static func randomlyFilled(count: Int) -> Self {
        var generator = SystemRandomNumberGenerator()
        return randomlyFilled(count: count, using: &generator)
    }
    
    static func randomlyFilled(count: Int, using generator: inout some RandomNumberGenerator) -> Self {
        return Self(unsafeUninitializedCapacity: count, initializingWith: { buffer, initializedCount in
            generator.fill(UnsafeMutableRawBufferPointer(buffer))
            initializedCount = buffer.count
        })
    }
}

extension RangeReplaceableCollection where Element: FullWidthInteger {
    static func randomlyFilled(count: Int) -> Self {
        var generator = SystemRandomNumberGenerator()
        return randomlyFilled(count: count, using: &generator)
    }
    
    static func randomlyFilled(count: Int, using generator: inout some RandomNumberGenerator) -> Self {
        // Perhaps we should consider adding a failable init(unsafeUninitializedCapacity:initializingWith:)
        // initializer to `RangeReplaceableCollecion` to avoid unnecessary overhead here
        return withUnsafeTemporaryAllocation(of: Element.self, capacity: count) { buffer in
            generator.fill(UnsafeMutableRawBufferPointer(buffer))
            return Self(buffer)
        }
    }
}

extension MutableCollection where Element: FullWidthInteger {
    mutating func randomlyFill() {
        var generator = SystemRandomNumberGenerator()
        randomlyFill(using: &generator)
    }
    
    mutating func randomlyFill(using generator: inout some RandomNumberGenerator) {
        let wasFilledContiguously = withContiguousMutableStorageIfAvailable { buffer in
            generator.fill(UnsafeMutableRawBufferPointer(buffer))
        } != nil
        
        if !wasFilledContiguously {
            withUnsafeTemporaryAllocation(of: Element.self, capacity: count) { buffer in
                generator.fill(UnsafeMutableRawBufferPointer(buffer))
                var index = startIndex
                var bufferIndex = 0
                while bufferIndex != buffer.count {
                    self[index] = buffer[bufferIndex]
                    formIndex(after: &index)
                    bufferIndex += 1
                }
            }
        }
    }
}

/// An integer where the in-memory representation of the integer is equivalent to the value of the integer.
protocol FullWidthInteger: FixedWidthInteger {}
extension UInt8: FullWidthInteger {}
extension UInt16: FullWidthInteger {}
extension UInt32: FullWidthInteger {}
extension UInt64: FullWidthInteger {}
extension Int8: FullWidthInteger {}
extension Int16: FullWidthInteger {}
extension Int32: FullWidthInteger {}
extension Int64: FullWidthInteger {}
3 Likes

Theo gave a good talk several years ago about why randomness operations shouldn't encode failure in their API, ideally should never fail, and in the un-ideal case where they can fail, they should do so by trapping. The advice is mostly relevant to cryptographic uses of random number generators, but since SystemRandomNumberGenerator claims to be cryptographically secure, it's probably right to stick to that.

Putting on my security goggles, I'd also like to see if there's any reasonable way that this can be done without using unsafe pointers. I don't feel great about unsafe pointers in protocols because it forces implementations to use unsafe code, and that's particularly annoying for standard library protocols. Is the goal that the caller should be able to put the memory in automatic storage, and that the implementation should be able to get a pointer to contiguous storage to (as an example) call arc4random_buf?

+1 I've wanted this for a long time. I'm also tentatively for having this API be generic instead of just for UnsafePointer.

But for the protocol requirement itself, I can see why a throwing affordance might be more flexible.

Why? There's not really any reason to implement a PRNG or a CSPRNG that can fail after it's been initialized, and even if there were, what is your client code going to do when the randomness request fails?

1 Like

The implementation you describe either calls getrandom() every time the user calls next(), or it already has a large-ish buffer that it can use.

I would also argue that it shouldn't be a goal to map this feature as closely to getrandom() as possible. A high-quality implementation would call it once at initialization time to seed a CSPRNG and never go to the kernel again.