Pitch: Requesting larger amounts of randomness from SystemRandomNumberGenerator

lukasa · July 23, 2019, 3:54pm

Introduction

I propose adding a new method to SystemRandomNumberGenerator: randomBytes(count:), which will enable generating an arbitrary number of random bytes. This is a minor quality of life pitch that can provide a small but meaningful performance improvement to randomness-heavy code.

Motivation

SE-0202 introduced the RandomNumberGenerator protocol and provided a single standard library conforming type: SystemRandomNumberGenerator. This SystemRandomNumberGenerator implementation was deliberately designed to be cryptographically secure, allowing it to be used as a source of randomness when working in cryptographically sensitive contexts.

While this is a great first step, the implementation as defined has one major deficiency when used in cryptographic contexts: the only mechanism to obtain random numbers caps the maximum amount of randomness that can be extracted in one call at 64 bits. This is because the entire RandomNumberGenerator protocol is:

public protocol RandomNumberGenerator {
  mutating func next() -> UInt64
}

Unfortunately it is vanishingly rare that 64 bits of randomness is sufficient in a cryptographic context. When used for generating random keys or random secrets, the absolute minimum number of bits necessary in almost any case is 128, and frequently 256 bits are needed. This necessitates multiple calls to next() into order to get the full quantity of random bytes. As those bytes are often required to be in a form that can be passed to a C library or converted to a large integer type, it is also quite common to need to pass these integers through a raw pointer type, requiring awkward pointer management.

In addition to the above minor awkwardness, there is a performance cost incurred here on many platforms. SystemRandomNumberGenerator uses:

arc4random_buf on Apple platforms
getrandom when available on Linux
getentropy on a grab bag of platforms
/dev/urandom on both Linux and the grab bag in cases when getrandom or getentropy aren't available.

On Apple platforms the extra calls don't matter too badly as arc4random_buf is provided by libc. However, getrandom and getentropy are syscalls, necessitating a userspace->kernel context switch for each call. /dev/urandom requires an amortised 1 syscall (open is amortised across the runtime of the program). Anything requiring syscalls may naturally require more if they are interrupted by a signal.

As cryptographic code is already computationally fairly expensive, requiring multiple syscalls in this hot CPU path is less than ideal. Given that these lower level APIs are quite capable of returning us more than 8 bytes at a time (and in fact the underlying Swift implementation is written for an arbitrary quantity of randomness), I propose we provide an alternative path when more randomness is required.

Proposed Solution

Extend SystemRandomNumberGenerator with a new function.

extension SystemRandomNumberGenerator {
    /// Provides `count` random bytes from the system
    /// random number generator.
    func randomBytes(count: Int) -> [UInt8]
}

This function would return an Array containing count random bytes. In principle count is unlimited, though in practice the maximum value is constrained by the various system APIs. A reasonable upper bound for a single call is UInt32.max, as that covers the platforms with the most severe restrictions (Windows can only do 2³² bits in one go). This proposal would suggest baking that limitation into the API documentation, as once you're asking for 2³² bytes of randomness the cost of the syscall starts being less than the cost of shuffling memory around.

The implementation of this function is trivial, as the currently existing swift_stdlib_random function already supports this use-case.

Effect on ABI stability

None.

Alternatives Considered

Extending RandomNumberGenerator

In principle providing a whole buffer full of randomness is useful in many other contexts than cryptographic code. This is particularly true when trying to do things with a fast, non-CS PRNG, e.g. for simulations. This would be a motivation to extend the entire RandomNumberGenerator protocol.

However, extending protocols is tricky, and I don't fully understand the ABI implications here. We could reduce the API cost to nothing by providing a default implementation in terms of next(), of course, but I am unsure of the effects on ABI stability. In order to reduce the scope of this proposal, therefore, I have decided not to propose extending the entire protocol. If the community believes we both can and should do that, I believe it's a fairly straightforward extension of this pitch.

Writing bytes into buffers

One potential cost that can still be incurred here is the cost of a memory copy. It would be nice to be able to ask the system random number generator to write the random bytes into a user-provided buffer of bytes. This improves the performance story further when those random bytes are going to be manipulated by some other library.

However, as SE-0256 (MutableContiguousCollection) was rejected, there is no clear abstraction for what the type would need to be. In principle RangeReplaceableCollection is the right choice, but that is not substantially cheaper than users writing the code themselves as it will still necessitate a memory copy.

The only alternative, without reintroducing SE-0256, would be for this API to accept an UnsafeMutableBufferPointer. The performance gain from this is probably not worth creating such a prominent unsafe API, so until or unless something like SE-0256 lands, or a compelling performance case is made for adding this unsafe API, I elected to hold off for now.

johannesweiss · July 23, 2019, 4:17pm

+1 from me.

However, I think the 'Extending RandomNumberGenerator' alternative is even better. I don't think there's an API/ABI implication here because library evolution works and we can add a function to the protocol if it comes alongside a default implementation for all RNGs that don't implement the buffer function yet. But as Cory points out, this could be seen as an extension and we could totally start with just implementing it on SystemRandomNumberGenerator.

Regarding on how to provide the buffer: Why not vend a [UInt8]? Sure, that will often require the user to copy the data out but that is true for an UnsafeRawBufferPointer too. (And of course SE-0256 would be even better but as you point out it was rejected...)
Sorry, I misread the alternative that was proposed. You wanted to write into a user-provided UnsafeMutableRawBufferPointer, I think I agree that's probably a step too far .

AlexanderM · July 23, 2019, 4:20pm

Isn't Data more suitable than [UInt8]?

lukasa · July 23, 2019, 4:22pm

Data is not in the standard library, so SystemRandomNumberGenerator cannot rely on it.

lukasa · July 23, 2019, 4:25pm

I'm happy to go down this road if the ABI experts concur that it's safe: it definitely seems more logical to me.

Ben_Cohen · July 23, 2019, 5:01pm

SE-0256 being rejected isn't really the problem here. Even if it had been accepted, it wouldn't have allowed for Data or UnsafeMutableRawBufferPointer because they are untyped, and it proposed a typed API.

But it is also not necessary to optimize this to take an inout C: Collection where Element == UInt8 because we have withContiguousMutableStorageIfAvailable, allowing a [UInt8] to be used without unnecessary allocation.

Tino · July 23, 2019, 5:09pm

Actually, I'd go even further and replace next with something like the suggested method... -> +1

lukasa · July 23, 2019, 5:11pm

As it's already shipped, that's not really workable now. Changing the required function that conforming types need to implement cannot be done without breaking source, which seems unnecessary here.

beccadax · July 23, 2019, 5:28pm

The documentation does not say that. It says:

While the system generator is automatically seeded and thread-safe on every platform, the cryptographic quality of the stream of random data produced by the generator may vary.

While the underlying source of randomness in Darwin is a CSPRNG, I don’t believe that any of the (admittedly fairly thin) Swift APIs built on top of it have been audited by security folks. Is my recollection incorrect?

lukasa · July 23, 2019, 5:31pm

My belief is that the documentation is bad and needs correcting. See: Strengthen SystemRandomNumberGenerator docs. by Lukasa · Pull Request #26294 · apple/swift · GitHub. My thesis is outlined on that issue, and I'm reluctant to derail this thread with a discussion of that thesis.

Can I suggest that we keep all discussion of that topic either on that issue or in a new thread? We can consider the discussion of this pitch contingent on my assertion of cryptographic security being true, and if it turns out to be false then we can consider revising the pitch.

dnadoba · April 1, 2021, 10:45am

Anything blocking this proposal? Almost all answers seams to be positive.
I just need exactly this kind of API to get 16 random bytes to generate a nonce for the Sec-WebSocket-Key as defined in RFC 6455.
It would be enough if it were implemented on SystemRandomNumberGenerator.
For now I will just generate two UInt64 but it would be nice if we add this.
Anything I can help with to move this forward?