Time to make jrand48 and friends available again?

You aren't missing anything, we don't have truly random numbers, and truly random numbers aren't desirable for almost any use-case anyway.

Also, despite my concerns raised in Clarify the cryptographic properties of SystemRandomNumberGenerator, it's been made explicitly clear that SystemRandomNumberGenerator is not required to be a CSPRNG. That means that the RandomNumberGenerator protocol does not imply a CSPRNG either, and so we don't really need a new protocol for non-cryptographically secure random number generators.

We could have a protocol for a sendable random number generator, but I doubt that's very useful. The only thing it adds is a init(seed:) method to RandomNumberGenerator, and given that the whole point of having seedable generators is that they're replayable, I can't imagine a world where you care what the seed is, but not what the RNG is.

3 Likes

Writing generic code for simulation or testing and adding support for replaying a specific scenario given a certain state and/or seed (having to hard code a specific generator into that code would be strange).

Ie, the pair of a generator and seed / state is what is needed to replay a scenario.

1 Like

Sure, but that's supportable today. You're generic over RandomNumberGenerator.

Consider this code, which you can write today:

class Simulation<RNG: RandomNumberGenerator> {
    private var rng: RNG

    init(_ rng: RNG) {
        self.rng = rng
    }

    func simulate() {
        // Do some stuff, generating random values out of RNG
    }   
}

let simulation = Simulation(SystemRandomNumberGenerator())

This is completely expressible today. If you want to use a sendable random number generator, there are two ways to go. One is we can make seedability a property of a protocol and amend to this:

class Simulation<RNG: SeedableRandomNumberGenerator> {
    private var rng: RNG

    init(seed: RNG.Seed) {
        self.rng = RNG.init(seed: seed)
    }

    func simulate() {
        // Do some stuff, generating random values out of RNG
    }   
}

let simulation = Simulation<MersenneTwister>(seed: seed)

Or, alternatively, you can just have an RNG whose init method takes a seed, and then pass it as a value to Simulation. This requires no change to Simulation, you just do this:

let simulation = Simulation(MersenneTwister(seed: seed))

The only reason I can see to make SeedableRandomNumberGenerator a protocol is so that you can instantiate it with a seed in a generic context. But I don't see any case where that's useful: if you're trying to make things replayable, the initial state vector is not seed, it's the tuple of (RNG, seed). Why allow those handles to be controlled separately?

I think this is getting off topic, and I don't want to spend more time motivating the need for a PseudorandomNumberGenerator protocol, you might be right that there is no real need for it (any apparent need would go away by reformulating the problem in some (better?) way) and if so then so be it. :slight_smile:

But here is an important semantic difference between an RNG like the system generator and a deterministic seedable PRNG.
func test<R: RandomNumberGenerator>(using generator: inout R) {
    var a = generator
    var b = generator
    print(a.next() == b.next())
}
var srng = SystemRandomNumberGenerator()
var prng = WyRand()
test(using: &srng) // false (except possibly about once in 2^64)
test(using: &prng) // true (always)

This difference could be a semantic requirement of a PseudorandomNumberGenerator protocol.


As a followup question to the one of the OP, is there a reason why eg jrand48() is not available when drand48() is (via Darwin)?

The arc4random man page says, “The subsystem is re-seeded from the kernel random number subsystem on a regular basis, and also upon fork(2).” My understanding has always been that the kernel RNG uses a source of true entropy. Am I mistaken?

Well, my point was that some of these "low-quality" RNG APIs, especially the obscure ones, are probably not much of a security risk now that we have first-class a RNG in the standard library. But upon consideration, I know it's Apple's right to make its own business decisions about what to expose with the Darwin module, and opinions voiced here probably won't change those decisions.

In the meantime, I'm using this:

/// A linear congruential pseudo-random number generator.
struct LinearCongruential: RandomNumberGenerator {
  /// The last value returned by `self.next`, or what `self` was seeded with.
  private var lastValue: UInt64

  /// Creates an instance with the given seed.
  ///
  /// Instances created with the same seed produce the same sequence of
  /// pseuedo-random results from `next()`.
  init(seed: UInt64 = 0) {
    lastValue = seed
  }

  /// Returns a value from a uniform, independent distribution of binary data.
  mutating func next() -> UInt64 {
    // A "good value" chosen from https://arxiv.org/pdf/2001.05304.pdf (Steele,
    // Guy; Vigna, Sebastiano (15 January 2020). "Computationally easy,
    // spectrally good multipliers for congruential pseudorandom number
    // generators". arXiv:2001.05304 [cs.DS].)
    let a: UInt64 = 0xaf251af3b0f025b5
    // As long as “m”, the modulus value, is a power of 2, the generator is
    // insensitive to the value of c, provided they are relatively prime.
    let c: UInt64 = 1
    lastValue = lastValue &* a &+ c
    return lastValue
  }
}
1 Like

That's a good follow-up question. I'm not sure, maybe because its not as performant because drand48() avoids int to double conversion. Also, because of a design decision as it's not required for most applications (quoting @lukasa).

But I'd be interested to know the real reason as well.

In fairness, jrand48 is objectively not great, even separate from security concerns. 48 bits of internal state is insufficient for many purposes, and simply returning the high-order bits of state doesn't get as much mixing as you'd like to have (producing significant statistical issues), and you're not getting a meaningful speed advantage over other, better choices in exchange. The PCG and Xoroshiro families, in particular, are better choices along basically every dimension, and there are a number of other good choices as well.

I absolutely agree that we should provide some seeded PRNGs out of the box, but jrand48 and other "legacy" PRNGs shouldn't be included in that (both for the reasons I identified above, and because the C library interfaces have implicit state, so they are unpredictable under reentrancy).

In order to make seedable PRNGs truly useful, we also need to improve the APIs that transform the raw PRNG output into numbers (e.g. random(in: Range)) so that they can be updated to improve performance and fix bugs while allowing users to opt into getting exactly the same sequences for all time.

8 Likes

A seeded random number generator is one of my top wishes for the standard library. I'd love to see this topic move forward and become a proposal.

6 Likes

I'm not sure if any part of Swift uses this, but the RDRAND instruction on my Ryzen uses a series of ring oscillators to produce bits which are truly random (or as close as is physically possible).

The system is free to use RDRAND to implement the API that the Swift API uses. Some do.

In my case, anything random-ish and repeatable is truly useful, especially when compared to having no way to do it at all. True randomness is just not that important for regression testing most of the time. That's why I started by proposing to make something as “objectively not-great” as jrand48 available. It would at least give me a way of doing this without having to spend an hour or more in research to reimplement an approach that isn't just objectively awful.

The flaws that I'm talking about are not a departure from true randomness; I'm talking about clear failures on mainstream statistical tests (e.g. testU01 (pdf¹)), which have real implications for how meaningful coverage is when they're employed for test-case generation. Failing these statistical checks means that when you generate test cases using these generators, you are susceptible to getting less meaningful coverage that you would be getting if you used a higher-quality generator. That's not always the end of the world, but in certain catastrophic cases, it can lead to very meaningful breakdowns in test coverage (particularly when sampling a high-dimensional space, it can easily work out that your tests only sample a small number of hyperplanes, leaving most of the space uncovered).

It's important to be clear that the only reason anyone should use any of the rand48 generators today is if they have a hard compatibility guarantee with existing software that uses it. It is far from the Pareto frontier of performance vs. statistical quality. There's no reason to make it available when other seedable generators are available that are better in every way. We should provide interfaces to those in the standard library; in the very short-term, there are numerous public packages that wrap them.

I confess not understanding needing to reimplement it--if I actually needed jrand48 specifically, I would wrap it in a C function and import that.


¹ A particularly relevant snippet:

The first category contains linear congruential generators, which obey the recurrence x_i = (ax_{i−1} + c) mod m with output function u_i = x_i/m; they are denoted by LCG(m, a, c). LCGs with power-of-two modulus m = 2^e are known to be badly behaved, especially in their least significant bits [L’Ecuyer 1990]... LCG(248, 25214903917, 11) is the drand48 from the Unix standard library. The RNG in java.util.Random of the Java standard library is based on the same recurrence, but uses two successive values to build each output ... This is an improvement over drand48, but still not acceptable.

In my case I just need a blob of repeatable numbers that I don't have to invent myself; I'm not using randomness to try to create coverage. If they were all the same value or had a few other regular properties (e.g. a strided sequence) that are trivial to generate, I could imagine fooling myself about the results of my algorithm, but the algorithm isn't taking branches based on the values or deciding the size of the test set or anything. As long as they're not obviously regular, there's not much danger for my use case. I could generate them truly randomly once and read them from a file; that's just a lot of drudge-work.

There's no reason to make it available when other seedable generators are available that are better in every way.

That's the problem: such things aren't (portably) available in Swift for my purposes, AFAICT.

I confess not understanding needing to reimplement it--if I actually needed jrand48 specifically, I would wrap it in a C function and import that.

I think you're missing my meaning. I didn't mean “reimplement jrand48 specifically,” I meant “reimplement the facility it gives me of a repeatable source of irregular data.” And dropping into C to get it would be another productivity drain.

¹ A particularly relevant snippet:

The first category contains linear congruential generators, which obey the recurrence x_i = (ax_{i−1} + c) mod m with output function u_i = x_i/m ; they are denoted by LCG(m, a, c). LCGs with power-of-two modulus m = 2^e are known to be badly behaved, especially in their least significant bits [L’Ecuyer 1990]...

From what I can tell from Wikipedia and the Steele and Vigna paper (a successor to L'Ecuyer's), that statement is painting with rather a broad brush. It seems to me, from my only-several-hours of research, that well-behaved LCGs with power-of-two modulus do exist, which is why I eventually wrote this one based on numbers from Steele and Vigna. Am I mistaken?

I am far from being a numerics expert and I confess to not understanding the papers at a deep level, so I have stopped where I am, with little confidence, having previously wasted an hour or so trying to code against jrand48 before discovering it was unavailable. Personally, especially until/unless some expert validates my LCG, I'd have been better served if that first effort had paid off.

1 Like

The reason I mentioned upthread that a seedable random number generator is one of my top wishes for the standard library is pretty much this: I am not an expert in this area either.

Randomized testing (property-based, metamorphic, etc) is an extremely important test strategy and should be supported directly by Swift without requiring personal expertise in random number generators or a 3rd party library.

5 Likes

To add another voice to this chorus, I’m about as close to an expert as a non-expert can get.

I have a degree in math, I’ve been reading about RNGs (and studying their implementations) as a hobby for years. I’m currently in the process of rewriting the Swift standard library’s floating-point random number generator to do what it says on the tin (generate any value in the range with correct probability). Just today I wrote a proof-of-concept reimplementation of RandomNumberGenerator.next(upperBound:) -> T that’s 10-20% faster than the standard library’s version for the hardest cases, and equal speed for the general case.

And despite all of that, I do not know how to properly choose an RNG.

I have a general hazy feeling that Xoroshiro256++ (or **) is probably where one wants to be for most non-cryptographic applications, but I can’t back that up with any solid rationale.

Now, I don’t necessarily need a repeatable RNG to be in the standard library, but it would certainly be nice to have a generally-agreed-on-by-experts option available. Putting it in the Numerics package would also work just fine for me.

2 Likes

Certainly non-catastrophic ones exist. The entropy in the low-order bits is prone to be not very good, but that can be fine, depending on use. PCG, which is one of the families that I consider to be worth using, pairs a simple LCG with an "output function" to overcome this limitation (jrand48 effectively does, too, but with a much less good output function--dropping the low-order 16 bits--but it's still better than nothing). The nice thing about this approach is that the only dependency between iterations is the LCG, so on a modern wide core, you get equivalent performance to a simple LCG, but with better statistical properties.

To give you a little reassurance, based on what you've told me about your use case, what you're doing is fine for now.

@nevin: yeah, the recent Xoroshiro variants and PCG-family are the two most well-known "pretty good choices" (largely because of the ongoing spat between their creators--no such thing as bad publicity :joy:).

1 Like

And yet, where are they, for Swift? Please consider your distaste for jrand48 as a motivator. ;-)

1 Like

Yeah, just to be totally clear, I'm definitely not saying "there's nothing to be done here." There's a very real and very important feature that you're asking for.

FWIW, the PCG Paper looks fascinating. I wish I had about a week to read it!

2 Likes