SE-0202: Random Unification

Nevin · March 26, 2018, 6:26am

If I understand it correctly, that code in fact chooses uniformly among some number (2⁵² for Double, 2²³ for Float) of evenly-spaced values that span the range. Thus in general there will be many representable values in the range which are impossible to produce. For example, the range 0..<1 contains approximately 2⁶² Double and 2³⁰ Float values.

The suggested alternative, namely to operate as though a real number were chosen uniformly from the range and then rounded to a representable value, is more subtle. It would make every representable value in the range possible to produce, with probability proportional to the gap between it and the next.

I have made an attempt at implementing such a function, and it works almost-correctly. There are some thorny rounding issues that I haven’t quite sorted out, though I’m sure @scanon knows how to do it right.

A sketch of one possible implementation:

extension BinaryFloatingPoint {
    static func random<T: RandomNumberGenerator>(in range: Range<Self>, using generator: T) -> Self {
        // Handle the easy cases
        precondition(!range.isEmpty)
        guard range.lowerBound.isFinite && range.upperBound.isFinite else {
            return .nan
        }
        if (range.lowerBound.nextUp == range.upperBound) {
            return range.lowerBound
        }
        
        // Subdivide the interval into 2^64 sections, choose one at random,
        // and repeat recursively until the subinterval has only one value.
        func randomHelper(_ subrange: Range<Self>) -> Self {
            let n: UInt64 = generator.next()
            
            // FIXME - This will underflow if Self.exponentBitCount is small
            let scale: Self = 0x1p-64
            
            // To avoid overflow here, we multiply before subtracting.
            // FIXME - Need to handle underflow
            let stepSize = (subrange.upperBound * scale) - (subrange.lowerBound * scale)
            
            // FIXME - Need this to be rounded down, not rounded to nearest
            let a = subrange.lowerBound + Self(n) * stepSize
            
            let b = a + stepSize
            if (a == b) || (a.nextUp == b) {
                return a
            }
            return randomHelper(a ..< b)
        }
        
        // Do the subdividing thing, and if any weird rounding issues make the
        // value fall outside the target range, then try again until it works.
        var result = randomHelper(range)
        while !range.contains(result) {
            result = randomHelper(range)
        }
        return result
    }
}

Note that the vast majority of the time the randomHelper() function returns a value on its first pass, with no recursion necessary. For Double it will need a second pass approximately 1 in 1024 times, and for Float only 1 in 2⁴⁰.

hlovatt · March 26, 2018, 7:12am

Below is a possible seeded generator that the library could provide. Whilst implementing this I noticed that next in RandomNumberGenerator should be mutating.

public protocol RandomNumberGenerator {
    // This determines the functionality for producing a random number.
    // Required to implement by all rngs.
    mutating func next() -> UInt64 // *Must* be mutating for a seeded generator.
}

/// Swift version of Xorshift+ from: https://en.wikipedia.org/wiki/Xorshift
struct Xorshift128Plus: RandomNumberGenerator {
    private var xS: UInt64
    private var yS: UInt64
    
    /// Two seeds, `x` and `y`, are required for the random number generator (default values are provided for both).
    init(xSeed: UInt64 = 0, ySeed:  UInt64 = UInt64.max) {
        xS = xSeed == 0 && ySeed == 0 ? UInt64.max : xSeed // Seed cannot be all zeros.
        yS = ySeed
    }
    
    mutating func next() -> UInt64 {
        var x = xS
        let y = yS
        xS = y
        x ^= x << 23 // a
        yS = x ^ y ^ (x >> 17) ^ (y >> 26) // b, c
        return yS &+ y
    }
}

var rng = Xorshift128Plus()
print(rng.next()) // 18446743798831644671
print(rng.next()) // 18446743798823260096
print(rng.next()) // 16140918656667222015
print(rng.next()) // 13835128698895859711
print(rng.next()) // 16140954115928756175

The generator is fast, passes BigCrush even when reversed, and takes up little memory.

jawbroken · March 26, 2018, 7:57am

It's probably just the bad example seed, but those are some very suspect looking pseudorandom numbers.

ahti · March 26, 2018, 9:44am

You may be misunderstanding. What I said, and I think others as well, is that PRNG selection is actually that. There is a selection to be made. It‘s not insanely difficult, probably not difficult at all once you understand the basics, but there are tradeoffs to be made.

But the main argument for me is that whatever would get chosen could not be changed ever. This could leave us in a situation where next year a way better algorithm (better magic numbers more likely, but bear with me) might be found, but Swift would forever be left with a subpar old generator, which eventually nobody should use anymore. I don‘t want to end up in a place where newcommers ask for seeded PRNG and the answer is „yeah, there is this old thing in the stdlib, but don‘t ever use that, it‘s horribly outdated“.

Tino · March 26, 2018, 10:42am

If we can't even change an implementation detail like the choice which RNG should be used by default, I predict Swift will become really ugly quite soon:
After all, the ideal generator would be completely unpredictable, so if someone actually relies on deterministism, he should use a custom generator in the first place.

Nonetheless, it might be convenient to be able to specify a custom generator to be used as default (but besides testing, I see little reason to do so).
Imho the same is true for the cryptography context: When someone without knowledge of the problems with randomness designs encryption, the default PRNG won't make a difference for the quality of the results.

jawbroken · March 26, 2018, 11:00am

The ideal PRNG, in the context of the post you're replying to, is not completely unpredictable. It's deliberately entirely predictable given the seed. This would be in addition to the (unseedable) secure RNG that is the default outlined in the proposal.

benrimmington · March 26, 2018, 2:41pm

@Alejandro, here's another alternative which hides the default generator.

public protocol RandomNumberGenerator : AnyObject {
  func next() -> UInt64
}

extension Bool {
  public static func random(
    using generator: RandomNumberGenerator? = nil
  ) -> Bool {
    var number: UInt64 = 0
    if let generator = generator {
      number = generator.next()
    } else {
      withUnsafeMutableBytes(of: &number) { bytes in
        _stdlib_random(bytes)
      }
    }
    return number % 2 == 0
  }
}

The size of the RandomNumberGenerator? parameter is two words.
There's no need for the other next() and next(upperBound:) methods.
Access to the default generator would only be via Bool, Double, Int, etc.

scanon · March 26, 2018, 5:09pm

All of the real subtlety is in what you do at the endpoints. Under the interpretation where you uniformly choose a value with fixed granularity, this is straightforward (since it's exactly the same as the integer case).

Under the "sample a uniform real number on the interval, and round to the nearest representable value" interpretation, it's considerably more subtle. The first issue: there are a bunch of real numbers in [0,1) that round to 1.0, but most people will not expect (0..<1.0).random() to return 1.0. Truncating instead of rounding addresses this, but then you give double-weight to zero in any interval that spans it, so you actually probably want to round towards minus infinity.

Rounding this way also has the nice feature that (1..<2).random() produces all values in the binade with equal probability.

Nevin · March 26, 2018, 5:51pm

Yes, that is what the 3rd FIXME comment in the code under the disclosure triangle in my previous post was getting at. I want it to round down, I just don’t know how to actually *make it* round down. (I am aware of the existence of IEEE 754 floating-point rounding modes, but I don’t know how to change them in Swift.)

The 2nd FIXME deals with situations like (-1.0)..<(0.5.ulp.nextDown) where, as written, positive numbers will never occur because the upper bound is too small (relative to lower bound) to affect stepSize. Also, really small bounds that underflow to 0 when multiplied by 2⁻⁶⁴, like Double.leastNormalMagnitude.

The 1st is just the implicit assumption that 2⁻⁶⁴ is representable, which it might not be for some super-small floating point type.

GarthSnyder · March 26, 2018, 6:12pm

Why? By this logic, no software system could ever provide a PRNG, because they're all operating under these same constraints. And yet they do...

It seems to me the solution to this issue is entirely trivial. Part of the benefit of putting RNGs under a common API is that it makes them easily substitutable. Just as Random.default makes no specific claims about exactly what RNG you'll be connecting with, neither should PseudoRandom.default or PseudoRandom(seed:).

If the default PRNG were to change, clients could easily instantiate, e.g., MersenneTwisterPseudoRandom(seed:) to recover the prior behavior. Or if they're particularly forward-looking and anticipate a need for long-term reproducibility, they can just bypass the default and instantiate that specific PRNG in their original code.

juliand665 · March 26, 2018, 9:21pm

(hope this isn't too off-topic but here I go)

The rationale behind min() and max() being functions is that they take linear time to compute (have to scan the entire collection), while first and last only take constant time. While random() does take constant time too, it returns different values even when the collection remains constant. In conclusion, there are two different requirements at work here:

Computed properties have to return in constant time.
Computed properties can only change if the receiver changes.

ahti · March 26, 2018, 9:47pm

They do, and most use, like you said, Mersenne Twister. Now have a look at this CS Stackexchange question and answers about MT. Mersenne Twister is not a great PRNG by todays standards. It performs worse in BigCrush than Xorshift variants (which have already been mentioned here), while being slower, completely predictable after only a few outputs and using a huge state array of 2.5KiB.

Yet most languages ship with this as their PRNG. Why? Because they can't change to a better one without breaking compatibility. MT being the PRNG in all those languages you mentioned is a perfect example of the pitfall I want Swift to avoid.

This would defeat the purpose of using a seeded PRNG where you absolutely need the promise that output for one seed will stay the same.

Changing the PRNG would thus be a change that is source-breaking (at run-time no less), so that's unlikely to happen.

hlovatt · March 26, 2018, 11:42pm

I cannot see why the library can't provide one seeded random number generator (rng) at first and then at some point in the future provide others. I don't think you can retire an old rng but you can add new ones, adding new ones is not a breaking change and therefore no problem. I would suggest using the name of the algorithm rather than 'default', because you generally want to know which generator you are using.

Nevin · March 27, 2018, 12:15am

This sounds like a perfect role for one of the “non-standard libraries” that people keep talking about. A well-curated statistics module would be a welcome addition to the ecosystem. There is no need to burden the standard library with multiple random number generators, at least not right away.

Alejandro · March 27, 2018, 12:25am

One of the reasons Random exists is to provide developers this sort of default way of accessing a custom API. Take for example, Foundation's Data:

extension Data {
  public init<T: RandomNumberGenerator>(
    randomBytes amount: Int,
    using generator: T
  ) {
    // ...
  }

  public init(randomBytes amount: Int) {
    self.init(randomBytes: amount, using: Random.default)
  }
}

Looking at this from a user's perspective, they have no way using the first initializer without having to implement their own rng (or by importing one). Random solves this problem.

benrimmington · March 27, 2018, 10:59am

The extension would only need one initializer:

public init(randomByteCount: Int, using generator: RandomNumberGenerator? = nil)

But either way, how would such an initializer be implemented? Should a next(_ bytes: UnsafeMutableRawBufferPointer) method be added to the protocol? This could be in addition to, or instead of, the generic next<T: FixedWidthInteger & UnsignedInteger>() -> T method. It could even replace the next() -> UInt64 method, but in that case other generators might need to use an internal buffer.

Another suggestion is to rename Random to DefaultRandomNumberGenerator. Its property could be renamed from default to shared (if using a final class instead of a struct).

Alejandro · March 27, 2018, 11:03pm

There are of course plenty of different implementations of such initializer using the current design.
Example:

extension Data {
  public init(randomBytes amount: Int, using generator: RandomNumberGenerator) {
    self.init(count: amount)
    withUnsafeMutableBytes { (ptr: UnsafeMutablePointer<UInt8>) -> () in
      for i in 0 ..< amount {
        ptr.advanced(by: i).pointee = generator.next()
      }
    }
  }
}

There might be better implementations, but for the sake of an example it works well.

Alejandro · March 27, 2018, 11:11pm

Sorry for not responding sooner and not being completely clear in the proposal! Yes, these are the semantics that I'm trying achieve here (those being uniform in real numbers).

benrimmington · March 28, 2018, 11:23am

It works, but if using the default generator, _stdlib_random would be called repeatedly, instead of only once.