SE-0202: Random Unification

(hope this isn’t too off-topic but here I go)

The rationale behind min() and max() being functions is that they take linear time to compute (have to scan the entire collection), while first and last only take constant time. While random() does take constant time too, it returns different values even when the collection remains constant. In conclusion, there are two different requirements at work here:

  • Computed properties have to return in constant time.
  • Computed properties can only change if the receiver changes.
5 Likes

They do, and most use, like you said, Mersenne Twister. Now have a look at this CS Stackexchange question and answers about MT. Mersenne Twister is not a great PRNG by todays standards. It performs worse in BigCrush than Xorshift variants (which have already been mentioned here), while being slower, completely predictable after only a few outputs and using a huge state array of 2.5KiB.

Yet most languages ship with this as their PRNG. Why? Because they can’t change to a better one without breaking compatibility. MT being the PRNG in all those languages you mentioned is a perfect example of the pitfall I want Swift to avoid.

This would defeat the purpose of using a seeded PRNG where you absolutely need the promise that output for one seed will stay the same.

Changing the PRNG would thus be a change that is source-breaking (at run-time no less), so that’s unlikely to happen.

1 Like

I cannot see why the library can’t provide one seeded random number generator (rng) at first and then at some point in the future provide others. I don’t think you can retire an old rng but you can add new ones, adding new ones is not a breaking change and therefore no problem. I would suggest using the name of the algorithm rather than ‘default’, because you generally want to know which generator you are using.

3 Likes

This sounds like a perfect role for one of the “non-standard libraries” that people keep talking about. A well-curated statistics module would be a welcome addition to the ecosystem. There is no need to burden the standard library with multiple random number generators, at least not right away.

4 Likes

One of the reasons Random exists is to provide developers this sort of default way of accessing a custom API. Take for example, Foundation’s Data:

extension Data {
  public init<T: RandomNumberGenerator>(
    randomBytes amount: Int,
    using generator: T
  ) {
    // ...
  }

  public init(randomBytes amount: Int) {
    self.init(randomBytes: amount, using: Random.default)
  }
}

Looking at this from a user’s perspective, they have no way using the first initializer without having to implement their own rng (or by importing one). Random solves this problem.

1 Like

The extension would only need one initializer:

public init(randomByteCount: Int, using generator: RandomNumberGenerator? = nil)

But either way, how would such an initializer be implemented? Should a next(_ bytes: UnsafeMutableRawBufferPointer) method be added to the protocol? This could be in addition to, or instead of, the generic next<T: FixedWidthInteger & UnsignedInteger>() -> T method. It could even replace the next() -> UInt64 method, but in that case other generators might need to use an internal buffer.

Another suggestion is to rename Random to DefaultRandomNumberGenerator. Its property could be renamed from default to shared (if using a final class instead of a struct).

1 Like

There are of course plenty of different implementations of such initializer using the current design.
Example:

extension Data {
  public init(randomBytes amount: Int, using generator: RandomNumberGenerator) {
    self.init(count: amount)
    withUnsafeMutableBytes { (ptr: UnsafeMutablePointer<UInt8>) -> () in
      for i in 0 ..< amount {
        ptr.advanced(by: i).pointee = generator.next()
      }
    }
  }
}

There might be better implementations, but for the sake of an example it works well.

Sorry for not responding sooner and not being completely clear in the proposal! Yes, these are the semantics that I’m trying achieve here (those being uniform in real numbers).

2 Likes

It works, but if using the default generator, _stdlib_random would be called repeatedly, instead of only once.

Looking over the proposal again, I have to wonder: should we include more protocol requirements (with default implementations) in RandomNumberGenerator?

For example, in addition to the “next() -> UInt64” function, we could make the generic FixedWidthInteger version a protocol requirement, so developers of custom generators can use optimized implementations.

Perhaps more importantly, we could include a requirement akin to arc4random_buf(), which takes an inout MutableRawBufferPointer and fills it with random bits. Depending on the algorithm, this could potentially bring a significant speedup.

8 Likes

So yeah, this is something that has been brought up a few times, and I hope to answer it as best as I can.

In regards to the generic next<T: FixedWidthInteger & UnsignedInteger>() -> T version of next(), this isn’t a requirement due to the fact that generators are good at one thing; producing a number. Requiring generators to do a job that they aren’t designed to do doesn’t seem correct. This is reflected with the single requirement, next(), and the default implementation of the generic next().

I agree that developers should have control over their custom generators, and they do! There is currently no restriction in the current design that disallows developers to implement their own version of the generic next(). I actually encourage developers to implement their own version of this function so that they can optimize away for their custom needs.

So I touched on this a little in the discussion, but I want to add that another goal of this proposal is to kind of escape from these “lower” level facilities, such as arc4random_buf.

These types of APIs don’t fit well with this proposal’s “general use” design that it is heavily built for. While I understand a lot of use cases for such facility, there’s only so many problems it solves. The current design is geared to satisfy many, if not all for some, of the current pain points. I understand it doesn’t solve everybody’s problems, but it deliberately isn’t sufficient for some. Again, I encourage those to design solutions that solve every single problem that they have. More power to you!

It sounds like you are saying two different things at once, so perhaps one or the other of us has a misunderstanding. I am going to write a few paragraphs about how protocol requirements and default implementations work in Swift, and I hope this doesn’t come across as patronizing.

Protocol requirements—the methods declared within the protocol itself—are dynamically dispatched. When a variable (or function parameter) is constrained to conform to the protocol, protocol requirements can be called on that variable and those calls will use the variable’s own implementation of those methods.

Protocol requirements with default implementations—meaning the method is declared within the protocol, and a protocol extension provides an implementation—are also dynamically dispatched (because they are protocol requirements). However, conforming types are not required to provide an implementation (because there is a default implementation).

Protocol extension methods—methods which are *not* declared within the protocol, but which a protocol extension provides an implementation—are statically dispatched. When a variable (or parameter) is constrained to conform to the protocol, protocol extension methods can be called on that variable. However, those calls will always use the implementation from the protocol extension. Even if the variable has its own version, it will not be called.

• • •

In other words, protocol requirements define *customization points*: If you want conforming types to have the *ability* to provide their own version of a function, then that function must be a protocol requirement. If you do not want them to have that ability, then it must not be a requirement.

Moreover, if you want conforming types to be *able* to provide their own version, but not *required* to do so, then it must be a protocol requirement *and* there must be a default implementation.

Does that make sense?

• • •

In particular, if we make the generic version of next() a protocol requirement, and also give it a default implementation, then conforming types will *not* be required to implement it (because there is a default implementation) but they will be *able* to do so if they desire.

Currently, with generic next() as an extension method—not a protocol requirement—conforming types are *unable* to provide a custom implementation. Or rather, if they do provide their own implementation, it will not be called from a generic context.

Similarly, if we make a “fill the buffer with random bits” method, both putting it in the protocol itself and providing an implementation in an extension, then conforming types will get that behavior for free (because it is in an extension) while still having the ability to customize it if they need to.

• • •

In summary:

• Protocol requirement are customization points.
• Extension methods are not.
• Protocol requirements with default implementations are customization points which conforming types don’t have to implement if they don’t want to.

I think that last one is what we want for generic next() and for “fill a buffer with random bytes”. Conforming types don’t have to implement them (because there is a default implementation), but if they do implement them then their custom version will be called (unlike a pure extension method).

8 Likes

Argh, my apologies. I agree these need to be requirements with default implementation to accompany them. So this raises the question, should next(upperBound:) also be a requirement? (I feel as if it should now). I’m not sure if I can update the proposal at this point (can I?), but I will definitely update it after a decision to accept or reject it has been made.

Glad we’re on the same page now! :-)

For next(upperBound:), do we expect some generators to have an optimized custom version? If so then yes.

I don’t know the procedural specifics. Perhaps update the proposal and ping the review manager @Ben_Cohen?

You can absolutely update your proposal, you just need to keep @Ben_Cohen in the loop about what you’re changing and why.

I have a couple of questions: doesn’t the lack of mutating in this design preclude most structs from implementing RandomNumberGenerator? If so, why isn’t the protocol constrained to AnyObject?

Also, what was your reasoning for not making next mutating? I’m not saying it should be; I just think that the reason it isn’t should be documented.

2 Likes

That’s a great point, @hlovatt and @Nobody1707. For small generators like Xoroshiro128+, a struct is a natural fit. There’s no reason to require a class (and thus a heap allocation) for a generator whose state is just two instances of UInt64.

So I concur: RandomNumberGenerator methods like next() should be mutating.

5 Likes

There is a reason to let the generator have an identity, it does have a state, after all. When you pass a generator as an argument through various functions, you expect it’s state to have changed after you “get it back” and continue to use it for the next thing, right? So either all those random(… using: …) has to be inout (for the generator), or you use a final class.

Details

I’ve experimented with different ways of structuring a Random API in Swift (focus on high performance and seedable prng for graphics/audio/general data processing) over the last couple of years, and one of the first things I settled on was to use final class for the generators. I’ve tested the performance of various implementations of eg SplitMix64 and Xoroshiro128+ generators extensively both in micro benchmarks and in several real use cases and using a final class rather than a struct doesn’t incur any overhead at all (the optimizer does a good job in this case). But using a struct made it impossible/impractical to use in a lot of common situations.

Here’s an example generator from the current version of my little lib:

/// The xoroshiro128+ generator, translated from:
/// http://xoroshiro.di.unimi.it/xoroshiro128plus.c
/// This generator is higher quality and "generally faster" than SplitMix64.
final class Xoroshiro128Plus : PseudoRandomGenerator {
    var state: (UInt64, UInt64)
    /// The state of Xoroshiro128Plus must not be everywhere zero.
    init?(state: (UInt64, UInt64)) {
        guard state.0 != 0 || state.1 != 0 else { return nil }
        self.state = state
    }
    init(seed: UInt64) {
        // Uses SplitMix64 to scramble the given seed into a valid state:
        let sm = SplitMix64(seed: seed)
        state = (sm.next(), sm.next())
    }
    func next() -> UInt64 {
        func rol55(_ x: UInt64) -> UInt64 {
            return (x << UInt64(55)) | (x >> UInt64(9))
        }
        func rol36(_ x: UInt64) -> UInt64 {
            return (x << UInt64(36)) | (x >> UInt64(28))
        }
        let result = state.0 &+ state.1
        let t = state.1 ^ state.0
        state = (rol55(state.0) ^ t ^ (t << UInt64(14)), rol36(t))
        return result
    }
}

This is really fast. Here is an example of using it to produce random simd float2 values in the range [-1, 1) (for x and y) using a similar method to the range conversion from UInt64 to unit range Double that I showed in my first post in this thread, but here the 64 bits are used as two 32-bit values that are converted to xy in [-1, 1):

import AppKit
import simd

func test() {
    // Generating and adding a billion random float2 values takes 1.7 seconds.
    // (MacBook Pro (Retina, 15-inch, Late 2013), 2 GHz Intel Core i7)
    let rg = Xoroshiro128Plus()
    let sampleCount = 5
    let iterationCount = 1_000_000_000
    for _ in 0 ..< sampleCount {
        var sum = float2(0, 0)
        let t0 = CACurrentMediaTime()
        for _ in 0 ..< iterationCount {
            let v = rg.next().float2InMirroredUnitRange // xy in [-1, 1)
            sum = sum + v
        }
        let t1 = CACurrentMediaTime()
        print("time:", t1 - t0, "seconds, sum:", sum)
    }
}
test()

Here’s the output:

time: 1.77398550696671 seconds, sum: float2(8866.03, -3739.31)
time: 1.7318291279953 seconds, sum: float2(-2833.5, 9740.43)
time: 1.71480693900958 seconds, sum: float2(-9193.63, 31648.9)
time: 1.80105038301554 seconds, sum: float2(35128.7, 45299.6)
time: 1.92562632507179 seconds, sum: float2(7253.67, 28513.9)
3 Likes

So, would you agree that RandomNumberGenerator should be constrained to AnyObject?

Probably (but why : AnyObject rather than : class ?), here’s what I have for my little lib:

protocol RandomGenerator : class {
    /// Returns the next random bit pattern and advances the state of the
    /// random generator.
    func next() -> UInt64
}

/// A pseudo random UInt64 bit pattern generator type.
///
/// The generated UInt64 values can be converted to other types by using eg
/// extensions on UInt64 for converting it to Double or float2 in unit range.
///
/// A random generator only have to implement two initializers and the
/// next() -> UInt64 method.
protocol PseudoRandomGenerator : RandomGenerator {
    associatedtype State
    
    /// The current state of the random generator.
    var state: State { get }
    
    /// Creates a a new random generator with the given state. The initializer
    /// fails if the given state is invalid according to the random generator.
    init?(state: State)
    
    /// Creates a a new random generator with a state that is determined by
    /// `seed`. Each `seed` must result in a unique valid state.
    init(seed: UInt64)
}

And they get a default initializer (with a truly random seed) like this:

import Security

extension PseudoRandomGenerator {
    /// Creates a new pseudo random generator seeded with a cryptographically
    /// secure random seed.
    init() {
        var seed: UInt64 = 0
        withUnsafeMutableBytes(of: &seed) { (ptr) -> Void in
            let sc = SecRandomCopyBytes(nil, ptr.count, ptr.baseAddress!)
            precondition(sc == errSecSuccess)
        }
        self.init(seed: seed)
    }
}
2 Likes
Terms of Service

Privacy Policy

Cookie Policy