SE-0202: Random Unification

That sort of implementation detail is not actually relevant to Swift Evolution. The proposal does not specify the algorithm, so it can always be updated in the future if improvements are found.

…that said, I just looked at the implementation for next(upperBound:) and I’m pretty sure it does not work as intended.

The current code
public func next<T: FixedWidthInteger & UnsignedInteger>(upperBound: T) -> T {
  let range = T.max % upperBound
  var random: T = 0
  
  repeat {
    random = self.next()
  } while random < range
  
  return random % upperBound
}

The line “let range = T.max % upperBound” really ought to be something like:

let temp = T.max % upperBound + 1
let range = (temp == upperBound) ? 0 : temp

I was only asking because I always use multipliedFullWidth in mine, and I was worried I was missing a drawback. It was a bit off-topic though, so sorry about that.

I like it overall.

Here are a few things I would like to see added:

  • An additional protocol for seedable/repeatable PRNGs that adds the ability to seed, store and restore a generator's state. That way we can build on them even if we don't know which one is being used.
  • More definition on the expected memory model. e.g. Are the using: parameters inout?
  • The variants of next defined in the protocol itself so they can be overridden by a particular implementation (From the comments, I believe this was the intent)
  • collection.randomElement() would be slightly clearer than collection.random()
  • Bool.random(oneIn: 1_000_000) //Returns true one out of the given UInt times.

Yes, definitely

Yes, the additions are minimal, but provide a good base to build on.

I would really like to see a protocol for seeded/repeatable generators as well, since those are the ones I will use most often, and it would be nice to interoperate with other's code in this area rather than reinventing the wheel for each PRNG...

I have built several random facilities in Swift myself, and this doesn't have as much power, but I am hoping it gives enough standardization that I can utilize the work of others and vice versa.

I like how GameKit allows interchangeable pieces of randomness, and I would like to see that in Swift in general.

I read and participated in the full discussion, and have built lots of randomness code for Swift (mainly for graphical effects which require a repeatable source).

4 Likes

I'm hugely +1 for this proposal. We definitely need to do something like it, and this design seems quite reasonable (though some quibbling over details below).

I'm glad that this isn't trying to solve all the world's RNG API needs, e.g. distributions. Let's get the basics right, then consider building on top of it.

I read the proposal in depth, but did not follow the extensive discussion threads, nor have I read the extensive posts on the review thread.

Questions/suggestions:

// The following are examples on how to get full width integers
let randomInt = Int.random(in: .min ... .max)

shouldn't we provide Int.random() that does that? Later: ah, I see it in alternatives considered. I don't find the argument there strongly motivated. Yes, people could use the wrong thing if they are familiar with C APIs, but the range based API you provide is a lot more convenient than what C provides, and will lead people towards doing the right thing. I don't think the goal of avoiding misuse is significant enough to prevent providing the obvious API. If we don't include random(), it will be a FAQ somewhere because people will keep asking for it and wondering why it isn't included.

The collection API addition seems like it should be named randomElement(...), not random(...) because the other random(...) methods return an instance of Self - in this case, a user could reasonably expect it to return a random collection. I see this in alternatives considered and don't agree with the rationale. This seems like a very confusing name to me, this is unlike "first" which can only plausibly mean one thing.

Other random comments (bad pun :roll_eyes: ):

  • I agree that random functionality shouldn't expose fallibility to the user just to support the weird case where /dev/urandom can't be accessed.

  • "Why not make the default rng non-secure?" - I agree with the rationale, but shouldn't we provide a "Random.insecure" implementation that provides a faster implementation that isn't worried about crypto security? I agree with safe default, but that doesn't mean we should underserve the performance sensitive case.

Overall, thank you Alejandro so much for pushing this forward. I know that this has been an epicly long quest, and I'm super thankful that you're championing the effort!!

-Chris

6 Likes

That’s great feedback - can you please make a comment to that effect on the implementation?

I'm concerned that this is the only reply in the thread to Jens' post. The substance of his posting indicates that the proposal doesn't address important parts of the problem or ensure that they can be addressed in future. I haven't yet evaluated the proposal to see if I agree with Jens' assessment, but if he's wrong I'd like to see a vigorous rebuttal here.

1 Like

Some playground experiments suggest Jens is quite correct:

/// From the proposal implementation
func algo0(_ bitPattern: UInt64) -> Double {
    return Double(
        sign: .plus,
        exponentBitPattern: (1 as Double).exponentBitPattern,
        significandBitPattern: bitPattern
    ) - 1
}

/// Jens’s proposed alternative
func algo1(_ bitPattern: UInt64) -> Double {
    return Double(bitPattern) * Double.ulpOfOne / 2
}

// algo0 generates the number in the interval 1..<2, which loses
// one bit of significand precision. When we subtract 1 to move it
// to the interval 0..<1, we can see the significand stepping by 2
// for values slightly below 1.0 (which will have the lowest
// precision of any in the output range):

algo0((1 << 52) - 1).significandBitPattern  // 4503599627370494
algo0((1 << 52) - 2).significandBitPattern  // 4503599627370492

// algo1 claims to cover twice as many possible values, and indeed,
// we can see the resulting bit pattern stepping by 1 just below 1.0:

algo1((1 << 53) - 1).significandBitPattern  // 4503599627370495
algo1((1 << 53) - 2).significandBitPattern  // 4503599627370494

// We can verify that algo0 is doubling the precision of algo1 by
// checking that one step below 1.0 for algo0 is two steps for algo1:

algo0((1 << 52) - 1) == algo1((1 << 53) - 2)  // true

// ...and for conclusive evidence that algo1 is as dense as any
// algorithm that generates evenly spaced numbers can possiby be:

algo1((1 << 53) - 1) == 1.0.nextDown  // true

However, this an implementation detail; I don’t see that it fundamentally undermines the proposal’s design in any way.

4 Likes

I believe I have an updated implementation utilizing your method. I will hopefully push the changes tomorrow. If you have any more concerns, please post a comment on the implementation. Feedback greatly appreciated!

1 Like

I wasn't writing explicitly in reply to Jens, but the playground I posted is meant to demonstrate how different kinds of generators and distributions can be built on top of the functionality in this proposal. In particular, it includes several distributions modeled as infinite sequences, including a Gaussian/normal distribution, a couple different uniform distributions, and a discrete distribution that (I think) really shows off the composability of the proposed APIs tied to collection types.

While this proposal doesn't attempt to go as far with its random capabilities as the C++ STL, in what it does tackle it is far easier to use and it lays the foundation for different kinds of distributions in third-party libraries or for standard library adoption with future proposals.

1 Like

Quick review manager note: I just want to reinforce this point. If there is a concern about the candidate implementation, but not one that alters anything about the proposed API, we should discuss that on a separate thread on the standard library developer forum rather than in the review of the API.

Of course, if you think the implementation issue is caused by the API itself, that's appropriate for the review discussion.

2 Likes

As we're approaching the end of the review period, I wanted to try and focus the discussion in the last few days.

There's been a lot of great posts about whether the standard library should include other features, such as different random number distributions, or other random sources such as a seeded generator. These features may well be suitable as additions to the standard library, and can be considered as future proposals building on this one if it is accepted, but shouldn't be considered requirements for accepting/rejecting this proposal itself.

What is important is that there is confidence that the API as proposed can accommodate them. Any feedback on ways in which the APIs could prove problematic for integrating different kinds of distributions or generators would be very much appreciated. In assessing this, I'd urge everyone to take a look at Nate's playground demonstrating some of the ways this can be done with this API. In hindsight, it seems like the proposal could have included some of this material, and maybe we can retrospectively integrate some of it, but I don't think we need to make accepting the proposal contingent on that.

There are also some other topics that have been brought up, and for which additional justifications for or against what's proposed would be beneficial in finalizing the review:

  • the naming of Collection.random() versus randomElement() (see the rationale here)
  • whether that method should return an optional vs trap on empty, and relatedly whether, if it trapped, the static methods like Int.random(in:) would still be necessary.

(note, this is a call for rationales for a particular choice, rather than just a vote for a particular style)

Thanks to everyone for participating so far,
Ben

6 Likes

I'm working on an example implementation with a free Random(in:using:) function and an AnyObject constraint on RandomNumber generator, just for comparison purposes.

A nice side effect of the AnyObject constraint is that none of the functions need to be generic over the generator, and I can include a default value in all of them (which I'm assuming didn't work otherwise, since the playground implemented the default value with a manual override).

Hopefully I'm able to provide my justification as best as possible here for the points you mention.

I'm certainly open to the idea (not attached to either spelling). We must also understand that this rename also affects the range based API. (0 ..< 10).randomElement(). Spellings like this, in my opinion, gives the static T.random(in:) even more justification as randomElement() is semantically tied to collections. While Ranges can conditionally conform to become collections, I fear this spelling could hurt the meaning of what it means to get a random number from a range. I agree, this spelling makes a lot of sense for collections, but probably only collections.

I agree with where this line of thinking is coming from, but to me, it seems reasonable that a user should expect that there would a difference in the API when one is a static method vs an instance method.

Again, I don't have feelings towards one spelling or the other. I'm simply providing my rationale for naming the method the name it has now.

In regards to the second point, I think many others have already given great rationale for why this should return an optional. Requiring that this method trap would, in my opinion, cause more harm than good. Literal arrays are the exception in that you know for sure that the array is not empty, but it's very easy to not know what the arrays contents are at runtime. This results in this sort of boilerplate code where you're always checking if the array is empty before calling the random method. I feel Donald summarizes this perfectly.

Not only that rationale, but because there's already a precedent set by other facilities such as min() and max(). I keep bringing these back up because I believe a user could ask such questions like, "Why does min() and max() return optionals, but random() doesn't?" I'm a huge fan of consistency, and aligning random() with these other APIs seems to fit rather nicely.

In the case we do make this trap, is the static methods like Int.random(in:) still necessary? Yes.

If the naming of this method was randomElement(), one could make the argument that floating point ranges need to be consistent (huge fan of btw) with ranges that conditionally conform to collection. To me, it doesn't make sense to get a random element from a range that isn't a collection.
(1.0 ..< 10.0).randomElement() doesn't make sense because there is no element to select from. This justifies the static T.random(in:) methods which doesn't cause this confusion.

Now, if the naming of this method were named random() however, in the alternatives considered I wrote, Having a range as the primary spelling makes it fairly simple to get a random number from. Ranges are also very desirable because it doesn't encourage modulo bias. I still stand by this, but Nate brings up a great point that this design simply doesn't hold for other types that want to incorporate random functionality. Data.random(byteCount:using:) simply cannot be expressed so simply like a range. It's my mistake to not have mentioned this somewhere in the proposal, but I'm glad we're getting a chance to discuss it now.

I apologize if the proposal doesn't go so in depth in terms of my justification, but like I said, really glad that we get to discuss it now.

1 Like

I agree that the protocol should be constrained to class types, but having the generator not be generic would prevent the optimizer from specializing and inlining it, and I'd say efficiency is very important here.


And more generally, I still think that there hasn’t been nearly enough (hardly any?) actual practical usage/testing/experimentation using the proposed API in order to know how, or even whether or not, it can be successfully extended with custom (seedable) PRNGs, and eg Gaussian distributions etc, and IMHO it sounds like a lot of people here are OK with just assuming that this API is a good starting point and that everything will just work out even if we haven't actually tried to do anything real with it (ie try to use it for some little game with procedurally generated graphics and levels, or a little raytracer (which could use 2d gaussian values for multii-sampling) or something).

I haven’t looked into the playground of @nnnnnnnn which seems to be the only effort that has been made in this direction. But for example, it's impossible to measure efficiency in a playground settings.

My guess/feeling is that I will have to continue using my own little Random library (that I have been using, testing and modifying for a couple of years) even If this proposal is accepted. My requirements are PRNG, and the generators and the stuff based on them (conversions into various floating point ranges, other distributions, etc) have to be as fast as is possible, and I've sometimes needed different generators for different scenarious, sometimes trading speed for quality, perhaps a generator with a state that is of suitable size for a particular task, etc. I have never needed a non-seedable cryptographically secure RNG, but I understand that there might be others that have, and I understand the rationale of having it be the default. My use cases have been graphics/audio/signal processing (which would include games with procedurally generated graphics, levels and stuff like that).

I don’t have the time to dig deeper in this (trying out and/or providing example code etc) but maybe these negative ramblings of mine can provide some sort of non-negative value anyway.

1 Like

I think the RandomNumberGenerator's next methods should be mutating to permit structs to be able to adopt it sanely.

Please see this being discussed in these earlier posts.

Again, designing this API while at the same time actually using it in the context of some real example use cases would IMHO show that there is no need for speculation, and the only sensible choice for PRNG types is final classes (no performance trade off compared to structs if they are final and inlinable, and no point in trying to pretend that they don't have identity/state).

There are two points here that you’re asking for

  1. Non-uniform distributions: like your example of a 2D Gaussian. If you have a 1D Uniform distribution, you can already get any other distribution. There’s a map between any two 1D distributions, and multidimensionality is easy to achieve as well. Therefore, if there’s already a Uniform Dist. in the proposal, I think this is already solved.

  2. N-D Noise Functions: what you want to procedurally generate in videogames (like Minecraft) is something akin to Perlin Noise and Simplex Noise. These are seeded functions: given a seed k, you get a continuous function f_k(•) that takes a vector of N elements and returns a value between 0 and 1. These functions are hard to design, but the API might be ready for them.
    For Perlin Noise at least, you just need to, given the seed, compute a random floating point vector for every square/cube/hcube on a “sufficiently large” grid.
    Noise functions might look like magic, but in this case, I think all we’d need is the ability to (a) given a seed, (b) generate random vectors for a list of vectors. The behavior should be such that, as long as the key is the same, the generated output vectors for each input vector are always the same.

I don’t know if the proposal gives space to seeded functions, but given it’s already a big proposal, I’d be happy if it’s good enough to be extendable to include them. Like in an external library, for example.

That’s what you’re asking for, right @Jens? Good enough extendability?
I don’t think we can have PRNGs right out of the bat with this proposal, but we can ask for a good base for the future.

It's more complicated than this in practice (when you take into account optimizability etc). And these are things that has to be tested rather than assumed/speculated. The currently discussed Random API could be designed in ways that make these extensions straight forward and successful/fast or practically impossible, forcing users to make eg their gaussian 2d distr random number code on the side, unrelated to the Random API.

But perhaps this thread has seen enough of my negativity now : ) Maybe I will find the time to channel my concerns via requests for improvements instead, once this Random API has been approved and is available in dev snapshots.

1 Like

Ahh, I see what you’re saying now. Hmm. Yes. I don’t know how extendable this proposal is in terms of performance. I’ll take a look, maybe I can learn something.

What would be ideal is some way to call into LAPACK-level functions. I’ve read LAPACK provides some fast random generators, but I dunno how seedable they are.

The proposal does intend to use some battle-tested C libraries, fwiw. So that’s kinda fast already.

Anyway. I get your point now. I’ll read more and come back if I find any answers.

I'm just saying that if I have a fast implementation of some PRNG in Swift (I have!), and some fast ways of turning the 64 bit patterns it generates into say unit range [0, 1) or [-1, 1) Float values, then I'd like to be able to extend the proposed Random API using these things that I have without them being slowed down by the way the Random API was implemented/designed.

1 Like

I think the API only needs a seedable RNG protocol. If you standardize the behavior through a protocol, it shouldn’t add any overhead.

Maybe after the dust settles on this proposal and we get to play with the release, a new proposal extending its functionality to have seedable generators would be just right :slight_smile: