SE-0202: Random Unification

Jon_Hull · April 3, 2018, 3:12pm

I definitely think the random(in:) methods are necessary in any case. Ranges do cover the most common use-cases, but we are building something here that is meant to be extensible by third parties over time, and we shouldn't forget that.

The ranges are a cute short hand for values which can fit in ranges, but random(in: using:) provides a good default template for other types to build on. Data has been mentioned as a prime example, but there are many others (I, for example, have random colors and CGVectors in my code) that would benefit from a common/predictable call format.

I also agree with Chris' assessment that we should have Int.random(using:) which returns a random value from the full range. The worry about people using % under the new system is overblown. Worst case, we can add a fixit suggesting the .random(in:) version.

Jon_Hull · April 3, 2018, 4:05pm

Optional-returning randomElement() seems correct for collection, but trapping random() seems better for Ranges. Would it be too much to just do both? Put one on Collection and the other on Ranges? I know Integer ranges are also collections, so there would be some minor overlap, but it might be a small price to pay for the improvement elsewhere. (Also note that they would have different names & signatures so it shouldn't cause too much trouble).

Dante-Broggi · April 3, 2018, 4:48pm

The fact that seedable RandomNumberGenerators have state does not detract from the fact that getting its next value is an inherently mutating operation.
Additionally, having next declared mutating should not prevent classes from adopting it.

Alejandro · April 3, 2018, 5:12pm

If we agree that T.random(in:) should be the primary spelling, I don’t see a reason to give ranges a special API. In the implementation, floating point ranges don’t have a random interface at all because the idea is that the primary spelling should be Double.random(in:). Giving integer ranges a special API gives off the impression that we need to also support floating point ranges. Not only would ranges have two different methods that essentially do the same thing, but there would be two exact spellings for getting a random number. The only reason the proposal mentions integer ranges at all is because they receive this shiny new method on collection by default. We can’t really stop ranges from having this, but the implementation at least supports the functionality.

nnnnnnnn · April 3, 2018, 6:22pm

Chris, if we have the rest of these APIs, in what contexts do you see people needing a random full-width Int? In our pre-review discussion of the proposal, the idea was that those cases generally need random "data" rather than integer values, so we would push authors toward using the UInt64s produced by RandomNumberGenerator.next() instead.

dabrahams · April 3, 2018, 9:30pm

As I've said in the past, I don't think it's important to cover all or most of what C++ covers in the standard library; however, I think it's important to demonstrate that it is possible to write components that, together with the standard library, and without duplicating its efforts, cover that space. Jens' post seemed to suggest that the proposal didn't hit that mark. Ideally I'd like to know that @Jens is now satisfied. If he's not, I'm happy to dig deeper to satisfy myself

xwu · April 3, 2018, 9:51pm

I've been delinquent in reviewing this proposal, and I apologize for slipping this in under the wire here.

I know some of the points below have already been covered by other reviewers, but as the thread has grown unwieldy, I have to apologize that it's hard to keep track of who said what and won't attempt to cite individuals. Instead I'll keep it brief---

What is your evaluation of the proposal?

This proposal addresses a pressing need. We've discussed the APIs at length during the pitch phase, and @Alejandro deserves a lot of credit for working so diligently to synthesize the discussion. In its final form, the API largely reflects what was consensus during the pitch phase as to the proper breadth for a standard library implementation.

I will set aside issues with implementation for commentary at a later time and in a different forum. Instead, I will bring up a few concerns here about the design that I think would merit addressing:

Design of `RandomNumberGenerator`

Some PRNG algorithms only generate 32-bits of randomness at a time. I think it'd be fair to allow the result of next() to be of any associated type that conforms to UnsignedInteger, not just UInt64. This does not impair the ability to write a performant generic default implementation of next().

Visibility of the default RNG

As mentioned during the pitch phase, I agree with the decision that the default RNG should not be seedable; since it's not seedable, and since it's a thread-safe singleton that should never need to be instantiated, I don't think the default RNG needs to be a publicly visible entity. That is to say, I don't think we need to have a public type named Random. This may decrease the API surface area of the proposed additions.

Choice of default RNG

As others have mentioned, it isn't necessary to specify the exact implementation as part of the proposal. But the principles behind choosing to use one implementation or another does go to the intended design.

Here, IIUC, the enunciated principle is that the default RNG should be as secure as pragmatically feasible, which I think is the right choice.

However, I am concerned about the philosophy behind using a cryptographically-secure arc4random for newer versions of macOS and SecCopyRandomBytes for earlier versions. They reflect different philosophical approaches to "pragmatic feasibility." In my understanding, the latter API is a potentially slower one that is meant to be suitable for users to seed their own PRNGs with extremely good random bits, while the former API is a cryptographically-secure PRNG seeded and re-seeded periodically using extremely good random bits.

Now, in practice, there may be little distinction between the two (as to performance or security), but the question remains: is Swift's default RNG a slower one intended to provide extremely good random bits even at a performance cost, or a cryptographically-secure PRNG seeded using those random bits? I think the answer should be the latter. But more importantly, I think we ought to decide on an answer as part of the design.

Static methods named `random`

As mentioned during the pitch phase, I think Bool.random() doesn't pull its own weight:

@Alejandro has said that Bool.random(using:) isn't intended to allow different distributions (i.e., an unfair coin toss); in that case, a 50:50 choice between heads or tails doesn't seem to be worth two methods in the standard library when it is trivial to implement in other ways.

`Collection` APIs

I agree with @Chris_Lattner3 that randomElement might be a more suitable name. In general, having multiple distinct methods named random that have different semantics seems like an inferior choice when we have the option of distinguishing them without impairing readability.

I agree with others that a shuffle API seems separable from the rest of the proposal; like Bool.toggle, it may be worth considering separately.

Moreover, I'd urge study as to whether a more general API could take the place of shuffle: specifically, a method that allows the user to choose n elements randomly without replacement. When n is equal to count, the method is equivalent to shuffle. This design incidentally has the nice benefit that it (a method to choose n elements randomly without replacement) would be to randomElement as prefix is to first.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Yes, with the caveats discussed above.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I have used similar features in many other languages; I think the proposed design generally compares favorably and very smartly avoids certain pitfalls (such as unwittingly encouraging modulo bias).

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

In-depth study.

xwu · April 3, 2018, 9:55pm

How would you square that with empiric evidence brought forward during the pitch phase that, in Apple's code base, using % is both widespread and almost entirely wrong?

dabrahams · April 3, 2018, 10:00pm

I don't understand why we are proposing to support

Int.random(in: a..<b)

since it's equivalent to

(a..<b).randomElement()

It seems like you need T.random(in: a..<b) only for types without an integer stride, which basically boils down to floating point at the moment.

xwu · April 3, 2018, 10:07pm

I was of that opinion as well at one point, but I've warmed to @Alejandro's design.

First, notionally, you can't write an implementation of the latter (randomElement) without some way of generating a random index distance (aka Int), so the former method is more "foundational" in semantics even if not as a matter of implementation necessity.

Second, and relatedly, users reaching for "I want a random integer" are unlikely immediately to rephrase that as, "I want a random element from the finite collection of integers within a bounded range." The former API is likely to be significantly more discoverable for this use case, at least IMHO.

Nevin · April 3, 2018, 10:36pm

I think that it is more common to want “n elements randomly without replacement” than “one random element” from a collection, and certainly more generally useful. However, *designing* such an API may be a challenge.

On the other hand, shuffle() makes it trivially easy to get “n elements randomly without replacement”, thus solving the base problem. And even if we later add a tailored method for “n random elements”, we would *still* want shuffle() for, well, shuffling.

So in my view, we should accept shuffle(), and defer consideration of “n random elements”. In other words, I support the proposal on this point.

I still hold the opinion that the best possible spelling is simply random(a..<b), for both integer and floating-point types.

dabrahams · April 3, 2018, 10:38pm

I don't feel particularly compelled by the first point, but discoverability is a more practical issue. In principle I'm still uncomfortable with that as grounds for redundancy in such simple APIs, but I don't think the harm is too great in this case.

dabrahams · April 3, 2018, 10:49pm

Having looked at the playground, it seems to cover the basics. @Jens, it didn't take long; why not have a look and see what you think?

dabrahams · April 3, 2018, 11:00pm

It's not bad, and arguably more discoverable than any of the T. spellings.

xwu · April 3, 2018, 11:16pm

Designing or implementing? Although I'm not sure why either would be a challenge.

On the contrary, getting n elements randomly without replacement seems to me to be the base problem; shuffle is just the specific case where n == count and I don't see why we'd need a separate method for that.

By contrast, I was writing to suggest that we omit shuffle() and defer consideration of these APIs to a later proposal.

felix91gr · April 3, 2018, 11:21pm

Deferring shuffle() makes sense. It’s almost a derivative of the core of this proposal. In most scenarios, it’s better to make two smaller steps than a huge one. The shuffle() kind of functionality can wait and become the second step that builds on top of this.

Ben_Cohen · April 3, 2018, 11:41pm

It is equivalent to shuffled, not shuffle. I am not sure you can draw an equivalent parallel to an in-place operation.

Shuffling arrays is one of the most requested standard library features, and "how do I shuffle an array?" is the #4 most linked swift question on stack overflow. Even if you can reproduce shuffled using choose* (which I think could itself make a good follow-on proposal) I think there is a strong case that both names should exist.

I would urge against delaying its inclusion beyond this proposal.

* or should it be choosed? chosen? choiced?

Nobody1707 · April 3, 2018, 11:47pm

If I remember correctly, the compiler can inline method calls when passing existentials of protocols with an AnyObject constraint (assuming the concrete type was final), but I'll have to actually check that. If it can't then the functions can just be made generic over the generator again.

Jon_Hull · April 4, 2018, 12:33am

I think a big part of that is due to most sample code (and stack overflow) using %. If we introduce "This is the way to do basic randomness in Swift" and we make sure all of the sample code uses .random(in:) then I think it should go a long way to fixing the issue.

My main point was that it is such a predictable pattern that we can just look for .random() % n and provide a warning/fixit that suggests .random(in:). We don't have to cripple the rest of the API for fear of misuse.

masters3d · April 4, 2018, 3:21am

I think collection.random() is going to be confusing because it is unclear which part of random. There is not a clear indication that .random() is singular. I think the better way to go with something like collection.firstRandom() or collection.first(randomGenerator: ). This would be more closely related to .first(where:). Another possibility would be:

public func first<T: RandomNumberGenerator>(where randomGenerator: T) -> Element? {...}
public func firstRandom() -> Element? {
    return first(where: Random.default)
}

I prefer firstRandom and the overload because they are easier to find with auto completion.
.first clearly communicates it is only going to return one thing. This is better umbrella to have this functionality in the same way that we already provide other finding method on fist like where.