SE-0202: Random Unification

rparada · March 24, 2018, 2:13am

How come this returns an optional?

 (0..<15).random()

GarthSnyder · March 24, 2018, 6:22am

This is my concern as well. This is a crucial missing feature, not just because there's no API for seeding, but also because there's no default implementation of a seeded RNG in the kit.

Seeded RNGs are not just older, inferior, and messier versions of modern systems like /dev/urandom. They serve a completely different purpose: namely, they facilitate the development of systems that behave both "randomly" and reproducibly.

For example, if randomized torture testing of a data structure implementation reveals a bug, you can add the random seed to the test suite to ensure that the exact sequence of operations that revealed the original bug will be used as part of regression testing. Nobody cares that this supposedly "random" test sequence was one of only 2^32 possible options.

Lots of scientific and numerical computation approaches rely on "random" sequences, too, and it's often helpful to be able to reproduce these. If you tweak your markov chain monte carlo model, you want to see how much that model change affects the outputs. If you must also account for the fact that the model processes different random data every time it runs, that makes it much harder to pick out the effects of changes.

The current proposal draft doesn't do anything to prevent you from implementing a seeded RNG if you want to, but it doesn't help you out, either. All the points made to justify this proposal apply equally well to those who need seeded RNGs. The default seeded RNG should either select the underlying system's best available option or implement a portable algorithm that meets modern standards for randomness quality.

ahti · March 24, 2018, 7:16am

I disagree. I would go so far as to say it would be a mistake to include a seeded PRNG in the standard library for multiple reasons.

Firstly, from what I could find, all seedable PRNGs on Linux rely on some form of global state, which would not be a good idea for a RNG you might want to instantiate multiple times. Furthermore, no generator we could use is guaranteed to be available on all platforms Swift supports (or may support in the future), which is bad, because we'd want them to produce the same values for the same seeds on any platform.

So the only thing we could do is ship a PRNG implemented in the standard library. But with PRNGs, there are tradeoffs you need to decide on. Do you value speed over uniformity/predictability? What about memory usage? How important is the cycle length for your use-case?

So if we chose an algorithm to implement in the stdlib, it wouldn't be appropriate for all use-cases. And once we chose one, we could never ever change that choice, since (again) the random values generated for a seed must remain the same across Swift versions.

The only viable way to include a seeded PRNG would thus be to not include one but many, and probably to add new ones as new, better algorithms are found. This to me sounds like the job of one or many SPM packages, not the standard library.

ahti · March 24, 2018, 7:38am

+1 to RandomNumberGenerator and Random. Some of the criticism in this thread regarding the ways to get random numbers, collection elements etc. sound reasonable to me, but I could also live with the design as proposed.

One thing I wish the proposal would address more clearly is the matter of distributions, I assume they'd also implement RandomNumberGenerator and wrap an underlying generator, but a simple example of how that would work would be nice. Maybe also rationale for not having a separate protocol modelling a Distribution.

Absolutely. Random in Swift is a huge mess currently, and is important enough to live in the stdlib rather than SPM packages.

Yup. Providing a protocol to allow custom generators feels very swifty.

I have used basic random functionality in other languages. This Proposal seems to provide more flexibility in choosing custom generators when getting random elements from collections/numbers. (I have not used the C++ random system though).

I also once wrote a small SPM package that included pretty much the same protocol, but a few seeded PRNGs instead of the Random struct.

I read the proposal, a little of the discussion thread and the previous reviews here.

Jens · March 24, 2018, 8:05am

I agree that the stdlib is not a statistics library, which is another reason for why:

SiliconUnicorn · March 24, 2018, 1:15pm

+1 to RandomNumberGenerator.
I think that this is a wonderful idea that clearly improves the ability of the stdlib to provide cross-platform support. I also like that the proposal provides cryptographically secure random numbers, which, while not necessarily essential, will improve the security of apps made with Swift.

Paul_Cantrell · March 24, 2018, 2:22pm

+1. This is a reasonable, carefully thought-out approach to adding sorely missing functionality.

My compliments to the proposal authors for so clearly spelling out each design decision and the tradeoffs involved. I had many questions as I read, and the writing addressed each one. Well done!

Part of me felt like collection.random() should not be optional, but the comparison to .min() is compelling. It convinced me.

Despite that, like many others here, I find that the optionality of (0..<100).random() still sits poorly. Subjectively, (0..<0).random() feels more like division by zero than like .first or .min(), and it rankles me that (0..<100).random() requires a ! even though it is trivially statically safe (though not within the shape of Swift’s type system).

However, making Int.random(in: 0..<100) the primary mechanism for numeric randomness does ease many of these concerns — and @stephencelis makes a great point that we can abbreviate this as .random(in: 0..<100) in most circumstances.

I worry that the distinction between range.random() and .random(in: range) is too confusing. It is subtle, and not at all obvious at the point of use. This will be a perpetual pain point in Swift if we accept this proposal.

Considering all the possible alternative and all tradeoffs involved, however, I can't come up with a clearly better alternative. I'm inclined to accept the proposal as is.

P.S. A fixit for “numeric range.random() in non-optional context → .random(in: range)” would save untold developer tears.

kdawgwilk · March 24, 2018, 2:52pm

I am feeling pretty good with how the proposal is looking but one thing stuck out to me as to why first and last are computed properties and min() max() are methods along with random() being added would be a method? I feel that unifying these in the future would be beneficial but is definitely out of of scope for this proposal. Keeping that in mind I feel that random() being proposed would be more Swifty as a computed property instead of method and would line up with first and last which I assume are used more frequently than min() or max()

soroush · March 24, 2018, 2:59pm

This is a good proposal and I would happily accept it as is.

I think we should do this. It would help composability a lot imo.

I do think it’s a bit weird that if you have let emptyRange = 0..<0, emptyRange.random() returns an optional and Int.random(in: emptyRange) traps. Was Int.random(upTo: 10) considered? Or is stuff like Int.random(upTo: 10) - 5 problematic and we want to dissuade people from using it?

What are the benefits of making this a protocol? I assume it's primarily so that we can bring static RNGs for testing purposes. Will we also be able to build RNGs that support different distributions (gaussian, normal, etc)?

nnnnnnnn · March 24, 2018, 3:14pm

I'm an enthusiastic supporter of this proposal — it solves a few different problems that are tricky to implement correctly on your own, especially across platforms, and does so in a way that additional functionality can be built on top.

I've updated the playground I sent out earlier to use the proposed APIs — if you're wondering about what distributions and other generators could look like in practice, please take a look! The C++ approach is pretty overwrought IMO; the proposed approach offers the same benefits while still being user-friendly when that amount of power or control isn't needed. (cc @Nevin @Jens @ahti)

Download Playground →

One of the big questions is why the numeric generating APIs are static methods rather than just using the collection-based generator or a global function. There are a few competing goals that make this an inherently tricky proposition:

While ranges of integers form a collection, ranges of floating-point values do not. For Double to have a consistent interface for generating random values as Int (a preferred goal of mine), we'd need manual overloads of Range<T: BinaryFloatingPoint>.random() and the same for ClosedRange.
While integers and even floating-point types make sense to use with just a range, for other types this pattern falls down. While not part of this proposal, extensions to other types could follow the pattern established here:
```
let myVector = Vector.random(x: 20...50, y: 20...50, using: myGenerator)
let myData = Data.random(byteCount: 400, using: myGenerator)
```
Lastly, to me at least, generating a random number and picking a random element of a collection are fundamentally different operations. The collection method must deal with emptiness to be consistent with other collection methods, but generating a number from an empty range is a programming error and should be treated as such. The overlap between these methods is unfortunate (perhaps a different name could help), but I don't think it's ruinous.

nnnnnnnn · March 24, 2018, 3:38pm

That's great feedback - can you please make a comment to that effect on the implementation?

Alejandro · March 24, 2018, 3:42pm

This is definitely an interesting question to be asked, "What happens when we need to update one of these sources in the future?" To me I don't think such request would require another evolution proposal (as nothing is being changed from a user's perspective), but yes I don't want to have to be constrained to these sources for as long as time. I'm pretty sure a detailed pull request with sufficient information and details explaining the need to replace one of these sources in the future would be sufficient.

Alejandro · March 24, 2018, 3:52pm

.random() on collections return an optional because it follows other memberwise methods like .min() .max(). Removing this optional return type means that developers can just do collection.random() without even thinking about checking if the collection was empty or not. Returning an optional at least gives the developer some sort of acknowledgement that this method can fail if the collection is empty. What the developer does at this point, just !ing it on the fixit, or doing some checks to make sure the collection is not empty, is completely up to them.

Alejandro · March 24, 2018, 4:18pm

I never said the design was bad, I was just stating that his argument that repeating type information is un-Swifty doesn't hold for the free function either if others wanted anything other than Int or Double.

Yes, this is intentional. It is not a requirement of the protocol, but I do this for Random because we gain some small performance improvements by using _stdlib_random directly.

This is an interesting topic. My concern is what does it mean when a generator returns an optional? Deterministic generators should be able to always provide a value, meaning the chance of an empty value just isn't possible. I can see a world where developers may need to interface with the generator itself, and having to constantly ! the result of next() seems wrong for a facility with no semantic ties to an optional.

Zhu_Shengqi · March 24, 2018, 5:05pm

I think it's the user's responsibility to check if the collection is empty, like indexing an array. Otherwise we have to write expressions like [1, 2, 3].random()! , which confuse user more. If the user checks the collection is not empty (otherwise he do something else if is empty), he still needs to unwrap the optional. So what's the point of the optional return type?

DonaldHays · March 24, 2018, 5:51pm

+1 to the proposal.

I think I would echo Nevin's thoughts that rather than specifying the exact underlying strong random APIs that will be used on different platforms, the proposal should perhaps only mandate the required attributes of APIs to be used for the default generator, and the question of which APIs are used on which platforms are an implementation detail rather than part of the spec.

I personally have projects that have included similar Random APIs. If this proposal were implemented, I would be able to delete large amounts of my own code that's obviated by the APIs here, and the rest could be built on top the foundation that this proposal establishes. I wouldn't have to keep any APIs that do the same thing as this proposal, just in a different way.

In particular, in my own projects, I would implement a seed-based PRNG that conforms to RandomNumberGenerator, and would implement some distribution types that are similar to GameplayKit's GKRandomDistribution that take a RandomNumberGenerator as an argument in their initializers. But I would be able to delete my equivalent to the RandomNumberGenerator, my equivalent to the default strong random number generator, and copious APIs for generating random numbers from various common types and working with collections.

Some elements in the proposal I initially considered opposing, but eventually came around to agreeing with.

I initially felt that the proposal should include at least one seed-based PRNG. I was unable, however, to come up with which PRNG I felt would be the best candidate if there were only one in the standard library, and agreed with the sentiment that including multiple PRNGs quickly becomes unreasonable. GameplayKit, for example, ships with 3 PRNGs (ARC4, Linear Congruential, and Mersenne Twister), all of which have reasonable arguments for and against being the only default PRNG. I feel that when a developer needs a deterministic PRNG, it's likely they would also need certain runtime performance characteristics, have a certain threshold for required or acceptable randomness quality, and probably also depend on stability of implementation going forward (if you're using seeds, you probably always want the same numbers for the same seeds no matter what. Some PRNGs could reasonably have underlying constants changed between API versions (like if new constants were discovered for a LCG that have better distribution), but doing so would produce different streams of numbers for the same seed). Because of that, I think it's reasonable that developers requiring a PRNG should provide their own.

I felt for a good while that the API to get a random element from a collection should trap on an empty collection (being considered a programmer error), instead of returning an optional. There are two use cases that brought me around to agreeing with the proposal's approach. First, the case of a pre-known collection, like an array of constant numbers, or an array containing a 52-card deck. This is the case that argues for a non-optional API, because it feels nonsensical to ask Random to pick a card from an empty deck, and because developers will force unwrap the result when they know the collection will never be empty. The second case is where the collection you wish to pick from is the result of a runtime query, and might reasonably be empty. If optional, you can safely chain together the collection-returning query call with the API to get the random element (let element: Type? = query().random()), but if it traps on an empty collection, you have much more to write to be safe (let collection = query(); let element: Type? = collection.isEmpty ? nil : collection.random()). I feel the inconveniences of an optional API are outweighed by the inconveniences of a non-optional API.

soroush · March 24, 2018, 5:53pm

Iterators return an optional because returning a nil signals the end of the iteration. In the case of a random iterator, it would go on forever, so it never needs to return nil. Perhaps we could rename next() to generate() (as in RandomNumberGenerator) and then have next() (from IteratorProtocol) just forward to generate()?

I really think having our RNG sources be iterators as well will help us compose them into more interesting structures more easily.

jamieb · March 24, 2018, 5:56pm

+1 on the proposal, and the implementation.

Is the problem being addressed significant enough to warrant a change to Swift?

Definitely. Most languages have a solid random API built into them.

Does this proposal fit well with the feel and direction of Swift?

Yes.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I feel that this is similar to most implementations of this in other languages, and it's pretty standard. The proposal makes sense, and would be a very beneficial add.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I read the proposal.

jarod · March 24, 2018, 6:04pm

In the optional-returning version, the empty check would be redundant, so you wouldn't need to do it.

If the method returns a non-optional, I think it will be common for developers to introduce crashes by using it without considering the emptiness of the collection. It's more concise to just unwrap the optional than to do an empty check anyways:

if let element = array.random() {
	// ...
}

// vs

if !array.isEmpty {
	let element = array.random()
	// ...
}

Plus, force-unwrapping in a situation where you're certain the collection is non-empty is only one extra character while still making you consider whether it's actually the right thing to do.

David_Keck · March 24, 2018, 6:16pm

Absolutely agree this proposal is a great idea. Randomness should be easy to implement for experimention. It’s a good teaching aid as well. :) love the proposed syntax too.