SE-0202: Random Unification

I agree that the stdlib is not a statistics library, which is another reason for why:

+1 to RandomNumberGenerator.
I think that this is a wonderful idea that clearly improves the ability of the stdlib to provide cross-platform support. I also like that the proposal provides cryptographically secure random numbers, which, while not necessarily essential, will improve the security of apps made with Swift.

+1. This is a reasonable, carefully thought-out approach to adding sorely missing functionality.

My compliments to the proposal authors for so clearly spelling out each design decision and the tradeoffs involved. I had many questions as I read, and the writing addressed each one. Well done!

Part of me felt like collection.random() should not be optional, but the comparison to .min() is compelling. It convinced me.

Despite that, like many others here, I find that the optionality of (0..<100).random() still sits poorly. Subjectively, (0..<0).random() feels more like division by zero than like .first or .min(), and it rankles me that (0..<100).random() requires a ! even though it is trivially statically safe (though not within the shape of Swift’s type system).

However, making Int.random(in: 0..<100) the primary mechanism for numeric randomness does ease many of these concerns — and @stephencelis makes a great point that we can abbreviate this as .random(in: 0..<100) in most circumstances.

I worry that the distinction between range.random() and .random(in: range) is too confusing. It is subtle, and not at all obvious at the point of use. This will be a perpetual pain point in Swift if we accept this proposal.

Considering all the possible alternative and all tradeoffs involved, however, I can’t come up with a clearly better alternative. I’m inclined to accept the proposal as is.

P.S. A fixit for “numeric range.random() in non-optional context → .random(in: range)” would save untold developer tears.

1 Like

I am feeling pretty good with how the proposal is looking but one thing stuck out to me as to why first and last are computed properties and min() max() are methods along with random() being added would be a method? I feel that unifying these in the future would be beneficial but is definitely out of of scope for this proposal. Keeping that in mind I feel that random() being proposed would be more Swifty as a computed property instead of method and would line up with first and last which I assume are used more frequently than min() or max()

This is a good proposal and I would happily accept it as is.

I think we should do this. It would help composability a lot imo.

I do think it’s a bit weird that if you have let emptyRange = 0..<0, emptyRange.random() returns an optional and Int.random(in: emptyRange) traps. Was Int.random(upTo: 10) considered? Or is stuff like Int.random(upTo: 10) - 5 problematic and we want to dissuade people from using it?

What are the benefits of making this a protocol? I assume it's primarily so that we can bring static RNGs for testing purposes. Will we also be able to build RNGs that support different distributions (gaussian, normal, etc)?

I’m an enthusiastic supporter of this proposal — it solves a few different problems that are tricky to implement correctly on your own, especially across platforms, and does so in a way that additional functionality can be built on top.

I’ve updated the playground I sent out earlier to use the proposed APIs — if you’re wondering about what distributions and other generators could look like in practice, please take a look! The C++ approach is pretty overwrought IMO; the proposed approach offers the same benefits while still being user-friendly when that amount of power or control isn’t needed. (cc @Nevin @Jens @ahti)

Download Playground →

One of the big questions is why the numeric generating APIs are static methods rather than just using the collection-based generator or a global function. There are a few competing goals that make this an inherently tricky proposition:

  • While ranges of integers form a collection, ranges of floating-point values do not. For Double to have a consistent interface for generating random values as Int (a preferred goal of mine), we’d need manual overloads of Range<T: BinaryFloatingPoint>.random() and the same for ClosedRange.
  • While integers and even floating-point types make sense to use with just a range, for other types this pattern falls down. While not part of this proposal, extensions to other types could follow the pattern established here:
    let myVector = Vector.random(x: 20...50, y: 20...50, using: myGenerator)
    let myData = Data.random(byteCount: 400, using: myGenerator)
    
  • Lastly, to me at least, generating a random number and picking a random element of a collection are fundamentally different operations. The collection method must deal with emptiness to be consistent with other collection methods, but generating a number from an empty range is a programming error and should be treated as such. The overlap between these methods is unfortunate (perhaps a different name could help), but I don’t think it’s ruinous.
3 Likes

That’s great feedback - can you please make a comment to that effect on the implementation?

This is definitely an interesting question to be asked, “What happens when we need to update one of these sources in the future?” To me I don’t think such request would require another evolution proposal (as nothing is being changed from a user’s perspective), but yes I don’t want to have to be constrained to these sources for as long as time. I’m pretty sure a detailed pull request with sufficient information and details explaining the need to replace one of these sources in the future would be sufficient.

.random() on collections return an optional because it follows other memberwise methods like .min() .max(). Removing this optional return type means that developers can just do collection.random() without even thinking about checking if the collection was empty or not. Returning an optional at least gives the developer some sort of acknowledgement that this method can fail if the collection is empty. What the developer does at this point, just !ing it on the fixit, or doing some checks to make sure the collection is not empty, is completely up to them.

1 Like

I never said the design was bad, I was just stating that his argument that repeating type information is un-Swifty doesn't hold for the free function either if others wanted anything other than Int or Double.

Yes, this is intentional. It is not a requirement of the protocol, but I do this for Random because we gain some small performance improvements by using _stdlib_random directly.

This is an interesting topic. My concern is what does it mean when a generator returns an optional? Deterministic generators should be able to always provide a value, meaning the chance of an empty value just isn't possible. I can see a world where developers may need to interface with the generator itself, and having to constantly ! the result of next() seems wrong for a facility with no semantic ties to an optional.

I think it’s the user’s responsibility to check if the collection is empty, like indexing an array. Otherwise we have to write expressions like [1, 2, 3].random()! , which confuse user more. If the user checks the collection is not empty (otherwise he do something else if is empty), he still needs to unwrap the optional. So what’s the point of the optional return type?

+1 to the proposal.

I think I would echo Nevin’s thoughts that rather than specifying the exact underlying strong random APIs that will be used on different platforms, the proposal should perhaps only mandate the required attributes of APIs to be used for the default generator, and the question of which APIs are used on which platforms are an implementation detail rather than part of the spec.

I personally have projects that have included similar Random APIs. If this proposal were implemented, I would be able to delete large amounts of my own code that’s obviated by the APIs here, and the rest could be built on top the foundation that this proposal establishes. I wouldn’t have to keep any APIs that do the same thing as this proposal, just in a different way.

In particular, in my own projects, I would implement a seed-based PRNG that conforms to RandomNumberGenerator, and would implement some distribution types that are similar to GameplayKit’s GKRandomDistribution that take a RandomNumberGenerator as an argument in their initializers. But I would be able to delete my equivalent to the RandomNumberGenerator, my equivalent to the default strong random number generator, and copious APIs for generating random numbers from various common types and working with collections.

Some elements in the proposal I initially considered opposing, but eventually came around to agreeing with.

I initially felt that the proposal should include at least one seed-based PRNG. I was unable, however, to come up with which PRNG I felt would be the best candidate if there were only one in the standard library, and agreed with the sentiment that including multiple PRNGs quickly becomes unreasonable. GameplayKit, for example, ships with 3 PRNGs (ARC4, Linear Congruential, and Mersenne Twister), all of which have reasonable arguments for and against being the only default PRNG. I feel that when a developer needs a deterministic PRNG, it’s likely they would also need certain runtime performance characteristics, have a certain threshold for required or acceptable randomness quality, and probably also depend on stability of implementation going forward (if you’re using seeds, you probably always want the same numbers for the same seeds no matter what. Some PRNGs could reasonably have underlying constants changed between API versions (like if new constants were discovered for a LCG that have better distribution), but doing so would produce different streams of numbers for the same seed). Because of that, I think it’s reasonable that developers requiring a PRNG should provide their own.

I felt for a good while that the API to get a random element from a collection should trap on an empty collection (being considered a programmer error), instead of returning an optional. There are two use cases that brought me around to agreeing with the proposal’s approach. First, the case of a pre-known collection, like an array of constant numbers, or an array containing a 52-card deck. This is the case that argues for a non-optional API, because it feels nonsensical to ask Random to pick a card from an empty deck, and because developers will force unwrap the result when they know the collection will never be empty. The second case is where the collection you wish to pick from is the result of a runtime query, and might reasonably be empty. If optional, you can safely chain together the collection-returning query call with the API to get the random element (let element: Type? = query().random()), but if it traps on an empty collection, you have much more to write to be safe (let collection = query(); let element: Type? = collection.isEmpty ? nil : collection.random()). I feel the inconveniences of an optional API are outweighed by the inconveniences of a non-optional API.

6 Likes

Iterators return an optional because returning a nil signals the end of the iteration. In the case of a random iterator, it would go on forever, so it never needs to return nil. Perhaps we could rename next() to generate() (as in RandomNumberGenerator) and then have next() (from IteratorProtocol) just forward to generate()?

I really think having our RNG sources be iterators as well will help us compose them into more interesting structures more easily.

5 Likes

+1 on the proposal, and the implementation.

Is the problem being addressed significant enough to warrant a change to Swift?

Definitely. Most languages have a solid random API built into them.

Does this proposal fit well with the feel and direction of Swift?

Yes.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I feel that this is similar to most implementations of this in other languages, and it’s pretty standard. The proposal makes sense, and would be a very beneficial add.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I read the proposal.

In the optional-returning version, the empty check would be redundant, so you wouldn’t need to do it.

If the method returns a non-optional, I think it will be common for developers to introduce crashes by using it without considering the emptiness of the collection. It’s more concise to just unwrap the optional than to do an empty check anyways:

if let element = array.random() {
	// ...
}

// vs

if !array.isEmpty {
	let element = array.random()
	// ...
}

Plus, force-unwrapping in a situation where you’re certain the collection is non-empty is only one extra character while still making you consider whether it’s actually the right thing to do.

Absolutely agree this proposal is a great idea. Randomness should be easy to implement for experimention. It’s a good teaching aid as well. :) love the proposed syntax too.

I don’t think RNG needs to extend IteratorProtocol - in my experimentation I haven’t used them as sequences, just as sources for different amounts of random data. Would you mind saying a little more or showing what you mean by composing RNGs?

I was going to say that if you want 10 random numbers, Array(rng.prefix(10)) is preferable to (0..<10).map({ Int.random(in: range) }), but then realized that 10 random Int64 numbers isn’t all that valuable, since you’d get modulo bias if you tried to mod them. Consider this retracted!

The distributions in the playground better address the type of thing I’m after, and they do conform to Sequence.

1 Like

+1 from me.

Definitely. I have written a half-dozen ad-hoc wrappers around arc4random_buf, each time requiring me to carefully pour over man pages, avoid biases, and attempt to ensure uniformity. I’m not confident I got it right every time.

I appreciate the author providing reasoning and justification for decisions in the “Alternatives Considered” section.

Yes. It provides a convenient safe default and supports extensibility (e.g. @nnnnnnnn’s playground).

This feels in the top tier of convenient language support.

For serious usage, I would want to use a package similar to STL’s <random>. However, that feels beyond the current scope of the stdlib and this proposal handles convenient and common usage well and is extensible.

A quick read of prior threads, @nnnnnnnn’s Playground, the proposal, and a history of accumulated annoyances from not having this.

However, I’m unsure and unversed regarding what a default notion of a random floating point value should be. I would like this proposal to clarify the semantics, e.g. uniform in real numbers vs representable values. The proposal should ideally have rationale for the decision. (I recall this topic coming up in one of the mega-threads, but have since forgotten the details).

2 Likes

+1 for this. Great work.

The one thing that got me was Collection.random; I wasn’t immediately assuming Element. The property name doesn’t feel as obvious as it does for FixedWidthInteger.

Also, while potentially out of scope, I was also expecting a convenience initializer for Array similar to init(repeating:count:), but which takes a RandomNumberGenerator.

EDIT

I chose .random() over .randomElement() here because .randomElement() doesn’t match with things like .first, .last, .min(), or .max().

.random .any?

1 Like