SE-0202: Random Unification

palimondo · April 4, 2018, 3:23am

Were you searching for selected?

Jens · April 4, 2018, 8:14am

I have looked at it and I think it looks nice. But I still think there should have been two or three mini projects (like a little raytracer, a game etc) that makes practical use of the proposed Random API in a performance sensitive context. Design decisions could then have been based on insights from these mini projects rather than on speculations.

Also, as mentioned earlier, I think some parts of the Random API could have been separated into stand-alone APIs that would have been useful on their own, as well as together with the Random API, eg range conversions (converting from the full Range of UInt64 to a unit range floating point value).

I do not have any further complaints : ) But once the proposal has been approved and included in master, I might get back with some requests for improvements.

tali · April 4, 2018, 9:38am

I still hold the opinion that the best possible spelling is simply random(a..<b), for both integer and floating-point types.

Seconded.
I find it much more natural when random/randomElement is not modelled as a member, because ranges/collections don't have any randomness of their own.

tali · April 4, 2018, 10:22am

One thing I have been thinking about:

What about separating the API into something like:

/// just low level source for random bits
protocol RandomSource {
  public func next() -> UInt64
  public func next<T: FixedWidthInteger & UnsignedInteger>() -> T
}

and

/// higher-level API to get random numbers/elements
struct RandomNumberGenerator {
    let source: RandomSource
    func get<T: FixedWidthInteger>(_ range: Range<T>) -> T
    func get<T: FixedWidthInteger>(_ range: ClosedRange<T>) -> T
    func get<T: BinaryFloatingPoint>(_ range: Range<T>) -> T
    func get<T: BinaryFloatingPoint>(_ range: ClosedRange<T>) -> T
    func get<T: Collection>(_ elements: T) -> T.Element?
}

var random = RandomNumberGenerator(source: PlatformDefaultRandomSource())

This way it's always clear where the randomness is coming from and how to use a different source.
The global random object can be used to get all sorts of random stuff and is easy to discover / understand / extend.

If we want the shorter random(0..<5) syntax instead of random.get(0..<5), maybe we can build upon the DynamicCallable proposal. Or maybe somebody finds some other name which is short and better fits into a sentence (e.g. random.from(0..<5)?).

Jens · April 4, 2018, 11:28am

Storing the RandomSource as an existential in RandomNumberGenerator would probably prevent many optimizations, making it unnecessarily slow, and as source can be a reference type or value type, it's unclear how a (seedable/pseudo-) RandomNumberGenerator should be passed in order to preserve state changes between calls (should it always be inout or should all RandomSources be class-types? I'd say the latter makes most sense). Also, I guess the get methods could be called next.

Again, some concrete example use cases / projects in which the Random API could be put to actual use would quickly reveal these and many other issues/questions, as well as serving as an actual foundation for the design decisions, currently there is no such foundation except vague speculations.

It looks like a practical foundation for and evaluation of the design of the API will have to wait until after a(n essentially untested) proposed API gets approved and implemented, which seems a little backwards to me.

Tino · April 4, 2018, 11:41am

Afaics the current process supports nothing else - real world tests aren't part of Swift evolution.
But imho there have been decisions in the past that would have needed a evaluation period more than this proposal:
My expectations of randomness in the stdlib is that it's flexible and convenient - if people have special needs, they can still roll their own (ideally, but not necessarily, utilizing the groundwork discussed here).
Especially cryptographic security is an aspect I'd rather like to be added only if the price to pay for it is extremely low.
(btw: It would be nice if there would be an update of the proposal - afaics, there even have been some changes people have agreed on ;-)

Jens · April 4, 2018, 12:16pm

Most proposals are about eg adding a single method, in which case it is probably possible to evaluate it via discussion and small code snippets.

But in this case, some fundamental parts of the proposed API are still unclear / up for discussion (eg should generators be constrained to class types or not, should random methods "pollute" many other types or not, what performance impact does various designs decisions have, etc). I don't think these things had been as unclear by now if the API had been designed while using it in a couple of actual example projects.

That said, I can of course live with this proposal being approved, but I'm pretty sure that I will have to continue using my own little random library, not being able to integrate it with this one, for reasons that will probably only be clear once I try to do that with the actual implementation, and in some real world scenarios.

Ben_Cohen · April 12, 2018, 9:17pm

Acceptance In Principal/Proposed Further Discussion

Thanks to everyone for participating in this review. The core team intends broadly to accept the proposal, but would like further discussion in a few specific areas of the API design.

Below are the areas we would appreciate further discussion on. In particular, the question of static methods versus free functions, and the value vs reference type nature of generators.

Any posts on the specific topics below would be much appreciated. The review period will be extended to Monday, April 16th to allow for this discussion.

Generators as reference vs value types

The nature of the generator API implies that the generators must be reference-y, since they aren't passed inout to functions like shuffle, but aren't required to be reference types. This isn't great. The options are:

live with this
require they be a class.
change the APIs so they are taken inout.

Requiring classes forces an allocation for cases where the generator does not need to hold mutable state (for example, in a generator that sources from the OS). However, that allocation probably is not material to the other costs involved in random number-related operations.

Changing to inout makes the API more weighty to use. It would imply that generators could be value types. Value-typed seeded generators could easily lead to bugs – the user may not realize that by making a copy, they are essentially forking the generation at the point of the copy.

However, inout would be the correct syntax for a move-only generator type, once Swift has this concept, which would prevent this problem.

Static versus free functions

The team thinks making the random(in: Range<T>) family free functions, rather than static methods on T, may help with discoverability. There is also a term-of-art argument to be made: many APIs have random as a free function. If going with the term-of-art justification, this would also suggest that the argument label could be dropped too.

Naming of `Collection.random`

The team felt that Collection.randomElement would work better than Collection.random. Unlike .first or .max, the word "random" doesn't noun well, so adding "Element" improves readability. The team agrees that this method should return an optional value rather than trap, consistent with similar APIs like min and max.

Implementation Details

The core team believes it would be better to leave implementation details of the default source to the individual platforms, rather than defining the implementation for each platform as part of the proposal. The characteristics of the implementation should be documented by each platform. Implementations are strongly encouraged to prefer a secure RNG if this is practical on their platform. The team agrees with the decision to not have the generator API be failable.

Jens · April 12, 2018, 10:15pm

Generators as reference vs value types

+1 for this. I have tried out both ways extensively in my own prng library (in which I care a lot about efficiency), and in my tests, Swift's optimizer (and my module internal library) has made sure that there is no performance overhead when using a final class for the prng compared to using a struct passed via inout, or having the prng simply be a closure with captured state for that matter. I can't say anything about the future move-only aspect, but in my experience, allowing struct prngs (and passing them via inout) simply brings a lot of potential trouble without any real benefits.

Static versus free functions

I agree that making the random(in: Range<T>) family free functions is the right decision, for the same reasons stated by the core team.

Naming of Collection.random

I'd prefer if this was also part of the free function family, eg

func random<C, R>(elementFrom: C, using: R = default) -> C.Element
    where C: Collection, R: RandomNumberGenerator
{
    ...
}

for the same reasons as above.

Joe_Groff · April 12, 2018, 10:22pm

Ben_Cohen:

Generators as reference vs value types
The nature of the generator API implies that the generators must be reference-y, since they aren’t passed inout to functions like shuffle, but aren’t required to be reference types. This isn’t great. The options are:

live with this
require they be a class.
change the APIs so they are taken inout.

Requiring classes forces an allocation for cases where the generator does not need to hold mutable state (for example, in a generator that sources from the OS). However, that allocation probably is not material to the other costs involved in random number-related operations.

Changing to inout makes the API more weighty to use. It would imply that generators could be value types. Value-typed seeded generators could easily lead to bugs – the user may not realize that by making a copy, they are essentially forking the generation at the point of the copy.

However, inout would be the correct syntax for a move-only generator type, once Swift has this concept, which would prevent this problem.

To me, the value type inout approach seems like the right thing to do. As you noted, this is the right design using move-only types, and would allow for value type stateful PRNGs to be defined and used with some confidence in their thread safety being guarded by the value semantics exclusivity model. Most users of RNGs probably do not really want the indirection and allocation overhead or the dynamic exclusivity enforcement and race-unsafety complications of using classes. Although it's true that accidentally copying a random number generator value could lead to unexpected forking, copying an RNG to fork generation is also a reasonable thing to do in some circumstances. The failure mode seems similar in principle to accidentally copying a Hasher as proposed in SE-0206, and forking a hash operation has about the same ability to go unnoticed as accidentally forking a PRNG. We seem to be comfortable with a value-semantics-based design for Hasher, Swift users are hopefully familiar with value types and how they behave, and value types seem like the right long-term design for RNGs if we get the ownership model, so that seems like the right way to go.

Ben_Cohen · April 12, 2018, 10:24pm

This would be highly inconsistent with other ways of extracting elements from a collection, which are all methods.

It also creates a problem with shuttling back and forth from right to left:

random(stuff.dropFirst()).map({ Foo(bar, $0) })
// instead of the clear progression left-to-right
stuff.dropFirst().random().map({ Foo(bar, $0) })

(yes, initializers also have this problem... it's a problem...)

Jens · April 12, 2018, 10:29pm

@Ben_Cohen, I see your points and may agree with you.

hlovatt · April 12, 2018, 11:11pm

+1 from me. Much easier to pass round without accidentally copying, which makes no sense.

+1 to both - more natural in Swift.

+1 reads more naturally.

-1. I think it is best to specify that it must be secure random. From experience on very small platforms, micro controllers, it is easy to implement a secure random using a timer that is running for some other reason and a hash algorithm. Therefore no excuse not be secure.

ahti · April 12, 2018, 11:22pm

Using a clock as a seed to a hash algorithm is about as far from a secure random as you can get. On micro controllers especially, since they often do not have real-time clocks and offer very predictable performance characteristics, which means your seed might well be the same value every run.

I'm not sure where that leaves me wrt requiring security from the default generator. On some platforms, it might be very hard to impossible to find a secure source of randomness, but we also don't want code designed for cryptographic randomness to just blindly run using a less secure generator.

Just thinking out loud: Maybe we should just mark the default generator @unavailable on platforms where we can't provide a cryptographically secure source of randomness?

hlovatt · April 13, 2018, 12:14am

The answer below may be very specific to our application which has IO, etc. as a foreground task and uses random in the background.

The way we have coded the random function is to mix in the timer value into a hash function each time random is called. This is random for us because IO, etc. which are on interrupts are a higher priority and are naturally not deterministic and therefore the calls to random are non-deterministically spaced. (The IO etc. is non-deterministic because messages come from other processes, the communication channel incorporates random-delay retry, they depend on external stimuli, etc.)

Do you have an example of where you need random and something like the above isn't possible?

Not saying there isn't such a case, just that I haven't encountered one.

benrimmington · April 13, 2018, 12:25am

If the RandomNumberGenerator.next(upperBound:) method is required, could it be changed to use PartialRangeUpTo<T> (and perhaps PartialRangeThrough<T>) instead?

generator.next(upperBound: count) // Half-open or closed range ???

generator.next(in: ..<count) // Partial half-open range.

generator.next(in: ...count) // Partial closed range.

Alternatively, the upperBound: label could be upTo: and through: instead.

DonaldHays · April 13, 2018, 1:37am

+1 to this -1.

+1 to this.

I feel that changing the specification for the default generator from "requires secure" to "strongly encourages secure" materially undermines its value proposition to such an extent that clients that require a secure generator would no longer be able to use the default generator. Even if they know that—for example—Darwin and Linux currently have secure implementations, the specification means that such an attribute can not be be relied upon to apply in future versions.

If it is not possible for a platform to provide a secure implementation of the default generator, I feel it would be better for that platform to mark the default generator as unavailable.

ahti · April 13, 2018, 2:05am

I'm not saying that method doesn't work to provide good randomness, but as far as I can tell (I am not a cryptographer, so if someone more knowledgable wants to jump in) it does not result in cryptographically secure randomness. Time is easily predictable and hashing by nature deterministic. This means we can probably brute-force or even predict the values produced (depending on how precisely we can tell when random is called and how easy it is to verify a guess).

davedelong · April 13, 2018, 2:38am

Properties have the implied convention of being idempotent: no matter how many times you call them, they "should" return the same value.

I'm very firmly of the option that .random (or .randomElement) on Collection should be a method: .random() /.randomElement()

Alejandro · April 13, 2018, 2:40am

Well I've got good news, they are! The confusion comes from Ben not adding the () at the end.