[Proposal] Random Unification

xwu · October 5, 2017, 4:26am

Glibc random is not cryptographically secure, nor is it thread-safe.
getrandom() fails in more ways than arc4random and is therefore a primitive
on which we can base a shim but isn't suitable by itself. Swift used to use
libbsd internally on Linux but that dependency was removed. Currently, it
uses the C++ Mersenne twister engine on all platforms, which is not
cryptographically secure, with a device random seed that is not guaranteed
to be random by the C++ standard if no device is available. Likely,
therefore, a shim will have to be manually implemented.

···

On Wed, Oct 4, 2017 at 23:17 Alejandro Alonso via swift-evolution < swift-evolution@swift.org> wrote:

I think this is a good idea. I start asking questions about what our
default generator for linux will be if we use Darwin’s arc4random(3). Do we
use Glibc’s random()? If so, what do we seed it with?

- Alejandro

On Oct 4, 2017, 6:26 PM -0500, Ben Cohen via swift-evolution < > swift-evolution@swift.org>, wrote:

On Sep 30, 2017, at 3:23 PM, Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:

On Sep 11, 2017, at 9:43 PM, Brent Royal-Gordon <brent@architechies.com> > wrote:

On Sep 9, 2017, at 10:31 PM, Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:

- I’d love to see several of the most common random kinds supported, and I
agree it would be nice (but not required IMO) for the default to be
cryptographically secure.

I would be very careful about choosing a "simple" solution. There is a
log, sad history of languages trying to provide a "simple" random number
generator and accidentally providing a powerful footgun instead. But:

- We should avoid the temptation to nuke this mosquito with a heavy handed
solution designed to solve all of the world’s problems: For example, the
C++ random number stuff is crazily over-general. The stdlib should aim to
solve (e.g.) the top 3 most common cases, and let a more specialized
external library solve the fully general problem (e.g. seed management,
every distribution imaginable, etc).

That's not to say we need to have seven engines and twenty distributions
like C++ does. The standard library is not a statistics package; it exists
to provide basic abstractions and fundamental functionality. I don't think
it should worry itself with distributions at all. I think it needs to
provide:

1. The abstraction used to plug in different random number generators
(i.e. an RNG protocol of some kind).

2. APIs on existing standard library types which perform basic
randomness-related functions correctly—essentially, encapsulating Knuth.
(Specifically, I think selecting a random element from a collection (which
also covers generating a random integer in a range), shuffling a mutable
collection, and generating a random float will do the trick.)

3. A default RNG with a conservative design that will sometimes be too
slow, but will never be insufficiently random.

If you want to pick elements with a Poisson distribution, go get a
statistics framework; if you want repeatable random numbers for testing,
use a seedable PRNG from XCTest or some other test tools package. These can
leverage the standard library's RNG protocol to work with existing random
number generators or random number consumers.

+1 to this general plan!

This pretty much exactly matches my preferences.

If random numbers go into the std lib, they should being able to
customize the source of randomness for speed or test reproducibility, but
default to something sensible without the user having to know it’s
configurable. On Darwin that default should be based on arc4random(3). The
std lib doesn’t need to provide other non-default random sources.
Non-random sources for testing should be part of test frameworks and plug
in easily.

The proposal should include shuffle and random element from collection,
which are much-requested and not really the controversial part so won't
hold up the overall progress of the proposal.

(and no need for distributions other than uniform IMO, or otherwise)

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · November 6, 2017, 3:08am

I can agree with mostly everything in here. I think `.random` on
`RandomAccessCollection` should mimic the current design with `.first` and
`.last` by returning an optional. In terms of the naming of this, we have
to look at how python structures the call site. `random.choice([1, 2, 3,
4])` To me this reads, random choice within this array. This works because
of how it’s called. With the proposed solution, we are calling to get a
random element directly from the array. So I stick by with naming this
random.

On the subject of bike shedding the names, I can agree to use
`RandomNumberGenerator` whole heartily. As for `Randomizable`, I agree
there might be a better name for this, but the question is what?

That is a good question. I don't like `RandomlySamplable`; it might be that
`Randomizable` could be the least bad option.

One more comment that I neglected to make. You don't need to use an `enum`
in order to make a type non-initializable. A `struct` can do the same with
a private init. There are established singleton patterns for Swift and an
`enum` with a single case is...atypical. I'm also not sure why this should
be a public type at all. Why expose `Random.default.next()`? Shouldn't the
user always use `T.random`? Afaict, `Random` can be an internal type
`_Random` that the user never sees at all.

···

On Sun, Nov 5, 2017 at 8:21 PM, Alejandro Alonso <aalonso128@outlook.com> wrote:

On Nov 5, 2017, 7:56 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:

I like this particular version. In particular, the choice of algorithms
is, afaict, correct and that is incredibly important. I had overlooked that
`arc4random` is cryptographically secure past a certain version of macOS,
but you are absolutely right. I am also on board with the fatal error
suggestion if random entropy is unavailable; I think it must be amply
documented, though.

I do think, however, that you're overloading too many things into the word
"random" when they're not the same. Take a look at Python, which is pretty
widely used for numerics. There's `rand` and `random` for getting a random
integer or floating-point value, and there's `choice` and `sample` for
choosing one or more values out of a collection without replacement. These
are sufficiently different tasks and don't all need to be called "random"
or satisfy the same requirement of the same protocol. Put another way, it's
absolutely *not* inconsistent for numeric types to have `random()` while
collection types have a differently named method.

By contrast, I think the great length of text trying to justify naming all
of these facilities `random` in order to parallel `first` and `last` shows
how the proposed design is comparatively weaker. You have to argue that (a)
`Int.random` shouldn't return an optional value because it'd be unwieldy,
and therefore `(0..<5).random` shouldn't either because it would then be
inconsistent; but (b) that `(0..<5).random` should be spelled and behave
like `(0..<5).first` and `(0..<5).last` even though the user must handle
empty collections totally differently because the return types are not the
same. Either `(0..<5).random` should behave analogously to `first` and
`last` or it should not. If it should, it only makes sense to return a
result of type `T?`. After all, if a collection doesn't have a `first`
item, then it can't have a `random` item. Put another way, having a `first`
item is a prerequisite to having a randomly selectable item. The behavior
of the Swift APIs would be very consistent if `first` returns `T?` but
`random` returns `T`. However, I agree that unwrapping `Int.random` every
time would be burdensome, and it would not make sense to have a type
support `random` but not have any instantiable values; therefore, returning
an optional value doesn't make sense, and it follows that `Int.random`
*shouldn't* behave like `first` or `last`.

Once you stop trying to make what Python calls `rand/randint` and
`choice/sample` have the same names, then finding a Swifty design for the
distinct facilities becomes much easier, and it suggests a pretty elegant
result (IMO):
[1, 2, 3, 4].choice // like `first` or `last`, this gets you a value of
type Int?
[1, 2, 3, 4].sampling(2) // like `prefix(2)` or `suffix(2)`, this gets you
a subsequence with at most two elements

Int.random // this gets you a random Int; or it may trap
Float.random // this gets you a random Float; or it may trap
With that, it also becomes clear why--and I agree with you--an independent
`Int.random(in: 0..<5)` is not necessary. `(0..<5).choice` is fine, and it
can now appropriately return a value of type `T?` because it no longer
needs to parallel `Int.random`.

* * *

More in the bikeshedding arena, I take issue with some of the names:

- I reiterate my comment that `Randomizable` is not the best name. There
are multiple dictionary definitions of "randomize" and one is "make
unpredictable, unsystematic, or random in order or arrangement." Wikipedia
gives at least five different contextual meanings for the word. What you're
doing here is specifically **random sampling** and we can do better to
clarify that, I think.

- While I agree that `RNG` can be cryptic, the alternative should be
`RandomNumberGenerator` (as it's called in other languages);
`RandomGenerator` is not quite accurate. Again, we're _consuming_
randomness to _generate_ numbers (or values of other type, based on the
result of a generated number). We're not _generating_ randomness.

On Sun, Nov 5, 2017 at 6:33 PM, Alejandro Alonso <aalonso128@outlook.com> > wrote:

[Proposal] Random Unification by Azoy · Pull Request #760 · apple/swift-evolution · GitHub is the current API and
proposed solution.

- Alejandro

On Nov 5, 2017, 6:18 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:

My comments are directed to the "more up-to-date" document that you just
linked to in your reply to Jon. Is that one outdated? If so, can you send a
link to the updated proposal and implementation for which you're soliciting
feedback?

On Sun, Nov 5, 2017 at 6:12 PM, Alejandro Alonso <aalonso128@outlook.com> >> wrote:

The proposal and implementation have the current updated API. The link I
sent Jon was the one I brought up a few weeks ago which is outdated now.
The proposal answers all of your questions. As for `.random` being a
function, some would argue that it behaves in the same way as `.first` and
`.last` which are properties.

- Alejandro

On Nov 5, 2017, 6:07 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:

A few quick thoughts:

I know that there's been some discussion that `(1...10).random` is the
best spelling, but I'd like to push back on that suggestion. When I want a
random number, I tend to think of the type I want first ("I want a random
integer") and then a range ("I want a random integer between a and b"), not
the other way around. My intuition is that `Int.random(in:)` will be more
discoverable, both on that basis and because it is more similar to other
languages' syntax (`Math.random` in JavaScript and `randint` in NumPy, for
example). It also has the advantage that the type is explicit, which I
think is particularly useful in this case because the value itself is,
well, random.

I would also argue that, `random` is most appropriately a method and not
a property; there's no hard and fast rule for this, but the fact that the
result is stochastic suggests (to me) that it's not a "property" of the
range (or, for that matter, of the type).

I would reiterate here my qualms about `Source` being the term used for
a generator. These types are not a _source_ of entropy but rather a
_consumer_ of entropy.

`UnsafeRandomSource` needs to be renamed; "unsafe" has a specific
meaning in Swift--that is, memory safety, and this is not it. Moreover,
it's questionable whether this protocol is useful in any sense. What useful
generic algorithms can one write with such a protocol?

`XoroshiroRandom` cannot be seeded by any `Numeric` value; depending on
the specific algorithm it needs a seed of a specific bit width. If you
default the shared instance to being seeded with an `Int` then you will
have to have distinct implementations for 32-bit and 64-bit platforms. This
is unadvisable. On that note, your `UnsafeRandomSource` needs to have an
associated type and not a generic `<T : Numeric>` for the seed.

The default random number generator should be cryptographically secure;
however, it's not clear to me that it should be device random.

I agree with others that alternative random number generators other than
the default RNG (and, if not default, possibly also the device RNG) should
be accommodated by the protocol hierarchy but not necessarily supplied in
the stdlib.

The term `Randomizable` means something specific which is not how it's
used in your proposed protocol.

There's still the open question, not answered, about how requesting an
instance of the hardware RNG behaves when there's insufficient or no
entropy. Does it return nil, throw, trap, or wait? The proposed API does
not clarify this point, although based on the method signature it cannot
return nil or throw. Trapping might be acceptable but I'd be interested to
hear your take as to why it is preferable.

On Sun, Nov 5, 2017 at 4:43 PM, Alejandro Alonso via swift-evolution < >>> swift-evolution@swift.org> wrote:

For the proof of concept, I had accidentally deleted that one. I have a
more up to date one which was discussed a few weeks later.
Swift Random Unification Design · GitHub

- Alejandro

On Nov 5, 2017, 4:37 PM -0600, Jonathan Hull <jhull@gbis.com>, wrote:

Is there a link to the writeup? The one in the quote 404s.

Thanks,
Jon

On Nov 5, 2017, at 2:10 PM, Alejandro Alonso via swift-evolution < >>>> swift-evolution@swift.org> wrote:

Hello once again Swift evolution community. I have taken the time to
write up the proposal for this thread, and have provided an implementation
for it as well. I hope to once again get good feedback on the overall
proposal.

- Alejandro

On Sep 8, 2017, 11:52 AM -0500, Alejandro Alonso via swift-evolution < >>>> swift-evolution@swift.org>, wrote:

Hello swift evolution, I would like to propose a unified approach to
`random()` in Swift. I have a simple implementation here
https://gist.github.com/Azoy/5d294148c8b97d20b96ee64f434bb4f5\. This
implementation is a simple wrapper over existing random functions so
existing code bases will not be affected. Also, this approach introduces a
new random feature for Linux users that give them access to upper bounds,
as well as a lower bound for both Glibc and Darwin users. This change would
be implemented within Foundation.

I believe this simple change could have a very positive impact on new
developers learning Swift and experienced developers being able to write
single random declarations.

I’d like to hear about your ideas on this proposal, or any
implementation changes if need be.

- Alejando

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Alejandro · November 6, 2017, 3:15am

Yes, the user should only use `T.random`. I will go ahead and make `Random` internal and as a struct.

- Alejandro

···

On Nov 5, 2017, 9:08 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:
On Sun, Nov 5, 2017 at 8:21 PM, Alejandro Alonso <aalonso128@outlook.com<mailto:aalonso128@outlook.com>> wrote:
I can agree with mostly everything in here. I think `.random` on `RandomAccessCollection` should mimic the current design with `.first` and `.last` by returning an optional. In terms of the naming of this, we have to look at how python structures the call site. `random.choice([1, 2, 3, 4])` To me this reads, random choice within this array. This works because of how it’s called. With the proposed solution, we are calling to get a random element directly from the array. So I stick by with naming this random.

On the subject of bike shedding the names, I can agree to use `RandomNumberGenerator` whole heartily. As for `Randomizable`, I agree there might be a better name for this, but the question is what?

That is a good question. I don't like `RandomlySamplable`; it might be that `Randomizable` could be the least bad option.

One more comment that I neglected to make. You don't need to use an `enum` in order to make a type non-initializable. A `struct` can do the same with a private init. There are established singleton patterns for Swift and an `enum` with a single case is...atypical. I'm also not sure why this should be a public type at all. Why expose `Random.default.next()`? Shouldn't the user always use `T.random`? Afaict, `Random` can be an internal type `_Random` that the user never sees at all.

On Nov 5, 2017, 7:56 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com<mailto:xiaodi.wu@gmail.com>>, wrote:
I like this particular version. In particular, the choice of algorithms is, afaict, correct and that is incredibly important. I had overlooked that `arc4random` is cryptographically secure past a certain version of macOS, but you are absolutely right. I am also on board with the fatal error suggestion if random entropy is unavailable; I think it must be amply documented, though.

I do think, however, that you're overloading too many things into the word "random" when they're not the same. Take a look at Python, which is pretty widely used for numerics. There's `rand` and `random` for getting a random integer or floating-point value, and there's `choice` and `sample` for choosing one or more values out of a collection without replacement. These are sufficiently different tasks and don't all need to be called "random" or satisfy the same requirement of the same protocol. Put another way, it's absolutely *not* inconsistent for numeric types to have `random()` while collection types have a differently named method.

By contrast, I think the great length of text trying to justify naming all of these facilities `random` in order to parallel `first` and `last` shows how the proposed design is comparatively weaker. You have to argue that (a) `Int.random` shouldn't return an optional value because it'd be unwieldy, and therefore `(0..<5).random` shouldn't either because it would then be inconsistent; but (b) that `(0..<5).random` should be spelled and behave like `(0..<5).first` and `(0..<5).last` even though the user must handle empty collections totally differently because the return types are not the same. Either `(0..<5).random` should behave analogously to `first` and `last` or it should not. If it should, it only makes sense to return a result of type `T?`. After all, if a collection doesn't have a `first` item, then it can't have a `random` item. Put another way, having a `first` item is a prerequisite to having a randomly selectable item. The behavior of the Swift APIs would be very consistent if `first` returns `T?` but `random` returns `T`. However, I agree that unwrapping `Int.random` every time would be burdensome, and it would not make sense to have a type support `random` but not have any instantiable values; therefore, returning an optional value doesn't make sense, and it follows that `Int.random` *shouldn't* behave like `first` or `last`.

Once you stop trying to make what Python calls `rand/randint` and `choice/sample` have the same names, then finding a Swifty design for the distinct facilities becomes much easier, and it suggests a pretty elegant result (IMO):

[1, 2, 3, 4].choice // like `first` or `last`, this gets you a value of type Int?
[1, 2, 3, 4].sampling(2) // like `prefix(2)` or `suffix(2)`, this gets you a subsequence with at most two elements

Int.random // this gets you a random Int; or it may trap
Float.random // this gets you a random Float; or it may trap

With that, it also becomes clear why--and I agree with you--an independent `Int.random(in: 0..<5)` is not necessary. `(0..<5).choice` is fine, and it can now appropriately return a value of type `T?` because it no longer needs to parallel `Int.random`.

* * *

More in the bikeshedding arena, I take issue with some of the names:

- I reiterate my comment that `Randomizable` is not the best name. There are multiple dictionary definitions of "randomize" and one is "make unpredictable, unsystematic, or random in order or arrangement." Wikipedia gives at least five different contextual meanings for the word. What you're doing here is specifically **random sampling** and we can do better to clarify that, I think.

- While I agree that `RNG` can be cryptic, the alternative should be `RandomNumberGenerator` (as it's called in other languages); `RandomGenerator` is not quite accurate. Again, we're _consuming_ randomness to _generate_ numbers (or values of other type, based on the result of a generated number). We're not _generating_ randomness.

On Sun, Nov 5, 2017 at 6:33 PM, Alejandro Alonso <aalonso128@outlook.com<mailto:aalonso128@outlook.com>> wrote:
[Proposal] Random Unification by Azoy · Pull Request #760 · apple/swift-evolution · GitHub is the current API and proposed solution.

- Alejandro

On Nov 5, 2017, 6:18 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com<mailto:xiaodi.wu@gmail.com>>, wrote:
My comments are directed to the "more up-to-date" document that you just linked to in your reply to Jon. Is that one outdated? If so, can you send a link to the updated proposal and implementation for which you're soliciting feedback?

On Sun, Nov 5, 2017 at 6:12 PM, Alejandro Alonso <aalonso128@outlook.com<mailto:aalonso128@outlook.com>> wrote:
The proposal and implementation have the current updated API. The link I sent Jon was the one I brought up a few weeks ago which is outdated now. The proposal answers all of your questions. As for `.random` being a function, some would argue that it behaves in the same way as `.first` and `.last` which are properties.

- Alejandro

On Nov 5, 2017, 6:07 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com<mailto:xiaodi.wu@gmail.com>>, wrote:
A few quick thoughts:

I know that there's been some discussion that `(1...10).random` is the best spelling, but I'd like to push back on that suggestion. When I want a random number, I tend to think of the type I want first ("I want a random integer") and then a range ("I want a random integer between a and b"), not the other way around. My intuition is that `Int.random(in:)` will be more discoverable, both on that basis and because it is more similar to other languages' syntax (`Math.random` in JavaScript and `randint` in NumPy, for example). It also has the advantage that the type is explicit, which I think is particularly useful in this case because the value itself is, well, random.

I would also argue that, `random` is most appropriately a method and not a property; there's no hard and fast rule for this, but the fact that the result is stochastic suggests (to me) that it's not a "property" of the range (or, for that matter, of the type).

I would reiterate here my qualms about `Source` being the term used for a generator. These types are not a _source_ of entropy but rather a _consumer_ of entropy.

`UnsafeRandomSource` needs to be renamed; "unsafe" has a specific meaning in Swift--that is, memory safety, and this is not it. Moreover, it's questionable whether this protocol is useful in any sense. What useful generic algorithms can one write with such a protocol?

`XoroshiroRandom` cannot be seeded by any `Numeric` value; depending on the specific algorithm it needs a seed of a specific bit width. If you default the shared instance to being seeded with an `Int` then you will have to have distinct implementations for 32-bit and 64-bit platforms. This is unadvisable. On that note, your `UnsafeRandomSource` needs to have an associated type and not a generic `<T : Numeric>` for the seed.

The default random number generator should be cryptographically secure; however, it's not clear to me that it should be device random.

I agree with others that alternative random number generators other than the default RNG (and, if not default, possibly also the device RNG) should be accommodated by the protocol hierarchy but not necessarily supplied in the stdlib.

The term `Randomizable` means something specific which is not how it's used in your proposed protocol.

There's still the open question, not answered, about how requesting an instance of the hardware RNG behaves when there's insufficient or no entropy. Does it return nil, throw, trap, or wait? The proposed API does not clarify this point, although based on the method signature it cannot return nil or throw. Trapping might be acceptable but I'd be interested to hear your take as to why it is preferable.

On Sun, Nov 5, 2017 at 4:43 PM, Alejandro Alonso via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:
For the proof of concept, I had accidentally deleted that one. I have a more up to date one which was discussed a few weeks later. Swift Random Unification Design · GitHub

- Alejandro

On Nov 5, 2017, 4:37 PM -0600, Jonathan Hull <jhull@gbis.com<mailto:jhull@gbis.com>>, wrote:
Is there a link to the writeup? The one in the quote 404s.

Thanks,
Jon

On Nov 5, 2017, at 2:10 PM, Alejandro Alonso via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:

Hello once again Swift evolution community. I have taken the time to write up the proposal for this thread, and have provided an implementation for it as well. I hope to once again get good feedback on the overall proposal.

- Alejandro

On Sep 8, 2017, 11:52 AM -0500, Alejandro Alonso via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>>, wrote:
Hello swift evolution, I would like to propose a unified approach to `random()` in Swift. I have a simple implementation here https://gist.github.com/Azoy/5d294148c8b97d20b96ee64f434bb4f5\. This implementation is a simple wrapper over existing random functions so existing code bases will not be affected. Also, this approach introduces a new random feature for Linux users that give them access to upper bounds, as well as a lower bound for both Glibc and Darwin users. This change would be implemented within Foundation.

I believe this simple change could have a very positive impact on new developers learning Swift and experienced developers being able to write single random declarations.

I’d like to hear about your ideas on this proposal, or any implementation changes if need be.

- Alejando

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org<mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Alejandro · November 6, 2017, 3:32am

An error that came up in development, default arguments cannot use internal types, `shuffle(using:)` and `shuffled(using:)`. To remedy this issue I would have to add another method requirement with no arguments that gives a default implementation of using `_Random`. This would work, but would appear in stdlib documentation as two separate functions.