[Proposal] Random Unification

Ben_Cohen · October 5, 2017, 4:30pm

I don’t see much of a case for making it it random(in: SpecificCollection) instead of genericCollection.random().

One possible reason is if you exclude half-open ranges, only having CountableClosedRange, then you don’t have to account for the possibility of an empty collection (via an optional or a trap) because they cannot be empty. But closed ranges aren’t the currency type – half-open ranges are. So it’d hit usability if you have to convert from one to t'other often.

Other possibility is discovery. But given the common use case is “random element from collection”, I don’t expect this to be an issue as it will quickly become common knowledge that this feature is available.

···

On Oct 4, 2017, at 9:12 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

```
extension Int {
static func random(in range: Countable{Closed}Range<Int>) -> Int
}

Nice. Should these be initializers like:

extension Int {
init(randomIn: Countable{Closed}Range<Int>)
}

Ben_Cohen · October 6, 2017, 6:47pm

Instead of methods on specific concrete types, I’d prefer to have a Sequence or Collection-conforming type that you could compose with existing operations like init from a Sequence or replaceSubrange:

// Just a rough sketch...
struct RandomNumbers<T: FixedWidthInteger>: RandomAccessCollection {
let _range: ClosedRange<T>
let _count: Int

    var startIndex: Int { return 0 }
    var endIndex: Int { return _count }
    subscript(i: Int) -> T { return _range.random() }

    init(of: T.Type, count: Int) {
        _range = T.min...T.max
        _count = count
    }
    init(count: Int) {
        self = RandomNumbers(of: T.self, count: count)
    }
    // possibly a constructor that takes a specific range
    init<R: RangeExpression>(in: R, count: Int) where R.Bound == T {
        // etc
    }
}

for x in RandomNumbers(of: UInt8.self, count: 10) {
print(x)
}

let d = Data(RandomNumbers(count: 10))

var a = Array(0..<10)
a.replaceSubrange(5..., with: RandomNumbers(count: 5))

The tricky thing is that making it a Collection violates the (admittedly unwritten) rule that a collection should be multi-pass, in the sense that multiple passes produce different numbers. Then again, so can lazy collections with impure closures. At least it isn’t variable length (unlike a lazy filter). And making it a collection of defined size means it can be used efficiently with things like append and replaceSubrange which reserve space in advance

···

On Oct 4, 2017, at 19:26, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

extension Data {
static func random(byteCount: Int) -> Data
}

davedelong · November 17, 2017, 4:15pm

I agree with this. The only app-based use case I can think of for a full-range random value would be to construct a unique temporary file name. But that’s easily replaced with UUID().uuidString or mkstemp() (or whatever it’s called).

Dave

···

Sent from my iPhone

On Nov 17, 2017, at 9:10 AM, Gwendal Roué via swift-evolution <swift-evolution@swift.org> wrote:

Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution <swift-evolution@swift.org> a écrit :

If we go back to your example, you never call FixedWidthInteger.random either, you call range.random. Does this mean integer types shouldn’t have .random? No, because it means get a random number from it’s internal range (alias to (min ... max).random). I think we can all agree that Integer.random is a nicer api than making a range of its bounds. The same goes for Date.random and Color.random.

- Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life (backend & frontend app developer) that I have used a pure random value from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless. Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers. And also Array.randomElement(), Array.shuffled(), etc, because there are plenty naive and bad algorithms for those simple tasks.

Gwendal Roué

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jon_Hull · November 17, 2017, 11:30pm

Just to play devil’s advocate, wouldn’t they see random(in:) in the autocomplete when typing ‘random’?

Thanks,
Jon

···

On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

> Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :
>
> If we go back to your example, you never call FixedWidthInteger.random either, you call range.random. Does this mean integer types shouldn’t have .random? No, because it means get a random number from it’s internal range (alias to (min ... max).random). I think we can all agree that Integer.random is a nicer api than making a range of its bounds. The same goes for Date.random and Color.random.
>
> - Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life (backend & frontend app developer) that I have used a pure random value from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless. Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers. And also Array.randomElement(), Array.shuffled(), etc, because there are plenty naive and bad algorithms for those simple tasks.

Certainly it's hard to defend Date.random (and yes, it might be useful for a fuzzer, but that's a very niche use case--and in that case the fuzzer should probably also generate invalid/non-existent dates, which surely Date.random should not do). But actually, Int.random followed by % is the much bigger issue and a very good cautionary tale for why T.random is not a good idea. Swift should help users do the correct thing, and getting a random value across the full domain and computing an integer modulus is never the correct thing to do because of modulo bias, yet it's a very common error to make. We are much better off eliminating this API and encouraging use of the correct API, thereby reducing the likelihood of users making this category of error.

If (and I agree with this) the range-based notation is less intuitive (0..<10.random is certainly less discoverable than Int.random), then we ought to offer an API in the form of `Int.random(in:)` but not `Int.random`. This does not preclude a `Collection.random` API as Alejandro proposes, of course, and that has independent value as Gwendal says.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

dwaite · November 17, 2017, 11:16pm

<snip>

Certainly it's hard to defend Date.random (and yes, it might be useful for a fuzzer, but that's a very niche use case--and in that case the fuzzer should probably also generate invalid/non-existent dates, which surely Date.random should not do). But actually, Int.random followed by % is the much bigger issue and a very good cautionary tale for why T.random is not a good idea. Swift should help users do the correct thing, and getting a random value across the full domain and computing an integer modulus is never the correct thing to do because of modulo bias, yet it's a very common error to make. We are much better off eliminating this API and encouraging use of the correct API, thereby reducing the likelihood of users making this category of error.

+1.

-DW

···

On Nov 17, 2017, at 4:09 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Alejandro · November 17, 2017, 5:03pm

I can think of many cases where you actually need a random color, not because it doesn’t look good, but because you need a color you don’t have.

The proposed solution most definitely includes a range based random api for those who don’t need the full domain (0 ..< 10).random. What I’m trying to express is that some people need the full domain (Type.random) and some need the range based api ((min ..< max).random) (or ...). The proposed solution covers both of these and provides solutions to each.

- Alejandro

···

El nov. 17, 2017, a la(s) 10:10, Gwendal Roué <gwendal.roue@gmail.com> escribió:

Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution <swift-evolution@swift.org> a écrit :

If we go back to your example, you never call FixedWidthInteger.random either, you call range.random. Does this mean integer types shouldn’t have .random? No, because it means get a random number from it’s internal range (alias to (min ... max).random). I think we can all agree that Integer.random is a nicer api than making a range of its bounds. The same goes for Date.random and Color.random.

- Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life (backend & frontend app developer) that I have used a pure random value from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless. Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers. And also Array.randomElement(), Array.shuffled(), etc, because there are plenty naive and bad algorithms for those simple tasks.

Gwendal Roué

xwu · November 17, 2017, 11:08pm

Certainly it's hard to defend Date.random (and yes, it might be useful for
a fuzzer, but that's a very niche use case--and in that case the fuzzer
should probably also generate invalid/non-existent dates, which surely
Date.random should not do). But actually, Int.random followed by % is the
much bigger issue and a very good cautionary tale for why T.random is not a
good idea. Swift should help users do the correct thing, and getting a
random value across the full domain and computing an integer modulus is
never the correct thing to do because of modulo bias, yet it's a very
common error to make. We are much better off eliminating this API and
encouraging use of the correct API, thereby reducing the likelihood of
users making this category of error.

If (and I agree with this) the range-based notation is less intuitive
(0..<10.random is certainly less discoverable than Int.random), then we
ought to offer an API in the form of `Int.random(in:)` but not
`Int.random`. This does not preclude a `Collection.random` API as Alejandro
proposes, of course, and that has independent value as Gwendal says.

···

On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution < swift-evolution@swift.org> wrote:

> Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution < > swift-evolution@swift.org> a écrit :
>
> If we go back to your example, you never call FixedWidthInteger.random
either, you call range.random. Does this mean integer types shouldn’t have
.random? No, because it means get a random number from it’s internal range
(alias to (min ... max).random). I think we can all agree that
Integer.random is a nicer api than making a range of its bounds. The same
goes for Date.random and Color.random.
>
> - Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life
(backend & frontend app developer) that I have used a pure random value
from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better
arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and random
points in time do not match any physical use case.

This does not mean that random values from the full domain are useless. Of
course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers.
And also Array.randomElement(), Array.shuffled(), etc, because there are
plenty naive and bad algorithms for those simple tasks.

Brent_Royal-Gordon · November 18, 2017, 1:11am

But actually, Int.random followed by % is the much bigger issue and a very good cautionary tale for why T.random is not a good idea. Swift should help users do the correct thing, and getting a random value across the full domain and computing an integer modulus is never the correct thing to do because of modulo bias, yet it's a very common error to make. We are much better off eliminating this API and encouraging use of the correct API, thereby reducing the likelihood of users making this category of error.

Amen.

If (and I agree with this) the range-based notation is less intuitive (0..<10.random is certainly less discoverable than Int.random), then we ought to offer an API in the form of `Int.random(in:)` but not `Int.random`. This does not preclude a `Collection.random` API as Alejandro proposes, of course, and that has independent value as Gwendal says.

If we're not happy with the range syntax, maybe we should put `random(in:)`-style methods on the RNG protocol as extension methods instead. Then there's a nice, uniform style:

  let diceRoll = rng.random(in: 1...6)
  let card = rng.random(in: deck)
  let isHeads = rng.random(in: [true, false])
  let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint overload

The only issue is that this makes the default RNG's name really important. Something like:

DefaultRandom.shared.random(in: 1...6)

Will be a bit of a pain for users.

Maybe we call the default RNG instance `random`, and then give the `random(in:)` methods another name, like `choose(in:)`?

  let diceRoll = random.choose(in: 1...6)
  let card = random.choose(in: deck)
  let isHeads = random.choose(in: [true, false])
  let probability = random.choose(in: 0.0...1.0)

  let diceRoll = rng.choose(in: 1...6)
  let card = rng.choose(in: deck)
  let isHeads = rng.choose(in: [true, false])
  let probability = rng.choose(in: 0.0...1.0)

This would allow us to keep the default RNG's type private and expose it only as an existential—which means more code will treat RNGs as black boxes, and people will extend the RNG protocol instead of the default RNG struct—while also putting our default random number generator under the name `random`, which is probably where people will look for such a thing.

···

On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

Alejandro · November 17, 2017, 6:02pm

I agree with all your points, but that’s if we were implementing a randomness source. What we’re providing is a random number generator that takes some randomness from a source and uses that to give the user an actual number it knows about. That differs from a randomness source because a generator doesn’t create randomness, it simply uses it to generate a random number. For this reason I think that rngs should not write to a pointer.

- Alejandro

···

El nov. 17, 2017, a la(s) 11:29, David Waite <david@alkaline-solutions.com<mailto:david@alkaline-solutions.com>> escribió:

On Nov 16, 2017, at 8:12 PM, Alejandro Alonso via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:

While this could work, I don’t believe this aligns with Swift. SecRandomCopyBytes and arc4random_buf do it this way because of the languages they were built in, and for SecRandomCopyBytes, it needs to also return an error (in the form of its return value). For custom generators this doesn’t make sense to me because it seems as if each generator will have the same code to return an integer of a different size from what it’ll be producing. I really like Xiaodi’s solution to explicitly state what type of integer a custom generator will return, and as a default implementation, we provide a way to transform that.

- Alejandro

The random source does not know about integers or floats or colors. It just provides randomness. Higher level code is what determines for instance how to generate (for example) a random number between 1 and 27 with equal probability - which could theoretically require more than a 64 bits of randomness (with probability of that depending on luck and the algorithm used)

A double is not a uniform distribution, so likewise simply casting a 64 bit integer value to a double would not yield appropriate results (you’d have a significant chance of NaN values, for instance)

Thats why I would recommend having the random source just be a sequence of bytes. The higher level API choosing random elements from a Strideable or shuffling an array *should* be the interfaces that developers use, rather than directly reading from the random source

I proposed “read” below because it is compatible with the signature on InputStream, which both would allow you to easily bridge /dev/random or /dev/urandom in as well as have predefined data as a source of randomness for predictable testing.

Predictable randomness and multiple sources of randomness are important in a few scenarios, including gaming where the same “random” choices need to be made locally for each player to keep the games in sync.

-DW

On Nov 15, 2017, 11:26 AM -0600, Nate Cook <natecook@apple.com<mailto:natecook@apple.com>>, wrote:
On Nov 13, 2017, at 7:38 PM, Xiaodi Wu <xiaodi.wu@gmail.com<mailto:xiaodi.wu@gmail.com>> wrote:

On Mon, Nov 13, 2017 at 7:12 PM, Alejandro Alonso <aalonso128@outlook.com<mailto:aalonso128@outlook.com>> wrote:
After thinking about this for a while, I don’t agree with with an associated type on RandomNumberGenerator. I think a generic FixedWidthInteger & UnsignedInteger should be sufficient. If there were an associated type, and the default for Random was UInt32, then there might be some arguments about allowing Double to utilize the full 64 bit precision. We could make Random32 and Random64, but I think people will ask why there isn’t a Random8 or Random16 for those bit widths. The same could also be said that any experienced developer would know that his PRNG would be switched if he asked for 32 bit or 64 bit.

I don't understand. Of course, Double would require 64 bits of randomness. It would obtain this by calling `next()` as many times as necessary to obtain the requisite number of bits.

At base, any PRNG algorithm yields some fixed number of bits on each iteration. You can certainly have a function that returns an arbitrary number of random bits (in fact, I would recommend that such an algorithm be a protocol extension method on RandomNumberGenerator), but it must be built on top of a function that returns a fixed number of bits, where that number is determined on a per-algorithm basis. Moreover--and this is important--generating a random unsigned integer of arbitrary bit width in a sound way is actually subtly _different_ from generating a floating-point value of a certain bit width, and I'm not sure that one can be built on top of the other. Compare, for example:

github.com

xwu/NumericAnnex/blob/c962760bf974a84ec57d8c5e94c91f06584e2453/Sources/PRNG.swift#L157


      
              guard result == errSecSuccess else { return nil }
          #endif
              return value
            }
          }
          
          
extension PRNG {
            /// Generates a pseudo-random unsigned integer of type `T` in the range from 0
            /// to `2 ** min(bitCount, T.bitWidth)` (exclusive), where `**` is the
            /// exponentiation operator.
            public func _random<T : FixedWidthInteger & UnsignedInteger>(
              _: T.Type = T.self, bitCount: Int = T.bitWidth
            ) -> T {
              let randomBitWidth = Self._randomBitWidth
              let bitCount = Swift.min(bitCount, T.bitWidth)
              if T.bitWidth == Element.bitWidth &&
                randomBitWidth == Element.bitWidth &&
                bitCount == T.bitWidth {
                // It is an awkward way of spelling `next()`, but it is necessary.
                guard let next = first(where: { _ in true }) else { fatalError() }
                return T(truncatingIfNeeded: next)

github.com

xwu/NumericAnnex/blob/c962760bf974a84ec57d8c5e94c91f06584e2453/Sources/PRNG.swift#L316


      
            }
          }
          
          
// FIXME: If `FloatingPoint.init(_: FixedWidthInteger)` is added
          // then it becomes possible to remove the constraint `Element == UInt64`.
          
          
extension PRNG where Element == UInt64 {
            /// Generates a pseudo-random binary floating-point value of type `T` in the
            /// range from 0 to 1 (exclusive) with `min(bitCount, T.significandBitCount)`
            /// bits of precision.
            public func _random<T : BinaryFloatingPoint>(
              _: T.Type = T.self, bitCount: Int = T.significandBitCount
            ) -> T {
              let bitCount = Swift.min(bitCount, T.significandBitCount)
              let (quotient, remainder) =
                bitCount.quotientAndRemainder(dividingBy: Self._randomBitWidth)
              let k = Swift.max(1, remainder == 0 ? quotient : quotient + 1)
              let step = T(Self.max - Self.min)
              let initial = (0 as T, 1 as T)
              // Call `next()` exactly `k` times.
              let (dividend, divisor) = prefix(k).reduce(initial) { partial, next in

(These are essentially Swift versions of C++ algorithms.)

Basically, what I'm saying is that RandomNumberGenerator needs a `next()` method that returns a fixed number of bits, and extension methods that build on that to return T : FixedWidthInteger & UnsignedInteger of arbitrary bit width or U : BinaryFloatingPoint of an arbitrary number of bits of precision. Each individual RNG does not need to reimplement the latter methods, just a method to return a `next()` value of a fixed number of bits. You are welcome to use my implementation.

An alternative to this is to have the random generator write a specified number of bytes to a pointer’s memory, as David Waite and others have suggested. This is same way arc4random_buf and SecRandomCopyBytes are implemented. Each random number generator could then choose the most efficient way to provide the requested number of bytes. The protocol could look something like this:

protocol RandomNumberGenerator {
/// Writes the specified number of bytes to the given pointer’s memory.
func read(into p: UnsafeMutableRawPointer, bytes: Int)
}

This is less user-friendly than having a next() method, but I think that’s a good thing—we very much want people who need a random value to use higher-level APIs and just pass the RNG as a parameter when necessary.

Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org<mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · November 18, 2017, 12:10am

That’s the point. Using “Int.random(in: 0...9)” gives you a result that has
an equal chance of being any integer between zero and nine, while
“Int.random % 9” does not.

···

On Fri, Nov 17, 2017 at 17:30 Jonathan Hull <jhull@gbis.com> wrote:

Just to play devil’s advocate, wouldn’t they see random(in:) in the
autocomplete when typing ‘random’?

Thanks,
Jon

On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution < > swift-evolution@swift.org> wrote:

> Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution < >> swift-evolution@swift.org> a écrit :
>
> If we go back to your example, you never call FixedWidthInteger.random
either, you call range.random. Does this mean integer types shouldn’t have
.random? No, because it means get a random number from it’s internal range
(alias to (min ... max).random). I think we can all agree that
Integer.random is a nicer api than making a range of its bounds. The same
goes for Date.random and Color.random.
>
> - Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life
(backend & frontend app developer) that I have used a pure random value
from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better
arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and
random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless.
Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers.
And also Array.randomElement(), Array.shuffled(), etc, because there are
plenty naive and bad algorithms for those simple tasks.

Certainly it's hard to defend Date.random (and yes, it might be useful for
a fuzzer, but that's a very niche use case--and in that case the fuzzer
should probably also generate invalid/non-existent dates, which surely
Date.random should not do). But actually, Int.random followed by % is the
much bigger issue and a very good cautionary tale for why T.random is not a
good idea. Swift should help users do the correct thing, and getting a
random value across the full domain and computing an integer modulus is
never the correct thing to do because of modulo bias, yet it's a very
common error to make. We are much better off eliminating this API and
encouraging use of the correct API, thereby reducing the likelihood of
users making this category of error.

If (and I agree with this) the range-based notation is less intuitive
(0..<10.random is certainly less discoverable than Int.random), then we
ought to offer an API in the form of `Int.random(in:)` but not
`Int.random`. This does not preclude a `Collection.random` API as Alejandro
proposes, of course, and that has independent value as Gwendal says.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Alejandro · November 23, 2017, 4:55am

I pushed some updates to the proposal with a reflected API, but I do not agree that we should rid the API of T.random just because some users will misuse it. I think the correct solution here is to include T.random(in:) (which does not return an optional making it not a second typing of (min ... max).random). Like Jonathon said, autocomplete will display both of these and users will be able to select random(in:). I also disagree that T.random is _always_ followed by modulo because if we look at arc4random() it’s range is the whole domain of UInt32. Users don’t put a modulo here because they know the correct way to do it is through arc4random_uniform(), either through online tutorials, or by reading documentation. If we did get rid of T.random, users who want a random byte for instance would have to write UInt8.random(in: 0 … 255) every time. Developers will make wrappers over this. I believe the correct solution is to keep T.random for those who won’t misuse it and T.random(in:) for those who need to a random value within a range.

- Alejandro

If we go back to your example, you never call FixedWidthInteger.random either, you call range.random. Does this mean integer types shouldn’t have .random? No, because it means get a random number from it’s internal range (alias to (min ... max).random). I think we can all agree that Integer.random is a nicer api than making a range of its bounds. The same goes for Date.random and Color.random.

- Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life (backend & frontend app developer) that I have used a pure random value from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless. Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers. And also Array.randomElement(), Array.shuffled(), etc, because there are plenty naive and bad algorithms for those simple tasks.

Certainly it's hard to defend Date.random (and yes, it might be useful for a fuzzer, but that's a very niche use case--and in that case the fuzzer should probably also generate invalid/non-existent dates, which surely Date.random should not do). But actually, Int.random followed by % is the much bigger issue and a very good cautionary tale for why T.random is not a good idea. Swift should help users do the correct thing, and getting a random value across the full domain and computing an integer modulus is never the correct thing to do because of modulo bias, yet it's a very common error to make. We are much better off eliminating this API and encouraging use of the correct API, thereby reducing the likelihood of users making this category of error.

If (and I agree with this) the range-based notation is less intuitive (0..<10.random is certainly less discoverable than Int.random), then we ought to offer an API in the form of `Int.random(in:)` but not `Int.random`. This does not preclude a `Collection.random` API as Alejandro proposes, of course, and that has independent value as Gwendal says.

···

On Nov 17, 2017, 5:09 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:
On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:

Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> a écrit :

davedelong · October 4, 2017, 8:23pm

Couldn’t the generators just do seeding at init time, and then throw or return nil if it fails, for whatever reason?

Dave

···

On Oct 4, 2017, at 12:28 PM, Jacob Williams via swift-evolution <swift-evolution@swift.org> wrote:

I agree with Dave’s assertion that this should be in a separate Random library. Not every project needs random numbers and there could possibly be a SecureRandom that exclusively uses CSPRNGs for it’s functionality.

I also agree that trapping is not a preferred behavior. Optionals are slightly better, but in many instances the developer doesn’t care if the random number is secure or non-reproducible. They really just want some number within a specified range that “seems” random enough for that instant. SecureRandom numbers could be optional or trap as lacking entropy might significantly effect the usage of the random number.

On Oct 4, 2017, at 10:33 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Anything that hasn't killed the process seems fine, and you have to start from `main` for anything else. On iOS, you can be suspended at any time, but the program will only continue from the point that it was suspended if it hasn't been torn down; otherwise, it has to restart from the beginning and reload the known UI state that it is responsible for saving. Unless we go out of our way to destroy the PRNG, it won't go away from under the program's feet. I'm not aware of any OS that will core dump programs on shutdown and try to rehydrate them on reboot.

Félix

Le 4 oct. 2017 à 03:05, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> a écrit :

On Wed, Oct 4, 2017 at 04:55 Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
Seems like the API would be actively hiding he possibility of failure so that you’d have to be in the know to prevent it. Those who don’t know about it would be hunting down a ghost as they’re trying to debug, especially if their program crashes rarely, stochastically, and non-reproducibly because a third party library calls random() in code they can’t see. I think this makes trapping the least acceptable of all options.

I agree with Felix’s concern, which is why I brought up the question, but ultimately the issue is unavoidable. It’s not down to global instance or not. If your source of random numbers is unseedable and may mix in additional entropy at any time, then it may fail at any time because when a hardware restart might happen may be transparent to the process. The user must know about this or else we are laying a trap (pun intended).

On Wed, Oct 4, 2017 at 04:49 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:
@Xiaodi: What do you think of the possibility of trapping in cases of low entropy, and adding an additional global function that checks for entropy so that conscientious programmers can avoid the trap and provide an alternative (or error message)?

Thanks,
Jon

On Oct 4, 2017, at 2:41 AM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

On Wed, Oct 4, 2017 at 02:39 Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:
I'm really not enthusiastic about `random() -> Self?` or `random() throws -> Self` when the only possible error is that some global object hasn't been initialized.

The idea of having `random` straight on integers and floats and collections was to provide a simple interface, but using a global CSPRNG for those operations comes at a significant usability cost. I think that something has to go:

Drop the random methods on FixedWidthInteger, FloatingPoint
...or drop the CSPRNG as a default
Drop the optional/throws, and trap on error

I know I wouldn't use the `Int.random()` method if I had to unwrap every single result, when getting one non-nil result guarantees that the program won't see any other nil result again until it restarts.

From the perspective of an app that can be suspended and resumed at any time, “until it restarts” could be as soon as the next invocation of `Int.random()`, could it not?

Félix

Le 3 oct. 2017 à 23:44, Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> a écrit :

I like the idea of splitting it into 2 separate “Random” proposals.

The first would have Xiaodi’s built-in CSPRNG which only has the interface:

On FixedWidthInteger:
  static func random()throws -> Self
  static func random(in range: ClosedRange<Self>)throws -> Self

On Double:
  static func random()throws -> Double
  static func random(in range: ClosedRange<Double>)throws -> Double

(Everything else we want, like shuffled(), could be built in later proposals by calling those functions)

The other option would be to remove the ‘throws’ from the above functions (perhaps fatalError-ing), and provide an additional function which can be used to check that there is enough entropy (so as to avoid the crash or fall back to a worse source when the CSPRNG is unavailable).

Then a second proposal would bring in the concept of RandomSources (whatever we call them), which can return however many random bytes you ask for… and a protocol for types which know how to initialize themselves from those bytes. That might be spelled like 'static func random(using: RandomSource)->Self'. As a convenience, the source would also be able to create FixedWidthIntegers and Doubles (both with and without a range), and would also have the coinFlip() and oneIn(UInt)->Bool functions. Most types should be able to build themselves off of that. There would be a default source which is built from the first protocol.

I also really think we should have a concept of Repeatably-Random as a subprotocol for the second proposal. I see far too many shipping apps which have bugs due to using arc4Random when they really needed a repeatable source (e.g. patterns and lines jump around when you resize things). If it was an easy option, people would use it when appropriate. This would just mean a sub-protocol which has an initializer which takes a seed, and the ability to save/restore state (similar to CGContexts).

The second proposal would also include things like shuffled() and shuffled(using:).

Thanks,
Jon

On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <aalonso128@outlook.com <mailto:aalonso128@outlook.com>> wrote:

I really like the schedule here. After reading for a while, I do agree with Brent that stdlib should very primitive in functionality that it provides. I also agree that the most important part right now is designing the internal crypto on which the numeric types use to return their respected random number. On the discussion of how we should handle not enough entropy with the device random, from a users perspective it makes sense that calling .random should just give me a random number, but from a developers perspective I see Optional being the best choice here. While I think blocking could, in most cases, provide the user an easier API, we have to do this right and be safe here by providing a value that indicates that there is room for error here. As for the generator abstraction, I believe there should be a bare basic protocol that sets a layout for new generators and should be focusing on its requirements.

Whether or not RandomAccessCollection and MutableCollection should get .random and .shuffle/.shuffled in this first proposal is completely up in the air for me. It makes sense, to me, to include the .random in this proposal and open another one .shuffle/.shuffled, but I can see arguments that should say we create something separate for these two, or include all of it in this proposal.

- Alejandro

On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>>, wrote:

On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> a écrit :

On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <felixcloutier@icloud.com <mailto:felixcloutier@icloud.com>> wrote:

It's possible to use a CSPRNG-grade algorithm and seed it once to get a reproducible sequence, but when you use it as a CSPRNG, you typically feed entropy back into it at nondeterministic points to ensure that even if you started with a bad seed, you'll eventually get to an alright state. Unless you keep track of when entropy was mixed in and what the values were, you'll never get a reproducible CSPRNG.

We would give developers a false sense of security if we provided them with CSPRNG-grade algorithms that we called CSPRNGs and that they could seed themselves. Just because it says "crypto-secure" in the name doesn't mean that it'll be crypto-secure if it's seeded with time(). Therefore, "reproducible" vs "non-reproducible" looks like a good distinction to me.

I disagree here, in two respects:

First, whether or not a particular PRNG is cryptographically secure is an intrinsic property of the algorithm; whether it's "reproducible" or not is determined by the published API. In other words, the distinction between CSPRNG vs. non-CSPRNG is important to document because it's semantics that cannot be deduced by the user otherwise, and it is an important one for writing secure code because it tells you whether an attacker can predict future outputs based only on observing past outputs. "Reproducible" in the sense of seedable or not is trivially noted by inspection of the published API, and it is rather immaterial to writing secure code.

Cryptographically secure is not a property that I'm comfortable applying to an algorithm. You cannot say that you've made a cryptographically secure thing just because you've used all the right algorithms: you also have to use them right, and one of the most critical components of a cryptographically secure PRNG is its seed.

A cryptographically secure algorithm isn’t sufficient, but it is necessary. That’s why it’s important to mark them as such. If I'm a careful developer, then it is absolutely important to me to know that I’m using a PRNG with a cryptographically secure algorithm, and that the particular implementation of that algorithm is correct and secure.

It is a *feature* of a lot of modern CSPRNGs that you can't seed them:

You cannot seed or add entropy to std::random_device

Although std::random_device may in practice be backed by a software CSPRNG, IIUC, the intention is that it can provide access to a hardware non-deterministic source when available.

You cannot seed or add entropy to CryptGenRandom
You can only add entropy to /dev/(u)random
You can only add entropy to BSD's arc4random

Ah, I see. I think we mean different things when we say PRNG. A PRNG is an entirely deterministic algorithm; the output is non-random and the algorithm itself requires no entropy. If a PRNG is seeded with a random sequence of bits, its output can "appear" to be random. A CSPRNG is a PRNG that fulfills certain criteria such that its output can be appropriate for use in cryptographic applications in place of a truly random sequence *if* the input to the CSPRNG is itself random.

The examples you give above *incorporate* a CSPRNG, environment entropy, and a set of rules about when to mix in additional entropy in order to produce output indistinguishable from a random sequence, but they are *not* themselves really *pseudorandom* generators because they are not deterministic. Not only do such sources of random numbers not require an interface to allow seeding, they do not even have to be publicly instantiable: Swift need only expose a single thread-safe instance (or an instance per thread) of a single type that provides access to CryptGenRandom/urandom/arc4random, since after all the output of multiple instances of that type should be statistically indistinguishable from the output of only one.

What I was trying to respond to, by contrast, is the design of a hierarchy of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource : RandomSource) and the appropriate APIs to expose on each. This is entirely inapplicable to your examples. It stands to reason that a non-instantiable source of random numbers does not require a protocol of its own (a hypothetical RNG : CSPRNG), since there is no reason to implement (if done correctly) more than a single publicly non-instantiable singleton type that could conform to it. For that matter, the concrete type itself probably doesn't need *any* public API at all. Instead, extensions to standard library types such as Int that implement conformance to the protocol that Alejandro names "Randomizable" could call internal APIs to provide all the necessary functionality, and third-party types that need to conform to "Randomizable" could then in turn use `Int.random()` or `Double.random()` to implement their own conformance. In fact, the concrete random number generator type doesn't need to be public at all. All public interaction could be through APIs such as `Int.random()`.

Just because we can expose a seed interface doesn't mean we should, and in this case I believe that it would go against the prime objective of providing secure random numbers.

If we're talking about a Swift interface to a non-deterministic source of random numbers like urandom or arc4random, then, as I write above, not only do I agree that it doesn't need to be seedable, it also does not need to be instantiable at all, does not need to conform to a protocol that specifically requires the semantics of a non-deterministic source, does not need to expose any public interface whatsoever, and doesn't itself even need to be public. (Does it even need to be a type, as opposed to simply a free function?)

In fact, having reasoned through all of this, we can split the design task into two. The most essential part, which definitely should be part of the stdlib, would be an internal interface to a cryptographically secure platform-specific entropy source, a public protocol named something like Randomizable (to be bikeshedded), and the appropriate implementations on Boolean, binary integer, and floating point types to conform them to Randomizable so that users can write `Bool.random()` or `Int.random()`. The second part, which can be a separate proposal or even a standalone core library or third-party library, would be the protocols and concrete types that implement pseudorandom number generators, allowing for reproducible pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being the primitives on which `Int.random()` is implemented; `Int.random()` should be the standard library primitive which allows PRNGs and CSPRNGs to be seeded.

If your attacker can observe your seeding once, chances are that they can observe your reseeding too; then, they can use their own implementation of the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom sequence whether or not Swift exposes any particular API.

On Linux, the random devices are initially seeded with machine-specific but rather invariant data that makes /dev/urandom spit out predictable numbers. It is considered "seeded" after a root process writes POOL_SIZE bytes to it. On most implementations, this initial seed is stored on disk: when the computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves it in a file, and the contents of that file is loaded back into /dev/urandom when the computer starts. A scenario where someone can read that file is certainly not less likely than a scenario where /dev/urandom was deleted. That doesn't mean that they have kernel code execution or that they can pry into your process, but they have a good shot at guessing your seed and subsequent RNG results if no stirring happens.

Sorry, I don't understand what you're getting at here. Again, I'm talking about deterministic algorithms, not non-deterministic sources of random numbers.

Secondly, I see no reason to justify the notion that, simply because a PRNG is cryptographically secure, we ought to hide the seeding initializer (because one has to exist internally anyway) from the public. Obviously, one use case for a deterministic PRNG is to get reproducible sequences of random-appearing values; this can be useful whether the underlying algorithm is cryptographically secure or not. There are innumerably many ways to use data generated from a CSPRNG in non-cryptographically secure ways and omitting or including a public seeding initializer does not change that; in other words, using a deterministic seed for a CSPRNG would be a bad idea in certain applications, but it's a deliberate act, and someone who would mistakenly do that is clearly incapable of *using* the output from the PRNG in a secure way either; put a third way, you would be hard pressed to find a situation where it's true that "if only Swift had not made the seeding initializer public, this author would have written secure code, but instead the only security hole that existed in the code was caused by the availability of a public seeding initializer mistakenly used." The point of having both explicitly instantiable PRNGs and a layer of simpler APIs like "Int.random()" is so that the less experienced user can get the "right thing" by default, and the experienced user can customize the behavior; any user that instantiates his or her own ChaCha20Random instance is already calling for the power user interface; it is reasonable to expose the underlying primitive operations (such as seeding) so long as there are legitimate uses for it.

Nothing prevents us from using the same algorithm for a CSPRNG that is safely pre-seeded and a PRNG that people seed themselves, mind you. However, especially when it comes to security, there is a strong responsibility to drive developers into a pit of success: the most obvious thing to do has to be the right one, and suggesting to cryptographically-unaware developers that they have everything they need to manage their own seed is not a step in that direction.

I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling it cryptographically-secure, because it is not unless you know what to do with it. It is emphatically not far-fetched to imagine a developer who thinks that they can outdo the standard library by using their own ChaCha20Random instance after it's been seeded with time() if we let them know that it's "cryptographically secure". If you're a power user and you don't like the default, known-good CSPRNG, then you're hopefully good enough to know that ChaCha20 is considered a cryptographically-secure algorithm without help labels from the language, and you know how to operate it.

I'm fully aware of the myths surrounding /dev/urandom and /dev/random. /dev/urandom might never run out, but it is also possible for it not to be initialized at all, as in the case of some VM setups. In some older versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems where it is available, it can also be deleted, since it is a file. The point is, all of these scenarios cause an error during seeding of a CSPRNG. The question is, how to proceed in the face of inability to access entropy. We must do something, because we cannot therefore return a cryptographically secure answer. Rare trapping on invocation of Int.random() or permanently waiting for a never-to-be-initialized /dev/urandom would be terrible to debug, but returning an optional or throwing all the time would be verbose. How to design this API?

If the only concern is that the system might not be initialized enough, I'd say that whatever returns an instance of a global, framework-seeded CSPRNG should return an Optional, and the random methods that use the global CSPRNG can trap and scream that the system is not initialized enough. If this is a likely error for you, you can check if the CSPRNG exists or not before jumping.

Also note that there is only one system for which Swift is officially distributed (Ubuntu 14.04) on which the only way to get entropy from the OS is to open a random device and read from it.

Again, I'm not only talking about urandom. As far as I'm aware, every API to retrieve cryptographically secure sequences of random bits on every platform for which Swift is distributed can potentially return an error instead of random bits. The question is, what design for our API is the most sensible way to deal with this contingency? On rethinking, I do believe that consistently returning an Optional is the best way to go about it, allowing the user to either (a) supply a deterministic fallback; (b) raise an error of their own choosing; or (c) trap--all with a minimum of fuss. This seems very Swifty to me.

* What should the default CSPRNG be? There are good arguments for using a cryptographically secure device random. (In my proposed implementation, for device random, I use Security.framework on Apple platforms (because /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). On Linux platforms, I would prefer to use getrandom() and avoid using file system APIs, but getrandom() is new and unsupported on some versions of Ubuntu that Swift supports. This is an issue in and of itself.) Now, a number of these facilities strictly limit or do not guarantee availability of more than a small number of random bytes at a time; they are recommended for seeding other PRNGs but *not* as a routine source of random numbers. Therefore, although device random should be available to users, it probably shouldn’t be the default for the Swift standard library as it could have negative consequences for the system as a whole. There follows the significant task of implementing a CSPRNG correctly and securely for the default PRNG.

Theo give a talk a few years ago <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these problems are approached in LibreSSL.

Certainly, we can learn a lot from those like Theo who've dealt with the issue. I'm not in a position to watch the talk at the moment; can you summarize what the tl;dr version of it is?

I saw it three years ago, so I don't remember all the details. The gist is that:

OpenBSD's random is available from extremely early in the boot process with reasonable entropy
LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which doesn't actually use ARC4)
That implementation of arc4random is good because it is fool-proof and it has basically no failure mode
Stirring is good, having multiple components take random numbers from the same source probably makes results harder to guess too
Getrandom/getentropy is in all ways better than reading from random devices

Vigorously agree on all points. Thanks for the summary.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 4, 2017, 4:19pm

If trapping is OK, then surely returning Optional is superior; any user who
is OK with trapping can make that decision for themselves by writing
`random()!`. Everyone else can then see clearly that trapping is a
possibility, which is important.

···

On Wed, Oct 4, 2017 at 11:09 David Waite <david@alkaline-solutions.com> wrote:

On Oct 4, 2017, at 4:05 AM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

I agree with Felix’s concern, which is why I brought up the question, but
ultimately the issue is unavoidable. It’s not down to global instance or
not. If your source of random numbers is unseedable and may mix in
additional entropy at any time, then it may fail at any time because when a
hardware restart might happen may be transparent to the process. The user
must know about this or else we are laying a trap (pun intended).

I'm of the mindset (which might be controversial) that we should attempt
to expose legacy and cryptographically secure random number generators,
even a mixed algorithmic/entropy source like /dev/urandom, but that we
should not expose /dev/random at all. If someone is trying to use
/dev/random legitimately (such as to generate one-time-pads) they will have
to take into account that systems like linux still use algorithmic entropy
to drive /dev/random. If someone really has this sort of use case, they
have exceeded the bounds of the system randomness protocol.

Without /dev/random support as a requirement, the only failure cases I
know of are reading too much random data in one operation (which could be
solved by repeated calls) or calling before sufficient entropy has been set
up in /dev/urandom (such as in a system startup process). I'd be fine with
the second one being a special case, and such systems needing to know use
of the /dev/urandom -backed generator before randomness has been set up
will trap or return predictable information on certain platforms.

-DW

nnnnnnnn · October 5, 2017, 5:02pm

```
extension Int {
static func random(in range: Countable{Closed}Range<Int>) -> Int
}

Nice. Should these be initializers like:

extension Int {
init(randomIn: Countable{Closed}Range<Int>)
}

I don’t see much of a case for making it it random(in: SpecificCollection) instead of genericCollection.random().

I see a couple points in favor of these static methods (or initializers) on the numeric types:

1) The collection method will need to return an optional to match the semantics of existing methods (like min()). If this is the only method available, every time someone needs a random value in the range 1...10, they’ll need to unwrap the result (with either force unwrapping, which people will complain about, or some kind of conditional binding, which is its own problem). Even if the semantics are the same (trapping on an empty range), the user experience of using a non-optional method will be better.

2) Floating-point ranges won’t get the collection method, so either we’ll have inconsistent APIs (random FP value is non-optional, random integer is optional) or we’ll make the FP API optional just to match. Both of those seem bad.

One possible reason is if you exclude half-open ranges, only having CountableClosedRange, then you don’t have to account for the possibility of an empty collection (via an optional or a trap) because they cannot be empty. But closed ranges aren’t the currency type – half-open ranges are. So it’d hit usability if you have to convert from one to t'other often.

Other possibility is discovery. But given the common use case is “random element from collection”, I don’t expect this to be an issue as it will quickly become common knowledge that this feature is available.

Agreed here—I don’t think discovery is really an issue between the two kinds. However, I don’t think the overlap in features (two ways to generate random integers) are a problem, especially as we’d have better alignment between integer and floating-point methods.

Nate

···

On Oct 5, 2017, at 11:30 AM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 4, 2017, at 9:12 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

xwu · November 18, 2017, 2:31am

But actually, Int.random followed by % is the much bigger issue and a very
good cautionary tale for why T.random is not a good idea. Swift should help
users do the correct thing, and getting a random value across the full
domain and computing an integer modulus is never the correct thing to do
because of modulo bias, yet it's a very common error to make. We are much
better off eliminating this API and encouraging use of the correct API,
thereby reducing the likelihood of users making this category of error.

Amen.

If (and I agree with this) the range-based notation is less intuitive
(0..<10.random is certainly less discoverable than Int.random), then we
ought to offer an API in the form of `Int.random(in:)` but not
`Int.random`. This does not preclude a `Collection.random` API as Alejandro
proposes, of course, and that has independent value as Gwendal says.

If we're not happy with the range syntax, maybe we should put
`random(in:)`-style methods on the RNG protocol as extension methods
instead. Then there's a nice, uniform style:

let diceRoll = rng.random(in: 1...6)
let card = rng.random(in: deck)
let isHeads = rng.random(in: [true, false])
let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint
overload

The only issue is that this makes the default RNG's name really important.
Something like:

DefaultRandom.shared.random(in: 1...6)

Will be a bit of a pain for users.

I did in fact implement this style of RNG in NumericAnnex, but I'm not
satisfied with the design myself. Not only is it a bit of an ergonomic
thorn, there's also another drawback that actually has weighty implications:

Users aren't conditioned to reuse RNG instances. Perhaps, it is because it
can "feel" wrong that multiple random instances should come from the *same*
RNG. Instead, it "feels" more right to initialize a new RNG for every
random number. After all, if one RNG is random, two must be randomer! This
error is seen with some frequency in other languages that adopt this
design, and they sometimes resort to educating users through documentation
that isn't consistently heeded.

Of course, you and I both know that this is not ideal for performance.
Moreover, for a number of PRNG algorithms, the first few hundred or
thousand iterations can be more predictable than later iterations. (Some
algorithms discard the first n iterations, but whether that's adequate
depends on the quality of the seed, IIUC.) Both of these issues don't apply
specifically to a default RNG type that cannot be initialized and always
uses entropy from the global pool, but that's not enough to vindicate the
design, IMO. By emphasizing *which* RNG instance is being used for random
number generation, the design encourages non-reuse of non-default RNGs,
which is precisely where this common error matters for performance (and
maybe security).

Maybe we call the default RNG instance `random`, and then give the

`random(in:)` methods another name, like `choose(in:)`?

let diceRoll = random.choose(in: 1...6)
let card = random.choose(in: deck)
let isHeads = random.choose(in: [true, false])
let probability = random.choose(in: 0.0...1.0)
let diceRoll = rng.choose(in: 1...6)
let card = rng.choose(in: deck)
let isHeads = rng.choose(in: [true, false])
let probability = rng.choose(in: 0.0...1.0)

This would allow us to keep the default RNG's type private and expose it
only as an existential—which means more code will treat RNGs as black
boxes, and people will extend the RNG protocol instead of the default RNG
struct—while also putting our default random number generator under the
name `random`, which is probably where people will look for such a thing.

I've said this already in my feedback, but it can get lost in the long
chain of replies, so I'll repeat myself here because it's relevant to the
discussion. I think one of the major difficulties of discussing the
proposed design is that Alejandro has chosen to use a property called
"random" to name multiple distinct functions which have distinct names in
other languages. In fact, almost every method or function is being named
"random." We are tripping over ourselves and muddling our thinking (or at
least, I find myself doing so) because different things have the exact same
name, and if I'm having this trouble after deep study of the design, I
think it's a good sign that this is going to be greatly confusing to users
generally.

First, there's Alejandro's _static random_, which he proposes to return an
instance of type T given a type T. In Python, this is named `randint(a, b)`
for integers, and `random` (between 0 and 1) or `uniform(a, b)` for
floating-type types. The distinct names reflect the fact that `randint` and
`uniform` are mathematically quite different (one samples a *discrete*
uniform distribution and the other a *continuous* uniform distribution),
and I'm not aware of non-numeric types offering a similar API in Python.
These distinct names accurately reflect critiques from others on this list
that the proposed protocol `Randomizable` lumps together types that don't
share any common semantics for their _static random_ method, and that the
protocol is of questionable utility because types in general do not share
sufficient semantics such that one can do interesting work in generic code
with such a protocol.

Then there's Alejandro's _instance random_, which he proposes to return an
element of type T given a instance of a collection of type T. In Python,
this is named "choice(seq)" (for one element, or else throws an error) and
"sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to
draw an analogy between _instance random_ and other instance properties of
a Collection such as `first` and `last`. In fact, the behavior of Python's
"choice" (if modified to return an Optional) and "sample", as a pair, would
fit in very well next to Swift's existing pairs of `first` and `prefix(k)`
and `last` and `suffix(k)`. We could trivially Swiftify the names here; for
example:

[1, 2, 3].first
[1, 2, 3].any // or `choice`, or `some`, or...
[1, 2, 3].last

[1, 2, 3].prefix(2)
[1, 2, 3].sample(2)
[1, 2, 3].suffix(2)

I'm going to advocate again for _not_ naming all of these distinct things
"random". Even in conducting this discussion, it's so hard to keep track of
what particular function a person is giving feedback about.

···

On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

xwu · November 18, 2017, 12:11am

That’s the point. Using “Int.random(in: 0...9)” gives you a result that
has an equal chance of being any integer between zero and nine, while
“Int.random % 9” does not.

(...and nor does “Int.random % 10”. Typo.)

···

On Fri, Nov 17, 2017 at 18:10 Xiaodi Wu via swift-evolution < swift-evolution@swift.org> wrote:

On Fri, Nov 17, 2017 at 17:30 Jonathan Hull <jhull@gbis.com> wrote:

Just to play devil’s advocate, wouldn’t they see random(in:) in the
autocomplete when typing ‘random’?

Thanks,
Jon

On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org> wrote:

On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution < >> swift-evolution@swift.org> wrote:

> Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution < >>> swift-evolution@swift.org> a écrit :
>
> If we go back to your example, you never call FixedWidthInteger.random
either, you call range.random. Does this mean integer types shouldn’t have
.random? No, because it means get a random number from it’s internal range
(alias to (min ... max).random). I think we can all agree that
Integer.random is a nicer api than making a range of its bounds. The same
goes for Date.random and Color.random.
>
> - Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life
(backend & frontend app developer) that I have used a pure random value
from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better
arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and
random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless.
Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers.
And also Array.randomElement(), Array.shuffled(), etc, because there are
plenty naive and bad algorithms for those simple tasks.

Certainly it's hard to defend Date.random (and yes, it might be useful
for a fuzzer, but that's a very niche use case--and in that case the fuzzer
should probably also generate invalid/non-existent dates, which surely
Date.random should not do). But actually, Int.random followed by % is the
much bigger issue and a very good cautionary tale for why T.random is not a
good idea. Swift should help users do the correct thing, and getting a
random value across the full domain and computing an integer modulus is
never the correct thing to do because of modulo bias, yet it's a very
common error to make. We are much better off eliminating this API and
encouraging use of the correct API, thereby reducing the likelihood of
users making this category of error.

If (and I agree with this) the range-based notation is less intuitive
(0..<10.random is certainly less discoverable than Int.random), then we
ought to offer an API in the form of `Int.random(in:)` but not
`Int.random`. This does not preclude a `Collection.random` API as Alejandro
proposes, of course, and that has independent value as Gwendal says.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________

swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · November 23, 2017, 6:39am

I pushed some updates to the proposal with a reflected API, but I do not
agree that we should rid the API of T.random just because some users will
misuse it. I think the correct solution here is to include T.random(in:)
(which does not return an optional making it not a second typing of (min
... max).random). Like Jonathon said, autocomplete will display both of
these and users will be able to select random(in:). I also disagree that
T.random is _always_ followed by modulo because if we look at arc4random()
it’s range is the whole domain of UInt32. Users don’t put a modulo here
because they know the correct way to do it is through arc4random_uniform(),
either through online tutorials, or by reading documentation. If we did get
rid of T.random, users who want a random byte for instance would have to
write UInt8.random(in: 0 … 255) every time. Developers will make wrappers
over this. I believe the correct solution is to keep T.random for those who
won’t misuse it and T.random(in:) for those who need to a random value
within a range.

This is an exceedingly rare use case. Do you think it will be common that a
developer will want a single random byte? There will be much better ways to
request multiple bytes (through Data, for instance). And if a user wants
something like a value between 0...255 specifically, such as in the case of
elements of an RGB tuple, then (0...255).random is clearly a superior
spelling as compared to UInt8.random.

Quite simply, having seen how it is used in other languages, T.random is
vastly more likely to be misused than profitably used, by which I mean used
in a situation where an alternative spelling is clearly not good enough. It
should not be the simplest API offered, as its correct use is anything but
simple.

···

On Wed, Nov 22, 2017 at 22:55 Alejandro Alonso <aalonso128@outlook.com> wrote:

On Nov 17, 2017, 5:09 PM -0600, Xiaodi Wu <xiaodi.wu@gmail.com>, wrote:

On Fri, Nov 17, 2017 at 10:10 AM, Gwendal Roué via swift-evolution < > swift-evolution@swift.org> wrote:

> Le 17 nov. 2017 à 16:04, Alejandro Alonso via swift-evolution < >> swift-evolution@swift.org> a écrit :
>
> If we go back to your example, you never call FixedWidthInteger.random
either, you call range.random. Does this mean integer types shouldn’t have
.random? No, because it means get a random number from it’s internal range
(alias to (min ... max).random). I think we can all agree that
Integer.random is a nicer api than making a range of its bounds. The same
goes for Date.random and Color.random.
>
> - Alejandro

Hello,

I'm not random expert, but it has never happened in my developer life
(backend & frontend app developer) that I have used a pure random value
from the full domain of the random type. In this life:

- Int.random is _always_ followed by % modulo. Unless the better
arc4random_uniform(max) is used.
- Color.random is _never_ used, because random colors look bad.
- Date.random is _never_ used, because time is a physical unit, and
random points in time do not match any physical use case.

This does not mean that random values from the full domain are useless.
Of course not: math apps, fuzzers, etc. need them.

Yet a range-based API would be much welcomed by regular app developers.
And also Array.randomElement(), Array.shuffled(), etc, because there are
plenty naive and bad algorithms for those simple tasks.

Certainly it's hard to defend Date.random (and yes, it might be useful for
a fuzzer, but that's a very niche use case--and in that case the fuzzer
should probably also generate invalid/non-existent dates, which surely
Date.random should not do). But actually, Int.random followed by % is the
much bigger issue and a very good cautionary tale for why T.random is not a
good idea. Swift should help users do the correct thing, and getting a
random value across the full domain and computing an integer modulus is
never the correct thing to do because of modulo bias, yet it's a very
common error to make. We are much better off eliminating this API and
encouraging use of the correct API, thereby reducing the likelihood of
users making this category of error.

If (and I agree with this) the range-based notation is less intuitive
(0..<10.random is certainly less discoverable than Int.random), then we
ought to offer an API in the form of `Int.random(in:)` but not
`Int.random`. This does not preclude a `Collection.random` API as Alejandro
proposes, of course, and that has independent value as Gwendal says.

gparker42 · November 27, 2017, 9:42pm

What evidence do you have that "users…know the correct way to do it"?

I searched for arc4random and arc4random_uniform in a large Apple codebase.
723 files used arc4random.
296 files used arc4random_uniform.

By eyeball, at least 80% of the arc4random() uses were of the form
arc4random() % something
or
(some_float_type)arc4random() / something
which are non-uniform ways to get a random integer and floating-point values, respectively.

Few calls to arc4random() were obviously initializing random bytes for purposes like crypto or networking, but much of Apple's crypto code would likely have used other API like arc4random_buf() or CCRandomCopyBytes() so that may not be a good estimate.

···

On Nov 22, 2017, at 8:55 PM, Alejandro Alonso via swift-evolution <swift-evolution@swift.org> wrote:

I pushed some updates to the proposal with a reflected API, but I do not agree that we should rid the API of T.random just because some users will misuse it. I think the correct solution here is to include T.random(in:) (which does not return an optional making it not a second typing of (min ... max).random). Like Jonathon said, autocomplete will display both of these and users will be able to select random(in:). I also disagree that T.random is _always_ followed by modulo because if we look at arc4random() it’s range is the whole domain of UInt32. Users don’t put a modulo here because they know the correct way to do it is through arc4random_uniform(), either through online tutorials, or by reading documentation.

--
Greg Parker gparker@apple.com <mailto:gparker@apple.com> Runtime Wrangler

tali · November 27, 2017, 11:20pm

Hello,

Maybe we call the default RNG instance `random`, and then give the `random(in:)` methods another name, like `choose(in:)`?

  let diceRoll = random.choose(in: 1...6)
  let card = random.choose(in: deck)
  let isHeads = random.choose(in: [true, false])
  let probability = random.choose(in: 0.0...1.0)

  let diceRoll = rng.choose(in: 1...6)
  let card = rng.choose(in: deck)
  let isHeads = rng.choose(in: [true, false])
  let probability = rng.choose(in: 0.0...1.0)

I like this design a lot. After all, `random` is not a property of some type or instance, but we want to generate a new random element within some range/based on some given set.
Modeling that as methods of the RNG seems to be much more natural.

···

--
Martin

Ben_Cohen · October 4, 2017, 11:42pm

If trapping is OK, then surely returning Optional is superior; any user who is OK with trapping can make that decision for themselves by writing `random()!`. Everyone else can then see clearly that trapping is a possibility, which is important.

It’s important not to underestimate both the extent to which using `!` scares people, _and_ the extent to which beginners use it inappropriately. Both of these mean that it’s very important the standard library not include anything that makes force-unwrapping a common/routine operation (looking at you, UnsafeBufferPointer.baseAddress).

Given that arc4random, the recommended source of random numbers on Darwin, is always successful, requiring a force-unwrap when the user is using the default source doesn’t seem like the right move.

···

On Oct 4, 2017, at 9:20 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Wed, Oct 4, 2017 at 11:09 David Waite <david@alkaline-solutions.com <mailto:david@alkaline-solutions.com>> wrote:

On Oct 4, 2017, at 4:05 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I agree with Felix’s concern, which is why I brought up the question, but ultimately the issue is unavoidable. It’s not down to global instance or not. If your source of random numbers is unseedable and may mix in additional entropy at any time, then it may fail at any time because when a hardware restart might happen may be transparent to the process. The user must know about this or else we are laying a trap (pun intended).

I'm of the mindset (which might be controversial) that we should attempt to expose legacy and cryptographically secure random number generators, even a mixed algorithmic/entropy source like /dev/urandom, but that we should not expose /dev/random at all. If someone is trying to use /dev/random legitimately (such as to generate one-time-pads) they will have to take into account that systems like linux still use algorithmic entropy to drive /dev/random. If someone really has this sort of use case, they have exceeded the bounds of the system randomness protocol.

Without /dev/random support as a requirement, the only failure cases I know of are reading too much random data in one operation (which could be solved by repeated calls) or calling before sufficient entropy has been set up in /dev/urandom (such as in a system startup process). I'd be fine with the second one being a special case, and such systems needing to know use of the /dev/urandom -backed generator before randomness has been set up will trap or return predictable information on certain platforms.

-DW
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution