SE-0202 Amendment Proposal: Replace `Collection.randomElement` customization point with `randomIndex`, or remove the customization point entirely

Joe_Groff · June 19, 2018, 10:52pm

Following up from my recent pitch thread, I've prepared a couple of formal amendments for review to address the forward-compatibility issues with the Collection.randomElement requirement that was added as part of SE-0202. Since randomElement returns an element by value, it would not be implementable by a Collection with move-only elements when such things are supported. Two approaches came to light in discussion:

Replace the randomElement requirement with a randomIndex requirement, and implement randomElement in terms of that requirement: SE-0202 amendment: Replace `Collection.randomElement` requirement with `randomIndex`. by jckarter · Pull Request #864 · apple/swift-evolution · GitHub
Remove the randomElement requirement entirely, since the huge collections it strives to support are problematic in general, and randomElement is far from the only problem: SE-0202 amendment: Remove `Collection.randomElement` requirement. by jckarter · Pull Request #863 · apple/swift-evolution · GitHub

Note that in both proposals, randomElement still remains as public standard library API; the proposals only remove it as a protocol requirement for customization. I'd like the Swift community's feedback as to which of these amendments, if either, is most acceptable. Thanks!

Alejandro · June 20, 2018, 1:09am

I would love to see the addition of randomIndex replace randomElement as a requirement. I agree such collections like Int.min ... Int.max are problematic and that users should be using APIs such as Int.random(in:). I'm not too attached to (Int.min ... Int.max).randomElement()! working at all, but with the randomIndex solution, it might be possible to implement randomElement for these ranges to work. Of course, this wouldn't work in a generic sense as it would use Collection.randomElement. I haven't implemented this, so I'm not sure if we could make such facility work with randomIndex, but I wanted to point out a possibility. It could be that we don't want to support this type of operation on these types of ranges going forward and they can simply trap which is completely fine by me.

It is somewhat unfortunate that these types of operations on these problematic ranges have to trap. I'm unsure if in the future we're able to catch such implementations and provide helpful compile time diagnostics regarding the issue.

gwendal.roue · June 20, 2018, 1:20am

Hello, since SE-0202 amendment: Replace Collection.randomElement requirement with randomIndex does not remove randomElement from the standard library, it looks like the obvious choice for Swift users - assuming one of those amendments would be accepted.

On the other side, does SE-0202 amendment: Remove Collection.randomElement requirement mean that we lose randomElement for good? That it will be removed from the standard library entirely? That Swift users will have to write their own userland extensions to bring it back? And that eventually those Swift users will suffer from source breakage then those "move-only" types are introduced?

Generally speaking, it is difficult to reason as long as we don't know the implications of move-only types in terms of source breakage and general language convenience. Something is not clear at all: how could we keep first, last, and indexed subscript, if randomElement has to be removed?

The initial pitch did not address those questions with enough clarity. I don't say that my opinion matters much here. I say that move-only types look like an very bold authority move from the Core Team, with quite unclear consequences.

Alejandro · June 20, 2018, 1:25am

In the Remove Collection.randomElement requirement amendment, it is not removing the function from the stdlib. It is simply changing from a requirement + default implementation to simply an extension method.

Before:

protocol Collection {
  func randomElement() -> Element?
}
extension Collection {
  func randomElement() -> Element? {}
}

After:

extension Collection {
  func randomElement() -> Element? {}
}

Note that in both amendments, randomElement is going to be an extension method rather than a requirement + default imeplementation.

gwendal.roue · June 20, 2018, 1:49am

I apologize for reacting too fast in front of some diffs.

We had initially:

For Collection we add a random method with default implementation for collections to get a random element.

Amendment 1 says instead:

For Collection we add a random method with default implementation for collections to get the index of a random element, as well as get a random element directly.

And amendment 2 goes this way:

For Collection we add an extension method for collections to get a random element.

The first amendment makes it possible for huge collections to provide random elements, when the second amendment simplifies the design at the cost of runtime traps for huge collections.

I think I nailed it All right my vote goes to the second one, then!

Joe_Groff · June 20, 2018, 3:59pm

Sorry for the confusion. I've edited the title and original post to hopefully be clearer.

hooman · June 20, 2018, 11:16pm

My vote: Drop randomElement requirement and replace it with randomIndex requirement.

Wildchild9 · June 21, 2018, 12:19am

Quite frankly, I think both can be helpful. I would love if randomIndex was added, but I don't think that we should replace randomElement as they both have uses.

Ben_Cohen · June 21, 2018, 12:36am

That would lead to exceptionally bad ergonomics for the feature. Since the return value is optional, you'd have to write collection.randomIndex().map { collection[$0] } or some other ugliness to get an element, the common desired result.

I expect it's pretty rare to need to actually get a random index. It's mainly if you want to mutate a random element – but I suspect most needs like that would involve something like removing the random element, and unless you're doing that as a one off, doing so in a loop would end up being an O(n^2) operation. Chances are you want some other higher-level operation, like "choose n without replacement" and there are better algorithms for that kind of thing (std lib proposals for which would be very welcome!)

If you still really need to get a random index, you can always call collection.indices.randomElement()

That's why I'd vote for #2, or, if keeping Range.randomElement working for large ranges is considered important, i'd be in favor of adding it as an implementation detail with an underscore rather than bloating the API.

jawbroken · June 21, 2018, 1:02am

What's the implication of collection.indices.randomElement() for large collections? I am guessing “not good”, given the rest of your post, otherwise randomElement could just be implemented using that. I'm still thinking about @beccadax's concern from the pitch thread about breaking reasonable code. I suppose it depends how often people end up using .randomElement() vs .random(in:). Perhaps a targeted warning about using .randomElement() with known large ranges would cover most cases.

Ben_Cohen · June 21, 2018, 1:17am

Not sure what gave you the impression it wouldn't be good. Should be basically the same as the performance on the collection. All it's doing is getting a count (distance from start to end) then moving a random distance forward. This will either be a trivial operation (if Indices is a Range) or near-identical to an operation on the underlying collection to which it will forward the calls (if Indices is a DefaultIndices... you might pay for some refcounting in creating the wrapper).

Bear in mind, the constraints require Collection.Indices has the same capabilities (e.g. random access) as the collection.

jawbroken · June 21, 2018, 1:36am

Sorry, I was unclear, I meant in terms of breaking for the large collections/ranges that are the main point of contention here, not for performance.

Ben_Cohen · June 21, 2018, 2:12am

Oh I see. It’s basically the same problem. If the collection is too big, so will it’s indices be. The only solution to that (unless someone comes up with something clever, you never know) is to have a customization point on the protocol so that big collections can add special handling.

Nobody1707 · June 21, 2018, 3:06am

Didn't we already remove said customization point from Collection?

hlovatt · June 21, 2018, 5:42am

It seems like randomIndex doesn't really solve the problem that randomElement has re. move semantics, it just moves it (pun intended). With that in mind should move be rethought? It seems like you can't do any mutation on a move semantics collection, that is a big limitation.

I was envisaging move as smarter. EG:

let e = c.randomElement // OK

Also:

func randomElement<C, E>(_ c: @move C) -> E where C: Collection, E.Element == E {
    return c.randomElement
}
let e = randomElement(c) 
// c never used again therefore above OK.

But:

func randomElement<C, E>(_ c: @move C) -> E where C: Collection, E.Element == E {
    return c.randomElement
}
let e = randomElement(c) 
let e2 = randomElement(c) // Not OK, c is being used after it has been moved.

PS Getting uncertain that move is worth adding, if it causes all these problems with APIs and is yet another annotation - isn't the present design better?

huon · June 21, 2018, 6:08am

I think it does solve it, assuming that subscripting doesn't consume the whole collection, and we have a sketch of design for that: generalized accessors. Could you give a bit more detail about why you think it's only moving it?

To be clear, that's the problem with move semantics and randomElement: consuming (i.e. losing all later access) to the whole collection to get a single random element is problematic and limiting.

hlovatt · June 21, 2018, 6:29am

Yeah my example wasn't great.

Better example (hopefully), I was hoping that move would work like this:

let e1 = c.randomElement! // OK
let e2 = c.randomElement! // OK
// `c` hasn't been moved so you can call `randomElement` multiple times.

Should be able to move to a function and do the same:

func twoRandomElements<C, E>(_ c: @move C) -> (E, E) where C: Collection, E.Element == E {
    return (c.randomElement!, c.randomElement!) // OK to use `c` twice since `twoRandomElements` now owns `c`.
}
let es = twoRandomElements(c) // `c` goes out of scope here.
// `c` never used again therefore above OK.

But:

let e = twoRandomElements(c) // `c` goes out of scope here.
let e2 = twoRandomElements(c) // Not OK, `c` is being used after it has been `moved`.

Alejandro · June 21, 2018, 6:35am

Could it be possible to return a non optional for the randomIndex() and return an optional for the randomElement()? This would somewhat mimic the behavior for startIndex and first while at the same time solve the weird ergonomic issue with returning an optional index.

let a = [Int]()
a.startIndex // 0
a.randomIndex() // This will return a.startIndex (which is = a.endIndex)
a.endIndex // 0

a.first // nil
a.randomElement() // nil
a.last // nil

beccadax · June 21, 2018, 8:58am

As I said during the pitch thread, I much prefer changing randomElement(using:) to randomIndex(using:) over removing the customization point entirely. I think users are much more likely to be bitten by randomElement() trapping on large ranges than they would for most calls.

The draft which does this (apple/swift-evolution #864) looks good to me.

Ben_Cohen · June 21, 2018, 2:11pm

That would be confusingly inconsistent. Other searching operations that return an index return an optional, not endIndex.

I just don't think exposing a random index fetching API is useful to users – it's just a workaround for a specific edge case. So if we were to keep that workaround, I'd rather do it with a hidden _randomIndex() method.