SE-0202 Amendment Proposal: Replace `Collection.randomElement` customization point with `randomIndex`, or remove the customization point entirely

Nevin · June 21, 2018, 2:35pm

Why not? Any time you want to actually mutate the collection, you need indices not elements.

For example, say you have two collections with the same type of element, and you want to swap a random element from one with a random element from the other. This requires getting a random index in each collection, then swapping the elements at those indices.

This concept is a general one: anything you can do with an element of a collection, you can do the same thing using an index to access the element. But there are things you can do with an index which you cannot do with an element alone.

So index-based operations are inherently more fundamental than element-based ones.

Ben_Cohen · June 21, 2018, 3:11pm

The criteria for adding something as an API for the standard library is not "is it a low-level building block for some other operation". Sometimes those building blocks are, themselves, also needed to create multiple different higher-level operations, so they merit adding on their own. But sometimes they aren't useful except to build this one thing, and in that case, they shouldn't be added as public API.

I don't think your example of swapping two random elements in a collection is something that needs to be done often (or ever?). On the other hand, if you take the very common case of shuffling an entire collection, it wouldn't be particularly useful to pick a random index from the collection. To implement something like the fisher-yates shuffle, you need to move progressively through the collection, swapping random elements in a narrowing range. Having a randomIndex operation would not be a significant readability improvement over just advancing the left side of that narrowing range by a random distance. And even if it were, it would require a slicing operation that would lean on the optimizer to make efficient.

Adding API also needs to consider the "do no harm" angle. I suspect that the most common use of getting a random index would be misuse in cases where the user wants "without replacement" functionality:

while let idx = cards.randomIndex() {
  deal(cards[idx])
  cards.remove(at: idx)
}

Of course, if a feature is genuinely useful, we should add it even if it can be misused—hence we have firstIndex(where:). But we shouldn't add a non-useful API that invites misuse.

edit: I thought of a compelling use case for a random index: slicing a collection into two randomly-sized parts. If other common uses come up it would weigh more in favor of adding the index version despite the risks.

edit 2: thinking about that some more, maybe not so great, as it reveals another hidden gotcha: can the randomIndex include the endIndex or not? Unlike firstIndex(where:) that precludes that possibility, including it might sometimes be what you want. For example, does this code ever return an empty index for the first half, second half, or neither, or both? It's not obvious without reading the docs for randomIndex:

let r = a.randomIndex()!
let fst = a[..<r]
let snd = a[r...]

Joe_Groff · June 21, 2018, 4:01pm

To be clear, neither proposed amendment suggests removing randomElement as API, only as a customization point.

beccadax · June 21, 2018, 4:50pm

This is an interesting alternative which might solve the problem with ranges. On the other hand, in the past I've occasionally needed to implement these random workaround requirements outside the standard library (e.g. _customIndexOfEquatable(_:) when writing sorted collections) and it always felt like a dirty hack.

Ben_Cohen · June 21, 2018, 7:12pm

Yeah, this is a more general problem. There is no good way of distinguishing between things that implementors might need/want to implement, but that users should never see/use, vs requirements that are features for users. We use _ but that's kind of unsatisfactory because we also use that to indicate "hands off, std lib only" features. It would be nice to have a more explicit way to separate the two.

Paul_Cantrell · June 21, 2018, 7:46pm

I’ve often wanted variants of this. If this evolved into a separate proposal at some point, it would have my support.

hooman · June 21, 2018, 11:37pm

Apologies, I forgot to check the proposed randomIndex() signature. I oppose the proposed signature of randomIndex(). It should not return an optional. It should return endIndex iff the collection is empty.

Karl · June 22, 2018, 1:00pm

I'm not sure if this deserves another thread, but I also think this method should be moved to RandomAccessCollection. The protocol is literally named to indicate that it supports random access. I'm not sure why randomIndex() would live anywhere else.

For example: what happens when somebody calls randomIndex() on a String? Calculating the count is non-trivial, and then finally returning the index at some Character offset requires another non-trivial walk along the grapheme clusters. String cannot efficiently return a random Character (that's why it doesn't conform to RAC) -- but, for example, an Array<Character> could because of how its data is stored. As I understand it, identifying those inefficiencies is largely the point of the Collection protocol hierarchy.

Are there any interesting collections which cannot locate indexes at specific offsets in reasonable time (as required by RAC), but can produce a totally random index in good time?

Otherwise, I think randomIndex composes well with the existing hierarchy of Collection requirements:

Collection: index(after:), index(_:, offsetBy:) in O(n)
BidirectionalCollection: index(before:)
RandomAccessCollection: index(_:, offsetBy:) in O(1), allows for randomIndex(using:)