Proposed amendment to SE-202: Replace `Collection.randomElement` requirement with `randomIndex`

I agree that it is not necessary, but I prefer the clarity that it brings. In particular:

  1. I find these massive protocols, like Collection, difficult to to parse. It would be much easier for me to parse a number of small tightly focused protocols.

  2. An extra protocol is a documentation point were the reason for the methods, in this case that the Collection is move only, can be explained.

This sounds good. The extension points could be underscored if you don't want to deal with the public API bloat right away. Are there Sequence implementations with customized min/max that are not also Collections, though?

The API bloat question maybe leads into the more general ongoing discussions about index ergonomics. Doing anything with indexes is hard due to the need to constantly reiterate the collection being worked on. I would say that the arguments for randomIndex apply equally well to any requirement on Collection—a requirement that returns indexes is a more efficient building block for collection manipulations beyond just grabbing an element—but right now you always want the corresponding element-returning API too because it'd be too awkward to force user code to apply the index every time. If there were a higher-order convenience API that allowed you to apply a key path or closure that's Collection -> Index, so you could write c[\.someIndex] as a shorthand for c[c.someIndex] or c.someIndex.map { c[$0] }, that might help.

4 Likes

In discussion with the @Ben_Cohen and the rest of the core team, the question was raised of why randomElement should be a protocol requirement at all. If we remove it, then the immediate need to address the forward-compatibility problem with the Collection protocol goes away. Ben explained that the intent of the customization point was specifically to allow large Ranges with spans larger than Int.max to be able to produce random elements, such as Int.min ..< Int.max. However, such ranges are already problematic Collections because even more fundamental operations like count will also overflow and crash, as would nearly any operation that attempted to span the collection. We knew this was a tradeoff when eliminating the IndexDistance associated type from collections, and it seems strange to special-case this API. At least in first-order code that isn't generically manipulating collections, the more obvious way to get any random integer is with the Int.random(in: .min ..< .max) API. Does anyone strongly object to removing this customization point from the protocol without a replacement? randomIndex may still be a useful API on its own, but it would no longer be a compatibility imperative to add it.

6 Likes

This seems like a nasty trap we’ve accidentally laid for users. Generic algorithms using randomElement() can crash…but only on a ClosedRange of an integer type as large or larger than Int…and only rarely, unless you use Int64 on a 32-bit architecture, UInt64, or some kind of BigInt type. It feels like the kind of behavior that makes users superstitious about a particular API, so they end up cribbing half-baked “defenses” from Stack Overflow that have been mutating for five years and have a 50/50 chance of even stopping what they’re meant to stop.

I don’t love it, but I think we probably have to go with randomIndex(using:). (We probably don’t need a default-Random-using variant; this is a customization point.)

1 Like

To be clear, the trap isn't specific to randomElement, though. Anything that tries to measure distance between indexes greater than Int.max, including count, can trap.

That’s true, but most things you would do with such a distance sound unreasonable anyway—if you’re preallocating a buffer larger than Int.max, or preparing to compare two Collections larger than Int.max, the operation you want to perform is unreasonable and the crash when you try to fetch the count saves you from a much slower, more expensive failure. That isn’t true for randomElement()—it’s eminently reasonable to write, say, func randomElements(_ n: Int) -> Set<Element> in terms of randomElement(), eminently reasonable to use it on a big range, and eminently reasonable to feel very angry that the standard library fails you.

3 Likes