Proposed amendment to SE-202: Replace `Collection.randomElement` requirement with `randomIndex`

hlovatt · June 3, 2018, 10:17pm

I agree that it is not necessary, but I prefer the clarity that it brings. In particular:

I find these massive protocols, like Collection, difficult to to parse. It would be much easier for me to parse a number of small tightly focused protocols.
An extra protocol is a documentation point were the reason for the methods, in this case that the Collection is move only, can be explained.

Joe_Groff · June 5, 2018, 5:11pm

Ben_Cohen:

Unfortunately, min and max are another story, given Comparable -ordered collections are common ( Indices , ranges, hopefully soon a sorted set or dictionary) so it's definitely good to allow collections that know they're sorted to have an O(n) version of these.

Doing the same trick of switching to returning the index instead of the element presents a couple of challenges:

They're a naming nightmare. minIndex is wrong – it's the index of the minimum element, not the minimum index (which is always startIndex ). So you need to go for something waffly like minimumElementIndex .

They return an optional, and handling that is a real pain: c.minIndex.map { c[$0] } . So we'd definitely need to keep the element-returning versions too, and that's an unfortunate amount of redundancy.

That said, I don't see a real problem adding both the index versions on the protocol plus the convenience element versions via extensions that can in future be constrained to only copyable types.

This sounds good. The extension points could be underscored if you don't want to deal with the public API bloat right away. Are there Sequence implementations with customized min/max that are not also Collections, though?

The API bloat question maybe leads into the more general ongoing discussions about index ergonomics. Doing anything with indexes is hard due to the need to constantly reiterate the collection being worked on. I would say that the arguments for randomIndex apply equally well to any requirement on Collection—a requirement that returns indexes is a more efficient building block for collection manipulations beyond just grabbing an element—but right now you always want the corresponding element-returning API too because it'd be too awkward to force user code to apply the index every time. If there were a higher-order convenience API that allowed you to apply a key path or closure that's Collection -> Index, so you could write c[\.someIndex] as a shorthand for c[c.someIndex] or c.someIndex.map { c[$0] }, that might help.

Joe_Groff · June 15, 2018, 6:18pm

In discussion with the @Ben_Cohen and the rest of the core team, the question was raised of why randomElement should be a protocol requirement at all. If we remove it, then the immediate need to address the forward-compatibility problem with the Collection protocol goes away. Ben explained that the intent of the customization point was specifically to allow large Ranges with spans larger than Int.max to be able to produce random elements, such as Int.min ..< Int.max. However, such ranges are already problematic Collections because even more fundamental operations like count will also overflow and crash, as would nearly any operation that attempted to span the collection. We knew this was a tradeoff when eliminating the IndexDistance associated type from collections, and it seems strange to special-case this API. At least in first-order code that isn't generically manipulating collections, the more obvious way to get any random integer is with the Int.random(in: .min ..< .max) API. Does anyone strongly object to removing this customization point from the protocol without a replacement? randomIndex may still be a useful API on its own, but it would no longer be a compatibility imperative to add it.

beccadax · June 15, 2018, 6:37pm

This seems like a nasty trap we’ve accidentally laid for users. Generic algorithms using randomElement() can crash…but only on a ClosedRange of an integer type as large or larger than Int…and only rarely, unless you use Int64 on a 32-bit architecture, UInt64, or some kind of BigInt type. It feels like the kind of behavior that makes users superstitious about a particular API, so they end up cribbing half-baked “defenses” from Stack Overflow that have been mutating for five years and have a 50/50 chance of even stopping what they’re meant to stop.

I don’t love it, but I think we probably have to go with randomIndex(using:). (We probably don’t need a default-Random-using variant; this is a customization point.)

Joe_Groff · June 15, 2018, 6:40pm

To be clear, the trap isn't specific to randomElement, though. Anything that tries to measure distance between indexes greater than Int.max, including count, can trap.

beccadax · June 15, 2018, 6:47pm

That’s true, but most things you would do with such a distance sound unreasonable anyway—if you’re preallocating a buffer larger than Int.max, or preparing to compare two Collections larger than Int.max, the operation you want to perform is unreasonable and the crash when you try to fetch the count saves you from a much slower, more expensive failure. That isn’t true for randomElement()—it’s eminently reasonable to write, say, func randomElements(_ n: Int) -> Set<Element> in terms of randomElement(), eminently reasonable to use it on a big range, and eminently reasonable to feel very angry that the standard library fails you.