Proposed amendment to SE-202: Replace `Collection.randomElement` requirement with `randomIndex`

Joe_Groff · May 30, 2018, 8:16pm

SE-202, the "Random Unification" proposal, adds the following requirement to the Collection protocol:

protocol Collection {
  ...
  func randomElement<T: RandomNumberGenerator>(using generator: inout T) -> Element?
}

This was added as a requirement because collections with more than Int.max elements (for instance, the range Int.min ... Int.max) would not be able to use a generic implementation. However, this poses a forward-compatibility problem for Collection if we add move-only types in the future, since this requirement cannot be implemented by a collection with move-only elements, since returning the chosen element would require copying it out of the collection. We could make this requirement be consuming, but that's unlikely to be very useful, since picking a single random element out of a collection is unlikely to be the final thing you do with a collection.

There's a more primitive operation we could make a Collection requirement, which instead returns the index of a random element in the collection:

protocol Collection {
  func randomIndex<T: RandomNumberGenerator>(using generator: inout T) -> Index?
}

randomElement can then be implemented as an extension method in terms of this requirement (for copyable Collections):

extension Collection {
  func randomElement<T: RandomNumberGenerator>(using generator: inout T) -> Element? {
    if let index = randomIndex(using: &generator) {
      return self[index]
    } else {
      return nil
    }
  }
}

Aside from future compatibility with move-only types, randomIndex seems like a better primitive operation, since it can used to build other interesting operations like "insert an element at a random index", "split a collection into random sub-collections", etc. Does this seem like a reasonable amendment to the proposal?

hlovatt · May 30, 2018, 8:53pm

Seems like a good suggestion.

Nevin · May 30, 2018, 9:48pm

I’m going to speak for myself, and possibly others, when I say that I do not know what a move-only type is. Or rather, I have looked up the definition—an instance of a move-only type cannot be copied, it can only be moved—but I do not understand the benefit or use. Why would we want move-only types in Swift, and what are they good for?

Joe_Groff · May 30, 2018, 9:54pm

You may want to read @John_McCall's "ownership manifesto": https://github.com/apple/swift/blob/master/docs/OwnershipManifesto.md

griotspeak · May 30, 2018, 11:01pm

more specifically, the section about non-copyable types swift/OwnershipManifesto.md at main · apple/swift · GitHub

jeremyp · May 31, 2018, 9:29am

As a general rule, I don't think hacking things about just because of some future feature that may or may not ever get implemented is a bad idea.

However, in this case, I think the other arguments for having randomIndex are fairly compelling and, in hindsight, this is how it should have been designed in the first place IMO.

DevAndArtist · May 31, 2018, 12:23pm

As far as I'm concerned, the whole stdlib was designed around a lot of missing generic features which may or may not ever be implemented in Swift. Now we have a lot of those and the stdlib become much much simpler. That said, I don't buy this rule, as long as everything is crafted carefully it should be fine.

Joe_Groff · May 31, 2018, 4:46pm

I agree with the general rule. In this case, ownership and move-only types are features we know fairly certainly we want to support in the near future, and this protocol requirement poses a clear obstacle for delivering the feature in the way we'd like.

beccadax · June 1, 2018, 2:24am

Don't other members like first have the same problem?

Should we address this in the moveonly design by allowing a protocol to say "this requirement only applies to non-moveonly conformances"? I've wanted a similar feature for other purposes—particularly, I've wanted to be able to put a func index(of elem: Element) -> Index? where Element: Equatable requirement so we could replace the private _customIndexOfEquatableElement(_:) requirement with something publicly usable.

hlovatt · June 1, 2018, 3:57am

Why not just subdivide the protocol; I dislike these massive protocols that you get stuck with all the requirements? Rather than introduce another where clause item. You can retain backwards compatibility with a 'convenience protocol' that inherits all the 'sub-protocols'.

felix91gr · June 1, 2018, 5:05am

I don't quite understand what you mean. What would a "convenience protocol" like you say look like?

Joe_Groff · June 1, 2018, 2:36pm

I raised the issue with first in another thread. @Ben_Cohen said it's a unnecessary customization point he can remove.

@Douglas_Gregor has expressed concerns about implementing such a feature. All things equal, it'd be better to avoid tying the move-only feature to more prerequisite features if we can avoid it.

hlovatt · June 1, 2018, 11:32pm

protocol MoveOnlyCollection {
    // Methods that are valid for move semantics like `randomIndex`.
}

protocol Collection: MoveOnlyCollection { // Convenience protocol if you want the lot. 
    // Other methods like `randomEment`. 
}

felix91gr · June 1, 2018, 11:44pm

Ah, I think I get it now. The idea would be that, since move-only collections can have some but not all of the functions that normal collections can, you use a small protocol (MoveOnlyCollection) and expand it to give it all the normal-but-not-moveonly-collection's functions (Collection)

That said though, it looks strange to say that Collection : MoveOnlyCollection. I'd prefer to have the smaller one be called CoreCollection, and have a shim-protocol that's named MoveOnlyCollection. That'd also allow for future "only for moveonly collection"-functions, if we'd ever need them.

EDIT: maybe your naming is actually spot-on. We could benefit from having explicit moveonly guarantees for what functions can be shared between the moveonly and non-moveonly collections.

Ben_Cohen · June 2, 2018, 4:54pm

Yeah, I can't see the justification for keeping first (and last) as protocol requirements.

Unfortunately, min and max are another story, given Comparable-ordered collections are common (Indices, ranges, hopefully soon a sorted set or dictionary) so it's definitely good to allow collections that know they're sorted to have an O(n) version of these.

Doing the same trick of switching to returning the index instead of the element presents a couple of challenges:

They're a naming nightmare. minIndex is wrong – it's the index of the minimum element, not the minimum index (which is always startIndex). So you need to go for something waffly like minimumElementIndex.
They return an optional, and handling that is a real pain: c.minIndex.map { c[$0] }. So we'd definitely need to keep the element-returning versions too, and that's an unfortunate amount of redundancy.

That said, I don't see a real problem adding both the index versions on the protocol plus the convenience element versions via extensions that can in future be constrained to only copyable types.

Nevin · June 2, 2018, 8:18pm

I read the entire thing once before, and I just re-read the section on non-copyable types. I’ll trust that you and John and the rest know what you’re talking about, but that document did not convey to me an understanding of what move-only types are for, nor why we should want them.

Ben_Cohen:

Unfortunately, min and max are another story, given Comparable -ordered collections are common ( Indices , ranges, hopefully soon a sorted set or dictionary) so it's definitely good to allow collections that know they're sorted to have an O(n) version of these.

Doing the same trick of switching to returning the index instead of the element presents a couple of challenges:

They're a naming nightmare. minIndex is wrong – it's the index of the minimum element, not the minimum index (which is always startIndex ). So you need to go for something waffly like minimumElementIndex .

They return an optional, and handling that is a real pain: c.minIndex.map { c[$0] } . So we'd definitely need to keep the element-returning versions too, and that's an unfortunate amount of redundancy.

That said, I don't see a real problem adding both the index versions on the protocol plus the convenience element versions via extensions that an in future be constrained to only copyable types.

The naming is easy: indexOfMin and indexOfMax. I’ve written and used these myself for various purposes.

Although, to mesh with the recent renaming of index(where:) to firstIndex(where:), perhaps it ought to be firstIndexOfMin, etc.

xwu · June 2, 2018, 9:48pm

Do we care that it returns specifically the first index? I'd imagine that, for most uses, it's sufficient that it returns an index, and guaranteeing that it's the first one seems like it might be overly restrictive.

I really don't think it's a problem to name that property minElementIndex. It's clear and not overly verbose.

Ben_Cohen · June 2, 2018, 10:13pm

Abbreviations are usually only allowed when they’re terms of art, which min and max are but these index versions aren’t. But if the element versions remain, we can probably abbreviate the index versions for consistency.

I don’t think the first naming applies here. There are many possible equal elements, but only one minimum.

Edit: duh, no, sorry, I guess the same minimum element could appear a second time. But still, it doesn’t seem to have the same importance as the matching first/last symmetry.

Ben_Cohen · June 2, 2018, 10:26pm

It shouldn’t be necessary to introduce a second protocol for the fetching of elements. This could be done via extension where the element type were copyable. A protocol is necessary to add customization points, but here the necesssry customization can be on the index returning versions (i.e. getting the minimum of a sorted collection can be done in constant time).

Nevin · June 3, 2018, 12:08am

Thinking about it more…

min is currently defined on Sequence. If we go through with the revamp to make Sequence just a typealias for Collection with auto-generated indices, then there’s no problem.

But the way things stand today, Sequence.min cannot be implemented in terms of indexOfMin, because sequences don’t have indices.