[Pitch] Make Collection Super-Convenient and Retire Sequence

I like this proposal and @bjhomer’s suggestion of a typealias.

Infinite collections have to be processed lazily therefore why not make lazy collections single pass and potentially infinite? Whereas non-lazy collections are by definition finite and multipass. Further lazy collections cannot be used in for loops, that way there is no possibility of nested for loops contravening the single pass requirement.

I just updated the proposal to include @bjhomer's suggestion.

2 Likes

Seeing this as a complete pitch, I have moved from agnostic to positive about making this change to Sequence. In particular, I agree that most Sequences that are currently single-pass are accidentally so, because of the relative ease of conforming to Sequence vs Collection, and providing an equally simple way to define a Collection is a very nice solution to this issue.

I got a couple of private questions about this pitch this morning, which I'll answer here:

  1. Q: If a collection can be infinite, what is count going to do? A: it can loop forever, just like an implementation of max() might for an infinite Sequence of Floats today, or it can trap, like a recursive implementation of the same method might.
  2. Q: What will happen to the sequence() functions? A: they would automatically continue to work, but would now create a conforming Collection. As with everything that newly gains a Collection conformance, it would be up to the user to ensure multipass behavior (usually by inspection but at worst by using something like GeneratorCollection).

I think I'd describe my current opinion of this plan as "cautiously optimistic." It's really interesting that the source compatibility issues are actually lessened by just aliasing Sequence than the inversion you originally described. For example, in cases where someone has extended Sequence, those extensions should still work for any types that they did before.

One way of thinking about this change is that it's adding a new way to build a custom Collection type — before, you had to provide, at minimum, lower and upper index bounds and an (Index) -> Element subscript, now you'll be able to just provide a () -> Element? method and be done with it. Even though there won't be a separate protocol, this fits well with the existing Collection hierarchy in that if you need a bit more performance you can often just do a bit more work in your implementation. In a way we'd still have the same hierarchy:

protocol benefits considerations
Collection (via next() [née Sequence]) easy to implement forward traversal only, unoptimized index type
Collection (via indexes) better indexing performance forward traversal only, custom index types can be more work
BidirectionalCollection backward traversal need to add index(before:)
RandomAccessCollection better overall performance need to handle offsets and distance
11 Likes

There's one part of your table I don't follow — how do you get better indexing performance by implementing custom indexes? SequenceIndex keeps a reference to its element so you can return that in O(1) time.

In fact, you might be able to remove that entirely, because calling next() on Base.Iterator should give you the same value?

Regarding the proposal as a whole — I'm not sure. It seems a lot like using a sledgehammer to crack a nut, although it does solve a problems. My preference is still towards semantically changing Sequence to be multipass (a la [Pitch] Remove the single-pass requirement on Sequence), purely for the limited effects on existing source, but Dave's pitch also solves the associatedtype weirdness when jumping from Sequence to Collection.

The index that you get if you just write the next() method is pretty heavyweight—each index needs to contain an Int counter, the element at that index, and the full iteration state. If you can write a smaller index type (or even use Int or something!) you should get better performance.

Of course, this is a case where you should really only need to optimize this if it's causing problems. If you just need to create a lightweight sequence type and then iterate it, you wouldn't even be touching any of the indices.

2 Likes

This demonstrates that it's possible. It's a tradeoff between storing the element and calling the next() function twice as often in an advance-index-and-subscript loop. I think storing the element probably wins, but it's just a guess.

Thanks for writing this up @dabrahams!

I have spent quite a bit of time thinking about this since our initial discussion and I am convinced we really need to spend some time considering both the containment aspect of sequence/collection (i.e. they are a collection of values) and the ordering aspect (i.e. those values are ordered in some way... even if the ordering is arbitrary).

I had this all written down at some point, but seemed to misplace that. Please forgive any errors as I am trying to recreate it from memory after several months.

Containment

On the containment side, let's start with something fairly abstract:

//Something which contains (possibly uncountably many) values of some type
protocol Container {
    associatedtype Element
    
    var isEmpty:Bool {get} //Does the container contain any elements

    func randomElement() -> Element? //A random element from the container
    func randomElement(using rng: inout RandomNumberGenerator) -> Element? //A random element using a particular RNG
}

extension Container where Element:Equatable {
    func contains(_ target:Element) -> Bool {...} //Does the container contain the target element?
}

It may seem strange that I started by allowing uncountably many elements, but note that this is also compatible with things like Range, and would give things like Ranges and Collections a common base protocol. At this level of abstraction, all we can do is ask what type of Element the container/collection has, if it is empty, and ask for a random element. If the element is equatable, we can ask if it is contained in the Container. Notice that we can't yet iterate over the things in any order.

Next up:

//A Container which has a countable number of items which can be iterated through in a reasonable time
protocol CountableContainer : Container {
    var count : Int {get}
    func contains(where: (Element)->Bool) -> Bool
    func unorderedElements() -> Iterator<Element>
    //Any other functions enabled by unordered iteration (e.g. map, filter, etc...)
}

Here we gain our count, and the ability to iterate over the container's contents in an arbitrary (undefined) ordering. This could underly CountableRange, but notice that this also fits Set and Dictionary better than Collection does, because they shouldn't guarantee an ordering. CountableContainers should be for...in-able. They should also be equatable if their Element is Equatable.

I would also argue without providing details, that we should provide NonEmptyContainer, and if it isn't possible to combine naturally with CountableContainer, NonEmptyCountableContainer (which would be the equivalent of CountableContainer & NonEmptyContainer w/o the diamond issue). These would underly ClosedRange and CountableClosedRange.

I would also like to see a SingleContainer, which contains a single element.

Ordering

There are a couple different ways we could go for ordering. We could define a protocol which creates/defines an ordering of a particular Element. For simplicity's sake, I am going to instead posit the idea of a "natural ordering", where there is one ordering which is obvious.

protocol Collection : CountableContainer, Ordered {...}
protocol Range : Container, Ordered {...}

That is, Collection is just a CountableContainer with a natural ordering defined somehow within it. Range is a Container with a natural ordering defined on it. With Collection, we get things like first, last, and the ability to index into it with a subscript.

I just looked at the clock and I need to be up in a couple of hours. I can explain better later if needed, but that should give you the gist...

4 Likes

Deprecating Sequence obviously solves all issues that protocol has ;-), so imho it's preferable over what we have now.
But apparently, this step alone doesn't solve all problems, but turns some of them into Collection-issues.

It's also a big change in one very important aspect of Swift, so I say it's crucial to get this right on the first try to avoid a situation similar to what we had with access control.

Therefor, I think it makes sense to split this huge topic into threads:
There are now at least three different designs that have been suggested, and supplementary to separate threads for all of those, imho it would make sense to collect the aspects of the status quo that should be improved.

On top of that, we could also look at existing solutions -- Swift may be different than some other languages, but collections are quite fundamental, so I'm sure we can learn from some of them.

4 Likes

unorderedElements() seems like a lie, because the Iterator is going to iterate over them in some order. Presumably that order would be fixed (because there is no advantage in having it somehow be random for any implementation of any collection I've ever heard of), so you can easily define all the things you save for Collection on it (first, last, indexing, etc). But this is all back in Set/Sequence thread territory where some people are going to be adamant that these are unordered types, when actually the order is just not controllable/specified/whatever. I remain very positive about the pitch in this thread, which actually eliminates some of the hierarchy while retaining the simplicity of Sequence, but negative about building a more complex hierarchy with benefits that are very limited at best, as far as I can tell.

3 Likes

I like the direction this proposal is going, and I agree with your premises. I don't think that Sequence is carrying its weight, and I agree that anything actually single pass that is modeled in terms of Sequence is probably broken in practice.

I think that merging Sequence into Collection is very tantalizing and making a separate Stream protocol makes a lot of sense.

I would love to see this happen, but it is also hard to evaluate the effects and details of this proposal, I think those details can only be understood with a prototype implementation.

-Chris

7 Likes

If you are going to be pedantic, the very act of iteration imposes an order, even if that order is ephemeral. The name unorderedElements() implies no reliable ordering, and that seems good enough. As I understand it, one of the issues with sets and dictionaries is the requirement that order be preserved across iterations, provided no mutations of state. The proposed name implies that this requirement has been dropped, and it implies that the order may be arbitrary.

1 Like

FWIW, that's a misunderstanding. There's no desire to allow consecutive iterations to present different orderings. In fact doing anything that changed the iteration order in subsequent passes would break thread-safety of these value types (or would require that they employ expensive and otherwise-unneeded locks and/or atomics).

HTH,
Dave

3 Likes

Thank you for explaining. I still think there's a difference between what we desire for implementation reasons and what expectations we wish to set for the application developer. Even if some hypothetical hardware architecture made variance impossible, we'd still want the app developer to write code as if there could be.

Maybe you would. I would not, FWIW.

I think I understand a bit better now. Please let me know if I got it.

A Sequence guarantees nothing with respect to multiple passes, not even that a second pass will be possible. A Collection guarantees both that 1) multiple passes will be possible, and 2) iteration order follows Index order. Thus, if nothing disturbs the indexes (i.e. no mutations), the iteration order must be preserved across successive passes.

If the above is correct, than unorderedElements() is indeed a poor name, because it implies that one of the requirements of Collection is not true.

Perhaps a better name would be arbitrarilyOrderedElements(), or indexOrderedElements().

It isn't so much across multiple passes as the fact that Sets and Dictionaries which are equal may have different orderings based on implementation details and how they were created. This has caused errors when people expected them to have a defined order that was the same for given content.

The idea with unorderedElements() is that you are able to iterate over the elements, but there is no guarantee at all on the order of elements. To get a guarantee of ordering, you need a sub-protocol like Collection.

But what is that index order for a Set containing the integers 1, 2, & 3? There are 6 possibilities and you may run into any of them for equivalent Sets based on internal implementation details of Set leaking out (details which might change in future releases). We really shouldn't be relying on that ordering in these cases... hence unorderedElements() for Set and Dictionary, and a defined ordering for things that naturally have one.

I was thinking in terms of Collection, and not your proposed Container protocol. (Lost track of the thread.) indexOrder() indeed makes little sense in that context.

I too would like to see some code.