[Pitch] Make Collection Super-Convenient and Retire Sequence

There's one part of your table I don't follow — how do you get better indexing performance by implementing custom indexes? SequenceIndex keeps a reference to its element so you can return that in O(1) time.

In fact, you might be able to remove that entirely, because calling next() on Base.Iterator should give you the same value?

Regarding the proposal as a whole — I'm not sure. It seems a lot like using a sledgehammer to crack a nut, although it does solve a problems. My preference is still towards semantically changing Sequence to be multipass (a la [Pitch] Remove the single-pass requirement on Sequence), purely for the limited effects on existing source, but Dave's pitch also solves the associatedtype weirdness when jumping from Sequence to Collection.

The index that you get if you just write the next() method is pretty heavyweight—each index needs to contain an Int counter, the element at that index, and the full iteration state. If you can write a smaller index type (or even use Int or something!) you should get better performance.

Of course, this is a case where you should really only need to optimize this if it's causing problems. If you just need to create a lightweight sequence type and then iterate it, you wouldn't even be touching any of the indices.

2 Likes

This demonstrates that it's possible. It's a tradeoff between storing the element and calling the next() function twice as often in an advance-index-and-subscript loop. I think storing the element probably wins, but it's just a guess.

Thanks for writing this up @dabrahams!

I have spent quite a bit of time thinking about this since our initial discussion and I am convinced we really need to spend some time considering both the containment aspect of sequence/collection (i.e. they are a collection of values) and the ordering aspect (i.e. those values are ordered in some way... even if the ordering is arbitrary).

I had this all written down at some point, but seemed to misplace that. Please forgive any errors as I am trying to recreate it from memory after several months.

Containment

On the containment side, let's start with something fairly abstract:

//Something which contains (possibly uncountably many) values of some type
protocol Container {
    associatedtype Element
    
    var isEmpty:Bool {get} //Does the container contain any elements

    func randomElement() -> Element? //A random element from the container
    func randomElement(using rng: inout RandomNumberGenerator) -> Element? //A random element using a particular RNG
}

extension Container where Element:Equatable {
    func contains(_ target:Element) -> Bool {...} //Does the container contain the target element?
}

It may seem strange that I started by allowing uncountably many elements, but note that this is also compatible with things like Range, and would give things like Ranges and Collections a common base protocol. At this level of abstraction, all we can do is ask what type of Element the container/collection has, if it is empty, and ask for a random element. If the element is equatable, we can ask if it is contained in the Container. Notice that we can't yet iterate over the things in any order.

Next up:

//A Container which has a countable number of items which can be iterated through in a reasonable time
protocol CountableContainer : Container {
    var count : Int {get}
    func contains(where: (Element)->Bool) -> Bool
    func unorderedElements() -> Iterator<Element>
    //Any other functions enabled by unordered iteration (e.g. map, filter, etc...)
}

Here we gain our count, and the ability to iterate over the container's contents in an arbitrary (undefined) ordering. This could underly CountableRange, but notice that this also fits Set and Dictionary better than Collection does, because they shouldn't guarantee an ordering. CountableContainers should be for...in-able. They should also be equatable if their Element is Equatable.

I would also argue without providing details, that we should provide NonEmptyContainer, and if it isn't possible to combine naturally with CountableContainer, NonEmptyCountableContainer (which would be the equivalent of CountableContainer & NonEmptyContainer w/o the diamond issue). These would underly ClosedRange and CountableClosedRange.

I would also like to see a SingleContainer, which contains a single element.

Ordering

There are a couple different ways we could go for ordering. We could define a protocol which creates/defines an ordering of a particular Element. For simplicity's sake, I am going to instead posit the idea of a "natural ordering", where there is one ordering which is obvious.

protocol Collection : CountableContainer, Ordered {...}
protocol Range : Container, Ordered {...}

That is, Collection is just a CountableContainer with a natural ordering defined somehow within it. Range is a Container with a natural ordering defined on it. With Collection, we get things like first, last, and the ability to index into it with a subscript.

I just looked at the clock and I need to be up in a couple of hours. I can explain better later if needed, but that should give you the gist...

4 Likes

Deprecating Sequence obviously solves all issues that protocol has ;-), so imho it's preferable over what we have now.
But apparently, this step alone doesn't solve all problems, but turns some of them into Collection-issues.

It's also a big change in one very important aspect of Swift, so I say it's crucial to get this right on the first try to avoid a situation similar to what we had with access control.

Therefor, I think it makes sense to split this huge topic into threads:
There are now at least three different designs that have been suggested, and supplementary to separate threads for all of those, imho it would make sense to collect the aspects of the status quo that should be improved.

On top of that, we could also look at existing solutions -- Swift may be different than some other languages, but collections are quite fundamental, so I'm sure we can learn from some of them.

4 Likes

unorderedElements() seems like a lie, because the Iterator is going to iterate over them in some order. Presumably that order would be fixed (because there is no advantage in having it somehow be random for any implementation of any collection I've ever heard of), so you can easily define all the things you save for Collection on it (first, last, indexing, etc). But this is all back in Set/Sequence thread territory where some people are going to be adamant that these are unordered types, when actually the order is just not controllable/specified/whatever. I remain very positive about the pitch in this thread, which actually eliminates some of the hierarchy while retaining the simplicity of Sequence, but negative about building a more complex hierarchy with benefits that are very limited at best, as far as I can tell.

3 Likes

I like the direction this proposal is going, and I agree with your premises. I don't think that Sequence is carrying its weight, and I agree that anything actually single pass that is modeled in terms of Sequence is probably broken in practice.

I think that merging Sequence into Collection is very tantalizing and making a separate Stream protocol makes a lot of sense.

I would love to see this happen, but it is also hard to evaluate the effects and details of this proposal, I think those details can only be understood with a prototype implementation.

-Chris

7 Likes

If you are going to be pedantic, the very act of iteration imposes an order, even if that order is ephemeral. The name unorderedElements() implies no reliable ordering, and that seems good enough. As I understand it, one of the issues with sets and dictionaries is the requirement that order be preserved across iterations, provided no mutations of state. The proposed name implies that this requirement has been dropped, and it implies that the order may be arbitrary.

1 Like

FWIW, that's a misunderstanding. There's no desire to allow consecutive iterations to present different orderings. In fact doing anything that changed the iteration order in subsequent passes would break thread-safety of these value types (or would require that they employ expensive and otherwise-unneeded locks and/or atomics).

HTH,
Dave

3 Likes

Thank you for explaining. I still think there's a difference between what we desire for implementation reasons and what expectations we wish to set for the application developer. Even if some hypothetical hardware architecture made variance impossible, we'd still want the app developer to write code as if there could be.

Maybe you would. I would not, FWIW.

I think I understand a bit better now. Please let me know if I got it.

A Sequence guarantees nothing with respect to multiple passes, not even that a second pass will be possible. A Collection guarantees both that 1) multiple passes will be possible, and 2) iteration order follows Index order. Thus, if nothing disturbs the indexes (i.e. no mutations), the iteration order must be preserved across successive passes.

If the above is correct, than unorderedElements() is indeed a poor name, because it implies that one of the requirements of Collection is not true.

Perhaps a better name would be arbitrarilyOrderedElements(), or indexOrderedElements().

It isn't so much across multiple passes as the fact that Sets and Dictionaries which are equal may have different orderings based on implementation details and how they were created. This has caused errors when people expected them to have a defined order that was the same for given content.

The idea with unorderedElements() is that you are able to iterate over the elements, but there is no guarantee at all on the order of elements. To get a guarantee of ordering, you need a sub-protocol like Collection.

But what is that index order for a Set containing the integers 1, 2, & 3? There are 6 possibilities and you may run into any of them for equivalent Sets based on internal implementation details of Set leaking out (details which might change in future releases). We really shouldn't be relying on that ordering in these cases... hence unorderedElements() for Set and Dictionary, and a defined ordering for things that naturally have one.

I was thinking in terms of Collection, and not your proposed Container protocol. (Lost track of the thread.) indexOrder() indeed makes little sense in that context.

I too would like to see some code.

There are a lot of changes happening in Swift at the moment, we will hopefully have generalized existentials for protocols and also ownership that allows for move semantics. I think both of these would have a large effect on a collections API. Therefore I would suggest not putting any effort into Collections until these two items come to pass.

PS Just to be clear I am in favour of updating the Collections API, from very early on I criticised its design, but I think existentials and ownership will have a profound effect.

1 Like

I am supportive of addressing the issues with the collection protocols in general. However, like Howard I also think it might be better to take a step back and revisit them more holistically when other language features are in place.

I have been having some difficulty forming an opinion regarding the relatively narrow proposed changes that resolve some issues while introducing new ones. Allowing valid conformances of Collection to trap when count is accessed seems particularly egregious to me. IMO we should be willing to commit to a design with strong semantics associated with a protocol.

It is ok for a requirement to have preconditions that apply to all valid conformances. For example, requiring indices provided to be valid indices into the current instance is perfectly sensible. On the other hand, allowing individual conformances to have idiosyncratic oreconditions (or worse simply not support a required member) seems awful to me.

It should be possible to write generic algorithms that are valid for every type that declares a valid conformance to the protocols required by the algorithm. It should also be possible to clearly state a semantic requirement that is common in a broad family of algorithms (such as the requirement that it be possible to process every element).

I haven’t had as much time as I would like recently but am continuing to think about this topic. It is very important to get it right if or when we do have the opportunity to make some breaking changes in this area. Perhaps some targeted changes make sense in the near future but perhaps they would simply cause churn without getting us closer to a true solution to the underlying problems.

It would be much easier to evaluate that if we had a clear vision of what the long-term goal is when we have an ownership system, etc. This would include the iteration redesign @dabrahams has mentioned and perhaps some other as-yet-unstated goals. I wonder if a manifesto-style document may be in order here.

4 Likes

Nobody's proposing to do that. Collection.count would always have the precondition “the collection is finite.” The fact that some collection types always satisfy that precondition is perfectly normal.

I think there's a valid concern here, which I'd phrase as, “lots of algorithms that currently have no preconditions will gain a precondition that the collection is finite, and that's unfortunate.” But if you throw 0..<Int.max at most of these algorithms, they'll fall over as well, with the practical result being a trap or an apparent hang, so I'm not very compelled by this concern.

1 Like

I'm not sure generalized existentials will have much effect on a rework of the collection protocol heirarchy, but I do think that ownership will. Being able to express in the type system that a given sequence won't (or will) be consumed by iteration would be very helpful towards producing a useful set of protocols.

1 Like

Sure. I think the reason I see this one differently is because the entire semenatics of this member is to express the precise size of the collection. Including this “requirement” without requiring the collection to be finite in the first place feels like a pretty clear contradiction in the semantics of the protocol to me. count simply makes no sense at all as a requirement of a protocol that allows infinite conformances.

I have been thinking a lot about this issue since you posted your current thoughts last week. IMO it gets at a deeper design issue. I am still thinking about it, but the “container” direction that some have mentioned is one that I feel warrants further consideration.

Here are a few of the things I’ve been thinking about:

  1. Many algorithms require the ability to process every element. Ideally this would be a semantic requirement of a protocol and not a precondition that is repeated all over the place.
  2. Many types can guarantee this capability. Most obviously all container types. If every element was explicitly constructed prior to invoking an algorithm (which is true of all containers) it should be possible to process every element.
  3. There are some “lazy” types such as ranges which have some instances for which processing every element is sensible and other instances for which it is not sensible. These types seem to be at the heart of the design issue we are facing.

As I mentioned earlier, I don’t have any strong opinions yet. This is a very tough problem especially given the source compatibility concerns we have to consider. I really appreciate your willingness to take it on despite the difficulty.

1 Like