[Pitch] Make Collection Super-Convenient and Retire Sequence

Avi · April 29, 2018, 6:01am

Thank you for explaining. I still think there's a difference between what we desire for implementation reasons and what expectations we wish to set for the application developer. Even if some hypothetical hardware architecture made variance impossible, we'd still want the app developer to write code as if there could be.

dabrahams · April 29, 2018, 6:03am

Maybe you would. I would not, FWIW.

Avi · April 29, 2018, 6:37am

I think I understand a bit better now. Please let me know if I got it.

A Sequence guarantees nothing with respect to multiple passes, not even that a second pass will be possible. A Collection guarantees both that 1) multiple passes will be possible, and 2) iteration order follows Index order. Thus, if nothing disturbs the indexes (i.e. no mutations), the iteration order must be preserved across successive passes.

If the above is correct, than unorderedElements() is indeed a poor name, because it implies that one of the requirements of Collection is not true.

Perhaps a better name would be arbitrarilyOrderedElements(), or indexOrderedElements().

Jon_Hull · April 29, 2018, 9:47am

It isn't so much across multiple passes as the fact that Sets and Dictionaries which are equal may have different orderings based on implementation details and how they were created. This has caused errors when people expected them to have a defined order that was the same for given content.

The idea with unorderedElements() is that you are able to iterate over the elements, but there is no guarantee at all on the order of elements. To get a guarantee of ordering, you need a sub-protocol like Collection.

But what is that index order for a Set containing the integers 1, 2, & 3? There are 6 possibilities and you may run into any of them for equivalent Sets based on internal implementation details of Set leaking out (details which might change in future releases). We really shouldn't be relying on that ordering in these cases... hence unorderedElements() for Set and Dictionary, and a defined ordering for things that naturally have one.

Avi · April 29, 2018, 9:57am

I was thinking in terms of Collection, and not your proposed Container protocol. (Lost track of the thread.) indexOrder() indeed makes little sense in that context.

ryansobol · April 29, 2018, 9:22pm

I too would like to see some code.

hlovatt · April 30, 2018, 1:06am

There are a lot of changes happening in Swift at the moment, we will hopefully have generalized existentials for protocols and also ownership that allows for move semantics. I think both of these would have a large effect on a collections API. Therefore I would suggest not putting any effort into Collections until these two items come to pass.

PS Just to be clear I am in favour of updating the Collections API, from very early on I criticised its design, but I think existentials and ownership will have a profound effect.

anandabits · April 30, 2018, 1:38am

I am supportive of addressing the issues with the collection protocols in general. However, like Howard I also think it might be better to take a step back and revisit them more holistically when other language features are in place.

I have been having some difficulty forming an opinion regarding the relatively narrow proposed changes that resolve some issues while introducing new ones. Allowing valid conformances of Collection to trap when count is accessed seems particularly egregious to me. IMO we should be willing to commit to a design with strong semantics associated with a protocol.

It is ok for a requirement to have preconditions that apply to all valid conformances. For example, requiring indices provided to be valid indices into the current instance is perfectly sensible. On the other hand, allowing individual conformances to have idiosyncratic oreconditions (or worse simply not support a required member) seems awful to me.

It should be possible to write generic algorithms that are valid for every type that declares a valid conformance to the protocols required by the algorithm. It should also be possible to clearly state a semantic requirement that is common in a broad family of algorithms (such as the requirement that it be possible to process every element).

I haven’t had as much time as I would like recently but am continuing to think about this topic. It is very important to get it right if or when we do have the opportunity to make some breaking changes in this area. Perhaps some targeted changes make sense in the near future but perhaps they would simply cause churn without getting us closer to a true solution to the underlying problems.

It would be much easier to evaluate that if we had a clear vision of what the long-term goal is when we have an ownership system, etc. This would include the iteration redesign @dabrahams has mentioned and perhaps some other as-yet-unstated goals. I wonder if a manifesto-style document may be in order here.

dabrahams · April 30, 2018, 1:56am

Nobody's proposing to do that. Collection.count would always have the precondition “the collection is finite.” The fact that some collection types always satisfy that precondition is perfectly normal.

I think there's a valid concern here, which I'd phrase as, “lots of algorithms that currently have no preconditions will gain a precondition that the collection is finite, and that's unfortunate.” But if you throw 0..<Int.max at most of these algorithms, they'll fall over as well, with the practical result being a trap or an apparent hang, so I'm not very compelled by this concern.

Nobody1707 · April 30, 2018, 2:02am

I'm not sure generalized existentials will have much effect on a rework of the collection protocol heirarchy, but I do think that ownership will. Being able to express in the type system that a given sequence won't (or will) be consumed by iteration would be very helpful towards producing a useful set of protocols.

anandabits · April 30, 2018, 2:40am

Sure. I think the reason I see this one differently is because the entire semenatics of this member is to express the precise size of the collection. Including this “requirement” without requiring the collection to be finite in the first place feels like a pretty clear contradiction in the semantics of the protocol to me. count simply makes no sense at all as a requirement of a protocol that allows infinite conformances.

I have been thinking a lot about this issue since you posted your current thoughts last week. IMO it gets at a deeper design issue. I am still thinking about it, but the “container” direction that some have mentioned is one that I feel warrants further consideration.

Here are a few of the things I’ve been thinking about:

Many algorithms require the ability to process every element. Ideally this would be a semantic requirement of a protocol and not a precondition that is repeated all over the place.
Many types can guarantee this capability. Most obviously all container types. If every element was explicitly constructed prior to invoking an algorithm (which is true of all containers) it should be possible to process every element.
There are some “lazy” types such as ranges which have some instances for which processing every element is sensible and other instances for which it is not sensible. These types seem to be at the heart of the design issue we are facing.

As I mentioned earlier, I don’t have any strong opinions yet. This is a very tough problem especially given the source compatibility concerns we have to consider. I really appreciate your willingness to take it on despite the difficulty.

jawbroken · April 30, 2018, 3:42am

Even putting aside the slim difference between an infinite Collection and one that is very large, I would say that a key problem with splitting the hierarchy into FiniteCollection and InfiniteCollection is that the type system will not be able to help you maintain the separation in many cases. So this will require an escape hatch, like .assumingFinite or similar, to assert that the Collection is actually finite and convert between them, or to call methods that you know are reasonable for your particular infinite Collection and given parameters.

Things like count, map, etc. should obviously not be called directly on an infinite Collection, though map can be on a lazy view, but other parts of the API are not so clear cut. Quickly skimming the Collection documentation and Sequence documentation, it seems like it would be ambiguous as to whether the resulting Collection is finite or infinite, or whether the method is reasonable to call (e.g. won't infinite loop) as it depends on the contents of the Collection and the given parameters, in the following cases:

subscript(bounds: Range<Self.Index>)
index(of:)
index(where:)
distance(from:to:)
split(separator:maxSplits:omittingEmptySubsequences:), which can be safe given judicial use of maxSplits and some knowledge of the contents
first(where:)
prefix(while:)
lexicographicallyPrecedes(_:) theoretically safe if you know the two are not equal, I guess

And I'm not immediately sure about things like prefix(upTo:), suffix(from:), and things like starts(with:) which are safe if only one of the Sequences is infinite, but not if both are. So the situation is kind of messy and it's hard to strike a balance between using the type system to help developers and removing or hiding possibly useful functionality.

anandabits · April 30, 2018, 4:01am

Yes I agree. Like I said, I’m not sure what the best solution is.

palimondo · April 30, 2018, 4:24am

To clear things up, count would never trap in Dave’s proposal, it just potentially wouldn’t terminate. So I think we are searching for a protocol requirements to prevent the halting problem?

FWIW, in Scala 2.12, Traversable trait has following relevant customization points:

Size info operations isEmpty, nonEmpty, size, and hasDefiniteSize: Traversable collections can be finite or infinite. An example of an infinite traversable collection is the stream of natural numbers Stream.from(0). The method hasDefiniteSize indicates whether a collection is possibly infinite. If hasDefiniteSize returns true, the collection is certainly finite. If it returns false, the collection has not been fully elaborated yet, so it might be infinite or finite.

The following refinement in the collections hierarchy, trait Iterable adds operations that work from the end (takeRight, dropRight). And it is the next refinement, trait Seq, that represents sequences.

A sequence is a kind of iterable that has a length and whose elements have fixed index positions, starting from 0.

Scala is currently in the middle of third rework of collections for upcoming version 2.13 and will remove the Traversable and go with just the Iterable… which seems like a rough correspondence to Dave’s proposal here.

dabrahams · April 30, 2018, 6:05am

The following are slicing operations, and thus are always “lazy” and work fine with infinite Collections:

also, in your survey, don't neglect the algorithms that are currently, already defined on Sequence, many of which will never finish or will trap if the input is infinite, e.g. map(_), reduce(_:_), filter(_), min(), max()…

To be clear, I'd love to have a better answer for infinite sequences and collections, but my sense is that no answer exists that is better in practice, especially given the existence of very large collections, of memory constraints on the machine, of time constraints on our users' patience, etc. And even if you live in the idealized world of pure abstractions, where only true infinities are bad, it is a problem that exists today for many Sequence methods, and that we wouldn't be making any worse. So, IMO, the thing to do is improve the parts of the design for which we have a clear path to forward progress, and explicitly not spend energy trying to make improvements that are unlikely to bear fruit.

Jon_Hull · April 30, 2018, 6:15am

I don't think the separation between technically finite (but very large) and infinite is really useful. If we decide we need to deal with infinite collections (e.g. using lazy map), then I think we should make the split point based on whether it can be iterated over in a "reasonable" time... where reasonable is defined somewhat hand-wavily.

jawbroken · April 30, 2018, 8:48am

Thanks for the corrections. These kind of details weren't that clear from my brief skim of the online documentation.

If it wasn't clear, I was trying to explore the API that is “ambiguous” when used with an infinite Collection, i.e. sometimes the returned Collection will be infinite and sometimes it will be finite; or sometimes it will cause an infinite loop and sometimes it won't depending on the contents of the collection and the given parameters. All the methods you mention I already presumed would definitely be removed from the hypothetical InfiniteCollection protocol.

I completely agree with you. I was just trying to explore what trying to maintain this separation might look like, but it only confirmed to me that this probably isn't worth enforcing, or possible to enforce, at the type system level.

Erica_Sadun · April 30, 2018, 1:58pm

I think conceptualizing a one-pass sequence as a Stream better represents things like a random number generator or user input. The output can be created but it can't be repeated and it may be infinite.

As you started out discussing, Sequence is most commonly a walk along knowable data, so is multipass. A walk produces iterable values. A knowable data source can have multiple walks, like striding, which is what @soroush and I were discussing in https://forums.swift.org/t/pitch-adding-strideable-sequences. We quickly converged on behavior that unified sequences and collections.

As for streams, I started putting together a very rough concept of Stream last week because I wanted to think through the problem space: stream.swift · GitHub I slightly disagree with @davedelong in its utility (He writes, "They’re rare enough in practice that I don’t think we need a way to generalize their existence"). I think representing any real world input, whether audio from a microphone, feeds from a website, tweets from social media, or as mentioned above random numbers and user input, all can be well represented as streams.

Avi · April 30, 2018, 2:02pm

The proposal for Stream looks a lot like an RxSwift Observable. Do we need such a thing in the standard library?

anandabits · April 30, 2018, 2:14pm

You may be right. You have certainly spent far more time than the rest of us thinking about this.

At a minimum, I think we should make the use of “finite” in the documentation more precise. For example:

a collection's indices form a finite range of the positions of the collection's elements

Given that we allow 0…<Int.max this sentence implies a connotation in direction of the mathematical notion of "finite". At the same time we have algorithm precondition documentation that says "must be finite". This clearly carries a narrower connotation that the it must be possible in practice to process every element. Obviously your proposal to allow infinite collections eliminates the former connotation leaving us with the latter.