[Pitch] Make Collection Super-Convenient and Retire Sequence

Sorry I wasn’t trying to be annoying but I am concerned that a better collection library design would use generalised existentials extensively. In the ideal world many methods like filter, map, or slice would return AnyCollection<T>, which is only possible after type system improvements.

Therefore we would be making changes now onely to have to go back and revisit collection changes again, so my suggestion of sitting on theses changes is to prevent too much churn in everyones code.

I am assuming that generalised existentials will happen soon though, if they are 12 months away then I would go with changes now.

I mostly agree with this. I don’t think we should wait to START designing it, but we may want to make the changes all at once, which might mean differing the actual implementation until at least the ownership stuff has been defined.

We should definitely start thinking about the design now though, since it is going to take awhile, and my least favorite additions have been the ones where we have rushed…

1 Like

FWIW, although that may look attractive on the surface, if you dig deeper into the implications, including performance, static type safety, and the extent of implied code breakage, I am convinced it will turn out to be wrong. I don’t have anything against speculative discussion of possible redesigns based on language features we don’t yet have, but IMO further discussion of this direction, if people want to pursue it, would be better in a separate thread. :wink:

5 Likes

I’m all for this unification, but please call it Sequence instead of Collection. The whole point of the thing is that it has an order, like all sequences do. The name Collection could then be reused for something more appropriate, maybe something that isn’t iterable at all.

5 Likes

Well, I tried, but at the moment I can’t even get past step 1. Just adding this, which should be benign, causes the standard library to not compile.

Can we just take Sequences' functionality and move it to Range (while also removing Sequence). After all, you can strongly argue philosophically that ranges and sequences are identical. Also, feel like it would work real nicely if we can take out subsequence as a requirement for collection and expand on Ranges capability by adding SubRanges and many other goodies that it should by default have. This would enrich String APIs A LOT. Plus, people would actually start using Ranges then.

1 Like

I am strongly in support of doing something about Sequence, but I’ve arrived at this position in a different way from Dave and I have a different set of concerns about the status quo as well as about practical impact of the proposal presented here.

It is rather long, so you might want some tasty beverage to wash this down:

Well, I found a workaround, so the work is currently unblocked. Until SR-7605 is fixed, we will have the compiler deducing Index = DefaultIndex<Self> for any Collection without an explicit nested Index type. After it is fixed, though, the picture changes slightly: either we make Sequence a refinement of Collection that supplies DefaultIndex<Self> as a default, or Sequence = Collection and we have to migrate existing Sequence instances to include typealias Index = DefaultIndex<Self>.

1 Like

If the behavior of SR-7605 is desirable, would a default associatedtype suffice?

protocol Collection {
    // ...
    associatedtype Index: Comparable = DefaultIndex<Self>
    // ...
}

I like @mpangburn solution, I have had a lot of problems with type inference of associated types and generally use explicit type aliases or defaults to overcome the problems.

FWIW, single pass streams are very common in network programming. Asynchronous stream processing is a large part of reactive and functional reactive programming for example. There's also all sorts of interesting problems with buffering and back-pressure as well (Chris mentioned bounded queue depths in his concurrency manifesto https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782 for example). I would kill for better stream support in Swift. It would be awesome if they could be an extension to the async/await constructs Chris talks about in https://gist.github.com/lattner/429b9070918248274f25b714dcfc7619.

As @anandabits points out though we should be careful to distinguish between pull streams (synchronous) and push streams (asynchronous).

(Sorry for being kind of off topic, I'm reading chronologically.)

2 Likes

Good question. There are plenty of collections that look something like:

struct X : RandomAccessCollection {
  let startIndex = 0
  let endIndex = 3
  subscript(_ x: Int) -> Bool { return x > 2 }
}

and for these to be deduced as having Index = Int the compiler would have to prefer deducing Index from the startIndex/endindex/subscript declarations over deducing it from an explicit default. But the compiler's type deduction rules are…somewhat… underspecified :wink:, so… maybe? I think we'd need to ask @Douglas_Gregor

Update: @nnnnnnnn found a great workaround and we are even able to declare the associated type default without breaking any tests. See https://github.com/apple/swift/pull/17220

5 Likes

What happens to LazySequence in the brave new world sketched in this proposal?

typealias LazySequence = LazyCollection

ersump'n.

3 Likes

I need time to write a full response, but I completely disagree with this premise. Sequence and Collection are separate concepts; neither is a special case of the other, and neither makes the other obsolete; we need both.

The only commonality is that Sequence and Collection need to be processed in for-in statements. The mistake we made here was insisting on One True Way to make something be able to be processed by for-in. So, an alternate solution is for for-in to follow two paths:

  • For both sequences and collections that also conform to Sequence, we do what we currently do.
  • For collections that are not also sequences (e.g. Set and Dictionary after we change the hierarchy), we call indices.makeIterator(). We'll insist that indices is a Sequence, even if the corresponding collection isn't. For these un-sequenced collections, Index only has to be Equatable, not also Comparable. Collections that could have multiple ways to iterate through there elements could provide several make*Iterator functions, but would have to canonize one for for-in.

(The inspiration for this came from remembering one of my fixed-sized array ideas. FSAs wouldn't conform to Sequence, but still would be processable through for-in. That breaks the rules of for-in's One True Way, modifying its rules simply by fiat. When I saw this thread, I wondered if that idea could be applied here.)

5 Likes

Perhaps I need to wait for the full response, but I'm not sure what you're saying here. If there's one canonical way to iterate through the elements of something then in what way is it not a Sequence? And why wouldn't a fixed-size array be a Sequence? I remain very negative about the idea that you should be able to use something in a for-in loop but pretend it isn't a Sequence by introducing a more complex protocol hierarchy. This matter also seems somewhat unrelated to the topic of this thread, in that you could still split up Collection to stop Set and Dictionary from conforming to it, or move conformance to a view/property, in the future proposed in this thread.

2 Likes

I strongly hope this is going on, but one aspect that disturbs me a little bit with Set and Dictionary not being Collections anymore is that there won't be any common type that groups all collections / containers.
Technically, there is no reason there has to be such a supertype, but for someone coming from another language, it might be quite irritating to have collections that aren't Collections...

2 Likes

Another thing that would benefit from having a non-Sequence based Collection-esque super-protocol is bi-(or higher)-axial-collections. That is have a protocol for things which have Elements and can vend Sequence views onto those elements by row, column, etc. Like matrices.

3 Likes

For something that IS-A sequence, the relative order of the elements is part of the whole object’s state. For containers that ARE-NOT-A sequence, there is no relative order conceptually, but we have to pick something practically in order for for-in to work. If a container does have a canonical linear order, then it should probably publicly be a Sequence; my use of “canonical” in the previous post was for implementing, not for modeling.

I’ve been imagining fixed-sized arrays as structural/compound types, so they can’t conform to any protocol. Even if we add protocol support to compound types, I think there were subtleties to how my FSA interface worked that made conforming to Sequence by default a bad idea. One way I gave support for Sequence operations was to have a global Standard Library function to flatten a FSA to a Unsafe(Mutable)BufferPointer in a closure.

Multi-axial containers, like @Dante-Broggi mentioned is his/her response are an example of something that could be iterated over all elements, but not necessarily have a model-level linear order. (They may have an implementation-level linear order.) Besides user-defined types, multi-dimensional FSAs are another example of multi-axial types.