[Pitch] Remove destructive consumption from Sequence

anandabits · July 2, 2016, 12:09pm

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen

I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

I agree.

The rationale has been discussed quite a bit already in this thread.
The current protocols do not provide the semantics many people are
assuming in their code, leading to a lot of code that is incorrect
despite the fact that it usually works in practice.

This is especially frequent in the case of the finite assumption.
This assumption is so common it seems very wise to me to encode it as
a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a
good handle on what a solution would look like.

I really appreciate the attention that the library team has given to
this.

, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

// Only eager algorithms that can terminate available here
func index(where predicate: (Element)->Bool) -> Index

// all lazy algorithms available here
var lazy: ...

var startIndex: Index
var endIndex: Index // possibly not reachable from startIndex

associatedtype SubSequence : Collection
// do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

// All eager algorithms available here
func map(...) ->
var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { … }

Does this design entirely break with the relationship between
collections and iterators (dropping `makeIterator` as a protocol
requirement)? If so, would for..in (over collections) be built on top
of indices and use `formIndex(after:)`? Would it require a finite
collection (unless we add `until` to the language and then allow
`for..in..until` to work with infinite collections)?

All of these points are up for discussion.

Cool. I think the collection for..in has some nice advantages, but
since it sounds like we’ll probably keep Iterator around it might be
best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built
on finite sequences and allow for..in..unitl to be used with infinite
sequences if we add that in the future. :)

John McCall pointed out to
me that an index-based for..in would make it possible to implement

for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance. I don’t know exactly how
for..in is implemented today, but the implementation of
IndexingIterator compares position to endIndex. If for..in is also
comparing checking the optional for nil that’s an extra comparison.
We shouldn't need to actually construct the optional in the first
place using an index-based for..in. Maybe optimizations like this
already exist? But even if they do, it seems like they wouldn’t be
possible in some cases where the type of the sequence isn’t statically
known.

Would we still retain `IndexingIterator`even if we break the
relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

I like this answer.

Would it still be possible to do things like zip a multi-pass sequence
with a single-pass sequence (assuming we keep single-pass sequences or
add them back eventually)? This seems like a use case worth
supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

Yes, of course. I’m glad we would keep this relationship in tact.

One subtle change I think this implies is that things like
`LazyFilterSequence` can implement `makeIterator` with constant
complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong. `LazyFilterSequence` just constructs an
iterator and returns it. `LazyFilterCollection` has to loop until it
finds the first item matching the predicate in its `startIndex`
implementation. The part I was missing is that `IndexingIterator`
gets the `startIndex` in its initializer.

`startIndex` for `LazyFilterCollection` currently has O(N) complexity.
The complexity of a complete iteration doesn’t change and probably
isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

I’ve been looking at some code that wraps a sequence and considering
how it would be impacted. With iterators it looks like this:

guard let element = base.next()
else { return nil }

With collections and indices it would look something like this:

base.formIndex(after: &index)
guard index != baseEndIndex
else { return endIndex }

let element = base[index]

That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

If we’re going to push people towards collections and indices we
should try to make common patterns like “update the iteration state
and return the next element if there is one" simpler.

That's IndexingIterator.

Cool, I wrote this thinking that was going away.

This could be accomplished with an extension method along these lines:

guard let element = base.formIndex(after: &index,
.returningOptionalElement)
else { return endIndex }

With an implementation something like:

enum FormIndexResult {
.returningOptionalElement
}
extension Collection {
func formIndex(after i: inout Self.Index, _ result:
FormIndexResult) -> Self.Element?
}

This would provide similar functionality to `IndexingIterator` without
coupling the storage of `elements` and `position` (which is important
if you’re wrapping a collection and need to wrap the collection and
its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in
the sense the following code would provide equivalent functionality to
`iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)

vs

let optionalElement = myIterator.next()

The single case enum is just there to provide a label that
differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary. I
still have a use case for it but it is rather obscure.

I can imagine wanting a design like the above for cases where
implementing the endIndex requires adding an extra bit of state, e.g. in

struct LazyPrefix<Base: Collection> : Collection {
init(_ base: Base, where: (C.Element)->Bool)
...
}

you don't want to traverse the base collection eagerly just to come up
with an endIndex, so you store an optional Base.Index in
LazyPrefix.Index, which is nil for the endIndex. In these cases, index
comparison is less efficient than it might otherwise be.

But my answer for these cases is simple: simply use a specialized
Iterator that will be more efficient than IndexingIterator. Is there a
reason that doesn't work for your case?

I would index a custom iterator in my case,

I don't know what it means to index an iterator.

Wow, I have no idea how I feel ended up writing that. I must have gotten distracted somehow. I meant a custom iterator wouldn't be relevant in my case.

but I am talking about code that would live inside the implementation
of `index(after:)` in the `Collection` conformance of a wrapper
`Collection` that bears some resemblance to `LazyFlattenCollection`.
In this case you are receiving your `Index` which wraps `Base.Index`.

I don't understand, but maybe it doesn't matter.

Sorry, maybe I would need to provide more context. But it's not important as like I said, it is a relatively obscure case.

FWIW, I'm pretty
confident we could write a specialized Collection protocol that, given
some equatable State and a next() method, supplied all the basic
Collection requirements, for the cases where the Iterator is the most
efficient means of traversal, e.g.

     protocol IteratingCollection : Collection {
       associatedtype IterationState : Equatable
       func next(inout state: IterationState) -> Element?
       func startState() -> IterationState
     }

     extension IteratingCollection {
       // all the collection requirements implemented in terms
       // of startState() and next(...)
     }

This would be cool. It could be the basis of a method taking a start state and a closure (I think you mentioned a method like this earlier).

Like I said, this is a pretty obscure case and I would never suggest
including it on the grounds of a use case that is only relevant to
`index(after:)` implementations in wrapper collections. :) I brought
it because I thought you may have been suggesting a more drastic
change that removed the collection iterators.

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states.

Can you provide examples of these? I’m having difficulty thinking of
one.

NSSet is an example.

Iterator is the right way to do that. So I think we need to keep
Iterator around.

I don’t have any objection to keeping it either. :) Hopefully we’d
still be able to improve the design in the future if / when new
enabling language features come along.

In the meantime, people would be able to implement their own protocol
for single pass sequences. What they would lose is for..in as well as
the standard library algorithms. I’m not sure how many people this
would impact or how big the impact would be for them. We have seen a
couple of examples in this discussion, but probably not enough to
asses the overall impact.

One thing you don’t mention here is a distinction between finite and
infinite single-pass sequences (iterators). I don’t know if the
finite / infinite distinction is as important here, but wanted to
point it out. Obviously if we remove support single-pass sequences
now we could defer that discussion until we’re ready to bring back
support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
enough (/dev/urandom, temperature sensor) that it's better to just
ask people handling them to be careful and not, e.g., try to “count”
them.

3. Decide that everything on a single-pass sequence is lazy. Since you
can only take a single pass anyway, people won't observe their
closures being called more often than necessary, which was the main
motivator for making map, filter, et. al eager on collections without
an explicit .lazy.

Implications of #3:

* Any “partially-formed” multipass sequences (modeling only Iterator)
would be free to expose an accurate underestimatedCount, thereby
optimizing the process of copying into an array. The lazy filter
Iterator adaptor would have an underestimatedCount of 0.

* All algorithms that require multiple passes, such as sorted(), would
be unavailable on Iterator. You'd have to construct an Array (or
other MutableCollection) and sort that in-place. Of course,
constructing an Array from an Iterator could still go on forever if
the Iterator turned out to be infinite, which means, at some level #3
is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

Yes, #3.

We can always make the few operations that have to be eager—such as
Array construction from an Iterator—explicit with a label or something:

Array(claimingFiniteness: someIterator)

This makes sense. Finite single-pass iterators can always be added in
the future if compelling use cases emerge. We’re not taking anything
away.

All of the code I have looked at that makes a finite assumption would
be converted to require `Collection` in the new model.

I think you mean FiniteCollection, yes?

Yes, sorry.

···

Sent from my iPad

On Jul 1, 2016, at 9:55 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jul 1, 2016, at 6:32 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com >>>>> <http://matthew-at-anandabits.com/> >>>>> <http://matthew-at-anandabits.com/ >>>>> <http://matthew-at-anandabits.com/>>> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> >>>>>> wrote:
on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me <http://swift-evolution-at-haravikk.me/>> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

--
Dave

dwaite · July 5, 2016, 9:43pm

Trying my hardest to summarize the talking points of the conversation so far. Please let me know if I’ve missed any particular points:

- The thread was started (by yours truly) as an offshoot of another on-list conversation involving Dave Abrahams. In it, I voiced a concern that there were not enough legitimate needs for a single pass sequence as a core type (supporting concepts such as for..in) to warrant the possibility of single-pass behavior in Sequence when writing dependent algorithms
- There was a concern about the possible mutability of the methods causing complexities around standard naming, in particular the recurring effort to rename map/filter/reduce ‘terms of art’.
- For example in the case of filter vs filtered, the immutably-named version might still destroy the ability to access the original data

- Dave Abrahams’s primary concern about the existing system is that the restriction that Sequence is single-pass is not intuitive, and that generalized functions over sequences would be tested with the commonly-available Sequences, all of which are multi-pass.
  - Considering single vs multi-pass can affect the algorithmic complexity and memory requirements of their code, this could be a considerably difficult bug to solve later on.
  - There is also a desire to simplify things by reducing the # of types if at all possible, and likewise a distaste for increasing the number of types or overall complexity
  - One example here would be that if single-pass sequences are rare, it may be possible to drop Sequence completely and just have Collection

- Single-pass sequences seem to be very rare and centered around I/O. One interesting example given was a database row cursor.

- A concern was voiced (by yours truly) that Sequences, unlike Collections, can today be:
  - Single pass/destructive
  - Infinite
  - Non-indexed
and that the elimination of single pass behavior would not eliminate the other differentiating factors

It was countered that
- Infinite sequences may be representable with Collection
- It is possible to have the state in a programmatic generator be encoded in or as the Index in a collection, so even programmatic sequences can support indexes

- A single-pass sequence is likely best represented by not having implement Sequence, but rather having an Iterator implementation directly.
  - Such an Iterator should be a reference type rather than representing it as having value semantics - if you actually support value semantics, then your state should be self-contained and thus you can support multi-pass.
  - IteratorProtocol might gain several of the sequence methods, such as dropFirst() and prefix().
  - Scala’s name for a single-pass sequence is TraversableOnce, and such descriptive naming would be useful in developer understanding on how to use single-pass sequences
  - There was concern that the current model might be appropriate, and that the changes may be underestimating developers’ ability to deal with the complexity of single and multi pass sequences represented by the same type.

- It is a possibility that for..in would work with two distinct protocols (such as IteratorProtocol and Sequence) , rather than having single pass iteration be differentiated by where you are in an inheritance chain
-There was also a concern that if we have a single root type that is for..in-able, we would [just be shuffling things around], e.g. calling Sequence multi-pass, and creating a new SinglePassSequence above it.

- While single-vs-multipass is a distinction which needs to be made, there is some disagreement over whether it makes sense to differentiate infinite vs finite sequences. A collection with UInt64.max elements is finite but is still effectively infinite in terms of computation time. E.g., algorithms might fail due to memory consumption or performance for a 2 billion entry collection the same as they would fail for an infinite one.

- There is a desire if possible to eliminate Sequence if possible, and just have Collection. However, Collection has the requirement of returning a finite count and is Indexable.
  - Indexable requires a Index which can be used to look up a collection value as an O(1) operation
  - Indexable is comparable/equatable, although there is a possibility that Comparable could be dropped to simplify implementation of Collections
  - Indexable requires a end index, which for an infinite Collection would be a synthetic value
  - Indexable incudes a distance between two indexes, thus having similar problems to Collection.count

- Around infinite sequences, there has been some discussion on whether it is worth differentiating them via different types to provide developer warning that they may be working with an infinite loop. There was discussion of a for..in..until syntax for this (I presume because for..in..while would be harder grammar-wise?). The justification is that requiring until requires the developer to reason about the infinite sequence.

- For computed sequences, it would be possible to use an iterator directly as the index, as the index is not required to be an Integer type to meet Collection requirements.

- There was a discussion of having a Iterator and FiniteIterator. It was also mentioned that Iterator could be func next() ->T while FiniteIterator is func next() -> T? . There was also discussion that there might be a PossiblyInfiniteIterator with Finite and Infinite iterators as subclasses.

- There was discussion about forcing value-semantic iterators to be Collections. This could be done by allowing for infinite Collections, using something similar to a value-semantic iterator as the Index, while reducing the requirements for Index to support such usage (e.g. drop Comparable)

- There was a strawman to move eager operations (map, etc) and count to a FiniteCollection sub-protocol of Collection. (note: it was not clarified what other protocols Collection maintains, like Indexable, where Indexable still has requirements for functions like distance() )

- "AFAIK there is no interesting multipass Sequence that cannot reasonably be made to support indexing.” -Dave Abrahams

- Discussion steered toward using sequence(first:next:) and sequence(state:next:) to create collections, and using this as a replacement for iterator/sequence usage

- If Iterator and Sequence are no longer part of multi-pass sequences, one could instead support for..in using formIndex and subscripting.
- This would support subscript setters, and thus mutation of values in a for..in

- Dave Abrahams indicated a desire to support “partially formed” multipass sequences which cannot meet indexable requirements easy via the existing iterator system. An example given was collections imported from Foundation, such as NSSet.
- Thus it would be likely that Collection would retain makeIterator. Thus iterators themselves would not give a destructive vs nondestructive guarantee.

- Three possible approaches were given for differentiating finite and infinite single-pass sequences, to protect against generalized algorithms that would go into an infinite loop when called:
  1. Separate protocols (provide count only on FiniteIterator)
  2. Implicit additional requirements from iterator source documentation (don’t call count on the /dev/urandom-backed iterator)
  3. Make all provided iterator methods lazy (instead of count, provide underestimateCount or possibly have count return an optional)

-DW