[Pitch] Remove destructive consumption from Sequence

anandabits · July 1, 2016, 10:22pm

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen

I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

I agree.

The rationale has been discussed quite a bit already in this thread. The current protocols do not provide the semantics many people are assuming in their code, leading to a lot of code that is incorrect despite the fact that it usually works in practice.

This is especially frequent in the case of the finite assumption. This assumption is so common it seems very wise to me to encode it as a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a good handle on what a solution would look like.

I really appreciate the attention that the library team has given to
this.

, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

  // Only eager algorithms that can terminate available here
  func index(where predicate: (Element)->Bool) -> Index

  // all lazy algorithms available here
  var lazy: ...

  var startIndex: Index
  var endIndex: Index // possibly not reachable from startIndex

  associatedtype SubSequence : Collection
  // do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

  // All eager algorithms available here
  func map(...) ->
  var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { … }

Does this design entirely break with the relationship between
collections and iterators (dropping `makeIterator` as a protocol
requirement)? If so, would for..in (over collections) be built on top
of indices and use `formIndex(after:)`? Would it require a finite
collection (unless we add `until` to the language and then allow
`for..in..until` to work with infinite collections)?

All of these points are up for discussion.

Cool. I think the collection for..in has some nice advantages, but since it sounds like we’ll probably keep Iterator around it might be best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built on finite sequences and allow for..in..unitl to be used with infinite sequences if we add that in the future. :)

John McCall pointed out to
me that an index-based for..in would make it possible to implement

for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance. I don’t know exactly how for..in is implemented today, but the implementation of IndexingIterator compares position to endIndex. If for..in is also comparing checking the optional for nil that’s an extra comparison. We shouldn't need to actually construct the optional in the first place using an index-based for..in. Maybe optimizations like this already exist? But even if they do, it seems like they wouldn’t be possible in some cases where the type of the sequence isn’t statically known.

Would we still retain `IndexingIterator`even if we break the
relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

I like this answer.

Would it still be possible to do things like zip a multi-pass sequence
with a single-pass sequence (assuming we keep single-pass sequences or
add them back eventually)? This seems like a use case worth
supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

Yes, of course. I’m glad we would keep this relationship in tact.

One subtle change I think this implies is that things like
`LazyFilterSequence` can implement `makeIterator` with constant
complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong. `LazyFilterSequence` just constructs an iterator and returns it. `LazyFilterCollection` has to loop until it finds the first item matching the predicate in its `startIndex` implementation. The part I was missing is that `IndexingIterator` gets the `startIndex` in its initializer.

`startIndex` for `LazyFilterCollection` currently has O(N) complexity.
The complexity of a complete iteration doesn’t change and probably
isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

I’ve been looking at some code that wraps a sequence and considering
how it would be impacted. With iterators it looks like this:

guard let element = base.next()
else { return nil }

With collections and indices it would look something like this:

base.formIndex(after: &index)
guard index != baseEndIndex
else { return endIndex }

let element = base[index]

That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

If we’re going to push people towards collections and indices we
should try to make common patterns like “update the iteration state
and return the next element if there is one" simpler.

That's IndexingIterator.

Cool, I wrote this thinking that was going away.

This could be accomplished with an extension method along these lines:

guard let element = base.formIndex(after: &index,
.returningOptionalElement)
   else { return endIndex }

With an implementation something like:

enum FormIndexResult {
   .returningOptionalElement
}
extension Collection {
   func formIndex(after i: inout Self.Index, _ result:
FormIndexResult) -> Self.Element?
}

This would provide similar functionality to `IndexingIterator` without
coupling the storage of `elements` and `position` (which is important
if you’re wrapping a collection and need to wrap the collection and
its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in the sense the following code would provide equivalent functionality to `iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)

vs

let optionalElement = myIterator.next()

The single case enum is just there to provide a label that differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary. I still have a use case for it but it is rather obscure. IndexingIterator probably covers the vast majority of use cases.

Q: Why should there be indices on an infinite multipass sequence?
A: Because the operations on indices apply equally well whether the
sequence is finite or not. Find the index of a value in the
sequence, slice the sequence, find again, etc.

Q: Why is there an endIndex on an infinite seque?
A: So you can write algorithms such as index(where:) once.

Q: Why not allow endIndex to have a different type from startIndex?

A: It appears to offer insufficient benefit for the associated
  complexity in typical usage. A classic use case that argues for a
  different endIndex type is the null-terminated C string. But you
  can't index one of those safely without actually counting the
  length,
  and once you've done that you can make the endIndex an Int.

It’s also worth nothing that we can use `Optional` with `nil` as the
`endIndex` sentinel if necessary.

True, that's a useful technique when there's no underlying storage in
the collection (e.g. a fibonacci sequence)

## Single Pass Iteration

The refinement relationship between Sequence and Collection is
problematic, because it means either:

a) algorithms such as map on single-pass sequences claim to be
nonmutating even though it's a lie (status quo)

b) those algorithms can't be used on immutable (“let bound”)
multipass sequences. IMO that would be totally unacceptable.

If we drop the refinement, we can have a saner world. We also don't
need to separate Sequence and Iterator anymore. We can simply drop
Sequence altogether, and the protocol for single-pass iteration
becomes Iterator.

Makes sense to me.

### Mutation and Reference Semantics

Everything in Swift is copiable via `let copy = thing` (let's please
not argue over the definition of copy for classes; this is the one
built into the lowest level of the language—I refer to the other one,
that requires allocation, as “clone”).

Anything you do with a sequence that's truly single-pass mutates the
sequence *and of its copies*. Therefore, such a type *fundamentally*
has reference semantics. One day we may be able to model single-pass
sequences with “move-only” value types, which cannot be copied. You
can find move-only types in languages like Rust and C++, but they are
not supported by Swift today. So it seems reasonable that all
Iterators in Swift today should be modeled as classes.

I think this makes a lot of sense in the model you are proposing. All
multipass structures are collections. Any sequence that can only
support a single pass is modeled as an iterator which inherently has
identity. Making this distinction strong will prevent any confusion.

The fact that Swift doesn't have a mutation model for classes,
though, means that mutating methods on a class constrained protocol
can't be labeled as such. So consuming operations on a
class-constrained Iterator protocol would not be labeled as mutating.

The standard library team is currently trying to evaluate the
tradeoffs in this area. One possibility under consideration is
simply dropping support for single-pass sequences until Swift can
support move-only value types and/or gets a mutation model for class
instances. It would be very interesting to know about any real-world
models of single-pass sequences that people are using in Swift, since
we don't supply any in the standard library.

I’m happy to see you mention a mutation model for class instances! (I
don’t mean to sidetrack the discussion, but would love to see that
someday)

I don’t have any objection to dropping support for single-pass
sequences temporarily. It’s possible that I would feel differently if
I was making use of them in my own code but I’m not.

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states.

Can you provide examples of these? I’m having difficulty thinking of one.

Iterator is the right way to do that. So I think we need to keep Iterator around.

I don’t have any objection to keeping it either. :) Hopefully we’d still be able to improve the design in the future if / when new enabling language features come along.

In the meantime, people would be able to implement their own protocol
for single pass sequences. What they would lose is for..in as well as
the standard library algorithms. I’m not sure how many people this
would impact or how big the impact would be for them. We have seen a
couple of examples in this discussion, but probably not enough to
asses the overall impact.

One thing you don’t mention here is a distinction between finite and
infinite single-pass sequences (iterators). I don’t know if the
finite / infinite distinction is as important here, but wanted to
point it out. Obviously if we remove support single-pass sequences
now we could defer that discussion until we’re ready to bring back
support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
  single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
  enough (/dev/urandom, temperature sensor) that it's better to just
  ask people handling them to be careful and not, e.g., try to “count”
  them.

3. Decide that everything on a single-pass sequence is lazy. Since you
  can only take a single pass anyway, people won't observe their
  closures being called more often than necessary, which was the main
  motivator for making map, filter, et. al eager on collections without
  an explicit .lazy.

Implications of #3:

* Any “partially-formed” multipass sequences (modeling only Iterator)
would be free to expose an accurate underestimatedCount, thereby
optimizing the process of copying into an array. The lazy filter
Iterator adaptor would have an underestimatedCount of 0.

* All algorithms that require multiple passes, such as sorted(), would
be unavailable on Iterator. You'd have to construct an Array (or
other MutableCollection) and sort that in-place. Of course,
constructing an Array from an Iterator could still go on forever if
the Iterator turned out to be infinite, which means, at some level #3
is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

···

On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> >> wrote:
on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

--
Dave

dwaite · July 5, 2016, 8:32pm

Q: Why not allow endIndex to have a different type from startIndex?
A: It appears to offer insufficient benefit for the associated
complexity in typical usage. A classic use case that argues for a
different endIndex type is the null-terminated C string. But you
can't index one of those safely without actually counting the length,
and once you've done that you can make the endIndex an Int.

It’s also worth nothing that we can use `Optional` with `nil` as the `endIndex` sentinel if necessary.

I don’t believe this is true? Optional cannot support Equatable w/o generics changes. I believe we would need a different wrapper type.

The fact that Swift doesn't have a mutation model for classes, though,
means that mutating methods on a class constrained protocol can't be
labeled as such. So consuming operations on a class-constrained
Iterator protocol would not be labeled as mutating.

The standard library team is currently trying to evaluate the tradeoffs
in this area. One possibility under consideration is simply dropping
support for single-pass sequences until Swift can support move-only
value types and/or gets a mutation model for class instances. It would
be very interesting to know about any real-world models of single-pass
sequences that people are using in Swift, since we don't supply any in
the standard library.

I’m happy to see you mention a mutation model for class instances! (I don’t mean to sidetrack the discussion, but would love to see that someday)

I don’t have any objection to dropping support for single-pass sequences temporarily. It’s possible that I would feel differently if I was making use of them in my own code but I’m not.

I might see an issue if there were enough single-pass sequences that we worried about them trying to wedge themselves into Collection.

In the meantime, people would be able to implement their own protocol for single pass sequences. What they would lose is for..in as well as the standard library algorithms. I’m not sure how many people this would impact or how big the impact would be for them. We have seen a couple of examples in this discussion, but probably not enough to asses the overall impact.

I “feel” the model for Iterator as a separate possibly single-pass/infinite thing makes sense. But I also suspect unique requirements will come up for I/O use cases; who knows if some concurrency primitives might provide a better fit at that point in time?

-DW

···

On Jul 1, 2016, at 9:59 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> wrote:

Russ_Bishop1 · June 30, 2016, 11:03pm

I use it in a LazyRowSequence<T: SqlModelConvertible> where
querying Sqlite in WAL mode allows multiple concurrent readers to
get point-in-time snapshots of the database. The query can’t be
replayed without buffering all the rows in memory because Sqlite’s
step functions are not bi-directional. In some cases we are talking
about tens of thousands of rows (or even hundreds of thousands) and
the ability to avoid buffering them is a feature, not a bug.

IMHO the typical case for single-pass is IO.

Yes. Also, truly-random number generators.

In this case it would work just as well as LazyRowIterator<T> assuming
the language allows for-in on an iterator.

So, you're not interested in using algorithms like map and filter on
these things?

If it was just about for-in, we could say you can for-in over any
instance of

()->T?

Fair point; we actually do take advantage of the lazy versions of filter and map.

Does it make sense to say that single-pass sequences are always lazy?

               Iterable
          / \
   / \
  LazyIterable EagerIterable
       > >
LazyCollection Collection

LazyCollection is a wrapper around Collection so you can still use .lazy the way you would today.
LazyIterables are single-pass.
EagerIterables are multi-pass.

Maybe this doesn’t work because we’re talking about somewhat orthogonal things? I could imagine trying to model the idea of single vs multi pass and lazy vs eager separately:

protocol Iterable {
 associatedtype Iterator: IteratorProtocol
 associatedtype Element = Iterator.Element
 func makeIterator() -> Iterator
}
protocol MultiIterable: Iterable { }
protocol SingleIterable: Iterable { }
protocol LazyIterable: Iterable { }
protocol EagerIterable: Iterable { }

extension MultiIterable where Self: EagerIterable {
 func map(t: @noescape (Element) -> U) -> [U] { }
}

extension MultiIterable where Self: LazyIterable {
 func map(t: (Element) -> U) -> LazyMapMultiIterable<Self> { }
}

extension SingleIterable where Self: LazyIterable {
 func map(t: (Element) -> U) -> LazyMapSingleIterable<Self> { }
}

I guess it comes back to what you and others have pointed out - it might not be worth the effort to provide this level of flexibility.

Russ

···

On Jun 30, 2016, at 3:37 PM, Dave Abrahams <dabrahams@apple.com> wrote:

Brent_Royal-Gordon · July 1, 2016, 10:57am

`sequence(state:next:)` could be adapted into a possibly-infinite `Collection`, and it's not much more difficult to use than `AnySequence`. (Although it would be more efficient to use a custom iterator for it than it would be to iterate with its indices.)

···

On Jul 1, 2016, at 12:34 AM, Haravikk <swift-evolution@haravikk.me> wrote:

Sequences are currently dead easy to implement, and to implement in an ad-hoc way via AnySequence(body:), how would that be done under this required indexing scheme?

--
Brent Royal-Gordon
Architechies

dabrahams · July 1, 2016, 2:43pm

Sequences are currently dead easy to implement, and to implement in
an ad-hoc way via AnySequence(body:), how would that be done under
this required indexing scheme?

`sequence(state:next:)` could be adapted into a possibly-infinite
`Collection`, and it's not much more difficult to use than
`AnySequence`.

Yes, the state would gain an Equatable constraint, that's all.

(Although it would be more efficient to use a custom
iterator for it than it would be to iterate with its indices.)

Whether the resulting collection is going to be efficient after
optimization depends on many factors. I can easily imagine this being
optimal:

for i in collection(
 first: 0, next: { state in state < 100 ? state + 5 : nil }) {
 print(i)
 }

However, it is true that the most reliable path to efficiency is always
going to be to encode the state and next in a protocol conformance

···

on Fri Jul 01 2016, Brent Royal-Gordon <brent-AT-architechies.com> wrote:

On Jul 1, 2016, at 12:34 AM, Haravikk <swift-evolution@haravikk.me> wrote:

--
Dave

dabrahams · July 1, 2016, 2:47pm

The subscript and index aren't there for the sequence's benefit, and you
can't know how users will want to use the sequence. The whole point of
making a type conform to a generic protocol is that you can't anticipate
all of the type's uses, so you want to make it work with as much other
code as possible. If I'm defining a new sequence type, how can I know
whether someone might want to use index(of:) or slicing on it?

···

on Fri Jul 01 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 30 Jun 2016, at 23:39, Dave Abrahams <dabrahams@apple.com> wrote:
All multi-pass sequences can benefit from subscripts.

Sorry, not really what I meant, but rather; how many sequences are
really going to use them?

--
Dave

dabrahams · July 1, 2016, 4:07pm

If Iterators become reference types that model single-pass sequences and
becomes for-in-able, as the write-up suggests, couldn't Sequence be
stipulated to be multipass and retain its refinement relationship with
Collection?

AFAIK there is no interesting multipass Sequence that cannot reasonably be
made to support indexing.

There *is* existing code that exposes multipass data structures without
exposing the ability to compare iteration state for equality.

It’s worth noting that indices require comparability, not just
equality. I think comparability might cause more difficulty than
equality (but haven’t thought too hard about it).

It does, but I am ever more strongly motivated to drop the comparability
requirement.

In every case I can think of, index equality could easily have been
exposed, but wasn't.These designs can't be adapted to model
Collection.

Why can’t they be adapted to model Collection if equality could have
been exposed? Is it because comparability would be difficult?

Comparabile refines Equatable, so it's at *least* as hard ;-)

Those designs are real, but I am unconvinced they are worth supporting
directly with a separate protocol in the standard library; I'm willing
to accept the idea that those data structures will simply be limited to
modeling Iterator.

Can you elaborate on what designs / data structures you’re talking
about here?

Cocoa Dictionaries and Sets are examples. The enumeration interface
doesn't have any facility for copying or comparing enumeration state.

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jun 30, 2016, at 5:32 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Thu Jun 30 2016, Xiaodi Wu <xiaodi.wu-AT-gmail.com <http://xiaodi.wu-at-gmail.com/>> wrote:

On Thu, Jun 30, 2016 at 12:26 Dave Abrahams via swift-evolution < >>> swift-evolution@swift.org> wrote:

on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution < >>>> swift-evolution@swift.org> wrote:

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

 // Only eager algorithms that can terminate available here
 func index(where predicate: (Element)->Bool) -> Index

 // all lazy algorithms available here
 var lazy: ...

 var startIndex: Index
 var endIndex: Index // possibly not reachable from startIndex

 associatedtype SubSequence : Collection
 // do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

 // All eager algorithms available here
 func map(...) ->
 var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { ... }

Q: Why should there be indices on an infinite multipass sequence?
A: Because the operations on indices apply equally well whether the
 sequence is finite or not. Find the index of a value in the
 sequence, slice the sequence, find again, etc.

Q: Why is there an endIndex on an infinite seque?
A: So you can write algorithms such as index(where:) once.

Q: Why not allow endIndex to have a different type from startIndex?
A: It appears to offer insufficient benefit for the associated
 complexity in typical usage. A classic use case that argues for a
 different endIndex type is the null-terminated C string. But you
 can't index one of those safely without actually counting the length,
 and once you've done that you can make the endIndex an Int.

## Single Pass Iteration

The refinement relationship between Sequence and Collection is
problematic, because it means either:

a) algorithms such as map on single-pass sequences claim to be
 nonmutating even though it's a lie (status quo)

b) those algorithms can't be used on immutable (“let bound”) multipass
 sequences. IMO that would be totally unacceptable.

If we drop the refinement, we can have a saner world. We also don't
need to separate Sequence and Iterator anymore. We can simply drop
Sequence altogether, and the protocol for single-pass iteration becomes
Iterator.

### Mutation and Reference Semantics

Everything in Swift is copiable via `let copy = thing` (let's please not
argue over the definition of copy for classes; this is the one built
into the lowest level of the language—I refer to the other one, that
requires allocation, as “clone”).

Anything you do with a sequence that's truly single-pass mutates the
sequence *and of its copies*. Therefore, such a type *fundamentally*
has reference semantics. One day we may be able to model single-pass
sequences with “move-only” value types, which cannot be copied. You can
find move-only types in languages like Rust and C++, but they are not
supported by Swift today. So it seems reasonable that all Iterators in
Swift today should be modeled as classes.

The fact that Swift doesn't have a mutation model for classes, though,
means that mutating methods on a class constrained protocol can't be
labeled as such. So consuming operations on a class-constrained
Iterator protocol would not be labeled as mutating.

The standard library team is currently trying to evaluate the tradeoffs
in this area. One possibility under consideration is simply dropping
support for single-pass sequences until Swift can support move-only
value types and/or gets a mutation model for class instances. It would
be very interesting to know about any real-world models of single-pass
sequences that people are using in Swift, since we don't supply any in
the standard library.

--
Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
Dave

dabrahams · July 1, 2016, 4:52pm

As noted elsewhere: yes, I am strongly considering that.

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jun 30, 2016, at 5:39 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Thu Jun 30 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 30 Jun 2016, at 18:26, Dave Abrahams <dabrahams@apple.com> wrote:

Q: Why should there be indices on an infinite multipass sequence?
A: Because the operations on indices apply equally well whether the
sequence is finite or not. Find the index of a value in the
sequence, slice the sequence, find again, etc.

Would it not make more sense for sequences that can benefit from
subscripts to just conform to Indexable?

All multi-pass sequences can benefit from subscripts.

It seems like having basically all of the Collection-specific methods
on Sequence too would be confusing, and dilute any nothing that these
are sequences where elements are supposed to be consumed in-order, as
it would give the illusion that you can skip around with the
convenience of an array.

If traversal consumes the sequence, it is a single-pass thing.

It’s also worth noting that Collection does not imply the ability to
skip around “with the convenience of an array” if that means skipping
around in constant time. You need RandomAccessCollection to get that
guarantee.

There's also the issue of how you would even index something that's
potentially infinite; you'd need to use a big integer in which case
you could end up with your indices growing to infinite size over time?

It's trivial; the index contains the iteration state. The only
fundamental difference between the constraints on Iterator and the
constraints on an Index is that Iterator doesn't support comparison for
equality.

Not just equality, but also general `Comparable` (you’re not
considering changing that are you?).

--
Dave

anandabits · July 1, 2016, 3:59pm

All multi-pass sequences can benefit from subscripts.

Sorry, not really what I meant, but rather; how many sequences are
really going to use them?

The subscript and index aren't there for the sequence's benefit, and you
can't know how users will want to use the sequence. The whole point of
making a type conform to a generic protocol is that you can't anticipate
all of the type's uses, so you want to make it work with as much other
code as possible. If I'm defining a new sequence type, how can I know
whether someone might want to use index(of:) or slicing on it?

This isn’t entirely true. Sometimes we know *exactly* how a type will be used in an application. It conforms to a generic protocol in order to take advantage of pre-existing generic code (algorithms, etc) in the standard library (or other libraries).

If the library contains algorithms expressed in terms of the weakest constraints possible (i.e. the current `Sequence` algorithms) sometimes we can accomplish what is necessary without needing to implement additional members on our type, even if it is possible to implement them and theoretically useful (but not useful to the current requirements of the application).

That said, I generally like the direction you suggesting.

I have been considering the incremental complexity of conforming to `Collection` rather than `Sequence` carefully. The one place that I still need to think further about is the `Comparable` requirement on indices and whether that might lead to nontrivial complexity over a `Sequence` conformance.

-Matthew

···

On Jul 1, 2016, at 9:47 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 30 Jun 2016, at 23:39, Dave Abrahams <dabrahams@apple.com> wrote:

--
Dave

anandabits · July 1, 2016, 4:19pm

If Iterators become reference types that model single-pass sequences and
becomes for-in-able, as the write-up suggests, couldn't Sequence be
stipulated to be multipass and retain its refinement relationship with
Collection?

AFAIK there is no interesting multipass Sequence that cannot reasonably be
made to support indexing.

There *is* existing code that exposes multipass data structures without
exposing the ability to compare iteration state for equality.

It’s worth noting that indices require comparability, not just
equality. I think comparability might cause more difficulty than
equality (but haven’t thought too hard about it).

It does, but I am ever more strongly motivated to drop the comparability
requirement.

Dropping that would eliminate the questions that are still lingering in my mind about getting rid of `Sequence`.

Are there any important algorithms that rely on the ability to compare indices?

In every case I can think of, index equality could easily have been
exposed, but wasn't.These designs can't be adapted to model
Collection.

Why can’t they be adapted to model Collection if equality could have
been exposed? Is it because comparability would be difficult?

Comparabile refines Equatable, so it's at *least* as hard ;-)

Right. You only mentioned that index equality could have been exposed but didn’t mention comparability. I was wondering whether the potential additional difficulty is why they couldn’t model Collection.

Those designs are real, but I am unconvinced they are worth supporting
directly with a separate protocol in the standard library; I'm willing
to accept the idea that those data structures will simply be limited to
modeling Iterator.

Can you elaborate on what designs / data structures you’re talking
about here?

Cocoa Dictionaries and Sets are examples. The enumeration interface
doesn't have any facility for copying or comparing enumeration state.

What impact would that have on the bridged value types (which is how we should be using those in Swift).

···

On Jul 1, 2016, at 11:07 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jun 30, 2016, at 5:32 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Thu Jun 30 2016, Xiaodi Wu <xiaodi.wu-AT-gmail.com <http://xiaodi.wu-at-gmail.com/>> wrote:

On Thu, Jun 30, 2016 at 12:26 Dave Abrahams via swift-evolution < >>>> swift-evolution@swift.org> wrote:

on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution < >>>>> swift-evolution@swift.org> wrote:

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

 // Only eager algorithms that can terminate available here
 func index(where predicate: (Element)->Bool) -> Index

 // all lazy algorithms available here
 var lazy: ...

 var startIndex: Index
 var endIndex: Index // possibly not reachable from startIndex

 associatedtype SubSequence : Collection
 // do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

 // All eager algorithms available here
 func map(...) ->
 var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { ... }

Q: Why should there be indices on an infinite multipass sequence?
A: Because the operations on indices apply equally well whether the
sequence is finite or not. Find the index of a value in the
sequence, slice the sequence, find again, etc.

Q: Why is there an endIndex on an infinite seque?
A: So you can write algorithms such as index(where:) once.

Q: Why not allow endIndex to have a different type from startIndex?
A: It appears to offer insufficient benefit for the associated
complexity in typical usage. A classic use case that argues for a
different endIndex type is the null-terminated C string. But you
can't index one of those safely without actually counting the length,
and once you've done that you can make the endIndex an Int.

## Single Pass Iteration

The refinement relationship between Sequence and Collection is
problematic, because it means either:

a) algorithms such as map on single-pass sequences claim to be
nonmutating even though it's a lie (status quo)

b) those algorithms can't be used on immutable (“let bound”) multipass
sequences. IMO that would be totally unacceptable.

If we drop the refinement, we can have a saner world. We also don't
need to separate Sequence and Iterator anymore. We can simply drop
Sequence altogether, and the protocol for single-pass iteration becomes
Iterator.

### Mutation and Reference Semantics

Everything in Swift is copiable via `let copy = thing` (let's please not
argue over the definition of copy for classes; this is the one built
into the lowest level of the language—I refer to the other one, that
requires allocation, as “clone”).

Anything you do with a sequence that's truly single-pass mutates the
sequence *and of its copies*. Therefore, such a type *fundamentally*
has reference semantics. One day we may be able to model single-pass
sequences with “move-only” value types, which cannot be copied. You can
find move-only types in languages like Rust and C++, but they are not
supported by Swift today. So it seems reasonable that all Iterators in
Swift today should be modeled as classes.

The fact that Swift doesn't have a mutation model for classes, though,
means that mutating methods on a class constrained protocol can't be
labeled as such. So consuming operations on a class-constrained
Iterator protocol would not be labeled as mutating.

The standard library team is currently trying to evaluate the tradeoffs
in this area. One possibility under consideration is simply dropping
support for single-pass sequences until Swift can support move-only
value types and/or gets a mutation model for class instances. It would
be very interesting to know about any real-world models of single-pass
sequences that people are using in Swift, since we don't supply any in
the standard library.

--
Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
Dave

dabrahams · July 1, 2016, 8:43pm

You can construct an example where it's detectable if you know enough
about the sequence. For example, you could zip together elements from
0..<∞ with /dev/urandom, and you would have a single-pass sequence where
the number of consumed elements was detectable.

I don't understand how the detectability of how many elements were
consumed is relevant to the design, though.

···

on Fri Jul 01 2016, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote:

On Fri, Jul 1, 2016 at 11:51 AM, Dave Abrahams via swift-evolution < > swift-evolution@swift.org> wrote:

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

>> On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> >> > wrote:
>>
>>
>> on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:
>>
>
>>>> On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution < >> swift-evolution@swift.org> wrote:
>>>>
>>>> Swift is a language that embraces value semantics. Many common
>>>> iterators *can* be implemented with value semantics. Just because we
>>>> can’t implement *all* iterators with value semantics doesn’t mean we
>>>> should require them to have reference semantics. It just means you
>>>> can’t *assume* value semantics when working with iterators in generic
>>>> code unless / until we have a way to specify a value semantics
>>>> constraint. That’s not necessarily a bad thing especially when it
>>>> leaves the door open to interesting future possibilities.
>>>>
>>>> -Matthew
>>>
>>> I'm kind of undecided about this personally. I think one of the
>>> problems with Swift is that the only indication that you have a
>>> reference type is that you can declare it as a constant, yet still
>>> call mutating methods upon it, this isn't a very positive way of
>>> identifying it however. This may be more of a GUI/IDE issue though, in
>>> that something being a class isn't always that obvious at a glance.
>>>
>>> I wonder, could we somehow force iterators stored in variables to be
>>> passed via inout? This would make it pretty clear that you're using
>>> the same iterator and not a copy in all cases, encouraging you to
>>> obtain another if you really do need to perform multiple passes.
>>
>> I'm going to push single-pass iteration on the stack briefly and talk
>> about the topic that's been under discussion here: infinite multipass
>> sequences.
>>
>> ## Fitting “Infinite Multipass” Into the Model
>>
>> It remains to be decided whether it's worth doing, but if it's to
>> happen
>
> I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

> I really appreciate the attention that the library team has given to
> this.
>
>> , the standard library team thinks the right design is roughly
>> this:
>>
>> /// A multipass sequence that may be infinite
>> protocol Collection {
>>
>> // Only eager algorithms that can terminate available here
>> func index(where predicate: (Element)->Bool) -> Index
>>
>> // all lazy algorithms available here
>> var lazy: ...
>>
>> var startIndex: Index
>> var endIndex: Index // possibly not reachable from startIndex
>>
>> associatedtype SubSequence : Collection
>> // do we need an associated FiniteSubsequence, e.g. for prefixes?
>> }
>>
>> protocol FiniteCollection : Collection {
>>
>> // All eager algorithms available here
>> func map(...) ->
>> var count: ...
>> }
>>
>> protocol BidirectionalCollection : Collection { ... }
>>
>> protocol RandomAccessCollection : BidirectionalCollection { … }
>
> Does this design entirely break with the relationship between
> collections and iterators (dropping `makeIterator` as a protocol
> requirement)? If so, would for..in (over collections) be built on top
> of indices and use `formIndex(after:)`? Would it require a finite
> collection (unless we add `until` to the language and then allow
> `for..in..until` to work with infinite collections)?

All of these points are up for discussion. John McCall pointed out to
me that an index-based for..in would make it possible to implement

 for inout x in y { mutate(&x) }

> Would we still retain `IndexingIterator`even if we break the
> relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

> Would it still be possible to do things like zip a multi-pass sequence
> with a single-pass sequence (assuming we keep single-pass sequences or
> add them back eventually)? This seems like a use case worth
> supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

> One subtle change I think this implies is that things like
> `LazyFilterSequence` can implement `makeIterator` with constant
> complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

> `startIndex` for `LazyFilterCollection` currently has O(N) complexity.
> The complexity of a complete iteration doesn’t change and probably
> isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

> I’ve been looking at some code that wraps a sequence and considering
> how it would be impacted. With iterators it looks like this:
>
> guard let element = base.next()
> else { return nil }
>
> With collections and indices it would look something like this:
>
> base.formIndex(after: &index)
> guard index != baseEndIndex
> else { return endIndex }
>
> let element = base[index]
>
> That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

> If we’re going to push people towards collections and indices we
> should try to make common patterns like “update the iteration state
> and return the next element if there is one" simpler.

That's IndexingIterator.

> This could be accomplished with an extension method along these lines:
>
> guard let element = base.formIndex(after: &index,
> .returningOptionalElement)
> else { return endIndex }
>
> With an implementation something like:
>
> enum FormIndexResult {
> .returningOptionalElement
> }
> extension Collection {
> func formIndex(after i: inout Self.Index, _ result:
> FormIndexResult) -> Self.Element?
> }
>
> This would provide similar functionality to `IndexingIterator` without
> coupling the storage of `elements` and `position` (which is important
> if you’re wrapping a collection and need to wrap the collection and
> its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

>> Q: Why should there be indices on an infinite multipass sequence?
>> A: Because the operations on indices apply equally well whether the
>> sequence is finite or not. Find the index of a value in the
>> sequence, slice the sequence, find again, etc.
>>
>> Q: Why is there an endIndex on an infinite seque?
>> A: So you can write algorithms such as index(where:) once.
>>
>> Q: Why not allow endIndex to have a different type from startIndex?
>>
>> A: It appears to offer insufficient benefit for the associated
>> complexity in typical usage. A classic use case that argues for a
>> different endIndex type is the null-terminated C string. But you
>> can't index one of those safely without actually counting the
>> length,
>> and once you've done that you can make the endIndex an Int.
>
> It’s also worth nothing that we can use `Optional` with `nil` as the
> `endIndex` sentinel if necessary.

True, that's a useful technique when there's no underlying storage in
the collection (e.g. a fibonacci sequence)

>>
>> ## Single Pass Iteration
>>
>> The refinement relationship between Sequence and Collection is
>> problematic, because it means either:
>>
>> a) algorithms such as map on single-pass sequences claim to be
>> nonmutating even though it's a lie (status quo)
>>
>> b) those algorithms can't be used on immutable (“let bound”)
>> multipass sequences. IMO that would be totally unacceptable.
>>
>> If we drop the refinement, we can have a saner world. We also don't
>> need to separate Sequence and Iterator anymore. We can simply drop
>> Sequence altogether, and the protocol for single-pass iteration
>> becomes Iterator.
>
> Makes sense to me.
>
>>
>> ### Mutation and Reference Semantics
>>
>> Everything in Swift is copiable via `let copy = thing` (let's please
>> not argue over the definition of copy for classes; this is the one
>> built into the lowest level of the language—I refer to the other one,
>> that requires allocation, as “clone”).
>>
>> Anything you do with a sequence that's truly single-pass mutates the
>> sequence *and of its copies*. Therefore, such a type *fundamentally*
>> has reference semantics. One day we may be able to model single-pass
>> sequences with “move-only” value types, which cannot be copied. You
>> can find move-only types in languages like Rust and C++, but they are
>> not supported by Swift today. So it seems reasonable that all
>> Iterators in Swift today should be modeled as classes.
>
> I think this makes a lot of sense in the model you are proposing. All
> multipass structures are collections. Any sequence that can only
> support a single pass is modeled as an iterator which inherently has
> identity. Making this distinction strong will prevent any confusion.
>
>> The fact that Swift doesn't have a mutation model for classes,
>> though, means that mutating methods on a class constrained protocol
>> can't be labeled as such. So consuming operations on a
>> class-constrained Iterator protocol would not be labeled as mutating.
>>
>> The standard library team is currently trying to evaluate the
>> tradeoffs in this area. One possibility under consideration is
>> simply dropping support for single-pass sequences until Swift can
>> support move-only value types and/or gets a mutation model for class
>> instances. It would be very interesting to know about any real-world
>> models of single-pass sequences that people are using in Swift, since
>> we don't supply any in the standard library.
>
> I’m happy to see you mention a mutation model for class instances! (I
> don’t mean to sidetrack the discussion, but would love to see that
> someday)
>
> I don’t have any objection to dropping support for single-pass
> sequences temporarily. It’s possible that I would feel differently if
> I was making use of them in my own code but I’m not.

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states. Iterator is the right
way to do that. So I think we need to keep Iterator around.

> In the meantime, people would be able to implement their own protocol
> for single pass sequences. What they would lose is for..in as well as
> the standard library algorithms. I’m not sure how many people this
> would impact or how big the impact would be for them. We have seen a
> couple of examples in this discussion, but probably not enough to
> asses the overall impact.
>
> One thing you don’t mention here is a distinction between finite and
> infinite single-pass sequences (iterators). I don’t know if the
> finite / infinite distinction is as important here, but wanted to
> point it out. Obviously if we remove support single-pass sequences
> now we could defer that discussion until we’re ready to bring back
> support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
 single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
 enough (/dev/urandom, temperature sensor) that it's better to just
 ask people handling them to be careful and not, e.g., try to “count”
 them.

Not really feeling sufficiently in my element (excuse the pun) to comment
on most of this discussion, but here I thought I'd chime in. What's
interesting about your two examples (/dev/urandom, temperature sensor) is
that, though single-pass, they should be insensitive to destructive
consumption, no? By which I mean, if some function returns 5 elements from
the "sequence", in both scenarios it would be undetectable whether it
consumes 5, 10 or 15 elements in the process, IIUC. Are there other
examples of infinite, single-pass sequences where destructive consumption
would make a difference?

--
Dave

dabrahams · July 1, 2016, 11:32pm

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen

I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

I agree.

The rationale has been discussed quite a bit already in this thread.
The current protocols do not provide the semantics many people are
assuming in their code, leading to a lot of code that is incorrect
despite the fact that it usually works in practice.

This is especially frequent in the case of the finite assumption.
This assumption is so common it seems very wise to me to encode it as
a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a
good handle on what a solution would look like.

I really appreciate the attention that the library team has given to
this.

, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

  // Only eager algorithms that can terminate available here
  func index(where predicate: (Element)->Bool) -> Index

  // all lazy algorithms available here
  var lazy: ...

  var startIndex: Index
  var endIndex: Index // possibly not reachable from startIndex

  associatedtype SubSequence : Collection
  // do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

  // All eager algorithms available here
  func map(...) ->
  var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { … }

Does this design entirely break with the relationship between
collections and iterators (dropping `makeIterator` as a protocol
requirement)? If so, would for..in (over collections) be built on top
of indices and use `formIndex(after:)`? Would it require a finite
collection (unless we add `until` to the language and then allow
`for..in..until` to work with infinite collections)?

All of these points are up for discussion.

Cool. I think the collection for..in has some nice advantages, but
since it sounds like we’ll probably keep Iterator around it might be
best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built
on finite sequences and allow for..in..unitl to be used with infinite
sequences if we add that in the future. :)

John McCall pointed out to
me that an index-based for..in would make it possible to implement

for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance. I don’t know exactly how
for..in is implemented today, but the implementation of
IndexingIterator compares position to endIndex. If for..in is also
comparing checking the optional for nil that’s an extra comparison.
We shouldn't need to actually construct the optional in the first
place using an index-based for..in. Maybe optimizations like this
already exist? But even if they do, it seems like they wouldn’t be
possible in some cases where the type of the sequence isn’t statically
known.

Would we still retain `IndexingIterator`even if we break the
relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

I like this answer.

Would it still be possible to do things like zip a multi-pass sequence
with a single-pass sequence (assuming we keep single-pass sequences or
add them back eventually)? This seems like a use case worth
supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

Yes, of course. I’m glad we would keep this relationship in tact.

One subtle change I think this implies is that things like
`LazyFilterSequence` can implement `makeIterator` with constant
complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong. `LazyFilterSequence` just constructs an
iterator and returns it. `LazyFilterCollection` has to loop until it
finds the first item matching the predicate in its `startIndex`
implementation. The part I was missing is that `IndexingIterator`
gets the `startIndex` in its initializer.

`startIndex` for `LazyFilterCollection` currently has O(N) complexity.
The complexity of a complete iteration doesn’t change and probably
isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

I’ve been looking at some code that wraps a sequence and considering
how it would be impacted. With iterators it looks like this:

guard let element = base.next()
else { return nil }

With collections and indices it would look something like this:

base.formIndex(after: &index)
guard index != baseEndIndex
  else { return endIndex }

let element = base[index]

That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

If we’re going to push people towards collections and indices we
should try to make common patterns like “update the iteration state
and return the next element if there is one" simpler.

That's IndexingIterator.

Cool, I wrote this thinking that was going away.

This could be accomplished with an extension method along these lines:

guard let element = base.formIndex(after: &index,
.returningOptionalElement)
   else { return endIndex }

With an implementation something like:

enum FormIndexResult {
   .returningOptionalElement
}
extension Collection {
   func formIndex(after i: inout Self.Index, _ result:
FormIndexResult) -> Self.Element?
}

This would provide similar functionality to `IndexingIterator` without
coupling the storage of `elements` and `position` (which is important
if you’re wrapping a collection and need to wrap the collection and
its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in
the sense the following code would provide equivalent functionality to
`iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)

vs

let optionalElement = myIterator.next()

The single case enum is just there to provide a label that
differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary. I
still have a use case for it but it is rather obscure.

I can imagine wanting a design like the above for cases where
implementing the endIndex requires adding an extra bit of state, e.g. in

struct LazyPrefix<Base: Collection> : Collection {
 init(_ base: Base, where: (C.Element)->Bool)
 ...
 }

you don't want to traverse the base collection eagerly just to come up
with an endIndex, so you store an optional Base.Index in
LazyPrefix.Index, which is nil for the endIndex. In these cases, index
comparison is less efficient than it might otherwise be.

But my answer for these cases is simple: simply use a specialized
Iterator that will be more efficient than IndexingIterator. Is there a
reason that doesn't work for your case?

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states.

Can you provide examples of these? I’m having difficulty thinking of
one.

NSSet is an example.

Iterator is the right way to do that. So I think we need to keep
Iterator around.

I don’t have any objection to keeping it either. :) Hopefully we’d
still be able to improve the design in the future if / when new
enabling language features come along.

In the meantime, people would be able to implement their own protocol
for single pass sequences. What they would lose is for..in as well as
the standard library algorithms. I’m not sure how many people this
would impact or how big the impact would be for them. We have seen a
couple of examples in this discussion, but probably not enough to
asses the overall impact.

One thing you don’t mention here is a distinction between finite and
infinite single-pass sequences (iterators). I don’t know if the
finite / infinite distinction is as important here, but wanted to
point it out. Obviously if we remove support single-pass sequences
now we could defer that discussion until we’re ready to bring back
support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
  single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
  enough (/dev/urandom, temperature sensor) that it's better to just
  ask people handling them to be careful and not, e.g., try to “count”
  them.

3. Decide that everything on a single-pass sequence is lazy. Since you
  can only take a single pass anyway, people won't observe their
  closures being called more often than necessary, which was the main
  motivator for making map, filter, et. al eager on collections without
  an explicit .lazy.

Implications of #3:

* Any “partially-formed” multipass sequences (modeling only Iterator)
would be free to expose an accurate underestimatedCount, thereby
optimizing the process of copying into an array. The lazy filter
Iterator adaptor would have an underestimatedCount of 0.

* All algorithms that require multiple passes, such as sorted(), would
be unavailable on Iterator. You'd have to construct an Array (or
other MutableCollection) and sort that in-place. Of course,
constructing an Array from an Iterator could still go on forever if
the Iterator turned out to be infinite, which means, at some level #3
is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

Yes, #3.

We can always make the few operations that have to be eager—such as
Array construction from an Iterator—explicit with a label or something:

Array(claimingFiniteness: someIterator)

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> >>> wrote:
on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

--
Dave

anandabits · July 2, 2016, 1:45am

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen

I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

I agree.

The rationale has been discussed quite a bit already in this thread.
The current protocols do not provide the semantics many people are
assuming in their code, leading to a lot of code that is incorrect
despite the fact that it usually works in practice.

This is especially frequent in the case of the finite assumption.
This assumption is so common it seems very wise to me to encode it as
a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a
good handle on what a solution would look like.

I really appreciate the attention that the library team has given to
this.

, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

// Only eager algorithms that can terminate available here
func index(where predicate: (Element)->Bool) -> Index

// all lazy algorithms available here
var lazy: ...

var startIndex: Index
var endIndex: Index // possibly not reachable from startIndex

associatedtype SubSequence : Collection
// do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

// All eager algorithms available here
func map(...) ->
var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { … }

Does this design entirely break with the relationship between
collections and iterators (dropping `makeIterator` as a protocol
requirement)? If so, would for..in (over collections) be built on top
of indices and use `formIndex(after:)`? Would it require a finite
collection (unless we add `until` to the language and then allow
`for..in..until` to work with infinite collections)?

All of these points are up for discussion.

Cool. I think the collection for..in has some nice advantages, but
since it sounds like we’ll probably keep Iterator around it might be
best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built
on finite sequences and allow for..in..unitl to be used with infinite
sequences if we add that in the future. :)

John McCall pointed out to
me that an index-based for..in would make it possible to implement

for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance. I don’t know exactly how
for..in is implemented today, but the implementation of
IndexingIterator compares position to endIndex. If for..in is also
comparing checking the optional for nil that’s an extra comparison.
We shouldn't need to actually construct the optional in the first
place using an index-based for..in. Maybe optimizations like this
already exist? But even if they do, it seems like they wouldn’t be
possible in some cases where the type of the sequence isn’t statically
known.

Would we still retain `IndexingIterator`even if we break the
relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

I like this answer.

Would it still be possible to do things like zip a multi-pass sequence
with a single-pass sequence (assuming we keep single-pass sequences or
add them back eventually)? This seems like a use case worth
supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

Yes, of course. I’m glad we would keep this relationship in tact.

One subtle change I think this implies is that things like
`LazyFilterSequence` can implement `makeIterator` with constant
complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong. `LazyFilterSequence` just constructs an
iterator and returns it. `LazyFilterCollection` has to loop until it
finds the first item matching the predicate in its `startIndex`
implementation. The part I was missing is that `IndexingIterator`
gets the `startIndex` in its initializer.

`startIndex` for `LazyFilterCollection` currently has O(N) complexity.
The complexity of a complete iteration doesn’t change and probably
isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

I’ve been looking at some code that wraps a sequence and considering
how it would be impacted. With iterators it looks like this:

guard let element = base.next()
else { return nil }

With collections and indices it would look something like this:

base.formIndex(after: &index)
guard index != baseEndIndex
else { return endIndex }

let element = base[index]

That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

If we’re going to push people towards collections and indices we
should try to make common patterns like “update the iteration state
and return the next element if there is one" simpler.

That's IndexingIterator.

Cool, I wrote this thinking that was going away.

This could be accomplished with an extension method along these lines:

guard let element = base.formIndex(after: &index,
.returningOptionalElement)
 else { return endIndex }

With an implementation something like:

enum FormIndexResult {
 .returningOptionalElement
}
extension Collection {
 func formIndex(after i: inout Self.Index, _ result:
FormIndexResult) -> Self.Element?
}

This would provide similar functionality to `IndexingIterator` without
coupling the storage of `elements` and `position` (which is important
if you’re wrapping a collection and need to wrap the collection and
its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in
the sense the following code would provide equivalent functionality to
`iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)

vs

let optionalElement = myIterator.next()

The single case enum is just there to provide a label that
differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary. I
still have a use case for it but it is rather obscure.

I can imagine wanting a design like the above for cases where
implementing the endIndex requires adding an extra bit of state, e.g. in

struct LazyPrefix<Base: Collection> : Collection {
 init(_ base: Base, where: (C.Element)->Bool)
 ...
}

you don't want to traverse the base collection eagerly just to come up
with an endIndex, so you store an optional Base.Index in
LazyPrefix.Index, which is nil for the endIndex. In these cases, index
comparison is less efficient than it might otherwise be.

But my answer for these cases is simple: simply use a specialized
Iterator that will be more efficient than IndexingIterator. Is there a
reason that doesn't work for your case?

I would index a custom iterator in my case, but I am talking about code that would live inside the implementation of `index(after:)` in the `Collection` conformance of a wrapper `Collection` that bears some resemblance to `LazyFlattenCollection`. In this case you are receiving your `Index` which wraps `Base.Index`.

Like I said, this is a pretty obscure case and I would never suggest including it on the grounds of a use case that is only relevant to `index(after:)` implementations in wrapper collections. :) I brought it because I thought you may have been suggesting a more drastic change that removed the collection iterators.

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states.

Can you provide examples of these? I’m having difficulty thinking of
one.

NSSet is an example.

Iterator is the right way to do that. So I think we need to keep
Iterator around.

I don’t have any objection to keeping it either. :) Hopefully we’d
still be able to improve the design in the future if / when new
enabling language features come along.

In the meantime, people would be able to implement their own protocol
for single pass sequences. What they would lose is for..in as well as
the standard library algorithms. I’m not sure how many people this
would impact or how big the impact would be for them. We have seen a
couple of examples in this discussion, but probably not enough to
asses the overall impact.

One thing you don’t mention here is a distinction between finite and
infinite single-pass sequences (iterators). I don’t know if the
finite / infinite distinction is as important here, but wanted to
point it out. Obviously if we remove support single-pass sequences
now we could defer that discussion until we’re ready to bring back
support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
enough (/dev/urandom, temperature sensor) that it's better to just
ask people handling them to be careful and not, e.g., try to “count”
them.

3. Decide that everything on a single-pass sequence is lazy. Since you
can only take a single pass anyway, people won't observe their
closures being called more often than necessary, which was the main
motivator for making map, filter, et. al eager on collections without
an explicit .lazy.

Implications of #3:

* Any “partially-formed” multipass sequences (modeling only Iterator)
would be free to expose an accurate underestimatedCount, thereby
optimizing the process of copying into an array. The lazy filter
Iterator adaptor would have an underestimatedCount of 0.

* All algorithms that require multiple passes, such as sorted(), would
be unavailable on Iterator. You'd have to construct an Array (or
other MutableCollection) and sort that in-place. Of course,
constructing an Array from an Iterator could still go on forever if
the Iterator turned out to be infinite, which means, at some level #3
is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

Yes, #3.

We can always make the few operations that have to be eager—such as
Array construction from an Iterator—explicit with a label or something:

Array(claimingFiniteness: someIterator)

This makes sense. Finite single-pass iterators can always be added in the future if compelling use cases emerge. We’re not taking anything away.

All of the code I have looked at that makes a finite assumption would be converted to require `Collection` in the new model.

···

On Jul 1, 2016, at 6:32 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/> <http://matthew-at-anandabits.com/>> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> >>>> wrote:
on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me <http://swift-evolution-at-haravikk.me/>> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

--
Dave

dabrahams · July 5, 2016, 9:04pm

Correct.

···

on Tue Jul 05 2016, David Waite <david-AT-alkaline-solutions.com> wrote:

On Jul 1, 2016, at 9:59 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> wrote:

Q: Why not allow endIndex to have a different type from startIndex?
A: It appears to offer insufficient benefit for the associated
complexity in typical usage. A classic use case that argues for a
different endIndex type is the null-terminated C string. But you
can't index one of those safely without actually counting the length,
and once you've done that you can make the endIndex an Int.

It’s also worth nothing that we can use `Optional` with `nil` as the `endIndex` sentinel if necessary.

I don’t believe this is true? Optional cannot support Equatable w/o
generics changes. I believe we would need a different wrapper type.

--
Dave

anandabits · July 5, 2016, 9:02pm

Q: Why not allow endIndex to have a different type from startIndex?
A: It appears to offer insufficient benefit for the associated
complexity in typical usage. A classic use case that argues for a
different endIndex type is the null-terminated C string. But you
can't index one of those safely without actually counting the length,
and once you've done that you can make the endIndex an Int.

It’s also worth nothing that we can use `Optional` with `nil` as the `endIndex` sentinel if necessary.

I don’t believe this is true? Optional cannot support Equatable w/o generics changes. I believe we would need a different wrapper type.

That's true right now, but conditional conformances are very near the top, if not the top, item on the generics priority list. I'm hoping they make Swift 3.x rather than 4.0 (but have no knowledge of the likelihood of that).

···

Sent from my iPad

On Jul 5, 2016, at 3:32 PM, David Waite <david@alkaline-solutions.com> wrote:

On Jul 1, 2016, at 9:59 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:
On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com> wrote:

The fact that Swift doesn't have a mutation model for classes, though,
means that mutating methods on a class constrained protocol can't be
labeled as such. So consuming operations on a class-constrained
Iterator protocol would not be labeled as mutating.

The standard library team is currently trying to evaluate the tradeoffs
in this area. One possibility under consideration is simply dropping
support for single-pass sequences until Swift can support move-only
value types and/or gets a mutation model for class instances. It would
be very interesting to know about any real-world models of single-pass
sequences that people are using in Swift, since we don't supply any in
the standard library.

I’m happy to see you mention a mutation model for class instances! (I don’t mean to sidetrack the discussion, but would love to see that someday)

I don’t have any objection to dropping support for single-pass sequences temporarily. It’s possible that I would feel differently if I was making use of them in my own code but I’m not.

I might see an issue if there were enough single-pass sequences that we worried about them trying to wedge themselves into Collection.

In the meantime, people would be able to implement their own protocol for single pass sequences. What they would lose is for..in as well as the standard library algorithms. I’m not sure how many people this would impact or how big the impact would be for them. We have seen a couple of examples in this discussion, but probably not enough to asses the overall impact.

I “feel” the model for Iterator as a separate possibly single-pass/infinite thing makes sense. But I also suspect unique requirements will come up for I/O use cases; who knows if some concurrency primitives might provide a better fit at that point in time?

-DW

dabrahams · June 30, 2016, 11:24pm

I use it in a LazyRowSequence<T: SqlModelConvertible> where
querying Sqlite in WAL mode allows multiple concurrent readers to

get point-in-time snapshots of the database. The query can’t be
replayed without buffering all the rows in memory because Sqlite’s
step functions are not bi-directional. In some cases we are talking
about tens of thousands of rows (or even hundreds of thousands) and
the ability to avoid buffering them is a feature, not a bug.

IMHO the typical case for single-pass is IO.

Yes. Also, truly-random number generators.

In this case it would work just as well as LazyRowIterator<T> assuming
the language allows for-in on an iterator.

So, you're not interested in using algorithms like map and filter on
these things?

If it was just about for-in, we could say you can for-in over any
instance of

()->T?

Fair point; we actually do take advantage of the lazy versions of
filter and map.

Does it make sense to say that single-pass sequences are always lazy?

It might. Would it be inconvenient?

               Iterable
          / \
   / \
  LazyIterable EagerIterable
       > >
LazyCollection Collection

LazyCollection is a wrapper around Collection so you can still use
.lazy the way you would today.

Is it a protocol? Wrappers are usually generics.

LazyIterables are single-pass. EagerIterables are multi-pass.
Maybe this doesn’t work because we’re talking about somewhat
orthogonal things? I could imagine trying to model the idea of single
vs multi pass and lazy vs eager separately:

The goal of generic programming is not to separate every idea into its
own protocol. Discovering the natural dependency relationships and
clusters of requirements and capabilities is a crucial part of it. As
far as I can tell, lazy operations are always appropriate.

···

on Thu Jun 30 2016, Russ Bishop <xenadu-AT-gmail.com> wrote:

On Jun 30, 2016, at 3:37 PM, Dave Abrahams <dabrahams@apple.com> wrote:

protocol Iterable {
 associatedtype Iterator: IteratorProtocol
 associatedtype Element = Iterator.Element
 func makeIterator() -> Iterator
}
protocol MultiIterable: Iterable { }
protocol SingleIterable: Iterable { }
protocol LazyIterable: Iterable { }
protocol EagerIterable: Iterable { }

extension MultiIterable where Self: EagerIterable {
 func map(t: @noescape (Element) -> U) -> [U] { }
}

extension MultiIterable where Self: LazyIterable {
 func map(t: (Element) -> U) -> LazyMapMultiIterable<Self> { }
}

extension SingleIterable where Self: LazyIterable {
 func map(t: (Element) -> U) -> LazyMapSingleIterable<Self> { }
}

I guess it comes back to what you and others have pointed out - it
might not be worth the effort to provide this level of flexibility.

--
Dave

dabrahams · July 1, 2016, 4:54pm

All multi-pass sequences can benefit from subscripts.

Sorry, not really what I meant, but rather; how many sequences are
really going to use them?

The subscript and index aren't there for the sequence's benefit, and you
can't know how users will want to use the sequence. The whole point of
making a type conform to a generic protocol is that you can't anticipate
all of the type's uses, so you want to make it work with as much other
code as possible. If I'm defining a new sequence type, how can I know
whether someone might want to use index(of:) or slicing on it?

This isn’t entirely true. Sometimes we know *exactly* how a type will
be used in an application. It conforms to a generic protocol in order
to take advantage of pre-existing generic code (algorithms, etc) in
the standard library (or other libraries).

If the library contains algorithms expressed in terms of the weakest
constraints possible (i.e. the current `Sequence` algorithms)
sometimes we can accomplish what is necessary without needing to
implement additional members on our type, even if it is possible to
implement them and theoretically useful (but not useful to the current
requirements of the application).

Yes, but requirements clustering is absolutely essential to creating
coherent generic libraries. Otherwise you end up with a billion little
granular protocols, to no real purpose.

That said, I generally like the direction you suggesting.

I have been considering the incremental complexity of conforming to
`Collection` rather than `Sequence` carefully. The one place that I
still need to think further about is the `Comparable` requirement on
indices and whether that might lead to nontrivial complexity over a
`Sequence` conformance.

It might, but let's drop it. The only *real* reason we have it is to
provide nice precondition checking for Range construction, and we can
simply make that conditional on comparability.

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jul 1, 2016, at 9:47 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 30 Jun 2016, at 23:39, Dave Abrahams <dabrahams@apple.com> wrote:

--
Dave

dabrahams · July 1, 2016, 6:10pm

If Iterators become reference types that model single-pass sequences and
becomes for-in-able, as the write-up suggests, couldn't Sequence be
stipulated to be multipass and retain its refinement relationship with
Collection?

AFAIK there is no interesting multipass Sequence that cannot reasonably be
made to support indexing.

There *is* existing code that exposes multipass data structures without
exposing the ability to compare iteration state for equality.

It’s worth noting that indices require comparability, not just
equality. I think comparability might cause more difficulty than
equality (but haven’t thought too hard about it).

It does, but I am ever more strongly motivated to drop the comparability
requirement.

Dropping that would eliminate the questions that are still lingering
in my mind about getting rid of `Sequence`.

Are there any important algorithms that rely on the ability to compare
indices?

No. When the collection has random access, you can always measure
distance using the collection and use its sign. Algorithms that depend
on that ability all require random access anyawy. If the collection is
not random-access but the Index type is comparable, you can use that as
well, which may make some new things possible.

In every case I can think of, index equality could easily have been
exposed, but wasn't.These designs can't be adapted to model
Collection.

Why can’t they be adapted to model Collection if equality could have
been exposed? Is it because comparability would be difficult?

Comparabile refines Equatable, so it's at *least* as hard ;-)

Right. You only mentioned that index equality could have been exposed
but didn’t mention comparability. I was wondering whether the
potential additional difficulty is why they couldn’t model Collection.

Those designs are real, but I am unconvinced they are worth supporting
directly with a separate protocol in the standard library; I'm willing
to accept the idea that those data structures will simply be limited to
modeling Iterator.

Can you elaborate on what designs / data structures you’re talking
about here?

Cocoa Dictionaries and Sets are examples. The enumeration interface
doesn't have any facility for copying or comparing enumeration state.

What impact would that have on the bridged value types (which is how
we should be using those in Swift).

None. Today we have a horrible workaround in Swift to make those model
Collection: storing an array of keys. At some point we can get
Foundation to do do something friendlier for us, but in the meantime,
that still works.

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jul 1, 2016, at 11:07 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jun 30, 2016, at 5:32 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Thu Jun 30 2016, Xiaodi Wu <xiaodi.wu-AT-gmail.com <http://xiaodi.wu-at-gmail.com/>> wrote:

--
Dave

anandabits · July 1, 2016, 10:20pm

All multi-pass sequences can benefit from subscripts.

Sorry, not really what I meant, but rather; how many sequences are
really going to use them?

The subscript and index aren't there for the sequence's benefit, and you
can't know how users will want to use the sequence. The whole point of
making a type conform to a generic protocol is that you can't anticipate
all of the type's uses, so you want to make it work with as much other
code as possible. If I'm defining a new sequence type, how can I know
whether someone might want to use index(of:) or slicing on it?

This isn’t entirely true. Sometimes we know *exactly* how a type will
be used in an application. It conforms to a generic protocol in order
to take advantage of pre-existing generic code (algorithms, etc) in
the standard library (or other libraries).

If the library contains algorithms expressed in terms of the weakest
constraints possible (i.e. the current `Sequence` algorithms)
sometimes we can accomplish what is necessary without needing to
implement additional members on our type, even if it is possible to
implement them and theoretically useful (but not useful to the current
requirements of the application).

Yes, but requirements clustering is absolutely essential to creating
coherent generic libraries. Otherwise you end up with a billion little
granular protocols, to no real purpose.

That said, I generally like the direction you suggesting.

I have been considering the incremental complexity of conforming to
`Collection` rather than `Sequence` carefully. The one place that I
still need to think further about is the `Comparable` requirement on
indices and whether that might lead to nontrivial complexity over a
`Sequence` conformance.

It might, but let's drop it. The only *real* reason we have it is to
provide nice precondition checking for Range construction, and we can
simply make that conditional on comparability.

Cool. This sounds like a good approach.

···

On Jul 1, 2016, at 11:54 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jul 1, 2016, at 9:47 AM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Haravikk <swift-evolution-AT-haravikk.me> wrote:

On 30 Jun 2016, at 23:39, Dave Abrahams <dabrahams@apple.com> wrote:

--
Dave

dabrahams · July 2, 2016, 2:55am

Swift is a language that embraces value semantics. Many common
iterators *can* be implemented with value semantics. Just because we
can’t implement *all* iterators with value semantics doesn’t mean we
should require them to have reference semantics. It just means you
can’t *assume* value semantics when working with iterators in generic
code unless / until we have a way to specify a value semantics
constraint. That’s not necessarily a bad thing especially when it
leaves the door open to interesting future possibilities.

-Matthew

I'm kind of undecided about this personally. I think one of the
problems with Swift is that the only indication that you have a
reference type is that you can declare it as a constant, yet still
call mutating methods upon it, this isn't a very positive way of
identifying it however. This may be more of a GUI/IDE issue though, in
that something being a class isn't always that obvious at a glance.

I wonder, could we somehow force iterators stored in variables to be
passed via inout? This would make it pretty clear that you're using
the same iterator and not a copy in all cases, encouraging you to
obtain another if you really do need to perform multiple passes.

I'm going to push single-pass iteration on the stack briefly and talk
about the topic that's been under discussion here: infinite multipass
sequences.

## Fitting “Infinite Multipass” Into the Model

It remains to be decided whether it's worth doing, but if it's to
happen

I definitely think it’s worth doing.

Opinions are nice, but rationales are better. How will we understand
*why* it's worth doing?

I agree.

The rationale has been discussed quite a bit already in this thread.
The current protocols do not provide the semantics many people are
assuming in their code, leading to a lot of code that is incorrect
despite the fact that it usually works in practice.

This is especially frequent in the case of the finite assumption.
This assumption is so common it seems very wise to me to encode it as
a semantic requirement in a protocol.

IMO these are problem worth addressing, especially now that we have a
good handle on what a solution would look like.

I really appreciate the attention that the library team has given to
this.

, the standard library team thinks the right design is roughly
this:

/// A multipass sequence that may be infinite
protocol Collection {

// Only eager algorithms that can terminate available here
func index(where predicate: (Element)->Bool) -> Index

// all lazy algorithms available here
var lazy: ...

var startIndex: Index
var endIndex: Index // possibly not reachable from startIndex

associatedtype SubSequence : Collection
// do we need an associated FiniteSubsequence, e.g. for prefixes?
}

protocol FiniteCollection : Collection {

// All eager algorithms available here
func map(...) ->
var count: ...
}

protocol BidirectionalCollection : Collection { ... }

protocol RandomAccessCollection : BidirectionalCollection { … }

Does this design entirely break with the relationship between
collections and iterators (dropping `makeIterator` as a protocol
requirement)? If so, would for..in (over collections) be built on top
of indices and use `formIndex(after:)`? Would it require a finite
collection (unless we add `until` to the language and then allow
`for..in..until` to work with infinite collections)?

All of these points are up for discussion.

Cool. I think the collection for..in has some nice advantages, but
since it sounds like we’ll probably keep Iterator around it might be
best to take the approach of making them both work.

You already know that I would prefer to see the current for..in built
on finite sequences and allow for..in..unitl to be used with infinite
sequences if we add that in the future. :)

John McCall pointed out to
me that an index-based for..in would make it possible to implement

for inout x in y { mutate(&x) }

That would be very nice!

I think it might also increase performance. I don’t know exactly how
for..in is implemented today, but the implementation of
IndexingIterator compares position to endIndex. If for..in is also
comparing checking the optional for nil that’s an extra comparison.
We shouldn't need to actually construct the optional in the first
place using an index-based for..in. Maybe optimizations like this
already exist? But even if they do, it seems like they wouldn’t be
possible in some cases where the type of the sequence isn’t statically
known.

Would we still retain `IndexingIterator`even if we break the
relationship in the protocol requirements?

Yes: it should be possible to implement Collection algorithms in terms
of Iterator algorithms, and IndexingIterator provides the means. That
said, I think the makeIterator requirement does little harm, especially
when it can be defaulted for Collections.

I like this answer.

Would it still be possible to do things like zip a multi-pass sequence
with a single-pass sequence (assuming we keep single-pass sequences or
add them back eventually)? This seems like a use case worth
supporting in some way.

Yes. If you can create an Iterator from a Collection, and you can zip
Iterators, you can do this.

Yes, of course. I’m glad we would keep this relationship in tact.

One subtle change I think this implies is that things like
`LazyFilterSequence` can implement `makeIterator` with constant
complexity, deferring the O(N) complexity to the first call to `next`.

I don't believe that's a difference, though I could be wrong.

You’re right, I was wrong. `LazyFilterSequence` just constructs an
iterator and returns it. `LazyFilterCollection` has to loop until it
finds the first item matching the predicate in its `startIndex`
implementation. The part I was missing is that `IndexingIterator`
gets the `startIndex` in its initializer.

`startIndex` for `LazyFilterCollection` currently has O(N) complexity.
The complexity of a complete iteration doesn’t change and probably
isn’t a big deal, but it’s worth noting.

Filtered collection views always require a bit of hand-waving around
performance guarantees; I don't think that changes.

I’ve been looking at some code that wraps a sequence and considering
how it would be impacted. With iterators it looks like this:

guard let element = base.next()
else { return nil }

With collections and indices it would look something like this:

base.formIndex(after: &index)
guard index != baseEndIndex
else { return endIndex }

let element = base[index]

That’s not too bad but it is more verbose.

Sequence today is a single-pass thing. If you are wrapping Sequence
today presumably you'd wrap an Iterator tomorrow, and you wouldn't have
to deal with indices.

If we’re going to push people towards collections and indices we
should try to make common patterns like “update the iteration state
and return the next element if there is one" simpler.

That's IndexingIterator.

Cool, I wrote this thinking that was going away.

This could be accomplished with an extension method along these lines:

guard let element = base.formIndex(after: &index,
.returningOptionalElement)
 else { return endIndex }

With an implementation something like:

enum FormIndexResult {
 .returningOptionalElement
}
extension Collection {
 func formIndex(after i: inout Self.Index, _ result:
FormIndexResult) -> Self.Element?
}

This would provide similar functionality to `IndexingIterator` without
coupling the storage of `elements` and `position` (which is important
if you’re wrapping a collection and need to wrap the collection and
its indices independently).

I'm afraid I don't understand. Could you be more explicit about what
you have in mind?

The idea was to provide functionality similar to `IndexingIterator` in
the sense the following code would provide equivalent functionality to
`iterator.next()` but expressed in terms of a collection and an index:

let optionalElement = myCollection.formIndex(after: &myIndex, . returningOptionalElement)

vs

let optionalElement = myIterator.next()

The single case enum is just there to provide a label that
differentiates the overload.

If we’re keeping IndexingIterator this probably isn’t necessary. I
still have a use case for it but it is rather obscure.

I can imagine wanting a design like the above for cases where
implementing the endIndex requires adding an extra bit of state, e.g. in

struct LazyPrefix<Base: Collection> : Collection {
 init(_ base: Base, where: (C.Element)->Bool)
 ...
}

you don't want to traverse the base collection eagerly just to come up
with an endIndex, so you store an optional Base.Index in
LazyPrefix.Index, which is nil for the endIndex. In these cases, index
comparison is less efficient than it might otherwise be.

But my answer for these cases is simple: simply use a specialized
Iterator that will be more efficient than IndexingIterator. Is there a
reason that doesn't work for your case?

I would index a custom iterator in my case,

I don't know what it means to index an iterator.

but I am talking about code that would live inside the implementation
of `index(after:)` in the `Collection` conformance of a wrapper
`Collection` that bears some resemblance to `LazyFlattenCollection`.
In this case you are receiving your `Index` which wraps `Base.Index`.

I don't understand, but maybe it doesn't matter. FWIW, I'm pretty
confident we could write a specialized Collection protocol that, given
some equatable State and a next() method, supplied all the basic
Collection requirements, for the cases where the Iterator is the most
efficient means of traversal, e.g.

      protocol IteratingCollection : Collection {
        associatedtype IterationState : Equatable
        func next(inout state: IterationState) -> Element?
        func startState() -> IterationState
      }

      extension IteratingCollection {
        // all the collection requirements implemented in terms
        // of startState() and next(...)
      }

Like I said, this is a pretty obscure case and I would never suggest
including it on the grounds of a use case that is only relevant to
`index(after:)` implementations in wrapper collections. :) I brought
it because I thought you may have been suggesting a more drastic
change that removed the collection iterators.

On second thought, I believe it is important to have a way to support
existing “partially formed” multipass sequences that don't expose
copying or equality for their iteration states.

Can you provide examples of these? I’m having difficulty thinking of
one.

NSSet is an example.

Iterator is the right way to do that. So I think we need to keep
Iterator around.

I don’t have any objection to keeping it either. :) Hopefully we’d
still be able to improve the design in the future if / when new
enabling language features come along.

In the meantime, people would be able to implement their own protocol
for single pass sequences. What they would lose is for..in as well as
the standard library algorithms. I’m not sure how many people this
would impact or how big the impact would be for them. We have seen a
couple of examples in this discussion, but probably not enough to
asses the overall impact.

One thing you don’t mention here is a distinction between finite and
infinite single-pass sequences (iterators). I don’t know if the
finite / infinite distinction is as important here, but wanted to
point it out. Obviously if we remove support single-pass sequences
now we could defer that discussion until we’re ready to bring back
support for them.

There are a few possible answers I can think of:

1. Do the “obvious” thing and create a separate protocol for finite
single-pass sequences

2. Decide that the combination of infinite and single-pass is rare
enough (/dev/urandom, temperature sensor) that it's better to just
ask people handling them to be careful and not, e.g., try to “count”
them.

3. Decide that everything on a single-pass sequence is lazy. Since you
can only take a single pass anyway, people won't observe their
closures being called more often than necessary, which was the main
motivator for making map, filter, et. al eager on collections without
an explicit .lazy.

Implications of #3:

* Any “partially-formed” multipass sequences (modeling only Iterator)
would be free to expose an accurate underestimatedCount, thereby
optimizing the process of copying into an array. The lazy filter
Iterator adaptor would have an underestimatedCount of 0.

* All algorithms that require multiple passes, such as sorted(), would
be unavailable on Iterator. You'd have to construct an Array (or
other MutableCollection) and sort that in-place. Of course,
constructing an Array from an Iterator could still go on forever if
the Iterator turned out to be infinite, which means, at some level #3
is just a refinement of #2 that makes it less error-prone.

Do you lean towards any of these?

Yes, #3.

We can always make the few operations that have to be eager—such as
Array construction from an Iterator—explicit with a label or something:

Array(claimingFiniteness: someIterator)

This makes sense. Finite single-pass iterators can always be added in
the future if compelling use cases emerge. We’re not taking anything
away.

All of the code I have looked at that makes a finite assumption would
be converted to require `Collection` in the new model.

I think you mean FiniteCollection, yes?

···

on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com> wrote:

On Jul 1, 2016, at 6:32 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:

On Jul 1, 2016, at 11:51 AM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:
on Fri Jul 01 2016, Matthew Johnson <matthew-AT-anandabits.com >>>> <http://matthew-at-anandabits.com/> >>>> <http://matthew-at-anandabits.com/ >>>> <http://matthew-at-anandabits.com/>>> wrote:

On Jun 30, 2016, at 12:26 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> >>>>> wrote:
on Wed Jun 29 2016, Haravikk <swift-evolution-AT-haravikk.me <http://swift-evolution-at-haravikk.me/>> wrote:

On 29 Jun 2016, at 00:10, Matthew Johnson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

--
Dave