[Pitch] Remove destructive consumption from Sequence


(David Waite) #1

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require an O(n) operation to count the values in the sequence. A collection has a finite number of elements, and the fixed size is exposed as an O(1) or O(n) operation via ‘count’

- A collection is indexable, with those indices being usable for various operations including forming subsets, comparisons, and manual iteration

- A sequence may or may not be destructive, where a destructive sequence consumes elements during traversal, making them unavailable on subsequent traversals. Collection operations are required to be non-destructive

I would like to Pitch removing this third differentiation, the option for destructive sequences.

My main motivation for proposing this is the potential for developer confusion. As stated during one of the previous threads on the naming of map, flatMap, filter, etc. methods on Sequence, Sequence has a naming requirement not typical of the rest of the Swift standard library in that many methods on Sequence may or may not be destructive. As such, naming methods for any extensions on Sequence is challenging as the names need to not imply immutability.

It would still be possible to have Generators which operate destructively, but such Generators would not conform to the needs of Sequence. As such, the most significant impact would be the inability to use such Generators in a for..in loop, although one could make the case for a lower-level Iterable-style interface for requesting destructive-or-nondestructive generators for the purpose of generic algorithms which do not care whether the data given is consumed as part of them doing their work. I do not make this case here, as instead I plan to make the case that destructive generators would be a rare beast.

From the Swift project documentation at https://github.com/apple/swift/blob/d95921e5a838d7cc450f6fbc2072bd1a5be95e24/docs/SequencesAndCollections.rst#sequences

"Because this construct is generic, s could be

  • an array
  • a set
  • a linked list
  • a series of UI events
  • a file on disk
  • a stream of incoming network packets
  • an infinite series of random numbers
  • a user-defined data structure
  • etc.”

The disruption of such a change on this list of examples would likely be:

- The series of UI events from a UI framework likely indicates a queue, where iterating over a generator would consume items from the head of that queue.

However, this is not typically how events are handled in such systems - instead, events are often represented either via an event loop dispatch, registered calls made by a thread pool, or a reactive mechanism. Such a stream of incoming UI events would likely be blocking, as such a signaling method for new events would still be needed at the queue level. When you consider UI events are already usually serialized onto a single thread, using a queue at the application level is an extra complexity over the event queue that is already being used at the runloop level or kernel level.

- A file on disk would likely be iterating as a series of UInt8 values, Characters, String lines, or other pattern matches. If built at a low enough level, such as on top of NSInputStream, this would also represent reading a file from a URL.

In this case there are three example behaviors I’d like to call attention to:
1. The developer wants to operate on the file as a set of data, in which case one would expect a Data value, String, or [String] to be created to represent the scenarios above.
2. The developer wants to parse a structured format, in which case such iteration behaviors would likely be insufficient
3. The developer wants to iterate on the input data as a stream, without waiting for the data to fully arrive or retaining it in memory. I suspect there is less overlap with this sort of developer and the developer who wants a framework-provided String-per-line iteration.

- Streams of incoming network packets build on the two previous points:
Like UI events, a stream of incoming network packets may be better suited to an event dispatch or reactive mechanism. Like file access, a stream of incoming network packets is likely to require processing beyond what is easily obtainable with a for..in loop.

Likewise, it is possible for data to be segmented across network packets at the application level, making a for..in loop possibly have to leak the contents of previous packets to process a single framed network message. It is also more likely that a network connectivity issue would disrupt the packets, requiring either additional error recovery processes to be defined around such a for..in loop, or an interface similar to reactive observables where stream close and errors can be represented as part of the iterated state

- It is unlikely that a for..in loop would be over a random number sequence.

However, if you did want to use random number sequences in a for..in loop, a ‘random' number sequence is reproducible if it has the same seed.

Non-repeating behavior does not require consuming bytes except where the ‘random’ sequence needs to be reproducible as a whole, such as gaming and cryptography applications. New ‘random’ data can be had by simply iterating a new random number generator with a new random seed.

However, iterating over an external random source like /dev/random would no longer be allowed via a for..in loop, because multiple iterations would yield different data.

-DW


(Dave Abrahams) #2

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require
an O(n) operation to count the values in the sequence.

The latter being no different from Collection.

A collection has a finite number of elements, and the fixed size is
exposed as an O(1) or O(n) operation via ‘count’

I don't believe we've actually nailed down that Collection is finite.

Oh, gee, Nate's documentation edits do
that. (https://github.com/apple/swift/commit/6e274913)
Nate, did we discuss this explicitly or did it slip in unnoticed?

The one crucial distinction in Collection is that you can make multiple
passes over the same elements.

- A collection is indexable, with those indices being usable for
various operations including forming subsets, comparisons, and manual
iteration

- A sequence may or may not be destructive, where a destructive
sequence consumes elements during traversal, making them unavailable
on subsequent traversals. Collection operations are required to be
non-destructive

I would like to Pitch removing this third differentiation, the option
for destructive sequences.

I have been strongly considering this direction myself, and it's
something we need to decide about for Swift 3.

My main motivation for proposing this is the potential for developer
confusion. As stated during one of the previous threads on the naming
of map, flatMap, filter, etc. methods on Sequence, Sequence has a
naming requirement not typical of the rest of the Swift standard
library in that many methods on Sequence may or may not be
destructive. As such, naming methods for any extensions on Sequence is
challenging as the names need to not imply immutability.

I don't think the names are really the worst potential cause of
confusion here. There's also the fact that you can conform to Sequence
with a destructively-traversed “value type” that has no mutating
methods.

It would still be possible to have Generators which operate

<Ahem> “Iterators,” please.

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”
* If not, would you remove Collection?
* What role would Iterator play?

···

on Wed Jun 22 2016, David Waite <swift-evolution@swift.org> wrote:

--
Dave


(Anton Zhilin) #3

One practical implication of treating Sequence as infinite is in lazy
filter. If you pass Sequence, but not Collection, to lazy filter, it will
guarantee to iterate it only once. If it's a collection, it will feel free
to iterate it twice.

So one benefit of single-pass sequences is assumptions about behaviour of
an API: if it can work with sequences, then it will only iterate them
once.


(Matthew Johnson) #4

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require
an O(n) operation to count the values in the sequence.

The latter being no different from Collection.

A collection has a finite number of elements, and the fixed size is
exposed as an O(1) or O(n) operation via ‘count’

I don't believe we've actually nailed down that Collection is finite.

Oh, gee, Nate's documentation edits do
that. (https://github.com/apple/swift/commit/6e274913)
Nate, did we discuss this explicitly or did it slip in unnoticed?

The one crucial distinction in Collection is that you can make multiple
passes over the same elements.

- A collection is indexable, with those indices being usable for
various operations including forming subsets, comparisons, and manual
iteration

- A sequence may or may not be destructive, where a destructive
sequence consumes elements during traversal, making them unavailable
on subsequent traversals. Collection operations are required to be
non-destructive

I would like to Pitch removing this third differentiation, the option
for destructive sequences.

I have been strongly considering this direction myself, and it's
something we need to decide about for Swift 3.

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is a good idea (along with preventing for..in from being used with an infinite sequence)

My main motivation for proposing this is the potential for developer
confusion. As stated during one of the previous threads on the naming
of map, flatMap, filter, etc. methods on Sequence, Sequence has a
naming requirement not typical of the rest of the Swift standard
library in that many methods on Sequence may or may not be
destructive. As such, naming methods for any extensions on Sequence is
challenging as the names need to not imply immutability.

I don't think the names are really the worst potential cause of
confusion here. There's also the fact that you can conform to Sequence
with a destructively-traversed “value type” that has no mutating
methods.

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that might need to iterate a sequence more than once. You currently have to over-constrain types to `Collection` even if you don’t need to do anything other than iterate the elements (the discussion about whether `LazyFilterSequnce` has a bug in its `underestimateCount` is relevant here).

It would still be possible to have Generators which operate

<Ahem> “Iterators,” please.

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”
* If not, would you remove Collection?
* What role would Iterator play?

If we’re going to consider alternative designs it is worth considering the semantic space available. For the sake of discussion, here is a model that captures the various semantics that exist (the names are just strawmen):

                           Iterable
                           / \
                          / \
                         / \
    FiniteIterable MultipassIterable
                        \ /
                          \ /
                           \ /
                          Sequence
                                 >
                                 >
                          Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond iteration are required. Infinite, single-pass “sequences” may conform.

`for..in` naturally requires `FiniteIterable`, but does not require the `MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These include any dynamically generated sequence, such as a mathematical sequence (even numbers, odd numbers, etc). This is also what the existing `Sequence` would become if we drop support for destructive sequences and do nothing else (note: it would still be possible to accidentally write a `for..in` loop over an infinite sequence).

Under this model `Sequence` brings together `FiniteIterable` and `MultipassIterable`. This describes the most common models of `Sequence`, can safely be used in a `for..in` loop, and does support “destructive” single pass sequences.

`FiniteIterable` and `MultipassIterable` introduce independent and important semantic requirements. If we’re going to consider changes here, I think it is worth at least considering introducing the distinction.

This is obviously much more complex than than the current design. The most obvious simplification would be to drop `Iterable` if we don’t have any compelling use cases for infinite, single pass sequences. One downside to doing this is that the syntactic requirements would need to be repeated in both `FiniteIterable` and `MultipassIterable`

Another obvious simplification would be to also remove `Sequence` (which becomes a “convenience” protocol under this model) and require types that can conform to both `FiniteIterable` and `MultipassIterable` to do so directly.

If chose to make both simplifications we could also rename the remaining `FiniteIterable` and `MultipassIterable` to something simpler like `Iterable` and `Sequence`.

               (for..in) (the existing `Sequence` with an additional multipass semantic requirement)
               Iterable Sequence
                        \ /
                          \ /
                           \ /
                          Collection

I’m interested in hearing what others think about this way of thinking about the available design space.

-Matthew

···

On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Wed Jun 22 2016, David Waite <swift-evolution@swift.org> wrote:

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Waite) #5

<Ahem> “Iterators,” please.

That makes me happy - for some reason I thought it was still GeneratorProtocol

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”

No, I would just remove the allowance in the documentation and API design for a destructive/consuming iteration. Sequence would be the interface to getting access to repeatable iteration, without the need for meeting the other requirements for Collection.

Sequence would be what java would refer to as Iterable, C# refers to as IEnumerable<>, but perhaps it is closest to the Enumerable mixin module in Ruby in terms of default utility methods supplied based on an implementation of ‘each’ (forEach() in Swift). Sequence is for…in-able because Sequence defined makeIterator(). The only difference is that subsequent calls to makeIterator() are expected to return conceptually equivalent elements in the same order.

* If not, would you remove Collection?

I see value in the other differentiating factors between Collection (not needing to be indexable, not needing to return a definitive count, and possibly not needing to be finite - although it sounds like collections may be effectively infinite to the extent that the integer type returned from count allows.

A Fibonacci-providing sequence can be written as a rather small closure, while my understanding of Collection would require either more code (that I likely don’t need), rather funky indexes that preserve state internally, and/or a switch to a non-iterative algorithm.

* What role would Iterator play?

So far I haven’t considered whether IteratorProtocol would be destructive, since my motivation was primarily around consistent Sequence behavior. By that light, an Iterator returned by a method other than Sequence.makeIterator() could still consume data. However, I wonder if supporting such Iterators provides enough value to be worthwhile - there aren’t that many methods that consume an IteratorProtocol (the AnyIterator<> initializer is the only one I know of) and without applicability to a Sequence such an iterator has less value.

-DW

···

On Jun 22, 2016, at 2:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:


(Dave Abrahams) #6

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require
an O(n) operation to count the values in the sequence.

The latter being no different from Collection.

A collection has a finite number of elements, and the fixed size is
exposed as an O(1) or O(n) operation via ‘count’

I don't believe we've actually nailed down that Collection is finite.

Oh, gee, Nate's documentation edits do
that. (https://github.com/apple/swift/commit/6e274913)
Nate, did we discuss this explicitly or did it slip in unnoticed?

The one crucial distinction in Collection is that you can make multiple
passes over the same elements.

- A collection is indexable, with those indices being usable for
various operations including forming subsets, comparisons, and manual
iteration

- A sequence may or may not be destructive, where a destructive
sequence consumes elements during traversal, making them unavailable
on subsequent traversals. Collection operations are required to be
non-destructive

I would like to Pitch removing this third differentiation, the option
for destructive sequences.

I have been strongly considering this direction myself, and it's
something we need to decide about for Swift 3.

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is
a good idea (along with preventing for..in from being used with an
infinite sequence)

for..in over an infinite sequence is not necessarily wrong. I'm not
confident it should be prevented. It's certainly less dangerous than
using `forEach`, from which there is no possibility of escape.

My main motivation for proposing this is the potential for developer
confusion. As stated during one of the previous threads on the naming
of map, flatMap, filter, etc. methods on Sequence, Sequence has a
naming requirement not typical of the rest of the Swift standard
library in that many methods on Sequence may or may not be
destructive. As such, naming methods for any extensions on Sequence is
challenging as the names need to not imply immutability.

I don't think the names are really the worst potential cause of
confusion here. There's also the fact that you can conform to Sequence
with a destructively-traversed “value type” that has no mutating
methods.

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that
might need to iterate a sequence more than once. You currently have
to over-constrain types to `Collection` even if you don’t need to do
anything other than iterate the elements (the discussion about whether
`LazyFilterSequnce` has a bug in its `underestimateCount` is relevant
here).

That's not an over-constraint. Multi-pass-ness *is* the fundamental
distinction between Sequence and Collection. AFAIK there's no multipass
sequence that cannot support indices.

It would still be possible to have Generators which operate

<Ahem> “Iterators,” please.

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”
* If not, would you remove Collection?
* What role would Iterator play?

If we’re going to consider alternative designs it is worth considering
the semantic space available. For the sake of discussion, here is a
model that captures the various semantics that exist (the names are
just strawmen):

                           Iterable
                           / \
                          / \
                         / \
    FiniteIterable MultipassIterable
                        \ /
                          \ /
                           \ /
                          Sequence
                                 >
                                 >
                          Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond
iteration are required. Infinite, single-pass “sequences” may
conform.

Just to keep things straight, let's please adjust one thing at a time.
Adding new concepts should be separated from renaming existing ones. I
don't want to have to qualify “Sequence” every time I use it in this
thread to clarify whether I mean the existing one or what you're
proposing.

`for..in` naturally requires `FiniteIterable`,

I'm less than confident.

FWIW, I'm also not entirely confident that single-pass things should be
part of *this* picture at all. It might be that single-pass things
should be entirely separate and forced to be reference types where
traversal mutates the instance. E.g., the current Iterator could gain a
class constraint and become the only representation of single-pass
sequences.

but does not require the `MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These
include any dynamically generated sequence, such as a mathematical
sequence (even numbers, odd numbers, etc).

These are all potential Collections, AFAICT. I see no reason they
shouldn't be treated as such.

This is also what the existing `Sequence` would become if we drop
support for destructive sequences and do nothing else (note: it would
still be possible to accidentally write a `for..in` loop over an
infinite sequence).

Under this model `Sequence` brings together `FiniteIterable` and
`MultipassIterable`. This describes the most common models of
`Sequence`, can safely be used in a `for..in` loop, and does support
“destructive” single pass sequences.

`FiniteIterable` and `MultipassIterable` introduce independent and
important semantic requirements. If we’re going to consider changes
here, I think it is worth at least considering introducing the
distinction.

This is obviously much more complex than than the current design.

Another reason I'm reluctant. Whatever we do, IMO, should simplify things.

···

on Wed Jun 22 2016, Matthew Johnson <swift-evolution@swift.org> wrote:

On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Wed Jun 22 2016, David Waite <swift-evolution@swift.org> wrote:

The most obvious simplification would be to drop `Iterable` if we
don’t have any compelling use cases for infinite, single pass
sequences. One downside to doing this is that the syntactic
requirements would need to be repeated in both `FiniteIterable` and
`MultipassIterable`

Another obvious simplification would be to also remove `Sequence`
(which becomes a “convenience” protocol under this model) and require
types that can conform to both `FiniteIterable` and
`MultipassIterable` to do so directly.

If chose to make both simplifications we could also rename the
remaining `FiniteIterable` and `MultipassIterable` to something
simpler like `Iterable` and `Sequence`.

               (for..in) (the existing `Sequence` with an additional multipass semantic requirement)
               Iterable Sequence
                        \ /
                          \ /
                           \ /
                          Collection

I’m interested in hearing what others think about this way of thinking about the available design space.

-Matthew

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dave


(Brent Royal-Gordon) #7

I think you're tremendously overcomplicating the design.

If we’re going to consider alternative designs it is worth considering the semantic space available. For the sake of discussion, here is a model that captures the various semantics that exist (the names are just strawmen):

                          Iterable
                          / \
                         / \
                        / \
   FiniteIterable MultipassIterable
                       \ /
                         \ /
                          \ /
                         Sequence
                                >
                                >
                         Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond iteration are required. Infinite, single-pass “sequences” may conform.

Okay, so let's start by undoing that renaming to make clear which semantics in this hierarchy are actually new and which ones are already part of the standard library.

                        Sequence
                          / \
                         / \
                        / \
   FiniteSequence MultipassSequence
                       \ /
                         \ /
                          \ /
               FiniteMultipassSequence
                                >
                                >
                         Collection

`for..in` naturally requires `FiniteIterable`,

Why? There's nothing conceptually incoherent about looping over an infinite sequence. Even ignoring errors, you can `break` out, you can `exit()` from an interior method, you can abort or fail a precondition, or you can just let it loop forever. Forbidding potentially-infinite for loops `for` loops is an arbitrary limitation.

but does not require the `MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These include any dynamically generated sequence, such as a mathematical sequence (even numbers, odd numbers, etc). This is also what the existing `Sequence` would become if we drop support for destructive sequences and do nothing else (note: it would still be possible to accidentally write a `for..in` loop over an infinite sequence).

Okay, but all of these can be represented as infinite Collections.

Let me explain. If a Sequence is truly multipass—that is, you can iterate over it many times and it will always return the same elements—then it must look something like this:

  struct MyElement { … }
  
  struct MyState {
    …
    init() { … }
    mutating func formNextElement(in: MySequence) { … }
    var currentElement(in: MySequence) -> MyElement? { … }
  }

  struct MySequence: MultipassSequence {
    struct Iterator: IteratorProtocol {
      var sequence: MySequence
      var state: MyState
      
      init(sequence: MySequence, initialState: MyState) {
        self.sequence = sequence
        self.state = initialState
      }
      
      mutating func next() -> MyElement? {
        defer { someState.formNextElement(in: sequence) }
        return someState.currentElement(in: sequence)
      }
    }
    
    func formIterator() -> MyIterator {
      return MyIterator(sequence: self, initialState: MyState())
    }
  }

Now, the pieces may *appear* to be different—the Iterator may not actually need a reference to the Sequence, the state may be stored in many properties instead of just one, the code to iterate and get the current element may be directly in `next()` instead of in methods `next()` calls—but any multipass Sequence and Iterator could be refactored to match this pattern, so all multipass Sequences and Iterators are equivalent to it.

But you can convert MySequence into a possibly infinite MyCollection like so:

  struct MyCollection: PossiblyInfiniteCollection {
    enum Index: Comparable {
      var offset: Int
      var state: MyState?
    }
        
    var startIndex: Index {
      return Index(offset: 0, state: MyState())
    }
    var endIndex: Index {
      return Index(offset: Int.max, state: nil)
    }
    
    func formIndex(after index: inout Index) {
      index.state!.formNextElement(in: self)
      if index.state!.currentElement(in: self) == nil {
        index.offset = Int.max
      }
      else {
        index.offset! += 1
      }
    }
    
    subscript(index: Index) -> MyElement {
      return index.state!.currentElement(in: self)
    }
  }
  
  func == (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
    return lhs.offset == rhs.offset
  }
  
  func < (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
    return lhs.offset < rhs.offset
  }

In other words, whatever state (other than the sequence itself) the Iterator needs to keep in its stored properties can instead be stored inside an index, and whatever logic the Iterator needs to implement its `next()` method can instead be split between `formIndex(after:)` and `subscript(_:)`. Any MultipassSequence can be written as a PossiblyInfiniteCollection if you're willing to put in the effort. And this is a boon to its multipass-ness, because you can not only start over again from the beginning, but start over from *any* step in the iteration.

If any MultipassSequence can be written as a PossiblyInfiniteCollection, then we can eliminate another protocol:

                        Sequence
                          / \
                         / \
                        / \
   FiniteSequence PossiblyInfiniteCollection
                       \ /
                         \ /
                          \ /
                         Collection

Now we can move any members which may not terminate early (like `map` or `last`) or are ill-defined when infinite (`count`) into `FiniteSequence` and `Collection`, while leaving any which often do terminate early (like `first` or `index(of:)`) in `Sequence` and `PossiblyInfiniteCollection`.

Is this necessary? I don't really know. Arguably, we could simply provide this hierarchy:

                        Sequence
                          / \
                         / \
                        / \
      IteratorProtocol Collection

Where Iterators must be treated as potentially infinite, Collections are always finite, and you are asked (or, with a "closed protocol" feature, forced) to conform to one or the other, rather than to Sequence directly. Then the non-infinite-safe APIs like `map` can move onto `Collection` and we'll have covered most of the use cases which matter.

···

--
Brent Royal-Gordon
Architechies


(Matthew Johnson) #8

Today, a Sequence differs from a Collection in that:

- A sequence can be infinitely or indefinitely sized, or could require
an O(n) operation to count the values in the sequence.

The latter being no different from Collection.

A collection has a finite number of elements, and the fixed size is
exposed as an O(1) or O(n) operation via ‘count’

I don't believe we've actually nailed down that Collection is finite.

Oh, gee, Nate's documentation edits do
that. (https://github.com/apple/swift/commit/6e274913)
Nate, did we discuss this explicitly or did it slip in unnoticed?

The one crucial distinction in Collection is that you can make multiple
passes over the same elements.

- A collection is indexable, with those indices being usable for
various operations including forming subsets, comparisons, and manual
iteration

- A sequence may or may not be destructive, where a destructive
sequence consumes elements during traversal, making them unavailable
on subsequent traversals. Collection operations are required to be
non-destructive

I would like to Pitch removing this third differentiation, the option
for destructive sequences.

I have been strongly considering this direction myself, and it's
something we need to decide about for Swift 3.

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is
a good idea (along with preventing for..in from being used with an
infinite sequence)

for..in over an infinite sequence is not necessarily wrong. I'm not
confident it should be prevented. It's certainly less dangerous than
using `forEach`, from which there is no possibility of escape.

Sure, it’s not wrong in the sense that sometimes an infinite loop is valid. But I think it would almost always be wrong (an accident) in practice. If you really want to loop over an infinite sequence maybe it’s a good thing to require you to do that explicitly.

My main motivation for proposing this is the potential for developer
confusion. As stated during one of the previous threads on the naming
of map, flatMap, filter, etc. methods on Sequence, Sequence has a
naming requirement not typical of the rest of the Swift standard
library in that many methods on Sequence may or may not be
destructive. As such, naming methods for any extensions on Sequence is
challenging as the names need to not imply immutability.

I don't think the names are really the worst potential cause of
confusion here. There's also the fact that you can conform to Sequence
with a destructively-traversed “value type” that has no mutating
methods.

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that
might need to iterate a sequence more than once. You currently have
to over-constrain types to `Collection` even if you don’t need to do
anything other than iterate the elements (the discussion about whether
`LazyFilterSequnce` has a bug in its `underestimateCount` is relevant
here).

That's not an over-constraint. Multi-pass-ness *is* the fundamental
distinction between Sequence and Collection. AFAIK there's no multipass
sequence that cannot support indices.

If we do nail down that Collection has to be finite then this is not the case. There are infinite multipass sequences.

If we require Sequence to be multipass (that is the same as removing destructive consumption, isn’t it?) then what would be the distinction between Sequence and Collection? The fact that we are making Collection finite?

It would still be possible to have Generators which operate

<Ahem> “Iterators,” please.

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”
* If not, would you remove Collection?
* What role would Iterator play?

If we’re going to consider alternative designs it is worth considering
the semantic space available. For the sake of discussion, here is a
model that captures the various semantics that exist (the names are
just strawmen):

                          Iterable
                          / \
                         / \
                        / \
   FiniteIterable MultipassIterable
                       \ /
                         \ /
                          \ /
                         Sequence
                                >
                                >
                         Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond
iteration are required. Infinite, single-pass “sequences” may
conform.

Just to keep things straight, let's please adjust one thing at a time.
Adding new concepts should be separated from renaming existing ones. I
don't want to have to qualify “Sequence” every time I use it in this
thread to clarify whether I mean the existing one or what you're
proposing.

I should have avoided using the name “Sequence” here just to keep things more clear. Sorry about that.

`for..in` naturally requires `FiniteIterable`,

I'm less than confident.

See above.

FWIW, I'm also not entirely confident that single-pass things should be
part of *this* picture at all. It might be that single-pass things
should be entirely separate and forced to be reference types where
traversal mutates the instance.

This seems like a reasonable direction.

E.g., the current Iterator could gain a
class constraint and become the only representation of single-pass
sequences.

Hmm. I would have to give this more thought. Do we really want to require all conformances of `Iterator` to be reference types? What would be the performance impact of that?

but does not require the `MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These
include any dynamically generated sequence, such as a mathematical
sequence (even numbers, odd numbers, etc).

These are all potential Collections, AFAICT. I see no reason they
shouldn't be treated as such.

If we allow Collections to be infinite.

This is also what the existing `Sequence` would become if we drop
support for destructive sequences and do nothing else (note: it would
still be possible to accidentally write a `for..in` loop over an
infinite sequence).

Under this model `Sequence` brings together `FiniteIterable` and
`MultipassIterable`. This describes the most common models of
`Sequence`, can safely be used in a `for..in` loop, and does support
“destructive” single pass sequences.

`FiniteIterable` and `MultipassIterable` introduce independent and
important semantic requirements. If we’re going to consider changes
here, I think it is worth at least considering introducing the
distinction.

This is obviously much more complex than than the current design.

Another reason I'm reluctant. Whatever we do, IMO, should simplify things.

Sure. I’m not actually proposing we implement this model, I am just trying to explore the design space and think through how we can simplify by introducing desirable semantic requirements.

I think part of the complexity of the current model is due to the protocols not introducing semantic requirements that the most heavily used models actually support. Making it clear where multipass is valid and making a distinction between finite and infinite sequences will simplify things IMO. Figuring out the best way to organize them is the tricky part. I think Brent is heading in the right direction in his post.

···

On Jun 22, 2016, at 6:41 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Wed Jun 22 2016, Matthew Johnson <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Wed Jun 22 2016, David Waite <swift-evolution@swift.org> wrote:

The most obvious simplification would be to drop `Iterable` if we
don’t have any compelling use cases for infinite, single pass
sequences. One downside to doing this is that the syntactic
requirements would need to be repeated in both `FiniteIterable` and
`MultipassIterable`

Another obvious simplification would be to also remove `Sequence`
(which becomes a “convenience” protocol under this model) and require
types that can conform to both `FiniteIterable` and
`MultipassIterable` to do so directly.

If chose to make both simplifications we could also rename the
remaining `FiniteIterable` and `MultipassIterable` to something
simpler like `Iterable` and `Sequence`.

              (for..in) (the existing `Sequence` with an additional multipass semantic requirement)
              Iterable Sequence
                       \ /
                         \ /
                          \ /
                         Collection

I’m interested in hearing what others think about this way of thinking about the available design space.

-Matthew

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Matthew Johnson) #9

I think you're tremendously overcomplicating the design.

This isn’t intended as a proposed design. It is intended to start a discussion exploring the design space.

If we’re going to consider alternative designs it is worth considering the semantic space available. For the sake of discussion, here is a model that captures the various semantics that exist (the names are just strawmen):

                         Iterable
                         / \
                        / \
                       / \
  FiniteIterable MultipassIterable
                      \ /
                        \ /
                         \ /
                        Sequence
                               >
                               >
                        Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond iteration are required. Infinite, single-pass “sequences” may conform.

Okay, so let's start by undoing that renaming to make clear which semantics in this hierarchy are actually new and which ones are already part of the standard library.

                       Sequence
                         / \
                        / \
                       / \
  FiniteSequence MultipassSequence
                      \ /
                        \ /
                         \ /
              FiniteMultipassSequence
                               >
                               >
                        Collection

Yes, I should have done it this way to begin with.

`for..in` naturally requires `FiniteIterable`,

Why? There's nothing conceptually incoherent about looping over an infinite sequence. Even ignoring errors, you can `break` out, you can `exit()` from an interior method, you can abort or fail a precondition, or you can just let it loop forever. Forbidding potentially-infinite for loops `for` loops is an arbitrary limitation.

The argument for doing this is that `for..in` is syntactic sugar and looping over an infinite sequence using it is often going to be a mistake. I think it’s reasonable to suggest that it be a little be more difficult to write an infinite loop in order to prevent one from doing so accidentally. The position you and Dave have taken is also reasonable.

but does not require the `MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These include any dynamically generated sequence, such as a mathematical sequence (even numbers, odd numbers, etc). This is also what the existing `Sequence` would become if we drop support for destructive sequences and do nothing else (note: it would still be possible to accidentally write a `for..in` loop over an infinite sequence).

Okay, but all of these can be represented as infinite Collections.

Let me explain. If a Sequence is truly multipass—that is, you can iterate over it many times and it will always return the same elements—then it must look something like this:

  struct MyElement { … }
  
  struct MyState {
    …
    init() { … }
    mutating func formNextElement(in: MySequence) { … }
    var currentElement(in: MySequence) -> MyElement? { … }
  }

  struct MySequence: MultipassSequence {
    struct Iterator: IteratorProtocol {
      var sequence: MySequence
      var state: MyState
      
      init(sequence: MySequence, initialState: MyState) {
        self.sequence = sequence
        self.state = initialState
      }
      
      mutating func next() -> MyElement? {
        defer { someState.formNextElement(in: sequence) }
        return someState.currentElement(in: sequence)
      }
    }
    
    func formIterator() -> MyIterator {
      return MyIterator(sequence: self, initialState: MyState())
    }
  }

Now, the pieces may *appear* to be different—the Iterator may not actually need a reference to the Sequence, the state may be stored in many properties instead of just one, the code to iterate and get the current element may be directly in `next()` instead of in methods `next()` calls—but any multipass Sequence and Iterator could be refactored to match this pattern, so all multipass Sequences and Iterators are equivalent to it.

But you can convert MySequence into a possibly infinite MyCollection like so:

  struct MyCollection: PossiblyInfiniteCollection {
    enum Index: Comparable {
      var offset: Int
      var state: MyState?
    }
        
    var startIndex: Index {
      return Index(offset: 0, state: MyState())
    }
    var endIndex: Index {
      return Index(offset: Int.max, state: nil)
    }
    
    func formIndex(after index: inout Index) {
      index.state!.formNextElement(in: self)
      if index.state!.currentElement(in: self) == nil {
        index.offset = Int.max
      }
      else {
        index.offset! += 1
      }
    }
    
    subscript(index: Index) -> MyElement {
      return index.state!.currentElement(in: self)
    }
  }
  
  func == (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
    return lhs.offset == rhs.offset
  }
  
  func < (lhs: MyCollection.Index, rhs: MyCollection.Index) -> Bool {
    return lhs.offset < rhs.offset
  }

In other words, whatever state (other than the sequence itself) the Iterator needs to keep in its stored properties can instead be stored inside an index, and whatever logic the Iterator needs to implement its `next()` method can instead be split between `formIndex(after:)` and `subscript(_:)`. Any MultipassSequence can be written as a PossiblyInfiniteCollection if you're willing to put in the effort. And this is a boon to its multipass-ness, because you can not only start over again from the beginning, but start over from *any* step in the iteration.

If any MultipassSequence can be written as a PossiblyInfiniteCollection, then we can eliminate another protocol:

                       Sequence
                         / \
                        / \
                       / \
  FiniteSequence PossiblyInfiniteCollection
                      \ /
                        \ /
                         \ /
                        Collection

Now we can move any members which may not terminate early (like `map` or `last`) or are ill-defined when infinite (`count`) into `FiniteSequence` and `Collection`, while leaving any which often do terminate early (like `first` or `index(of:)`) in `Sequence` and `PossiblyInfiniteCollection`.

Is this necessary? I don't really know. Arguably, we could simply provide this hierarchy:

                       Sequence
                         / \
                        / \
                       / \
     IteratorProtocol Collection

Where Iterators must be treated as potentially infinite, Collections are always finite, and you are asked (or, with a "closed protocol" feature, forced) to conform to one or the other, rather than to Sequence directly. Then the non-infinite-safe APIs like `map` can move onto `Collection` and we'll have covered most of the use cases which matter.

Thanks! This is exactly the kind of discussion I intended to encourage with my post.

This looks pretty good to me but it doesn’t address the issue that initiated this thread - removing destructive consumption (non-multipass) from Sequence. How do envision that fitting in?

Can you also elaborate on what you envision `IteratorProtocol` would look like in this picture? You depict it refining `Sequence`. That is not the current design in the standard library so it isn’t immediately clear to me.

-Matthew

···

On Jun 22, 2016, at 7:27 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

--
Brent Royal-Gordon
Architechies


(Andrew Bennett) #10

I agree that there should be a type for a non-destructive re-entrant
sequence.

IIRC in past discussions one of the counter arguments given was IO stream
sequences. It is likely undesirable to buffer the entire stream, but
without buffering there's no guarantee of getting the same values.

These discussions were back when swift-evolution started, sorry I couldn't
find a link. I think it was discussed in the context of a non-mutating
Generator.

···

On Thursday, 23 June 2016, Matthew Johnson via swift-evolution < swift-evolution@swift.org> wrote:

> On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution < > swift-evolution@swift.org <javascript:;>> wrote:
>
>
> on Wed Jun 22 2016, David Waite <swift-evolution@swift.org > <javascript:;>> wrote:
>
>> Today, a Sequence differs from a Collection in that:
>>
>> - A sequence can be infinitely or indefinitely sized, or could require
>> an O(n) operation to count the values in the sequence.
>
> The latter being no different from Collection.
>
>> A collection has a finite number of elements, and the fixed size is
>> exposed as an O(1) or O(n) operation via ‘count’
>
> I don't believe we've actually nailed down that Collection is finite.
>
> Oh, gee, Nate's documentation edits do
> that. (https://github.com/apple/swift/commit/6e274913)
> Nate, did we discuss this explicitly or did it slip in unnoticed?
>
> The one crucial distinction in Collection is that you can make multiple
> passes over the same elements.
>
>> - A collection is indexable, with those indices being usable for
>> various operations including forming subsets, comparisons, and manual
>> iteration
>>
>> - A sequence may or may not be destructive, where a destructive
>> sequence consumes elements during traversal, making them unavailable
>> on subsequent traversals. Collection operations are required to be
>> non-destructive
>>
>> I would like to Pitch removing this third differentiation, the option
>> for destructive sequences.
>
> I have been strongly considering this direction myself, and it's
> something we need to decide about for Swift 3.

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is a
good idea (along with preventing for..in from being used with an infinite
sequence)

>
>> My main motivation for proposing this is the potential for developer
>> confusion. As stated during one of the previous threads on the naming
>> of map, flatMap, filter, etc. methods on Sequence, Sequence has a
>> naming requirement not typical of the rest of the Swift standard
>> library in that many methods on Sequence may or may not be
>> destructive. As such, naming methods for any extensions on Sequence is
>> challenging as the names need to not imply immutability.
>
> I don't think the names are really the worst potential cause of
> confusion here. There's also the fact that you can conform to Sequence
> with a destructively-traversed “value type” that has no mutating
> methods.

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that might
need to iterate a sequence more than once. You currently have to
over-constrain types to `Collection` even if you don’t need to do anything
other than iterate the elements (the discussion about whether
`LazyFilterSequnce` has a bug in its `underestimateCount` is relevant here).

>
>> It would still be possible to have Generators which operate
>
> <Ahem> “Iterators,” please.
>
>> destructively, but such Generators would not conform to the needs of
>> Sequence. As such, the most significant impact would be the inability
>> to use such Generators in a for..in loop,
>
> Trying to evaluate this statement, it's clear we're missing lots of
> detail here:
>
> * Would you remove Sequence?
> * If so, what Protocol would embody “for...in-able?”
> * If not, would you remove Collection?
> * What role would Iterator play?

If we’re going to consider alternative designs it is worth considering the
semantic space available. For the sake of discussion, here is a model that
captures the various semantics that exist (the names are just strawmen):

                           Iterable
                           / \
                          / \
                         / \
    FiniteIterable MultipassIterable
                        \ /
                          \ /
                           \ /
                          Sequence
                                 >
                                 >
                          Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond
iteration are required. Infinite, single-pass “sequences” may conform.

`for..in` naturally requires `FiniteIterable`, but does not require the
`MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These
include any dynamically generated sequence, such as a mathematical sequence
(even numbers, odd numbers, etc). This is also what the existing
`Sequence` would become if we drop support for destructive sequences and do
nothing else (note: it would still be possible to accidentally write a
`for..in` loop over an infinite sequence).

Under this model `Sequence` brings together `FiniteIterable` and
`MultipassIterable`. This describes the most common models of `Sequence`,
can safely be used in a `for..in` loop, and does support “destructive”
single pass sequences.

`FiniteIterable` and `MultipassIterable` introduce independent and
important semantic requirements. If we’re going to consider changes here,
I think it is worth at least considering introducing the distinction.

This is obviously much more complex than than the current design. The
most obvious simplification would be to drop `Iterable` if we don’t have
any compelling use cases for infinite, single pass sequences. One downside
to doing this is that the syntactic requirements would need to be repeated
in both `FiniteIterable` and `MultipassIterable`

Another obvious simplification would be to also remove `Sequence` (which
becomes a “convenience” protocol under this model) and require types that
can conform to both `FiniteIterable` and `MultipassIterable` to do so
directly.

If chose to make both simplifications we could also rename the remaining
`FiniteIterable` and `MultipassIterable` to something simpler like
`Iterable` and `Sequence`.

               (for..in) (the existing `Sequence` with an
additional multipass semantic requirement)
               Iterable Sequence
                        \ /
                          \ /
                           \ /
                          Collection

I’m interested in hearing what others think about this way of thinking
about the available design space.

-Matthew

>
>
> --
> Dave
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org <javascript:;>
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <javascript:;>
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Waite) #11

Just wanted to communicate more around this point specifically. If Collections can be infinite (probably adding special meaning to count returning Collection.IndexDistance.max), then the only difference between Sequence and Collection is the indexes.

However, I’m concerned about the delta between an iterator and a full collection. For an example:

class FibIterator : IteratorProtocol {
  var last = 0, current = 1

  func next() -> Int? {
    (last, current) = (current, current + last)
    return current
  }
}

If subscript indexing on collections isn't required to be an O(1) operation, I don’t see a reason for Sequence to exist - we can simply enumerate the sequence with a numeric index, and iterate up to that count to resolve. But this makes things surprising for those implementing generic algorithms across Collections.

I don’t see a way to get an O(1) integer index and meet all the efficiency constraints of Collection without either memoization or additional complexity on implementing FibIterator.

1. we could use integer indexes and use a non-recursive technique for calculating the fibonacci value at the specified index. FibIterator basically is rewritten into a function “fibonacci(at index:Int)”.

2. We could use a copy of FibIterator as the index, since it is a value type. FibIterator would need to implement Comparable.

  2a. We make the index InfiniteSequenceIndex< FibIterator >, where the wrapper is there to define a consistent EndIndex value.

  However, Collection’s index(_:offsetBy) allows for negative offsets, which would not be accomplished in O(n).

  2b. If FibIterator gains an extra method to become bidirectional we can support index(_:offsetBy) in O(n) time. Note that you would probably want to have a bidirectional iterator define its own endIndex.

-DW

···

On Jun 22, 2016, at 5:41 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that
might need to iterate a sequence more than once. You currently have
to over-constrain types to `Collection` even if you don’t need to do
anything other than iterate the elements (the discussion about whether
`LazyFilterSequnce` has a bug in its `underestimateCount` is relevant
here).

That's not an over-constraint. Multi-pass-ness *is* the fundamental
distinction between Sequence and Collection. AFAIK there's no multipass
sequence that cannot support indices.


(Dave Abrahams) #12

That would be wrong unless there exist substantial examples of a
multipass Sequence that *can't* meet the other requirements of
Collection without loss of efficiency. And since I can write an adaptor
that turns any multipass sequence into a Collection, I think it's
trivial to prove that no such examples exist.

···

on Wed Jun 22 2016, David Waite <david-AT-alkaline-solutions.com> wrote:

On Jun 22, 2016, at 2:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

<Ahem> “Iterators,” please.

That makes me happy - for some reason I thought it was still GeneratorProtocol

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”

No, I would just remove the allowance in the documentation and API
design for a destructive/consuming iteration. Sequence would be the
interface to getting access to repeatable iteration, without the need
for meeting the other requirements for Collection.

--
-Dave


(David Waite) #13

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is
a good idea (along with preventing for..in from being used with an
infinite sequence)

for..in over an infinite sequence is not necessarily wrong. I'm not
confident it should be prevented. It's certainly less dangerous than
using `forEach`, from which there is no possibility of escape.

Sure, it’s not wrong in the sense that sometimes an infinite loop is valid. But I think it would almost always be wrong (an accident) in practice. If you really want to loop over an infinite sequence maybe it’s a good thing to require you to do that explicitly.

The problem is that the only way to know a loop is infinite reliably is to have the developer opt-in.

Also, operations on said infinite sequence might or might not be infinite - a Fibonacci sequence.first { $0 == 0 } would go forever since the sequence starts with 1 and never decreases.

Then we have the issue that we have finite sequences which will still take longer than the typical hardware lifetime.

let x = 0..<Int.max #9223372036854775806
foo(1..<Int.max) # even if foo contains a loop which evaluates a billion elements a second, will take ~292 years to complete

In UI programming unintentionally expensive operations are easier to detect since the runloop forms a natural watchdog. This is how spinning beachballs are born.

That's not an over-constraint. Multi-pass-ness *is* the fundamental
distinction between Sequence and Collection. AFAIK there's no multipass
sequence that cannot support indices.

That indices are expected to be O(1) in Collection (under penalty of documentation and/or surprise) means that there is an impact above just implementing an Iterator for implementing a Collection, is there not?

FWIW, I'm also not entirely confident that single-pass things should be
part of *this* picture at all. It might be that single-pass things
should be entirely separate and forced to be reference types where
traversal mutates the instance.

This seems like a reasonable direction.

E.g., the current Iterator could gain a
class constraint and become the only representation of single-pass
sequences.

Hmm. I would have to give this more thought. Do we really want to require all conformances of `Iterator` to be reference types? What would be the performance impact of that?

I am also unclear here - are you talking of moving the current functionality used for Iterator someplace else, or creating a new SinglePassIteratorProtocol?

-DW

···

On Jun 22, 2016, at 7:12 PM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

On Jun 22, 2016, at 6:41 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
on Wed Jun 22 2016, Matthew Johnson <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(Andrew Bennett) #14

Could we have a BufferedSequence type that is identity in most cases?

···

On Thursday, 23 June 2016, Andrew Bennett <cacoyi@gmail.com> wrote:

I agree that there should be a type for a non-destructive re-entrant
sequence.

IIRC in past discussions one of the counter arguments given was IO stream
sequences. It is likely undesirable to buffer the entire stream, but
without buffering there's no guarantee of getting the same values.

These discussions were back when swift-evolution started, sorry I couldn't
find a link. I think it was discussed in the context of a non-mutating
Generator.

On Thursday, 23 June 2016, Matthew Johnson via swift-evolution < > swift-evolution@swift.org > <javascript:_e(%7B%7D,'cvml','swift-evolution@swift.org');>> wrote:

> On Jun 22, 2016, at 3:57 PM, Dave Abrahams via swift-evolution < >> swift-evolution@swift.org> wrote:
>
>
> on Wed Jun 22 2016, David Waite <swift-evolution@swift.org> wrote:
>
>> Today, a Sequence differs from a Collection in that:
>>
>> - A sequence can be infinitely or indefinitely sized, or could require
>> an O(n) operation to count the values in the sequence.
>
> The latter being no different from Collection.
>
>> A collection has a finite number of elements, and the fixed size is
>> exposed as an O(1) or O(n) operation via ‘count’
>
> I don't believe we've actually nailed down that Collection is finite.
>
> Oh, gee, Nate's documentation edits do
> that. (https://github.com/apple/swift/commit/6e274913)
> Nate, did we discuss this explicitly or did it slip in unnoticed?
>
> The one crucial distinction in Collection is that you can make multiple
> passes over the same elements.
>
>> - A collection is indexable, with those indices being usable for
>> various operations including forming subsets, comparisons, and manual
>> iteration
>>
>> - A sequence may or may not be destructive, where a destructive
>> sequence consumes elements during traversal, making them unavailable
>> on subsequent traversals. Collection operations are required to be
>> non-destructive
>>
>> I would like to Pitch removing this third differentiation, the option
>> for destructive sequences.
>
> I have been strongly considering this direction myself, and it's
> something we need to decide about for Swift 3.

I believe this is a problem that should be solved.

I also believe distinguishing between finite and infinite sequences is a
good idea (along with preventing for..in from being used with an infinite
sequence)

>
>> My main motivation for proposing this is the potential for developer
>> confusion. As stated during one of the previous threads on the naming
>> of map, flatMap, filter, etc. methods on Sequence, Sequence has a
>> naming requirement not typical of the rest of the Swift standard
>> library in that many methods on Sequence may or may not be
>> destructive. As such, naming methods for any extensions on Sequence is
>> challenging as the names need to not imply immutability.
>
> I don't think the names are really the worst potential cause of
> confusion here. There's also the fact that you can conform to Sequence
> with a destructively-traversed “value type” that has no mutating
> methods.

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that might
need to iterate a sequence more than once. You currently have to
over-constrain types to `Collection` even if you don’t need to do anything
other than iterate the elements (the discussion about whether
`LazyFilterSequnce` has a bug in its `underestimateCount` is relevant here).

>
>> It would still be possible to have Generators which operate
>
> <Ahem> “Iterators,” please.
>
>> destructively, but such Generators would not conform to the needs of
>> Sequence. As such, the most significant impact would be the inability
>> to use such Generators in a for..in loop,
>
> Trying to evaluate this statement, it's clear we're missing lots of
> detail here:
>
> * Would you remove Sequence?
> * If so, what Protocol would embody “for...in-able?”
> * If not, would you remove Collection?
> * What role would Iterator play?

If we’re going to consider alternative designs it is worth considering
the semantic space available. For the sake of discussion, here is a model
that captures the various semantics that exist (the names are just
strawmen):

                           Iterable
                           / \
                          / \
                         / \
    FiniteIterable MultipassIterable
                        \ /
                          \ /
                           \ /
                          Sequence
                                 >
                                 >
                          Collection

`Iterable` corresponds to the current `Sequence` - no semantics beyond
iteration are required. Infinite, single-pass “sequences” may conform.

`for..in` naturally requires `FiniteIterable`, but does not require the
`MultipassIterable`.

There are many interesting infinite `MultipassIterable` types. These
include any dynamically generated sequence, such as a mathematical sequence
(even numbers, odd numbers, etc). This is also what the existing
`Sequence` would become if we drop support for destructive sequences and do
nothing else (note: it would still be possible to accidentally write a
`for..in` loop over an infinite sequence).

Under this model `Sequence` brings together `FiniteIterable` and
`MultipassIterable`. This describes the most common models of `Sequence`,
can safely be used in a `for..in` loop, and does support “destructive”
single pass sequences.

`FiniteIterable` and `MultipassIterable` introduce independent and
important semantic requirements. If we’re going to consider changes here,
I think it is worth at least considering introducing the distinction.

This is obviously much more complex than than the current design. The
most obvious simplification would be to drop `Iterable` if we don’t have
any compelling use cases for infinite, single pass sequences. One downside
to doing this is that the syntactic requirements would need to be repeated
in both `FiniteIterable` and `MultipassIterable`

Another obvious simplification would be to also remove `Sequence` (which
becomes a “convenience” protocol under this model) and require types that
can conform to both `FiniteIterable` and `MultipassIterable` to do so
directly.

If chose to make both simplifications we could also rename the remaining
`FiniteIterable` and `MultipassIterable` to something simpler like
`Iterable` and `Sequence`.

               (for..in) (the existing `Sequence` with an
additional multipass semantic requirement)
               Iterable Sequence
                        \ /
                          \ /
                           \ /
                          Collection

I’m interested in hearing what others think about this way of thinking
about the available design space.

-Matthew

>
>
> --
> Dave
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Patrick Smith) #15

There’s a great discussion going on here, and I don’t mean to distract. But I want to ask: to get an infinite sequence using the Swift standard library, you’re going to have to create one yourself, say using the new `sequence()` functions? There’s nothing built in that is already infinite, so people will know what they are creating, that’s the explicit step, and there’s no need to have safety to infinitely loop over something. And if someone accidentally creates an unwanted infinite loop, that’s a fix in the creation of the sequence, not in its looping.

Patrick

···

On 23 Jun 2016, at 11:12 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

Sure, it’s not wrong in the sense that sometimes an infinite loop is valid. But I think it would almost always be wrong (an accident) in practice. If you really want to loop over an infinite sequence maybe it’s a good thing to require you to do that explicitly.


(Dave Abrahams) #16

It's not necessarily a problem. In almost any context where Iterators
are used they could be promoted by the optimizer from the heap to the
stack.

···

on Wed Jun 22 2016, Matthew Johnson <swift-evolution@swift.org> wrote:

E.g., the current Iterator could gain a
class constraint and become the only representation of single-pass
sequences.

Hmm. I would have to give this more thought. Do we really want to
require all conformances of `Iterator` to be reference types? What
would be the performance impact of that?

--
-Dave


(Dave Abrahams) #17

Indices don't have to be integers, and often arent. You can always
bundle an integer with an Iterator and a single result buffer to produce
an Index.

···

on Fri Jun 24 2016, David Waite <david-AT-alkaline-solutions.com> wrote:

On Jun 22, 2016, at 5:41 PM, Dave Abrahams via swift-evolution > <swift-evolution@swift.org> wrote:

I agree, names are not the primary issue.

Another issue is that you cannot currently write generic code that
might need to iterate a sequence more than once. You currently

have

to over-constrain types to `Collection` even if you don’t need to

do

anything other than iterate the elements (the discussion about

whether

`LazyFilterSequnce` has a bug in its `underestimateCount` is

relevant

here).

That's not an over-constraint. Multi-pass-ness *is* the fundamental
distinction between Sequence and Collection. AFAIK there's no

multipass

sequence that cannot support indices.

Just wanted to communicate more around this point specifically. If
Collections can be infinite (probably adding special meaning to count
returning Collection.IndexDistance.max), then the only difference
between Sequence and Collection is the indexes.

However, I’m concerned about the delta between an iterator and a full
collection. For an example:

class FibIterator : IteratorProtocol {
  var last = 0, current = 1

  func next() -> Int? {
    (last, current) = (current, current + last)
    return current
  }
}

If subscript indexing on collections isn't required to be an O(1)
operation, I don’t see a reason for Sequence to exist - we can simply
enumerate the sequence with a numeric index, and iterate up to that
count to resolve. But this makes things surprising for those
implementing generic algorithms across Collections.

I don’t see a way to get an O(1) integer index and meet all the
efficiency constraints of Collection without either memoization or
additional complexity on implementing FibIterator.

--
-Dave


(Russ Bishop) #18

E.g., the current Iterator could gain a
class constraint and become the only representation of single-pass
sequences.

Hmm. I would have to give this more thought. Do we really want to require all conformances of `Iterator` to be reference types? What would be the performance impact of that?

I wouldn’t think so.

destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,

Trying to evaluate this statement, it's clear we're missing lots of
detail here:

* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”

No, I would just remove the allowance in the documentation and API design for a destructive/consuming iteration. Sequence would be the interface to getting access to repeatable iteration, without the need for meeting the other requirements for Collection.

Sequence would be what java would refer to as Iterable, C# refers to as IEnumerable<>, but perhaps it is closest to the Enumerable mixin module in Ruby in terms of default utility methods supplied based on an implementation of ‘each’ (forEach() in Swift). Sequence is for…in-able because Sequence defined makeIterator(). The only difference is that subsequent calls to makeIterator() are expected to return conceptually equivalent elements in the same order.
-DW

IEnumerable<> does not guarantee the ability to repeatedly enumerate, generally around I/O which is the same place I use a destructively iterating sequence in Swift - lazily enumerating rows from a database query.

Russ

···

On Jun 22, 2016, at 6:12 PM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote: