Proposal: CollectionType.cycle property for an infinite sequence


(Lily Ballard) #1

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
    public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard


[Starter Pitch] Introducing a `cycled` method to `Sequence`
Pitch: Sequence enhancements: chaining, repeating, batching
[Starter Pitch] Introducing a `cycled` method to `Sequence`
(Robert Widmann) #2

+1. Stream support is long overdue.

~Robert Widmann

2015/12/28 2:20、Kevin Ballard via swift-evolution <swift-evolution@swift.org> のメッセージ:

···

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
   public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(plx) #3

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

That opinion could change as the language changes or the standard library evolves.

···

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
   public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #4

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

The name should give an explicit indication that the result is infinite, e.g. “repeatedForever".

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

If this is, as I suspect, the primary use case, I would prefer a much clearer incantation than CollectionOfOne(x).repeatedForever. The part before the parentheses is mostly irrelevant. I suppose [x].repeatedForever could work, but needing an array allocation makes me sad. I wish I had something better to suggest… This may in fact be the best we can do.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
   public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee).

You can copy the elements into an array in that case. Whether or not we should provide that is an open question, but we do similar things for, e.g., reverse().

The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Yeah… I’m not sure we want to do so, but we should consider loosening that requirement. After all, x.repeatedForever.prefix(1000) is in principle a perfectly cromulent collection that shouldn’t require copying x’s elements into new storage.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions.

It would arguably be more appropriate to only provide repeatedForever on instances of LazySequenceType. The idea has always been that users are surprised when they see their map/filter closure’s side-effects happen multiple times, or at odd times, so we make them write “.lazy” to specifically opt into that behavior. I’ve always had mixed feelings about this, thinking that maybe it would be better to educate people about laziness, but that’s what we currently do.

We could weasel out of the “multiple side-effects” problem by declaring that since the result is not a collection you can only make one pass through any part of the result, so if your side-effect over-fires, it’s on you. I wouldn’t be in favor of making this sequence *actually* single-pass though, and this doesn’t solve the “side-effects at odd times” issue.

Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

Why do this? What would these eager APIs look like?

···

On Dec 27, 2015, at 11:20 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #5

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built" to handle this is that there are multiple constructs that attempt to eagerly consume an entire sequence, and using these with an infinite sequence would end up looping forever. But I don't consider that to be a particularly serious problem. You could "fix" it by refactoring SequenceType into two protocols, SequenceType (for possibly-infinite sequences) and FiniteSequenceType (for known-finite sequences) and then going over the entire standard library and updating various spots to use FiniteSequenceType, except this would be very limiting (sequences that are not known if they're infinite to the compiler could still be valid for various eager algorithms if the programmer knows it will be finite in practice).

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

You can only do it for the ones defined in the protocol, not ones defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence, although it does need a custom implementation. split is bounded by at most Int.max splits, which means it is guaranteed to terminate even for an infinite sequence (although the final subsequence does need to be infinite[1]). Even if there are no separators in the cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the remainder into an Array and returns that (wrapped in AnySequence), which is curious because it would be perfectly legal for it to just wrap the generator up in AnySequence and return that instead. I'm tempted to submit a PR to change that now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them with two infinite sequences? It's not so much that they're broken as it is that you're creating an infinite loop without any way to break out. And FWIW, lexicographicalCompare is actually something you might want to explicitly support on infinite sequences if you know your sequences aren't equal and want to find out which one is less than the other.

There are plenty of ways to shoot yourself in the foot. I don't think infinite sequences are really the big problem you're making them out to be.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're not members of the protocol, they're just extensions. You could implement them such that your implementation is called when working on the concrete CycleSequence type, but I'm not sure if that's a great idea to do that when the actual behavior differs from calling it generically on SequenceType (well, triggering a fatalError() instead of an infinite loop is fine because they're both Bottom, but returning a valid result in one context and looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol itself, which would let us override them. I'm not entirely sure why they aren't in the protocol to begin with, I guess because there's not much need for overriding these outside of infinite sequences (well, finite sorted sequences could provide an optimized min/maxElement, and an optimized version of contains(_: Self.Generator.Element), but maybe there's tradeoffs to doing this, e.g. maybe there's some reason why having a large protocol witness table is a bad idea).

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

Ultimately, there's not much difference between an infinite sequence and a sequence of Int.max elements. The latter is finite, but it's so massive (especially on 64-bit) that any kind of eager processing is going to hit the same problems as an infinite sequence. Every problem you describe will be a problem with the simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

···

On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

That opinion could change as the language changes or the standard library evolves.

> On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:
>
> ## Introduction
>
> Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.
>
> ## Motivation
>
> It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.
>
> ## Proposed solution
>
> Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.
>
> ## Detailed design
>
> 2 new types would be added:
>
> struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
> struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }
>
> CollectionType would be extended with a property:
>
> extension CollectionType {
> public var cycle: CycleSequence<Self> { get }
> }
>
> This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.
>
> Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.
>
> ## Impact on existing code
>
> None
>
> -Kevin Ballard
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #6

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

-Dave

···

On Dec 27, 2015, at 11:35 PM, Developer via swift-evolution <swift-evolution@swift.org> wrote:

+1. Stream support is long overdue.


(Lily Ballard) #7

>
> ## Introduction
>
> Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

The name should give an explicit indication that the result is infinite, e.g. “repeatedForever".

All of the precedent I'm aware of calls this operation "cycle".

Haskell: http://hackage.haskell.org/package/base-4.8.1.0/docs/Prelude.html#v:cycle
Rust: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.cycle
Ruby: http://ruby-doc.org/core-2.2.3/Enumerable.html#method-i-cycle
Python: https://docs.python.org/3/library/itertools.html?highlight=cycle#itertools.cycle
D: http://dlang.org/phobos/std_range.html#cycle

I'm not sure offhand what other languages to even look to for this.

> ## Motivation
>
> It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

If this is, as I suspect, the primary use case, I would prefer a much clearer incantation than CollectionOfOne(x).repeatedForever. The part before the parentheses is mostly irrelevant. I suppose [x].repeatedForever could work, but needing an array allocation makes me sad. I wish I had something better to suggest… This may in fact be the best we can do.

Honestly, I'd actually like to take Repeat and remove the count. I've never used this type and I've never seen any code that uses this type. The current counted behavior of Repeat could then be recovered by saying `Repeat(elt).prefix(count)`. Of course, this would only repeat a single element, so we'd still need Cycle to repeat sequences (which is why I suggested `CollectionOfOne(x).cycle` as that has the same behavior, although of course Repeat then just becomes a special case of `CollectionOfOne(x).cycle.prefix(count)`).

> ## Proposed solution
>
> Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.
>
> ## Detailed design
>
> 2 new types would be added:
>
> struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
> struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }
>
> CollectionType would be extended with a property:
>
> extension CollectionType {
> public var cycle: CycleSequence<Self> { get }
> }
>
> This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee).

You can copy the elements into an array in that case. Whether or not we should provide that is an open question, but we do similar things for, e.g., reverse().

You can, but I prefer not to hide array creation like that whenever possible. If I need to cycle some SequenceType I can just wrap it in Array myself, e.g. `Array(seq).cycle`.

> The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Yeah… I’m not sure we want to do so, but we should consider loosening that requirement. After all, x.repeatedForever.prefix(1000) is in principle a perfectly cromulent collection that shouldn’t require copying x’s elements into new storage.

Loosening this breaks the `count` property. There's no valid value of `count` that can be returned for an infinite sequence. The property could of course just loop infinitely (or fatalError), but that strikes me as being a much worse idea than e.g. map() looping forever, because with map() we have a lazy replacement but there is no lazy replacement for count.

We could of course give CycleSequence a separate prefix() function that returns a collection (but the prefix() from SequenceType must return SubSequence, which cannot be a collection).

> Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions.

It would arguably be more appropriate to only provide repeatedForever on instances of LazySequenceType. The idea has always been that users are surprised when they see their map/filter closure’s side-effects happen multiple times, or at odd times, so we make them write “.lazy” to specifically opt into that behavior. I’ve always had mixed feelings about this, thinking that maybe it would be better to educate people about laziness, but that’s what we currently do.

We could weasel out of the “multiple side-effects” problem by declaring that since the result is not a collection you can only make one pass through any part of the result, so if your side-effect over-fires, it’s on you. I wouldn’t be in favor of making this sequence *actually* single-pass though, and this doesn’t solve the “side-effects at odd times” issue.

Good point. I figured that an infinite sequence is "obviously" lazy, but we do have a precedent right now of requiring `lazy` to get lazy sequences. I also have mixed feelings about this (the strongest argument in favor of the status quo is being able to use @noescape functions for the array versions, but I end up having to use `lazy` very often anyway to avoid intermediate array creation anyway).

The reason why I didn't put this on LazySequenceType to begin with is so far we only require `lazy` directly before invoking an operation that takes a closure. `cycle()` doesn't take a closure, so no need to ask that it be lazy. But since operations chained off of it (like `map()`) should be lazy, it makes sense to require the `lazy` anyway.

> Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

Why do this? What would these eager APIs look like?

The only reason to do this is because the default implementations of the functions will crash the program anyway, after entering an infinite loop and gobbling up memory. So I figured it's nicer to just fatalError() immediately as we can provide a much better error that way.

Of course, technically, these eager functions could terminate anyway, if they throw an error. I still think fatalError() is a good idea because nobody writes code that calls one of these methods and expects to get an error in 100% of cases, but if this is a contentious point we can easily remove the fatalError() implementations from the proposal.

-Kevin Ballard

···

On Tue, Dec 29, 2015, at 04:11 PM, Dave Abrahams wrote:

> On Dec 27, 2015, at 11:20 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:


(Andrew Bennett) #8

+1 looks good, I had a go at implementing it and I think it may require
changes not discussed in the proposal.

You've covered the potential issues fairly well, to be a little more
explicit these are the issues I've found:

1) LazySequenceType's property array cannot be defined without an infinite
sized array.
2) what should [1,2,3].cycle.suffix(4) return? [3,1,2,3] probably has the
least surprises, but it's like asking what's the number before infinity.

3) dropLast should return AnySequence(self), but requires specialisation,
this may have to also be a fatalError (see below).

One issue I don't think you've mentioned, and I don't seem to be able to
resolve is this:

let mySequence = [1,2,3].cycle.dropLast(1)

mySequence.suffix(7)

This could have well defined behaviour (see part 2 above), however the
implementation has some issues.

In this case mySequence is an AnySequence<Int>, mySequence.suffix(7) uses
AnySequence's specialisation and so tries to iterate over the entire
sequence to find the suffix. AnySequence<Int> is type-erased so there's no
way to specialise when the underlying sequence is infinite (to get a valid
implementation of suffix).

Potential solutions:
* Replace erased Any* types with a more flexible alternative that doesn't
remove type information (higher kinded types perhaps).
* Replace SequenceType with two protocols FiniteSequenceType and
InfiniteSequenceType, have type erased versions of each, duplicate all the
things.
* Add a property to SequenceType to indicate if it's definitely finite
(undefined?), AnySequence uses a different backing implementation depending
on this boolean.

Here's the implementation I used in a playground to toy with this:

public struct CycleSequence<Base : CollectionType> : LazySequenceType {

    private let base: Base

    private init(_ collection: Base) {

        self.base = collection

    }

    @warn_unused_result

    public func generate() -> CycleGenerator<Base> {

        return CycleGenerator<Base>(base.generate)

    }

    public var array: [Base.Generator.Element] {

        fatalError("This is undefined!")

    }

}

public struct CycleGenerator<Base : CollectionType> : GeneratorType {

    private let generatorProducer: () -> Base.Generator

    private var generator: Base.Generator

    private init(_ generatorProducer: () -> Base.Generator) {

        self.generatorProducer = generatorProducer

        self.generator = generatorProducer()

    }

    @warn_unused_result

    public mutating func next() -> Base.Generator.Element? {

        if let element = generator.next() {

            return element

        }

        generator = generatorProducer()

        return generator.next()

    }

}

extension CycleSequence {

    @warn_unused_result

    public func dropLast(n: Int) -> AnySequence<Base.Generator.Element> {

        return AnySequence(self)

    }

    @warn_unused_result

    public func suffix(maxLength: Int) -> AnySequence<Base.Generator.Element>
{

        let maxCount = base.count.toIntMax()

        let sequenceLength = maxCount >= Int.max.toIntMax() ? Int.max : Int
(maxCount)

        if maxLength < sequenceLength {

            return AnySequence(base.suffix(maxLength))

        }

        return self.dropFirst(sequenceLength - (maxLength %
sequenceLength)).prefix(maxLength)

    }

}

extension CollectionType {

    public var cycle: CycleSequence<Self> { return CycleSequence(self) }

}

// this produces an infinite loop when evaluating .suffix(7)

let cycle = ["a", "b", "c"].cycle

cycle.dropLast(1).suffix(7).forEach { print("suffix: \($0)") }

···

On Mon, Dec 28, 2015 at 6:35 PM, Developer via swift-evolution < swift-evolution@swift.org> wrote:

+1. Stream support is long overdue.

~Robert Widmann

2015/12/28 2:20、Kevin Ballard via swift-evolution <
swift-evolution@swift.org> のメッセージ:

> ## Introduction
>
> Add a new property `cycle` to CollectionType that returns an infinite
SequenceType that yields the elements of the collection in a loop.
>
> ## Motivation
>
> It's sometimes useful to be able to have an infinite sequence. For
example, `CollectionOfOne(x).cycle` could be used to have an infinite
sequence of a single element (similar to Repeat but without a count). A
common use for infinite sequences is zipping with a finite sequence. As far
as I'm aware, the stdlib does not currently provide any way to create such
an infinite sequence.
>
> ## Proposed solution
>
> Extend CollectionType with a new property `cycle` that yields a type
that conforms to SequenceType. This sequence yields each element of the
collection in an infinite loop.
>
> ## Detailed design
>
> 2 new types would be added:
>
> struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
> struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }
>
> CollectionType would be extended with a property:
>
> extension CollectionType {
> public var cycle: CycleSequence<Self> { get }
> }
>
> This is an extension of CollectionType instead of SequenceType because
it requires a multi-pass sequence (and SequenceType does not provide that
guarantee). The returned type conforms to SequenceType instead of
CollectionType because there is no possible `endIndex` that satisfies the
requirement of being reachable from `startIndex` by zero or more
applications of `successor()`.
>
> Because the default eager versions of map and filter will execute
forever on an infinite sequence, CycleSequence conforms to LazySequenceType
instead of SequenceType in order to provide lazy versions of those
functions. Additionally, it will provide implementations of the eager
versions that simply trigger a fatalError(), as the alternative is an
infinite loop that consumes more and more memory.
>
> ## Impact on existing code
>
> None
>
> -Kevin Ballard
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #9

+1 looks good, I had a go at implementing it and I think it may
require changes not discussed in the proposal.

You've covered the potential issues fairly well, to be a little more
explicit these are the issues I've found:

1) LazySequenceType's property array cannot be defined without an
infinite sized array.

Good point, I forgot about that one. But that's really just the same
thing as saying `Array(seq)`, so I don't have any problems with just
saying that it's an error to access that property on an infinite
sequence (in this case I'd just make it fatalError()).

I do wish I could mark those sorts of things with @available(*,
unavailable, message="this cannot be used on an infinite sequence") to
provide a compile-time error for anyone accessing it on the concrete
type (generic access via the protocol wouldn't catch that of course),
but I discovered that if you try and use that on a function like map(),
Swift actually goes ahead and adds the default implementation to your
type anyway (basically throwing away your marked-as-unavailable method).
Which I suppose makes some sense, but I wish there was an alternative
that worked.

2) what should [1,2,3].cycle.suffix(4) return? [3,1,2,3] probably has
the least surprises, but it's like asking what's the number before
infinity.

Nothing. You can't take a suffix on an infinite list. There is no end to
it. That method should be overridden to fatalError() (or if not, it
would just loop forever).

3) dropLast should return AnySequence(self), but requires
specialisation, this may have to also be a fatalError (see below).

Good point. Since there is no end to the sequence, dropLast() on an
infinite sequence is still an infinite sequence. Honestly, the default
implementation should work fine, but it's probably a good idea to just
override it to return AnySequence(self) as you suggest anyway because
it's an easy win.

One issue I don't think you've mentioned, and I don't seem to be able
to resolve is this:

let mySequence = [1,2,3].cycle.dropLast(1) mySequence.suffix(7)

This could have well defined behaviour (see part 2 above), however the
implementation has some issues.

The only well-defined behavior this can have is to loop forever (or to
abort with a fatalError). It's simply an error to take a suffix() of an
infinite sequence.

In this case mySequence is an AnySequence<Int>, mySequence.suffix(7)
uses AnySequence's specialisation and so tries to iterate over the
entire sequence to find the suffix. AnySequence<Int> is type-erased so
there's no way to specialise when the underlying sequence is infinite
(to get a valid implementation of suffix).

That's fine, looping forever is a perfectly reasonable course of action
when you try and take a suffix() of an infinite sequence.

Potential solutions: * Replace erased Any* types with a more flexible
alternative that doesn't remove type information (higher kinded types
perhaps).

The whole point of the Any* types is they do remove type information.

* Replace SequenceType with two protocols FiniteSequenceType and
InfiniteSequenceType, have type erased versions of each, duplicate all
the things.

What's the point of this? All you can do with that is get rid of a
couple of methods that would loop forever on infinite sequences, but
it's a lot of work and a lot of duplication for what seems like an
extremely small win.

I'd much rather just come up with some alternative to @available(*,
unavailable) that actually leaves the method intact but provides a compile-
time error if you call it. This would be strictly intended for protocol
methods, as you'd still need to provide an implementation (such as
`fatalError("not supported")`) that would be called when the method is
accessed via a generic type bound on the protocol (or via an existential
protocol value, for protocols that support that).

* Add a property to SequenceType to indicate if it's definitely finite
(undefined?), AnySequence uses a different backing implementation
depending on this boolean.

And what would AnySequence do with that information? All it could really
do is make sure to call fatalError() instead of looping forever when a
method like suffix() is called.

-Kevin Ballard

···

On Tue, Dec 29, 2015, at 02:27 AM, Andrew Bennett wrote:


(plx) #10

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built" to handle this is that there are multiple constructs that attempt to eagerly consume an entire sequence, and using these with an infinite sequence would end up looping forever. But I don't consider that to be a particularly serious problem. You could "fix" it by refactoring SequenceType into two protocols, SequenceType (for possibly-infinite sequences) and FiniteSequenceType (for known-finite sequences) and then going over the entire standard library and updating various spots to use FiniteSequenceType, except this would be very limiting (sequences that are not known if they're infinite to the compiler could still be valid for various eager algorithms if the programmer knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols refactored to something more like this:

SequenceType // can be iterated
FiniteSequenceType : SequenceType, // of finite length
StableSequenceType : SequenceType, // can be re-iterated identically
CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // otherwise as it is now

…but can understand the wish to not overly-complicate the basic protocol hierarchy (and also to avoid baking-in nice-to-have, but impossible-to-really-enforce semantic requirements; I’d trust the standard library to use them properly, but not typical 3rd party code, somewhat defeating the purpose).

Everything else is a difference of outlook; we agree on the facts and differ in interpretation.

I consider concrete types that adopt a protocol only to simply call `fatalError` for most of the protocol methods to be pathological — often useful, but still pathological — and thus far the standard library doesn’t contain any such pathological types (to my knowledge).

`cycle` is useful but not useful enough to be the standard library’s first “pathological” type, so it’s still a -1 as proposed.

This is nothing specific to `cycle` and my opinion here could change were the language or the standard library to evolve in various ways.

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

You can only do it for the ones defined in the protocol, not ones defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence, although it does need a custom implementation. split is bounded by at most Int.max splits, which means it is guaranteed to terminate even for an infinite sequence (although the final subsequence does need to be infinite[1]). Even if there are no separators in the cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the remainder into an Array and returns that (wrapped in AnySequence), which is curious because it would be perfectly legal for it to just wrap the generator up in AnySequence and return that instead. I'm tempted to submit a PR to change that now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them with two infinite sequences? It's not so much that they're broken as it is that you're creating an infinite loop without any way to break out. And FWIW, lexicographicalCompare is actually something you might want to explicitly support on infinite sequences if you know your sequences aren't equal and want to find out which one is less than the other.

There are plenty of ways to shoot yourself in the foot. I don't think infinite sequences are really the big problem you're making them out to be.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're not members of the protocol, they're just extensions. You could implement them such that your implementation is called when working on the concrete CycleSequence type, but I'm not sure if that's a great idea to do that when the actual behavior differs from calling it generically on SequenceType (well, triggering a fatalError() instead of an infinite loop is fine because they're both Bottom, but returning a valid result in one context and looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol itself, which would let us override them. I'm not entirely sure why they aren't in the protocol to begin with, I guess because there's not much need for overriding these outside of infinite sequences (well, finite sorted sequences could provide an optimized min/maxElement, and an optimized version of contains(_: Self.Generator.Element), but maybe there's tradeoffs to doing this, e.g. maybe there's some reason why having a large protocol witness table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of the protocol (in the sense of overridable) at this time (they require `Element` satisfy certain type constraints), but they certainly should be if the type system someday would support that.

···

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:
On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

Ultimately, there's not much difference between an infinite sequence and a sequence of Int.max elements. The latter is finite, but it's so massive (especially on 64-bit) that any kind of eager processing is going to hit the same problems as an infinite sequence. Every problem you describe will be a problem with the simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

That opinion could change as the language changes or the standard library evolves.

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
  public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Craig Cruden) #11

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

I can guess — but it is only a guess.

A function defines an infinite “set” of values (like the digits of pi). A stream is just a special type of traversable (lazy) which does not evaluate until asked for the next in a sequence of the set. A function defined in a stream will thus only continue calculating next digits when asked for them. Similarly you could have a collection (head/tail) and you ask for the head and you get it, but the rest (tail) is just the tail as a whole and none of the values in it are really defined until you traverse down to the next head of the rest of the tail. Once it is evaluated it is stored in memory for future evaluations. If you were to fully evaluate the function it would never finish, and if it were to finish — you would probably run out of memory.

As usual I created these examples with the Scala 2.8 REPL but I think most if not all should work in 2.7.

···

On 2015-12-30, at 7:12:41, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 27, 2015, at 11:35 PM, Developer via swift-evolution <swift-evolution@swift.org> wrote:

+1. Stream support is long overdue.

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Andrew Bennett) #12

Hi Kevin,

The issue I was seeing was because AnySequence uses the protocol extension
on SequenceType for its implementations of filter/suffix/prefix etc. So I
don't think it's taking into account any implementations of those on the
base sequence's type.

The problem basically comes down to this:

AnySequence([1,2,3].cycle).suffix(1) uses SequenceType's implementation of
suffix.
[1,2,3].cycle.suffix(1) uses CycleSequence's implementation of suffix.

If you provide any custom implementations for CycleSequence AnySequence
will not preserve that intent. This is probably a bug.

Another side-effect of AnySequence is that @available(*, unavailable,
message="this cannot be used on an infinite sequence") on CycleSequence will
not work if you wrap it with AnySequence. It would still be nice to do
this, it's just not robust.

I've added a bug report (SR-413 <https://bugs.swift.org/browse/SR-413>) to
add missing SequenceType methods to AnySequence so that they use the
correct version. I think this should address my concerns.

If that bug is fixed then the only remaining problem I have is to decide is
if suffix/array should enter an infinite loop or fatalError. Personally I'm
leaning towards fatalError as it's more likely to let a developer know
what's wrong.

···

On Wed, Dec 30, 2015 at 5:44 AM, Kevin Ballard <kevin@sb.org> wrote:

On Tue, Dec 29, 2015, at 02:27 AM, Andrew Bennett wrote:

+1 looks good, I had a go at implementing it and I think it may require
changes not discussed in the proposal.

You've covered the potential issues fairly well, to be a little more
explicit these are the issues I've found:

1) LazySequenceType's property array cannot be defined without an
infinite sized array.

Good point, I forgot about that one. But that's really just the same thing
as saying `Array(seq)`, so I don't have any problems with just saying that
it's an error to access that property on an infinite sequence (in this case
I'd just make it fatalError()).

I do wish I could mark those sorts of things with @available(*,
unavailable, message="this cannot be used on an infinite sequence") to
provide a compile-time error for anyone accessing it on the concrete type
(generic access via the protocol wouldn't catch that of course), but I
discovered that if you try and use that on a function like map(), Swift
actually goes ahead and adds the default implementation to your type anyway
(basically throwing away your marked-as-unavailable method). Which I
suppose makes some sense, but I wish there was an alternative that worked.

2) what should [1,2,3].cycle.suffix(4) return? [3,1,2,3] probably has the
least surprises, but it's like asking what's the number before infinity.

Nothing. You can't take a suffix on an infinite list. There is no end to
it. That method should be overridden to fatalError() (or if not, it would
just loop forever).

3) dropLast should return AnySequence(self), but requires specialisation,
this may have to also be a fatalError (see below).

Good point. Since there is no end to the sequence, dropLast() on an
infinite sequence is still an infinite sequence. Honestly, the default
implementation should work fine, but it's probably a good idea to just
override it to return AnySequence(self) as you suggest anyway because it's
an easy win.

One issue I don't think you've mentioned, and I don't seem to be able to
resolve is this:

let mySequence = [1,2,3].cycle.dropLast(1)

mySequence.suffix(7)

This could have well defined behaviour (see part 2 above), however the
implementation has some issues.

The only well-defined behavior this can have is to loop forever (or to
abort with a fatalError). It's simply an error to take a suffix() of an
infinite sequence.

In this case mySequence is an AnySequence<Int>, mySequence.suffix(7) uses
AnySequence's specialisation and so tries to iterate over the entire
sequence to find the suffix. AnySequence<Int> is type-erased so there's no
way to specialise when the underlying sequence is infinite (to get a valid
implementation of suffix).

That's fine, looping forever is a perfectly reasonable course of action
when you try and take a suffix() of an infinite sequence.

Potential solutions:
* Replace erased Any* types with a more flexible alternative that doesn't
remove type information (higher kinded types perhaps).

The whole point of the Any* types is they do remove type information.

* Replace SequenceType with two protocols FiniteSequenceType and
InfiniteSequenceType, have type erased versions of each, duplicate all the
things.

What's the point of this? All you can do with that is get rid of a couple
of methods that would loop forever on infinite sequences, but it's a lot of
work and a lot of duplication for what seems like an extremely small win.

I'd much rather just come up with some alternative to @available(*,
unavailable) that actually leaves the method intact but provides a
compile-time error if you call it. This would be strictly intended for
protocol methods, as you'd still need to provide an implementation (such as
`fatalError("not supported")`) that would be called when the method is
accessed via a generic type bound on the protocol (or via an existential
protocol value, for protocols that support that).

* Add a property to SequenceType to indicate if it's definitely finite
(undefined?), AnySequence uses a different backing implementation depending
on this boolean.

And what would AnySequence do with that information? All it could really
do is make sure to call fatalError() instead of looping forever when a
method like suffix() is called.

-Kevin Ballard


(Lily Ballard) #13

Personally I’d say this should be a -1 for standard-library
inclusion.

Swift’s not really built to handle infinite sequences right now;
until they are handed better by the standard library convenience
methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built"
to handle this is that there are multiple constructs that attempt to
eagerly consume an entire sequence, and using these with an infinite
sequence would end up looping forever. But I don't consider that to
be a particularly serious problem. You could "fix" it by refactoring
SequenceType into two protocols, SequenceType (for possibly-infinite
sequences) and FiniteSequenceType (for known-finite sequences) and
then going over the entire standard library and updating various
spots to use FiniteSequenceType, except this would be very limiting
(sequences that are not known if they're infinite to the compiler
could still be valid for various eager algorithms if the programmer
knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols
refactored to something more like this:

SequenceType // can be iterated FiniteSequenceType : SequenceType, //
of finite length StableSequenceType : SequenceType, // can be re-
iterated identically CollectionType : StableSequenceType,
FiniteSequenceType, (etc.) // otherwise as it is now

…but can understand the wish to not overly-complicate the basic
protocol hierarchy (and also to avoid baking-in nice-to-have, but impossible-to-really-
enforce semantic requirements; I’d trust the standard library to use
them properly, but not typical 3rd party code, somewhat defeating the
purpose).

Everything else is a difference of outlook; we agree on the facts and
differ in interpretation.

I consider concrete types that adopt a protocol only to simply call
`fatalError` for most of the protocol methods to be pathological —
often useful, but still pathological — and thus far the standard
library doesn’t contain any such pathological types (to my knowledge).

It only calls `fatalError` because the default implementation would loop
forever anyway (and consume more and more memory until this eventually
causes a crash). You can trivially remove the `fatalError()`
implementations and still have the CycleSequence type, the only reason
to have them is because we may as well tell the user exactly what they
did wrong instead of letting the program enter an infinite loop that
tries to eat all the memory in the system.

`cycle` is useful but not useful enough to be the standard library’s
first “pathological” type, so it’s still a -1 as proposed.

There's plenty of use for infinite sequences. And all infinite sequences
are "pathological" in the same sense. The fact that the stdlib doesn't
provide any convenient ways to produce infinite sequences right now is a
deficiency, not an advantage. As previously stated, the issues with
infinite sequences also manifest with extremely long finite sequences
(such as `(1..<Int.max)`). Also, the stdlib does already provide a way
to make your own infinite sequences already, which is to use
`anyGenerator()` with a function that can never return `nil`.

Which is to say, the only thing special about Cycle is that it can
eagerly detect some infinite loops and trigger a fatalError().

It's also worth pointing out here that CycleSequence is inherently lazy
(it conforms to LazySequenceType), which provides lazy versions of
map/filter/etc., so you're only ever going to actually hit the eager
SequenceType variants when either working generically with <T:

or if you explicitly constrain the return value of

map()/filter() to Array (which will cause it to resolve to the
SequenceType version). But in both cases, the problems with Cycle are,
as I said before, the same problems you'd get with something like
`(1..<Int.max)`.

You can conceivably implement a non-crashing `contains`,
`minElement` and `maxElement` on `CycleSequence` by calling through
to the wrapped collection, but that’ll seemingly evaporate as soon
as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're
not members of the protocol, they're just extensions. You could
implement them such that your implementation is called when working
on the concrete CycleSequence type, but I'm not sure if that's a
great idea to do that when the actual behavior differs from calling
it generically on SequenceType (well, triggering a fatalError()
instead of an infinite loop is fine because they're both Bottom, but
returning a valid result in one context and looping infinitely in the
other seems bad).

Of course, these methods could actually be moved into the protocol
itself, which would let us override them. I'm not entirely sure why
they aren't in the protocol to begin with, I guess because there's
not much need for overriding these outside of infinite sequences
(well, finite sorted sequences could provide an optimized
min/maxElement, and an optimized version of contains(_:
Self.Generator.Element), but maybe there's tradeoffs to doing this,
e.g. maybe there's some reason why having a large protocol witness
table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of
the protocol (in the sense of overridable) at this time (they require
`Element` satisfy certain type constraints), but they certainly should
be if the type system someday would support that.

They actually can. There's two variants of each function; one which
takes a comparison function, and one which requires the
Comparable/Equatable bound. The version for Comparable/Equatable can't
be part of the protocol itself (and shouldn't be; it's just a
convenience for providing the appropriate comparison function), but the
version that takes a function has no bounds on it and could be part of
the protocol itself.

-Kevin Ballard

···

On Tue, Dec 29, 2015, at 03:39 PM, plx via swift-evolution wrote:

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift- >> evolution@swift.org> wrote:
On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:


(Dave Abrahams) #14

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built" to handle this is that there are multiple constructs that attempt to eagerly consume an entire sequence, and using these with an infinite sequence would end up looping forever. But I don't consider that to be a particularly serious problem. You could "fix" it by refactoring SequenceType into two protocols, SequenceType (for possibly-infinite sequences) and FiniteSequenceType (for known-finite sequences) and then going over the entire standard library and updating various spots to use FiniteSequenceType, except this would be very limiting (sequences that are not known if they're infinite to the compiler could still be valid for various eager algorithms if the programmer knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols refactored to something more like this:

SequenceType // can be iterated
FiniteSequenceType : SequenceType, // of finite length
StableSequenceType : SequenceType, // can be re-iterated identically
CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // otherwise as it is now

This is interesting. A few concerns:

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

…but can understand the wish to not overly-complicate the basic protocol hierarchy (and also to avoid baking-in nice-to-have, but impossible-to-really-enforce semantic requirements; I’d trust the standard library to use them properly, but not typical 3rd party code, somewhat defeating the purpose).

Well, I guess I should have read ahead and most of my lecture above was needless! I’m posting it anyway because I think it spells out some important principles we’ll need to refer to later.

···

On Dec 29, 2015, at 3:39 PM, plx via swift-evolution <swift-evolution@swift.org> wrote:

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

Everything else is a difference of outlook; we agree on the facts and differ in interpretation.

I consider concrete types that adopt a protocol only to simply call `fatalError` for most of the protocol methods to be pathological — often useful, but still pathological — and thus far the standard library doesn’t contain any such pathological types (to my knowledge).

`cycle` is useful but not useful enough to be the standard library’s first “pathological” type, so it’s still a -1 as proposed.

This is nothing specific to `cycle` and my opinion here could change were the language or the standard library to evolve in various ways.

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

You can only do it for the ones defined in the protocol, not ones defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence, although it does need a custom implementation. split is bounded by at most Int.max splits, which means it is guaranteed to terminate even for an infinite sequence (although the final subsequence does need to be infinite[1]). Even if there are no separators in the cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the remainder into an Array and returns that (wrapped in AnySequence), which is curious because it would be perfectly legal for it to just wrap the generator up in AnySequence and return that instead. I'm tempted to submit a PR to change that now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them with two infinite sequences? It's not so much that they're broken as it is that you're creating an infinite loop without any way to break out. And FWIW, lexicographicalCompare is actually something you might want to explicitly support on infinite sequences if you know your sequences aren't equal and want to find out which one is less than the other.

There are plenty of ways to shoot yourself in the foot. I don't think infinite sequences are really the big problem you're making them out to be.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're not members of the protocol, they're just extensions. You could implement them such that your implementation is called when working on the concrete CycleSequence type, but I'm not sure if that's a great idea to do that when the actual behavior differs from calling it generically on SequenceType (well, triggering a fatalError() instead of an infinite loop is fine because they're both Bottom, but returning a valid result in one context and looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol itself, which would let us override them. I'm not entirely sure why they aren't in the protocol to begin with, I guess because there's not much need for overriding these outside of infinite sequences (well, finite sorted sequences could provide an optimized min/maxElement, and an optimized version of contains(_: Self.Generator.Element), but maybe there's tradeoffs to doing this, e.g. maybe there's some reason why having a large protocol witness table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of the protocol (in the sense of overridable) at this time (they require `Element` satisfy certain type constraints), but they certainly should be if the type system someday would support that.

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

Ultimately, there's not much difference between an infinite sequence and a sequence of Int.max elements. The latter is finite, but it's so massive (especially on 64-bit) that any kind of eager processing is going to hit the same problems as an infinite sequence. Every problem you describe will be a problem with the simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

That opinion could change as the language changes or the standard library evolves.

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
  public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


[Pitch] Remove the single-pass requirement on Sequence
(Dave Abrahams) #15

Yes, I understand the usual concept of a “stream," but it doesn’t help me understand what stream *support* entails.

-Dave

···

On Dec 29, 2015, at 4:29 PM, Craig Cruden <ccruden@novafore.com> wrote:

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

I can guess — but it is only a guess.

A function defines an infinite “set” of values (like the digits of pi). A stream is just a special type of traversable (lazy) which does not evaluate until asked for the next in a sequence of the set. A function defined in a stream will thus only continue calculating next digits when asked for them. Similarly you could have a collection (head/tail) and you ask for the head and you get it, but the rest (tail) is just the tail as a whole and none of the values in it are really defined until you traverse down to the next head of the rest of the tail. Once it is evaluated it is stored in memory for future evaluations. If you were to fully evaluate the function it would never finish, and if it were to finish — you would probably run out of memory.


(Lily Ballard) #16

Hi Kevin,

The issue I was seeing was because AnySequence uses the protocol
extension on SequenceType for its implementations of
filter/suffix/prefix etc. So I don't think it's taking into account
any implementations of those on the base sequence's type.

The problem basically comes down to this:

AnySequence([1,2,3].cycle).suffix(1) uses SequenceType's
implementation of suffix. [1,2,3].cycle.suffix(1) uses
CycleSequence's implementation of suffix.

If you provide any custom implementations for CycleSequence
AnySequence will not preserve that intent. This is probably a bug.

The only way to actually fix this is to have AnySequence contain a
closure for each method defined in SequenceType that calls the
underlying implementation (and wraps the result in AnySequence as
necessary). This is rather heavy-weight, which is probably why it
doesn't do that.

Really, I don't think it's particularly surprising that wrapping a
sequence in AnySequence causes the sequence's own implementations of the
optional methods to not be called. It would be great if there was a
simple way to fix this, but it's not worth the cost of reifying all the
runtime type information necessary.

Another side-effect of AnySequence is that @available(*, unavailable,
message="this cannot be used on an infinite sequence") onCycleSequence
will not work if you wrap it with AnySequence. It would still be nice
to do this, it's just not robust.

Since that's a compile-time check, it is quite impossible to ever
support that on a sequence that's wrapped in AnySequence.

-Kevin Ballard

I've added a bug report (SR-413[1]) to add missing SequenceType
methods to AnySequence so that they use the correct version. I think
this should address my concerns.

If that bug is fixed then the only remaining problem I have is to
decide is if suffix/array should enter an infinite loop or fatalError.
Personally I'm leaning towards fatalError as it's more likely to let a
developer know what's wrong.

__

+1 looks good, I had a go at implementing it and I think it may
require changes not discussed in the proposal.

You've covered the potential issues fairly well, to be a little more
explicit these are the issues I've found:

1) LazySequenceType's property array cannot be defined without an
infinite sized array.

Good point, I forgot about that one. But that's really just the same
thing as saying `Array(seq)`, so I don't have any problems with just
saying that it's an error to access that property on an infinite
sequence (in this case I'd just make it fatalError()).

I do wish I could mark those sorts of things with @available(*,
unavailable, message="this cannot be used on an infinite sequence")
to provide a compile-time error for anyone accessing it on the
concrete type (generic access via the protocol wouldn't catch that of
course), but I discovered that if you try and use that on a function
like map(), Swift actually goes ahead and adds the default
implementation to your type anyway (basically throwing away your marked-as-
unavailable method). Which I suppose makes some sense, but I wish
there was an alternative that worked.

2) what should [1,2,3].cycle.suffix(4) return? [3,1,2,3] probably
has the least surprises, but it's like asking what's the number
before infinity.

Nothing. You can't take a suffix on an infinite list. There is no end
to it. That method should be overridden to fatalError() (or if not,
it would just loop forever).

3) dropLast should return AnySequence(self), but requires
specialisation, this may have to also be a fatalError (see below).

Good point. Since there is no end to the sequence, dropLast() on an
infinite sequence is still an infinite sequence. Honestly, the
default implementation should work fine, but it's probably a good
idea to just override it to return AnySequence(self) as you suggest
anyway because it's an easy win.

One issue I don't think you've mentioned, and I don't seem to be
able to resolve is this:

let mySequence = [1,2,3].cycle.dropLast(1) mySequence.suffix(7)

This could have well defined behaviour (see part 2 above), however
the implementation has some issues.

The only well-defined behavior this can have is to loop forever (or
to abort with a fatalError). It's simply an error to take a suffix()
of an infinite sequence.

In this case mySequence is an AnySequence<Int>, mySequence.suffix(7)
uses AnySequence's specialisation and so tries to iterate over the
entire sequence to find the suffix. AnySequence<Int> is type-erased
so there's no way to specialise when the underlying sequence is
infinite (to get a valid implementation of suffix).

That's fine, looping forever is a perfectly reasonable course of
action when you try and take a suffix() of an infinite sequence.

Potential solutions: * Replace erased Any* types with a more
flexible alternative that doesn't remove type information (higher
kinded types perhaps).

The whole point of the Any* types is they do remove type information.

* Replace SequenceType with two protocols FiniteSequenceType and
InfiniteSequenceType, have type erased versions of each, duplicate
all the things.

What's the point of this? All you can do with that is get rid of a
couple of methods that would loop forever on infinite sequences, but
it's a lot of work and a lot of duplication for what seems like an
extremely small win.

I'd much rather just come up with some alternative to @available(*,
unavailable) that actually leaves the method intact but provides a
compile-time error if you call it. This would be strictly intended
for protocol methods, as you'd still need to provide an
implementation (such as `fatalError("not supported")`) that would be
called when the method is accessed via a generic type bound on the
protocol (or via an existential protocol value, for protocols that
support that).

* Add a property to SequenceType to indicate if it's definitely
finite (undefined?), AnySequence uses a different backing
implementation depending on this boolean.

And what would AnySequence do with that information? All it could
really do is make sure to call fatalError() instead of looping
forever when a method like suffix() is called.

-Kevin Ballard

Links:

  1. https://bugs.swift.org/browse/SR-413

···

On Tue, Dec 29, 2015, at 10:12 PM, Andrew Bennett wrote:

On Wed, Dec 30, 2015 at 5:44 AM, Kevin Ballard <kevin@sb.org> wrote:

On Tue, Dec 29, 2015, at 02:27 AM, Andrew Bennett wrote:


(Craig Cruden) #17

I have never tried to implement a “stream” in Swift, so I am not sure if there is an easy way to do it…. but I don’t see a collection of type stream as I have in Scala.

http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream

···

On 2015-12-30, at 7:39:54, Dave Abrahams <dabrahams@apple.com> wrote:

On Dec 29, 2015, at 4:29 PM, Craig Cruden <ccruden@novafore.com <mailto:ccruden@novafore.com>> wrote:

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

I can guess — but it is only a guess.

A function defines an infinite “set” of values (like the digits of pi). A stream is just a special type of traversable (lazy) which does not evaluate until asked for the next in a sequence of the set. A function defined in a stream will thus only continue calculating next digits when asked for them. Similarly you could have a collection (head/tail) and you ask for the head and you get it, but the rest (tail) is just the tail as a whole and none of the values in it are really defined until you traverse down to the next head of the rest of the tail. Once it is evaluated it is stored in memory for future evaluations. If you were to fully evaluate the function it would never finish, and if it were to finish — you would probably run out of memory.

Yes, I understand the usual concept of a “stream," but it doesn’t help me understand what stream *support* entails.

-Dave


(plx) #18

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built" to handle this is that there are multiple constructs that attempt to eagerly consume an entire sequence, and using these with an infinite sequence would end up looping forever. But I don't consider that to be a particularly serious problem. You could "fix" it by refactoring SequenceType into two protocols, SequenceType (for possibly-infinite sequences) and FiniteSequenceType (for known-finite sequences) and then going over the entire standard library and updating various spots to use FiniteSequenceType, except this would be very limiting (sequences that are not known if they're infinite to the compiler could still be valid for various eager algorithms if the programmer knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols refactored to something more like this:

SequenceType // can be iterated
FiniteSequenceType : SequenceType, // of finite length
StableSequenceType : SequenceType, // can be re-iterated identically
CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // otherwise as it is now

This is interesting. A few concerns:

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

Speaking frankly, I’d say that there are few benefits to a refactoring along the lines sketched above until/unless you want to have solid support for things like a product-sequence or product-collection; you would gain *significant* advantages from such a refactored hierarchy in such scenarios, but I can’t think of any meaningful improvement the refactoring would offer in the essentially-linear case. (Note that this is distinct from examples of concrete models that are naturally e.g. finite-but-not-stable and vice-versa; they exist, but the rest is already long enough as it is).

A full example is beyond my time-budget here, but I can give the flavor of where it makes a difference somewhat quickly.

Consider a hypothetical `ProductSequence2<A:SequenceType,B:SequenceType>` that enumerates the cartesian product of two sequences (in an unspecified order); the naive implementation has problems with non-stable sequences and with infinite-sequences, but I’ll only discuss the infinite case b/c it’s more relevant here.

When either—or both—of the sequences are infinite, the order-of-iteration has actual observable consequences; as a concrete example, if you want this to work out:

  ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true

…then we have this:

- if `a` is finite and `b` is not, you want to iterate like (a0,b0), (a1,b0), (a2, b0) … , (a0, b1), (a1, b1), …, etc
- if `b` is finite and `a` is not, you want to iterate like (a0,b0), (a0,b1), (a0, b2) … , (a1, b0), (a1, b1), …, etc
- if neither `a` nor `b` is finite, you want to iterate like (a0, b0), (a1,b0), (a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc

…as with those orderings you *will* eventually reach each pair in the product (which seems strictly superior to an iteration order which will leave many members of the product forever un-visited).

Importantly, the third ordering is inefficient in the general case, and thus isn’t a suitable default; you’d only want it in places where you’re *intentionally* using infinite series.

This is a concrete example of the sort of thing that leaves me preferring that the standard library *not* contain infinite sequences at this time: such sequences *highly* benefit from special handling, but there’s no good way at this time to generically provide such handling in a general context within the bounds of the current language and standard library .

If there’s a realistic chance of such a refactoring being accepted if sufficiently-well-proposed I can prepare something, but I’m admittedly skeptical given the likely magnitude of the associated changes. In particular, the API on types like `ForwardIndexType` is rather unfortunate for things like a hypothetical product-collection scenario; since such products are among the stronger motivating uses, this means it’s a protocol refactoring + a lot of API changes to the existing standard library, and probably making things clunkier in the common / linear cases.

···

On Dec 29, 2015, at 6:38 PM, Dave Abrahams <dabrahams@apple.com> wrote:

On Dec 29, 2015, at 3:39 PM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

…but can understand the wish to not overly-complicate the basic protocol hierarchy (and also to avoid baking-in nice-to-have, but impossible-to-really-enforce semantic requirements; I’d trust the standard library to use them properly, but not typical 3rd party code, somewhat defeating the purpose).

Well, I guess I should have read ahead and most of my lecture above was needless! I’m posting it anyway because I think it spells out some important principles we’ll need to refer to later.

Everything else is a difference of outlook; we agree on the facts and differ in interpretation.

I consider concrete types that adopt a protocol only to simply call `fatalError` for most of the protocol methods to be pathological — often useful, but still pathological — and thus far the standard library doesn’t contain any such pathological types (to my knowledge).

`cycle` is useful but not useful enough to be the standard library’s first “pathological” type, so it’s still a -1 as proposed.

This is nothing specific to `cycle` and my opinion here could change were the language or the standard library to evolve in various ways.

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

You can only do it for the ones defined in the protocol, not ones defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence, although it does need a custom implementation. split is bounded by at most Int.max splits, which means it is guaranteed to terminate even for an infinite sequence (although the final subsequence does need to be infinite[1]). Even if there are no separators in the cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the remainder into an Array and returns that (wrapped in AnySequence), which is curious because it would be perfectly legal for it to just wrap the generator up in AnySequence and return that instead. I'm tempted to submit a PR to change that now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them with two infinite sequences? It's not so much that they're broken as it is that you're creating an infinite loop without any way to break out. And FWIW, lexicographicalCompare is actually something you might want to explicitly support on infinite sequences if you know your sequences aren't equal and want to find out which one is less than the other.

There are plenty of ways to shoot yourself in the foot. I don't think infinite sequences are really the big problem you're making them out to be.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're not members of the protocol, they're just extensions. You could implement them such that your implementation is called when working on the concrete CycleSequence type, but I'm not sure if that's a great idea to do that when the actual behavior differs from calling it generically on SequenceType (well, triggering a fatalError() instead of an infinite loop is fine because they're both Bottom, but returning a valid result in one context and looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol itself, which would let us override them. I'm not entirely sure why they aren't in the protocol to begin with, I guess because there's not much need for overriding these outside of infinite sequences (well, finite sorted sequences could provide an optimized min/maxElement, and an optimized version of contains(_: Self.Generator.Element), but maybe there's tradeoffs to doing this, e.g. maybe there's some reason why having a large protocol witness table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of the protocol (in the sense of overridable) at this time (they require `Element` satisfy certain type constraints), but they certainly should be if the type system someday would support that.

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

Ultimately, there's not much difference between an infinite sequence and a sequence of Int.max elements. The latter is finite, but it's so massive (especially on 64-bit) that any kind of eager processing is going to hit the same problems as an infinite sequence. Every problem you describe will be a problem with the simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

That opinion could change as the language changes or the standard library evolves.

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
  public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Robert Widmann) #19

I think we have to make some things clear. Unfortunately, I used the word stream when I meant cycle, they are very different things from an implementer's perspective. A stream is a generalized version of a cycle, where the latter admits infinite sequences of any kind or mix of data (say, the Fibonacci example here), the latter admits only infinite representations based on an initial finite input. This makes many divergent cases in the former perfectly valid in the latter through proper manipulation of an underlying finite sequence. If you want to see what a Cycle looks like, Rust has std::iter::Cycle. Swiftz, on the other hand, shows what a proper stream would look like (and having written the thing, I do not wish to see support for it in the Swift Standard Library).

~Robert Widmann

2015/12/29 18:39、Craig Cruden <ccruden@novafore.com> のメッセージ:

···

I have never tried to implement a “stream” in Swift, so I am not sure if there is an easy way to do it…. but I don’t see a collection of type stream as I have in Scala.

http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream

On 2015-12-30, at 7:39:54, Dave Abrahams <dabrahams@apple.com> wrote:

On Dec 29, 2015, at 4:29 PM, Craig Cruden <ccruden@novafore.com> wrote:

Could you define what you mean by “stream support?” Whatever it is, I doubt simply adding an infinitely-repeating sequence type is enough to get you there.

I can guess — but it is only a guess.

A function defines an infinite “set” of values (like the digits of pi). A stream is just a special type of traversable (lazy) which does not evaluate until asked for the next in a sequence of the set. A function defined in a stream will thus only continue calculating next digits when asked for them. Similarly you could have a collection (head/tail) and you ask for the head and you get it, but the rest (tail) is just the tail as a whole and none of the values in it are really defined until you traverse down to the next head of the rest of the tail. Once it is evaluated it is stored in memory for future evaluations. If you were to fully evaluate the function it would never finish, and if it were to finish — you would probably run out of memory.

Yes, I understand the usual concept of a “stream," but it doesn’t help me understand what stream *support* entails.

-Dave


(Lily Ballard) #20

Refactoring like that can always be done later; introducing infinite
sequences now shouldn't make it any harder to refactor the protocols. If
anything, it will provide more practical experience with how infinite
sequences interact with the current protocol hierarchy that would help
guide the design of any refactoring (or, perhaps more importantly, help
determine if such a refactoring is worthwhile).

A big concern I have with refactoring like that is you'd rarely ever be
able to actually bound an algorithm on FiniteSequenceType, because there
will always be ways of constructing sequences that the compiler can't
tell if it's infinite. Trivial example is `ary.cycle.takeWhile(pred)`.
This is infinite if `pred(elt)` is true for every element of `ary` and
finite otherwise. But there's tons of other ways of creating such
indeterminate sequences. Bounding algorithms (such as `map(..) -> [T]`
or `reduce()`) on finite sequences only would be rather limiting, as
users who know their sequence must be finite but can't prove it to the
compiler would be unable to use methods that are in fact perfectly safe.

What you could do with such a refactoring is optimize certain algorithms
if they use a more precise sequence type, but I'm not actually sure what
sort of optimizations you can make with your proposed protocol
hierarchy. In fact, the only real optimizations that come to mind are
optimizing when you know you're working with a CycleSequence, because
for various algorithms you really only have to iterate the underlying
elements once (e.g. assuming a pure predicate, CycleSequence can
implement contains() safely, although we don't actually have pure in the
language right now so that's not necessarily a safe assumption; also,
contains() isn't a protocol method, just an extension method, so we
can't actually do that anyway).

On that note, core language team, anyone know offhand why SequenceType
has a bunch of extension methods that aren't part of the protocol? For
example, contains(_ predicate:). The only real reason I can think of is
to shrink the protocol witness table, but surely that's not particularly
meaningful. I warrant that contains(_ predicate:) doesn't really have
any reason to be overridden by anything except sequences that knowingly
repeat elements (e.g. CycleSequence and Repeat), and even that's only if
you assume the predicate is pure, but there's some other methods that
make sense to override on some sequences (like minElement(_
isOrderedBefore:) for any sequence that has a defined ordering).

-Kevin Ballard

···

On Wed, Dec 30, 2015, at 11:01 AM, plx via swift-evolution wrote:

On Dec 29, 2015, at 6:38 PM, Dave Abrahams >> <dabrahams@apple.com> wrote:

On Dec 29, 2015, at 3:39 PM, plx via swift-evolution <swift- >>> evolution@swift.org> wrote:

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift- >>>> evolution@swift.org> wrote:

On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

Personally I’d say this should be a -1 for standard-library
inclusion.

Swift’s not really built to handle infinite sequences right now;
until they are handed better by the standard library convenience
methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built"
to handle this is that there are multiple constructs that attempt
to eagerly consume an entire sequence, and using these with an
infinite sequence would end up looping forever. But I don't
consider that to be a particularly serious problem. You could "fix"
it by refactoring SequenceType into two protocols, SequenceType
(for possibly-infinite sequences) and FiniteSequenceType (for known-
finite sequences) and then going over the entire standard library
and updating various spots to use FiniteSequenceType, except this
would be very limiting (sequences that are not known if they're
infinite to the compiler could still be valid for various eager
algorithms if the programmer knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols
refactored to something more like this:

SequenceType // can be iterated FiniteSequenceType : SequenceType,
// of finite length StableSequenceType : SequenceType, // can be re-
iterated identically CollectionType : StableSequenceType,
FiniteSequenceType, (etc.) // otherwise as it is now

This is interesting. A few concerns:

First, we have tried not to create any distinct protocols with
identical syntactic requirements, because we think it makes the world
clearer; we think people are more likely to assign incorrect
protocols when all the operations they want are available, but don’t
have the right semantics. That isn’t to say we shouldn’t start doing
it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost.
Distinguishing protocols based on semantic requirements alone may
make the library harder to understand. I’ve heard some people’s
heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol
should be justified by both algorithms that exploit its requirements
(e.g. extension methods) and some real-world models that can’t
reasonably also conform to a more-refined protocol. For example, we
have a ForwardIndexType because a singly-linked list has multipass
forward traversal and can’t do bidirectional traversal. In order to
evaluate any proposal for new protocols, we’d need to see all of
these things.

Speaking frankly, I’d say that there are few benefits to a refactoring
along the lines sketched above until/unless you want to have solid
support for things like a product-sequence or product-collection; you
would gain *significant* advantages from such a refactored hierarchy
in such scenarios, but I can’t think of any meaningful improvement the
refactoring would offer in the essentially-linear case. (Note that
this is distinct from examples of concrete models that are naturally
e.g. finite-but-not-stable and vice-versa; they exist, but the rest is
already long enough as it is).

A full example is beyond my time-budget here, but I can give the
flavor of where it makes a difference somewhat quickly.

Consider a hypothetical
`ProductSequence2<A:SequenceType,B:SequenceType>` that enumerates the
cartesian product of two sequences (in an unspecified order); the
naive implementation has problems with non-stable sequences and with
infinite-sequences, but I’ll only discuss the infinite case b/c it’s
more relevant here.

When either—or both—of the sequences are infinite, the order-of-
iteration has actual observable consequences; as a concrete example,
if you want this to work out:

ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true

…then we have this:

- if `a` is finite and `b` is not, you want to iterate like (a0,b0),
  (a1,b0), (a2, b0) … , (a0, b1), (a1, b1), …, etc
- if `b` is finite and `a` is not, you want to iterate like (a0,b0),
  (a0,b1), (a0, b2) … , (a1, b0), (a1, b1), …, etc
- if neither `a` nor `b` is finite, you want to iterate like (a0, b0),
  (a1,b0), (a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc

…as with those orderings you *will* eventually reach each pair in the
product (which seems strictly superior to an iteration order which
will leave many members of the product forever un-visited).

Importantly, the third ordering is inefficient in the general case,
and thus isn’t a suitable default; you’d only want it in places where
you’re *intentionally* using infinite series.

This is a concrete example of the sort of thing that leaves me
preferring that the standard library *not* contain infinite sequences
at this time: such sequences *highly* benefit from special handling,
but there’s no good way at this time to generically provide such
handling in a general context within the bounds of the current
language and standard library .

If there’s a realistic chance of such a refactoring being accepted if
sufficiently-well-proposed I can prepare something, but I’m admittedly
skeptical given the likely magnitude of the associated changes. In
particular, the API on types like `ForwardIndexType` is rather
unfortunate for things like a hypothetical product-collection
scenario; since such products are among the stronger motivating uses,
this means it’s a protocol refactoring + a lot of API changes to the
existing standard library, and probably making things clunkier in the
common / linear cases.

…but can understand the wish to not overly-complicate the basic
protocol hierarchy (and also to avoid baking-in nice-to-have, but
impossible-to-really-enforce semantic requirements; I’d trust the
standard library to use them properly, but not typical 3rd party
code, somewhat defeating the purpose).

Well, I guess I should have read ahead and most of my lecture above
was needless! I’m posting it anyway because I think it spells out
some important principles we’ll need to refer to later.

Everything else is a difference of outlook; we agree on the facts
and differ in interpretation.

I consider concrete types that adopt a protocol only to simply call
`fatalError` for most of the protocol methods to be pathological —
often useful, but still pathological — and thus far the standard
library doesn’t contain any such pathological types (to my
knowledge).

`cycle` is useful but not useful enough to be the standard library’s
first “pathological” type, so it’s still a -1 as proposed.

This is nothing specific to `cycle` and my opinion here could change
were the language or the standard library to evolve in various ways.

You’d also want to call `fatalError` for at least `reduce`,
`reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`,
and `forEach`.

You can only do it for the ones defined in the protocol, not ones
defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence,
although it does need a custom implementation. split is bounded by
at most Int.max splits, which means it is guaranteed to terminate
even for an infinite sequence (although the final subsequence does
need to be infinite[1]). Even if there are no separators in the
cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the
    remainder into an Array and returns that (wrapped in
    AnySequence), which is curious because it would be perfectly
    legal for it to just wrap the generator up in AnySequence and
    return that instead. I'm tempted to submit a PR to change that
    now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are
all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them
with two infinite sequences? It's not so much that they're broken
as it is that you're creating an infinite loop without any way to
break out. And FWIW, lexicographicalCompare is actually something
you might want to explicitly support on infinite sequences if you
know your sequences aren't equal and want to find out which one is
less than the other.

There are plenty of ways to shoot yourself in the foot. I don't
think infinite sequences are really the big problem you're making
them out to be.

You can conceivably implement a non-crashing `contains`,
`minElement` and `maxElement` on `CycleSequence` by calling
through to the wrapped collection, but that’ll seemingly evaporate
as soon as your `CycleSequence` winds up hidden inside an
`AnySequence`.

You can't override those anyway in a generic context, because
they're not members of the protocol, they're just extensions. You
could implement them such that your implementation is called when
working on the concrete CycleSequence type, but I'm not sure if
that's a great idea to do that when the actual behavior differs
from calling it generically on SequenceType (well, triggering a
fatalError() instead of an infinite loop is fine because they're
both Bottom, but returning a valid result in one context and
looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol
itself, which would let us override them. I'm not entirely sure why
they aren't in the protocol to begin with, I guess because there's
not much need for overriding these outside of infinite sequences
(well, finite sorted sequences could provide an optimized
min/maxElement, and an optimized version of contains(_:
Self.Generator.Element), but maybe there's tradeoffs to doing this,
e.g. maybe there's some reason why having a large protocol witness
table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of
the protocol (in the sense of overridable) at this time (they
require `Element` satisfy certain type constraints), but they
certainly should be if the type system someday would support that.

Which illustrates why this is a -1 for me; there's nothing wrong
with the functionality in isolation and there’s nothing wrong with
infinite sequences, but the standard library should play well with
itself, and this wouldn’t play well with the rest of the standard
library.

Ultimately, there's not much difference between an infinite
sequence and a sequence of Int.max elements. The latter is finite,
but it's so massive (especially on 64-bit) that any kind of eager
processing is going to hit the same problems as an infinite
sequence. Every problem you describe will be a problem with the
simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

That opinion could change as the language changes or the standard
library evolves.

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution >>>>>> <swift-evolution@swift.org> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an
infinite SequenceType that yields the elements of the collection
in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence.
For example, `CollectionOfOne(x).cycle` could be used to have an
infinite sequence of a single element (similar to Repeat but
without a count). A common use for infinite sequences is zipping
with a finite sequence. As far as I'm aware, the stdlib does not
currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a
type that conforms to SequenceType. This sequence yields each
element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType {
... } struct CycleGenerator<Base : CollectionType> :
GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType { public var cycle: CycleSequence<Self>
{ get } }

This is an extension of CollectionType instead of SequenceType
because it requires a multi-pass sequence (and SequenceType does
not provide that guarantee). The returned type conforms to
SequenceType instead of CollectionType because there is no
possible `endIndex` that satisfies the requirement of being
reachable from `startIndex` by zero or more applications of
`successor()`.

Because the default eager versions of map and filter will execute
forever on an infinite sequence, CycleSequence conforms to
LazySequenceType instead of SequenceType in order to provide lazy
versions of those functions. Additionally, it will provide
implementations of the eager versions that simply trigger a
fatalError(), as the alternative is an infinite loop that
consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_________________________________________________
swift-evolution mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution