Proposal: CollectionType.cycle property for an infinite sequence

Refactoring like that can always be done later; introducing infinite sequences now shouldn't make it any harder to refactor the protocols. If anything, it will provide more practical experience with how infinite sequences interact with the current protocol hierarchy that would help guide the design of any refactoring (or, perhaps more importantly, help determine if such a refactoring is worthwhile).

A big concern I have with refactoring like that is you'd rarely ever be able to actually bound an algorithm on FiniteSequenceType, because there will always be ways of constructing sequences that the compiler can't tell if it's infinite. Trivial example is `ary.cycle.takeWhile(pred)`. This is infinite if `pred(elt)` is true for every element of `ary` and finite otherwise. But there's tons of other ways of creating such indeterminate sequences. Bounding algorithms (such as `map(..) -> [T]` or `reduce()`) on finite sequences only would be rather limiting, as users who know their sequence must be finite but can't prove it to the compiler would be unable to use methods that are in fact perfectly safe.

If you will accept that I would prefer the standard library types to interpret the standard library protocols as having already made such a binding — as, indeed, the type signatures necessarily imply — the rest of my objection should be comprehensible.

I don’t find the repeated points about what can happen due to user error to be interesting.

What you could do with such a refactoring is optimize certain algorithms if they use a more precise sequence type, but I'm not actually sure what sort of optimizations you can make with your proposed protocol hierarchy. In fact, the only real optimizations that come to mind are optimizing when you know you're working with a CycleSequence, because for various algorithms you really only have to iterate the underlying elements once (e.g. assuming a pure predicate, CycleSequence can implement contains() safely, although we don't actually have pure in the language right now so that's not necessarily a safe assumption; also, contains() isn't a protocol method, just an extension method, so we can't actually do that anyway).

I already gave the example of a product sequence. There’s a simple implementation that is correct when both input sequences are finite-and-stable, but not necessarily if otherwise; the proposed refactoring would statically provides enough information to allow one to choose the minimally-*pessimal* implementation that is still correct for such inputs.

How much correctness matters is, of course, essentially subjective.

On that note, core language team, anyone know offhand why SequenceType has a bunch of extension methods that aren't part of the protocol? For example, contains(_ predicate:). The only real reason I can think of is to shrink the protocol witness table, but surely that's not particularly meaningful. I warrant that contains(_ predicate:) doesn't really have any reason to be overridden by anything except sequences that knowingly repeat elements (e.g. CycleSequence and Repeat), and even that's only if you assume the predicate is pure, but there's some other methods that make sense to override on some sequences (like minElement(_ isOrderedBefore:) for any sequence that has a defined ordering).

I am not sure how, precisely, you would propose to override the closure-taking variant of `minElement` to take advantage of an intrinsic ordering, but I’d be curious to see it.

I’d hope that the non-closure-accepting variants would be made overridable once the type system supports it.

···

On Dec 30, 2015, at 3:39 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

-Kevin Ballard

On Wed, Dec 30, 2015, at 11:01 AM, plx via swift-evolution wrote:

On Dec 29, 2015, at 6:38 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

On Dec 29, 2015, at 3:39 PM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Dec 29, 2015, at 1:17 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Tue, Dec 29, 2015, at 08:14 AM, plx via swift-evolution wrote:

Personally I’d say this should be a -1 for standard-library inclusion.

Swift’s not really built to handle infinite sequences right now; until they are handed better by the standard library convenience methods for creating them shouldn’t be in the standard library.

As far as I can tell, the only way in which it's "not really built" to handle this is that there are multiple constructs that attempt to eagerly consume an entire sequence, and using these with an infinite sequence would end up looping forever. But I don't consider that to be a particularly serious problem. You could "fix" it by refactoring SequenceType into two protocols, SequenceType (for possibly-infinite sequences) and FiniteSequenceType (for known-finite sequences) and then going over the entire standard library and updating various spots to use FiniteSequenceType, except this would be very limiting (sequences that are not known if they're infinite to the compiler could still be valid for various eager algorithms if the programmer knows it will be finite in practice).

Indeed, on my wishlist I would like to see the standard protocols refactored to something more like this:

SequenceType // can be iterated
FiniteSequenceType : SequenceType, // of finite length
StableSequenceType : SequenceType, // can be re-iterated identically
CollectionType : StableSequenceType, FiniteSequenceType, (etc.) // otherwise as it is now

This is interesting. A few concerns:

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

Speaking frankly, I’d say that there are few benefits to a refactoring along the lines sketched above until/unless you want to have solid support for things like a product-sequence or product-collection; you would gain *significant* advantages from such a refactored hierarchy in such scenarios, but I can’t think of any meaningful improvement the refactoring would offer in the essentially-linear case. (Note that this is distinct from examples of concrete models that are naturally e.g. finite-but-not-stable and vice-versa; they exist, but the rest is already long enough as it is).

A full example is beyond my time-budget here, but I can give the flavor of where it makes a difference somewhat quickly.

Consider a hypothetical `ProductSequence2<A:SequenceType,B:SequenceType>` that enumerates the cartesian product of two sequences (in an unspecified order); the naive implementation has problems with non-stable sequences and with infinite-sequences, but I’ll only discuss the infinite case b/c it’s more relevant here.

When either—or both—of the sequences are infinite, the order-of-iteration has actual observable consequences; as a concrete example, if you want this to work out:

ProductSequence2([1,2].cycle,[1,2].cycle).contains((2,2)) == true

…then we have this:

- if `a` is finite and `b` is not, you want to iterate like (a0,b0), (a1,b0), (a2, b0) … , (a0, b1), (a1, b1), …, etc
- if `b` is finite and `a` is not, you want to iterate like (a0,b0), (a0,b1), (a0, b2) … , (a1, b0), (a1, b1), …, etc
- if neither `a` nor `b` is finite, you want to iterate like (a0, b0), (a1,b0), (a0,b1), (a2, b0), (a1, b1), (a0, b2), … etc

…as with those orderings you *will* eventually reach each pair in the product (which seems strictly superior to an iteration order which will leave many members of the product forever un-visited).

Importantly, the third ordering is inefficient in the general case, and thus isn’t a suitable default; you’d only want it in places where you’re *intentionally* using infinite series.

This is a concrete example of the sort of thing that leaves me preferring that the standard library *not* contain infinite sequences at this time: such sequences *highly* benefit from special handling, but there’s no good way at this time to generically provide such handling in a general context within the bounds of the current language and standard library .

If there’s a realistic chance of such a refactoring being accepted if sufficiently-well-proposed I can prepare something, but I’m admittedly skeptical given the likely magnitude of the associated changes. In particular, the API on types like `ForwardIndexType` is rather unfortunate for things like a hypothetical product-collection scenario; since such products are among the stronger motivating uses, this means it’s a protocol refactoring + a lot of API changes to the existing standard library, and probably making things clunkier in the common / linear cases.

…but can understand the wish to not overly-complicate the basic protocol hierarchy (and also to avoid baking-in nice-to-have, but impossible-to-really-enforce semantic requirements; I’d trust the standard library to use them properly, but not typical 3rd party code, somewhat defeating the purpose).

Well, I guess I should have read ahead and most of my lecture above was needless! I’m posting it anyway because I think it spells out some important principles we’ll need to refer to later.

Everything else is a difference of outlook; we agree on the facts and differ in interpretation.

I consider concrete types that adopt a protocol only to simply call `fatalError` for most of the protocol methods to be pathological — often useful, but still pathological — and thus far the standard library doesn’t contain any such pathological types (to my knowledge).

`cycle` is useful but not useful enough to be the standard library’s first “pathological” type, so it’s still a -1 as proposed.

This is nothing specific to `cycle` and my opinion here could change were the language or the standard library to evolve in various ways.

You’d also want to call `fatalError` for at least `reduce`, `reverse`, `sort`, `split`(?), `flatMap`, `dropLast`, `suffix`, and `forEach`.

You can only do it for the ones defined in the protocol, not ones defined in extensions. This means map, filter, forEach, and suffix.

split is actually perfectly implementable for a CycleSequence, although it does need a custom implementation. split is bounded by at most Int.max splits, which means it is guaranteed to terminate even for an infinite sequence (although the final subsequence does need to be infinite[1]). Even if there are no separators in the cycle, it can just return the cycle itself.

[1] Interestingly, the current implementation actually dumps the remainder into an Array and returns that (wrapped in AnySequence), which is curious because it would be perfectly legal for it to just wrap the generator up in AnySequence and return that instead. I'm tempted to submit a PR to change that now, as it just seems like unnecessary work to use an array.

`startsWith` and `elementsEqual` and `lexicographicComparison` are all broken if you call them like e.g. `self.startsWith(self)`.

That's true, but what do you really expect when you're calling them with two infinite sequences? It's not so much that they're broken as it is that you're creating an infinite loop without any way to break out. And FWIW, lexicographicalCompare is actually something you might want to explicitly support on infinite sequences if you know your sequences aren't equal and want to find out which one is less than the other.

There are plenty of ways to shoot yourself in the foot. I don't think infinite sequences are really the big problem you're making them out to be.

You can conceivably implement a non-crashing `contains`, `minElement` and `maxElement` on `CycleSequence` by calling through to the wrapped collection, but that’ll seemingly evaporate as soon as your `CycleSequence` winds up hidden inside an `AnySequence`.

You can't override those anyway in a generic context, because they're not members of the protocol, they're just extensions. You could implement them such that your implementation is called when working on the concrete CycleSequence type, but I'm not sure if that's a great idea to do that when the actual behavior differs from calling it generically on SequenceType (well, triggering a fatalError() instead of an infinite loop is fine because they're both Bottom, but returning a valid result in one context and looping infinitely in the other seems bad).

Of course, these methods could actually be moved into the protocol itself, which would let us override them. I'm not entirely sure why they aren't in the protocol to begin with, I guess because there's not much need for overriding these outside of infinite sequences (well, finite sorted sequences could provide an optimized min/maxElement, and an optimized version of contains(_: Self.Generator.Element), but maybe there's tradeoffs to doing this, e.g. maybe there's some reason why having a large protocol witness table is a bad idea).

I don’t think `contains` or `minElement/maxElement` *can* be part of the protocol (in the sense of overridable) at this time (they require `Element` satisfy certain type constraints), but they certainly should be if the type system someday would support that.

Which illustrates why this is a -1 for me; there's nothing wrong with the functionality in isolation and there’s nothing wrong with infinite sequences, but the standard library should play well with itself, and this wouldn’t play well with the rest of the standard library.

Ultimately, there's not much difference between an infinite sequence and a sequence of Int.max elements. The latter is finite, but it's so massive (especially on 64-bit) that any kind of eager processing is going to hit the same problems as an infinite sequence. Every problem you describe will be a problem with the simple sequence `(0..<Int.max)` as well.

-Kevin Ballard

That opinion could change as the language changes or the standard library evolves.

On Dec 28, 2015, at 1:20 AM, Kevin Ballard via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

## Introduction

Add a new property `cycle` to CollectionType that returns an infinite SequenceType that yields the elements of the collection in a loop.

## Motivation

It's sometimes useful to be able to have an infinite sequence. For example, `CollectionOfOne(x).cycle` could be used to have an infinite sequence of a single element (similar to Repeat but without a count). A common use for infinite sequences is zipping with a finite sequence. As far as I'm aware, the stdlib does not currently provide any way to create such an infinite sequence.

## Proposed solution

Extend CollectionType with a new property `cycle` that yields a type that conforms to SequenceType. This sequence yields each element of the collection in an infinite loop.

## Detailed design

2 new types would be added:

struct CycleSequence<Base : CollectionType> : LazySequenceType { ... }
struct CycleGenerator<Base : CollectionType> : GeneratorType { ... }

CollectionType would be extended with a property:

extension CollectionType {
  public var cycle: CycleSequence<Self> { get }
}

This is an extension of CollectionType instead of SequenceType because it requires a multi-pass sequence (and SequenceType does not provide that guarantee). The returned type conforms to SequenceType instead of CollectionType because there is no possible `endIndex` that satisfies the requirement of being reachable from `startIndex` by zero or more applications of `successor()`.

Because the default eager versions of map and filter will execute forever on an infinite sequence, CycleSequence conforms to LazySequenceType instead of SequenceType in order to provide lazy versions of those functions. Additionally, it will provide implementations of the eager versions that simply trigger a fatalError(), as the alternative is an infinite loop that consumes more and more memory.

## Impact on existing code

None

-Kevin Ballard
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

On that note, core language team, anyone know offhand why
SequenceType has a bunch of extension methods that aren't part of the
protocol? For example, contains(_ predicate:). The only real reason I
can think of is to shrink the protocol witness table, but surely
that's not particularly meaningful. I warrant that contains(_
predicate:) doesn't really have any reason to be overridden by
anything except sequences that knowingly repeat elements (e.g.
CycleSequence and Repeat), and even that's only if you assume the
predicate is pure, but there's some other methods that make sense to
override on some sequences (like minElement(_ isOrderedBefore:) for
any sequence that has a defined ordering).

I am not sure how, precisely, you would propose to override the closure-
taking variant of `minElement` to take advantage of an intrinsic
ordering, but I’d be curious to see it.

Oops, you're entirely right. I wasn't thinking carefully about it; it's
the Comparable version that would be able to take advantage of an
intrinsic ordering, but of course we can't override that using the
current language rules as it requires the Equatable bound. Of course,
once we add the ability to have conditional protocol conformance, then
we can move things like that into a separate protocol and declare
SequenceType as conforming to it whenever Element : Equatable, which
would let those methods then become overridable.

Although interestingly, contains() actually has a hack right now where
it invokes a hidden SequenceType method called
_customContainsEquatableElement(). This is explicitly intended to let
sequences override the Equatable version of contains() if the sequence
knows it can do better than a linear search.

I’d hope that the non-closure-accepting variants would be made
overridable once the type system supports it.

Agreed.

-Kevin Ballard

···

On Wed, Dec 30, 2015, at 02:27 PM, plx via swift-evolution wrote: