[Pitch] Remove the single-pass requirement on Sequence

(Nate Cook) #22

For me, the idea of adding an Iterable protocol for possibly single-pass sequences falls apart when we consider exactly what it looks like. Let’s assume we add it, and now any Iterable type can be iterated using a for-in loop. Is that all the protocol does? Seems like it would be pretty useful to iterate over an Iterable and transform each element in some way, right? So let’s move map from Sequence down to Iterable. What if we want to add up all the values in an Iterable of integers? Better move reduce down, too. And so it goes with everything on Sequence right now, and you’re back where you started. I have traveled down this road many times, and no matter which of the requirements I’m questioning, I’ve always ended up at the same place.

While Sequence has its particular semantics, the methods you define can also have their own preconditions. For example, the Sequence protocol doesn’t guarantee finite-ness, but plenty of sequence methods (map, filter, reduce, etc) require that the subject be finite. So two possible options, then, are to require one of the two inputs of the cartesian product to be a Collection, or on the other hand to let them both be sequences but document the method with the requirement that one of them “yield the same elements on every iteration” or something.

(Soroush Khanlou) #23

Infinite sequences and sequences which are sequences simply for the ease of defining them: e.g., sequence(first: view, next: { $0.superview }).

A big part of the reason I started this thread was to see if I was off-base with my (exhuastive?) list of cases for when single-pass sequences were used. It seems like the most important one of these cases is randomness. For this, I refer back to my OP:

If whatever CSRNG we land on requires a single-pass nature, then perhaps it’s better implemented as an Iterator that’s exposed for use publicly. We can add some facility to generate a fixed number of them, like:

Array(elementsFrom: RandomIterator<Int>(), count: 10)

What other single-pass sequences are so valuable that we choose to impose the burden of handling them to all people working with sequences?

(Xiaodi Wu) #24

Precisely. But, orthogonal to that answer–which I could not have put better–your function really should be constrained to Collection types. The two additional guarantees you get, over and above Sequence, are that conforming types are multi-pass and finite. You actually need both of these guarantees for computing a cartesian product.

Well, actually. My takeaway from previous discussions is that one of the intentions behind the present design is deliberately to encourage (“require”) developers to provide for indexable access on all multi-pass and finite sequences. As far as I can tell, it’s not an oversight or bug, but is regarded as a feature.

Collection conformance permits departing from the documented performance requirements if your type can’t support them. With that license, there ought to be very few scenarios in which the “average developer” couldn’t provide an implementation of indices and subscripting on any Collection type without breaking a sweat.

Failing that, if you have no intention of supporting the functionality, use fatalError() and document it. There are types in the standard library shipping even today that bail using fatalError() instead of implementing certain protocol requirements. It’s not ideal, but you have that option if you simply disagree with the protocol designer’s intentions or can’t implement the requirement for whatever reason.

(Xiaodi Wu) #25

As I wrote to you earlier, sequence(first:next:) does not provide a multipass guarantee. This is not some implementation oversight; by design, there’s no intention ever of doing so.

Regarding infinite sequences, what multi-pass infinite sequences do you have in mind where one would not want to provide indices? As I wrote in an earlier reply, by construction, a multipass sequence has at least one meaningful set of indices, because the element you obtain on the nth iteration will always be the same element, and therefore n can be used as an index for that element.

(James Froggatt) #26

Looking over this thread, it’s clear to me the distinction between Collection and Sequence. What I’m unsure about, which may in part be motivating the suggestion to make Sequence multipass, is what guarantees the Sequence protocol currently provides over IteratorProtocol.

Could someone clarify this point? The docs don’t seem to address this directly.

(^) #27

this isn’t true, you can have infinite cartesian products as long as only the outermost loop is infinite. for example

for base in 0...
    for offset in 0 ..< 3 
        base * 8 + offset

// 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, …

(Xiaodi Wu) #28

We’re not in disagreement here. Only the inner loop requires iterating over a finite sequence if you’re returning another sequence, just as only the inner loop needs to iterate over a multi-pass sequence. That’s exactly what I was referring to (and Nate as well) in writing about constraining to Collection, and this is an excellent demonstration of how the existing protocols are well designed to encourage correct usage.

(Saagar Jha) #29

Does the inner sequence have to be finite? If we’re generating the cartesian product ℕ×ℕ, wouldn’t this suffice?

for outer in 0... {
	for inner in 0... {
		print("\(outer) \(inner)")
		if (outer == inner) {

(Ole Begemann) #30

Related to this, @dabrahams said this in an earlier discussion of this topic (in December 2015) (this is in response to a pitch for introducing FiniteSequence and StableSequence protocols):

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

(Edit: added the link.)

(Dave Abrahams) #31

[quote=“GarthSnyder, post:21, topic:7964”]

Is it a recommended approach to allow generic functions to assume that their input Sequences are in fact multiply iterable? (In other words, just let everything collapse at run time if inappropriate Sequences happen to be supplied as arguments?)

No; in general a generic function that only has a Sequence constraint can’t assume the sequence is multi-pass. That’s what Collection is for.

(^) #32

i mean i guess that’s not really an infinite sequence, you’re doing [0, ∞) ∩ [0, outer] and that’s a finite interval. if it was a real infinite sequence, the inner loop would never terminate

(Soroush Khanlou) #33

Sequence isn’t a refinement of Iterator, the way that Collection is a refinement of Sequence. A Sequence is an object that gives you the ability to create an Iterator. An Iterator represents a mutating cursor that steps through the items in its Sequence one-by-one. You can kind of think of the sequence as a parent to the iterator. I think the only semantic requirement of an Iterator is that once it returns nil (which terminates the iteration) it never returns an non-nil value after that.

(Saagar Jha) #34

It’s the same thing as

for outer in 0... {
	for inner in 0... {
		print("\(outer) \(inner)")

just reordered in a way that makes it countable. The difference between my sequence and one in which “the inner loop would never terminate” is that mine requires the sequence to be multi-pass, and yours maintains a single pass.

(James Froggatt) #35

Thanks for the response, but my original phrasing may have been unclear. Do the requirements for Sequence, that an object may generate iterators, allow for any useful generic functionality over Iterator itself other than the potential for multipass? Otherwise, is there an implementation reason?

The example given in the Iterator docs, of a standard Sequence implementation, appears to simply add multi-pass support to the combined Iterator & Sequence example from Sequence’s doc, without providing any alternative use cases.

edit: less words.

(^) #36

this is exactly the same as

for inner in 0... 
    print("0 \(inner)")

assumming 0… is an actual infinite sequence and not just 0 … Int.max

(Soroush Khanlou) #37

One thing that Sequence can do that Iterator can’t is be used in for...in loops. Anything that conforms to Sequence can be used in a for loop. Does that answer your question?

(James Froggatt) #38

Partially, but the requirement of Sequence conformance for _ in use seems arbitrary. Couldn’t every Iterator be retroactively conformed to Sequence, since Sequence provides a default implementation for makeIterator() where Self: Iterator? Would such a conformance ever be invalid?

(Soroush Khanlou) #39

It wouldn’t ever be invalid, but I do think it’d be a bit anti-idiomatic. However, the standard library does include an IteratorSequence which wraps an iterator and promotes it to a sequence:


(Saagar Jha) #40

Yes, I realize that. What I’m saying is it’s possible to take the cartesian product of two (countably) infinite sequences if you do it right, which was posted in response to


I don’t buy your argument that what I did was not an infinite sequence. 0... by itself is certainly one, and my cross product will eventually iterate through all of it–if it wasn’t finite it would be illegal to call it ℕ×ℕ, which is what it produces.

(James Froggatt) #42

Bringing this back to the main topic:

My initial reasoning was backwards, which is obvious is retrospect - Iterator is a subset of Sequence, those which are single-pass, rather than a protocol with looser requirements than Sequence. Iterators must be single-pass, as the next() function requires storing the current index.

Given that Iterator is a strict subset of Sequence (as conforming any Iterator to Sequence is trivial), and given this subset is of every single-pass Sequence: either Iterator should be regarded as inheriting from Sequence, or Sequence should remove single-pass iteration from it’s domain.

But from this perspective, it’s clear that Iterator does inherit from Sequence ‘in spirit’, but in practice formal inheritance would require makeIterator() returning self be present on every Iterator, which would make for a bad API, leading to the current design.

A proposed MultipassSequence protocol would inherit from Sequence, being placed between Sequence and Collection, and represent the subset of Sequences which are not Iterators.

However, the current Sequence/Collection protocols each make important guarantees from an efficiency perspective. A MultipassSequence protocol, while providing symmetry to Iterator, would only allow certain Sequence algorithms to be more concise, but with a major tradeoff of losing genericity over single-pass Sequences.

If this summary of the situation is generally correct, I’m in support of not adding a multipass protocol.