[Pitch] Remove the single-pass requirement on Sequence

nnnnnnnn · January 21, 2018, 8:18am

For me, the idea of adding an Iterable protocol for possibly single-pass sequences falls apart when we consider exactly what it looks like. Let's assume we add it, and now any Iterable type can be iterated using a for-in loop. Is that all the protocol does? Seems like it would be pretty useful to iterate over an Iterable and transform each element in some way, right? So let's move map from Sequence down to Iterable. What if we want to add up all the values in an Iterable of integers? Better move reduce down, too. And so it goes with everything on Sequence right now, and you're back where you started. I have traveled down this road many times, and no matter which of the requirements I'm questioning, I've always ended up at the same place.

While Sequence has its particular semantics, the methods you define can also have their own preconditions. For example, the Sequence protocol doesn't guarantee finite-ness, but plenty of sequence methods (map, filter, reduce, etc) require that the subject be finite. So two possible options, then, are to require one of the two inputs of the cartesian product to be a Collection, or on the other hand to let them both be sequences but document the method with the requirement that one of them "yield the same elements on every iteration" or something.

soroush · January 21, 2018, 6:12pm

Infinite sequences and sequences which are sequences simply for the ease of defining them: e.g., sequence(first: view, next: { $0.superview }).

A big part of the reason I started this thread was to see if I was off-base with my (exhuastive?) list of cases for when single-pass sequences were used. It seems like the most important one of these cases is randomness. For this, I refer back to my OP:

If whatever CSRNG we land on requires a single-pass nature, then perhaps it's better implemented as an Iterator that's exposed for use publicly. We can add some facility to generate a fixed number of them, like:

Array(elementsFrom: RandomIterator<Int>(), count: 10)

What other single-pass sequences are so valuable that we choose to impose the burden of handling them to all people working with sequences?

xwu · January 21, 2018, 6:36pm

Precisely. But, orthogonal to that answer--which I could not have put better--your function really should be constrained to Collection types. The two additional guarantees you get, over and above Sequence, are that conforming types are multi-pass and finite. You actually need both of these guarantees for computing a cartesian product.

Well, actually. My takeaway from previous discussions is that one of the intentions behind the present design is deliberately to encourage ("require") developers to provide for indexable access on all multi-pass and finite sequences. As far as I can tell, it's not an oversight or bug, but is regarded as a feature.

Collection conformance permits departing from the documented performance requirements if your type can't support them. With that license, there ought to be very few scenarios in which the "average developer" couldn't provide an implementation of indices and subscripting on any Collection type without breaking a sweat.

Failing that, if you have no intention of supporting the functionality, use fatalError() and document it. There are types in the standard library shipping even today that bail using fatalError() instead of implementing certain protocol requirements. It's not ideal, but you have that option if you simply disagree with the protocol designer's intentions or can't implement the requirement for whatever reason.

xwu · January 21, 2018, 6:41pm

As I wrote to you earlier, sequence(first:next:) does not provide a multipass guarantee. This is not some implementation oversight; by design, there's no intention ever of doing so.

Regarding infinite sequences, what multi-pass infinite sequences do you have in mind where one would not want to provide indices? As I wrote in an earlier reply, by construction, a multipass sequence has at least one meaningful set of indices, because the element you obtain on the nth iteration will always be the same element, and therefore n can be used as an index for that element.

MutatingFunk · January 21, 2018, 7:12pm

Looking over this thread, it's clear to me the distinction between Collection and Sequence. What I'm unsure about, which may in part be motivating the suggestion to make Sequence multipass, is what guarantees the Sequence protocol currently provides over IteratorProtocol.

Could someone clarify this point? The docs don't seem to address this directly.

taylorswift · January 21, 2018, 7:24pm

this isn’t true,, you can have infinite cartesian products as long as only the outermost loop is infinite. for example

for base in 0...
{
    for offset in 0 ..< 3 
    {
        base * 8 + offset
    }
}

// 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, …

xwu · January 21, 2018, 7:33pm

We're not in disagreement here. Only the inner loop requires iterating over a finite sequence if you're returning another sequence, just as only the inner loop needs to iterate over a multi-pass sequence. That's exactly what I was referring to (and Nate as well) in writing about constraining to Collection, and this is an excellent demonstration of how the existing protocols are well designed to encourage correct usage.

saagarjha · January 21, 2018, 11:42pm

Does the inner sequence have to be finite? If we're generating the cartesian product ℕ×ℕ, wouldn't this suffice?

for outer in 0... {
	for inner in 0... {
		print("\(outer) \(inner)")
		if (outer == inner) {
			break
		}
	}
}

ole · January 22, 2018, 12:05am

Related to this, @dabrahams said this in an earlier discussion of this topic (in December 2015) (this is in response to a pitch for introducing FiniteSequence and StableSequence protocols):

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

(Edit: added the link.)

dabrahams · January 22, 2018, 12:40am

[quote="GarthSnyder, post:21, topic:7964"]

Is it a recommended approach to allow generic functions to assume that their input Sequences are in fact multiply iterable? (In other words, just let everything collapse at run time if inappropriate Sequences happen to be supplied as arguments?)

No; in general a generic function that only has a Sequence constraint can't assume the sequence is multi-pass. That's what Collection is for.

taylorswift · January 22, 2018, 5:28am

i mean i guess that’s not really an infinite sequence, you’re doing [0, ∞) ∩ [0, outer] and that’s a finite interval. if it was a real infinite sequence, the inner loop would never terminate

soroush · January 22, 2018, 4:55pm

Sequence isn't a refinement of Iterator, the way that Collection is a refinement of Sequence. A Sequence is an object that gives you the ability to create an Iterator. An Iterator represents a mutating cursor that steps through the items in its Sequence one-by-one. You can kind of think of the sequence as a parent to the iterator. I think the only semantic requirement of an Iterator is that once it returns nil (which terminates the iteration) it never returns an non-nil value after that.

saagarjha · January 22, 2018, 7:01pm

It's the same thing as

for outer in 0... {
	for inner in 0... {
		print("\(outer) \(inner)")
	}
}

just reordered in a way that makes it countable. The difference between my sequence and one in which "the inner loop would never terminate" is that mine requires the sequence to be multi-pass, and yours maintains a single pass.

MutatingFunk · January 22, 2018, 7:51pm

Thanks for the response, but my original phrasing may have been unclear. Do the requirements for Sequence, that an object may generate iterators, allow for any useful generic functionality over Iterator itself other than the potential for multipass? Otherwise, is there an implementation reason?

The example given in the Iterator docs, of a standard Sequence implementation, appears to simply add multi-pass support to the combined Iterator & Sequence example from Sequence's doc, without providing any alternative use cases.

edit: less words.

taylorswift · January 22, 2018, 10:14pm

this is exactly the same as

for inner in 0... 
{
    print("0 \(inner)")
}

assumming 0… is an actual infinite sequence and not just 0 … Int.max

soroush · January 22, 2018, 10:30pm

One thing that Sequence can do that Iterator can't is be used in for...in loops. Anything that conforms to Sequence can be used in a for loop. Does that answer your question?

MutatingFunk · January 22, 2018, 10:39pm

Partially, but the requirement of Sequence conformance for _ in use seems arbitrary. Couldn't every Iterator be retroactively conformed to Sequence, since Sequence provides a default implementation for makeIterator() where Self: Iterator? Would such a conformance ever be invalid?

soroush · January 23, 2018, 12:03am

It wouldn't ever be invalid, but I do think it'd be a bit anti-idiomatic. However, the standard library does include an IteratorSequence which wraps an iterator and promotes it to a sequence:

https://developer.apple.com/documentation/swift/iteratorsequence

saagarjha · January 23, 2018, 12:23am

Yes, I realize that. What I'm saying is it's possible to take the cartesian product of two (countably) infinite sequences if you do it right, which was posted in response to

and

I don't buy your argument that what I did was not an infinite sequence. 0... by itself is certainly one, and my cross product will eventually iterate through all of it–if it wasn't finite it would be illegal to call it ℕ×ℕ, which is what it produces.

MutatingFunk · January 23, 2018, 1:07am

Bringing this back to the main topic:

My initial reasoning was backwards, which is obvious is retrospect - Iterator is a subset of Sequence, those which are single-pass, rather than a protocol with looser requirements than Sequence. Iterators must be single-pass, as the next() function requires storing the current index.

Given that Iterator is a strict subset of Sequence (as conforming any Iterator to Sequence is trivial), and given this subset is of every single-pass Sequence: either Iterator should be regarded as inheriting from Sequence, or Sequence should remove single-pass iteration from it's domain.

But from this perspective, it's clear that Iterator does inherit from Sequence ‘in spirit’, but in practice formal inheritance would require makeIterator() returning self be present on every Iterator, which would make for a bad API, leading to the current design.

A proposed MultipassSequence protocol would inherit from Sequence, being placed between Sequence and Collection, and represent the subset of Sequences which are not Iterators.

However, the current Sequence/Collection protocols each make important guarantees from an efficiency perspective. A MultipassSequence protocol, while providing symmetry to Iterator, would only allow certain Sequence algorithms to be more concise, but with a major tradeoff of losing genericity over single-pass Sequences.

If this summary of the situation is generally correct, I'm in support of not adding a multipass protocol.