[Pitch] Remove the single-pass requirement on Sequence

The example in my post: sequence(first: 1, next: { $0 == 4 ? nil : $0 + 1 }).dropFirst() is a great example of a sequence I wouldn't expect to be single-pass, but is.

Yes, exactly! In a world with my proposal, if people want to make single-pass sequences, they "can", in the same way that they "can" make infinite collections now. The compiler will allow it, but you still shouldn't.

This is no doubt a problem, although I do think it's less of a problem until Swift nails down its randomness story. There are resolutions to the problem, like seeding each RandomSequence once, when it is initialized. That way, you can iterate it more than once, and get the same values each time, but any new RandomSequence you make will have different elements. I'm not an expert on randomness, though, so I admit I may be a bit out of my depth here.

My issue with this proposal is that, putting aside the semantics of Sequence, the protocol has a syntactic meaning: “Instances can be iterated with a for/in loop”. I’m not comfortable with the suggestion that it should be impossible to use a for/in loop on a single-pass sequence.

Maybe the semantics should be severed from the syntax: we should have an Iterable protocol, which is refined by Sequence, which is refined by Collection. But we then have to ask ourselves, what is Sequence for? Any type which can be multipass can have reusable indices. (If nothing else, the iterator’s state can be packaged up as an “index”.) So when do we want to provide a multipass guarantee, but not want to provide indices?

Swift’s randomness story isn’t very well-defined at the moment, and it’s possible that a RandomSequence could be initialized with a seed that would create the exact same sequence each time if necessary.

There are certainly some types of RNGs which use a seed and thus could be multipass. There are also types of RNGs which really, really can’t, and they’re usually the ones with the strongest randomness guarantees. RNGs are a separate discussion (and one I’ve removed myself from), but I don’t think we should design an RNG system which specifically excludes the strongest implementations.

1 Like

I agree with Brent. It would be unfortunate if we can’t use for...in to read content of, say, a packet.

I’m not sure I understand the point. How would your idea help you know whether that value is a single-pass sequence or not, or to use it correctly? And where have you encountered such values being returned from a library without documentation that it’s single-pass, as you say occurs in “a lot of cases”?

Soroush mentioned the traversal of data structures such as linked lists, trees, and graphs. These cases seem like good examples to me.

As mentioned above, one can forcibly define indices for any multiply enumerable sequence. But that doesn't necessarily mean that such indices are semantically meaningful or of practical value. What's the natural interpretation of the endIndex of a linked list? The fact that you can torture such sequences into being indexable doesn't mean that you should, or that you should be required to.

More broadly, I don't think the sequence provider's perspective is all that relevant. I see the thrust of this proposal as being more about the consumer's perspective. Given that you want to implement a generic facility such as eachPair(), how can you write it to cover the broadest possible spectrum of inputs, and how can Swift best support that?

The current taxonomy forces some code to be written in terms of Collection when it could just as well be written in terms of multiply enumerable Sequences, were those a thing. It's hard to quantify exactly how often that happens, but my subjective impression is that it's a common situation---common enough that on reading the OP, I immediately thought, "Oh yes, I know where he's going with this." I've often required Collections as inputs because I needed reiterability and knew that in the instant case, the arguments would happen to be Collections. But I'm always aware of the code smell.

Cartesian products are a paradigmatic example. They're used every-freaking-where, and you just can't define them on Sequences unless you're willing to transcribe at least one complete sequence to fixed storage. Yes, they're easy to implement in terms of Collections, but often, you don't control the data sources. All I want is to be able to write

for elt1 in seq1 {
    for elt2 in seq2 {
        // Do something with (elt1, elt2)...
    }
}

No explicit indexes needed, wanted, or visible on the horizon.

I have the sense that this was intended as something of a straw man; nevertheless, +1 from me for this variant. :grinning:

I think there's a role for nonreenumerable sequences. I just wish they weren't as prominent in the API as they currently are. The middle layer (Sequence above) should be the general-purpose lowest common denominator. It's fine with me if Iterables don't support things like Cartesian products without assistance; they'd be genuinely lower-level constructs.

It's also not clear to me why going from two to three layers should compromise for...in (or syntax generally) in any way. Couldn't all these layers support that?

3 Likes

I also want an Iterable protocol to sever the semantics of Sequence from it’s syntax.
Not (directly) for its multi-enumerability, but for its order, in Unordered Collections.
Because I want to be able to express that while the order of elements could be decided by calls to arc4random(), each element I receive in one iteration, I will receive in a second iteration, eventually.

3 Likes

A Collection doesn’t need to be either bidirectionally or random-access traversible–that’s guaranteed by more refined protocols. But, an index has meaningful information for any multipass sequence. For one, you know you have the same value if you have the same index (inherent to being multipass), and that can be pretty valuable (no pun intended). And to answer your question, the natural interpretation of the endIndex of a linked list is that, if you have advanced from startIndex, you know that there are no other indices you haven’t visited.

Let me ask you, what “meaning” or “value” for indices of a collection would be “tortured” for linked lists, trees, or graphs but not for other Collection types that don’t conform to BidirectionalCollection or other *Collection protocols?

I think, given Nate’s feedback, in fact the current design helps you to write correct algorithms for the broadest spectrum of inputs. By ignoring the possibility of a single-pass sequence, the “easier” implementation of eachPair() would not be able to accommodate even seedable (i.e., repeatable) pseudorandom sequences, whereas a trivial reworking required by the current semantics of Sequence is correct for all conforming sequence types.

To entertain the notion of changing the protocol hierarchy of Swift sequences, this will require empiric evidence.

Core team members have stated that their intention is that types that conform to Sequence but not Collection should be rare. I asked Soroush, and I will ask you too, what libraries have you encountered where this poses a problem, and for what algorithms that you want to implement?

Your stated for elt1 in seq1 { for elt2 in seq2 { ... } } can be constrained to Sequence without problems today, so I’m not sure what that example is supposed to illustrate in the way of proposed changes.

I'm putting this first because my whole complaint may be based on a misunderstanding of the API.

If I understand it correctly, this form is syntactically fine but semantically undefined: "A conforming sequence that is not a collection is allowed to produce an arbitrary sequence of elements in [a] second for-in loop."

Is it a recommended approach to allow generic functions to assume that their input Sequences are in fact multiply iterable? (In other words, just let everything collapse at run time if inappropriate Sequences happen to be supplied as arguments?) This doesn't seem very formal, but it does have a certain simplicity and appeal. I had formerly assumed that since Sequences were not guaranteed to be reiterable, you should not presume to reiterate them.

When "torture" enters the API discussion, can "Hitler" be far behind? :-) Sorry, my fault.

You're right: indexing on something like a graph or tree is not inherently more obscure than indexing on any other abstract Collection. However, I'd argue that much of the indexing in the standard library is already pretty tortured. Lord help you if you need to do complex character-level processing on a String without first converting it to an array of Characters.

Mind you, I don't claim that there's anything wrong here. Indexing is abstract algebra, and libraries must be full-featured and formally correct. I just question whether, as an average developer who just wants to emit a sequence of values within the context of a specific app, I should be forced to define my own abstract algebra of data structure indices. It's extraneous. If nobody is ever going to call that code, I shouldn't be required to write it just to conform to the standard library's requirements.

I'm not sure I agree that the rewrite is trivial. Nate didn't post the "correct" code; Soroush did, as an example of how tedious it is to be correct under the current standards.

In this case, the "correct" code is longer and more complicated but also potentially faster. However, for something like a Cartesian product (all combinations of elements from multiple sequences), the current regime imposes additional overhead costs.

I acknowledge the rest of this as hand-waving, but I don't think libraries are the real issue here. This is about how the patterns and examples set by the standard library generalize to other code and influence the use of Swift generally. Of course robust libraries promote abstractions to their highest possible level, e.g., Collection vs. Sequence. Who cares if a library developer has to spend a few hours developing an abstract indexing algebra for a particular type of collection? Somebody will eventually use that stuff.

But, the standard library is also a general model for how to structure Swift code. Simple, everyday things should be simple. It makes sense for the standard library to accommodate these straightforward use cases even when doing so wouldn't significantly change the architecture of the standard library itself.

Given that, it makes sense for the standard protocol hierarchy to acknowledge the existence of sequences that are multiply iterable but not indexable.

For me, the idea of adding an Iterable protocol for possibly single-pass sequences falls apart when we consider exactly what it looks like. Let’s assume we add it, and now any Iterable type can be iterated using a for-in loop. Is that all the protocol does? Seems like it would be pretty useful to iterate over an Iterable and transform each element in some way, right? So let’s move map from Sequence down to Iterable. What if we want to add up all the values in an Iterable of integers? Better move reduce down, too. And so it goes with everything on Sequence right now, and you’re back where you started. I have traveled down this road many times, and no matter which of the requirements I’m questioning, I’ve always ended up at the same place.

While Sequence has its particular semantics, the methods you define can also have their own preconditions. For example, the Sequence protocol doesn’t guarantee finite-ness, but plenty of sequence methods (map, filter, reduce, etc) require that the subject be finite. So two possible options, then, are to require one of the two inputs of the cartesian product to be a Collection, or on the other hand to let them both be sequences but document the method with the requirement that one of them “yield the same elements on every iteration” or something.

2 Likes

Infinite sequences and sequences which are sequences simply for the ease of defining them: e.g., sequence(first: view, next: { $0.superview }).

A big part of the reason I started this thread was to see if I was off-base with my (exhuastive?) list of cases for when single-pass sequences were used. It seems like the most important one of these cases is randomness. For this, I refer back to my OP:

If whatever CSRNG we land on requires a single-pass nature, then perhaps it's better implemented as an Iterator that's exposed for use publicly. We can add some facility to generate a fixed number of them, like:

Array(elementsFrom: RandomIterator<Int>(), count: 10)

What other single-pass sequences are so valuable that we choose to impose the burden of handling them to all people working with sequences?

Precisely. But, orthogonal to that answer–which I could not have put better–your function really should be constrained to Collection types. The two additional guarantees you get, over and above Sequence, are that conforming types are multi-pass and finite. You actually need both of these guarantees for computing a cartesian product.

Well, actually. My takeaway from previous discussions is that one of the intentions behind the present design is deliberately to encourage (“require”) developers to provide for indexable access on all multi-pass and finite sequences. As far as I can tell, it’s not an oversight or bug, but is regarded as a feature.

Collection conformance permits departing from the documented performance requirements if your type can’t support them. With that license, there ought to be very few scenarios in which the “average developer” couldn’t provide an implementation of indices and subscripting on any Collection type without breaking a sweat.

Failing that, if you have no intention of supporting the functionality, use fatalError() and document it. There are types in the standard library shipping even today that bail using fatalError() instead of implementing certain protocol requirements. It’s not ideal, but you have that option if you simply disagree with the protocol designer’s intentions or can’t implement the requirement for whatever reason.

As I wrote to you earlier, sequence(first:next:) does not provide a multipass guarantee. This is not some implementation oversight; by design, there’s no intention ever of doing so.

Regarding infinite sequences, what multi-pass infinite sequences do you have in mind where one would not want to provide indices? As I wrote in an earlier reply, by construction, a multipass sequence has at least one meaningful set of indices, because the element you obtain on the nth iteration will always be the same element, and therefore n can be used as an index for that element.

Looking over this thread, it’s clear to me the distinction between Collection and Sequence. What I’m unsure about, which may in part be motivating the suggestion to make Sequence multipass, is what guarantees the Sequence protocol currently provides over IteratorProtocol.

Could someone clarify this point? The docs don’t seem to address this directly.

this isn’t true, you can have infinite cartesian products as long as only the outermost loop is infinite. for example

for base in 0...
{
    for offset in 0 ..< 3 
    {
        base * 8 + offset
    }
}

// 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, …

We’re not in disagreement here. Only the inner loop requires iterating over a finite sequence if you’re returning another sequence, just as only the inner loop needs to iterate over a multi-pass sequence. That’s exactly what I was referring to (and Nate as well) in writing about constraining to Collection, and this is an excellent demonstration of how the existing protocols are well designed to encourage correct usage.

Does the inner sequence have to be finite? If we’re generating the cartesian product ℕ×ℕ, wouldn’t this suffice?

for outer in 0... {
	for inner in 0... {
		print("\(outer) \(inner)")
		if (outer == inner) {
			break
		}
	}
}

Related to this, @dabrahams said this in an earlier discussion of this topic (in December 2015) (this is in response to a pitch for introducing FiniteSequence and StableSequence protocols):

First, we have tried not to create any distinct protocols with identical syntactic requirements, because we think it makes the world clearer; we think people are more likely to assign incorrect protocols when all the operations they want are available, but don’t have the right semantics. That isn’t to say we shouldn’t start doing it, but it would be a break from the past.

Higher protocol granularity has a high comprehensibility cost. Distinguishing protocols based on semantic requirements alone may make the library harder to understand. I’ve heard some people’s heads have exploded from simply encountering CollectionType.

Next, it’s a principle of generic programming that every protocol should be justified by both algorithms that exploit its requirements (e.g. extension methods) and some real-world models that can’t reasonably also conform to a more-refined protocol. For example, we have a ForwardIndexType because a singly-linked list has multipass forward traversal and can’t do bidirectional traversal. In order to evaluate any proposal for new protocols, we’d need to see all of these things.

(Edit: added the link.)

2 Likes

[quote=“GarthSnyder, post:21, topic:7964”]

Is it a recommended approach to allow generic functions to assume that their input Sequences are in fact multiply iterable? (In other words, just let everything collapse at run time if inappropriate Sequences happen to be supplied as arguments?)

No; in general a generic function that only has a Sequence constraint can’t assume the sequence is multi-pass. That’s what Collection is for.

i mean i guess that’s not really an infinite sequence, you’re doing [0, ∞) ∩ [0, outer] and that’s a finite interval. if it was a real infinite sequence, the inner loop would never terminate

Sequence isn’t a refinement of Iterator, the way that Collection is a refinement of Sequence. A Sequence is an object that gives you the ability to create an Iterator. An Iterator represents a mutating cursor that steps through the items in its Sequence one-by-one. You can kind of think of the sequence as a parent to the iterator. I think the only semantic requirement of an Iterator is that once it returns nil (which terminates the iteration) it never returns an non-nil value after that.