Can’t a Sequence be potentially infinite, whereas a collection has a
defined count/endIndex? Other than that, I agree with your
statement.
Here is what I see as the appropriate structure:
Iterator: Single destructive pass, potentially infinite, (should be for-in able)
Sequence: Guaranteed non-destructive multi-pass (vends Iterators),
potentially infinite, (should be subscript-able, gain most of
collection, but lose anything that relies on it ending)
Collection: Multi-pass, guaranteed finite, (no changes from current
form, except extra inits from Iterator/Sequence with end conditions)
Right now we are allowed to have an infinite sequence, but calling
dropLast or non-lazy map will cause an infinite loop. These cases
could be made much safer by considering the potentially infinite and
finite cases separately…
I think this is pointing in the right general direction. It would
make working with `Sequence` much more straightforward and allow us to
depend on the multi-pass property that is true in practice of the most
common models of `Sequence`.
But I agree that we should give much more careful consideration to
finite / infinite generally and for..in specifically.
Now that I have been thinking about the finite / infinite distinction
more closely I have begun to notice a lot of code that is written
generically using `Sequence` where a for..in loop is really what is
required, however the “finite sequence” precondition is not explicitly
stated. Interestingly, this is the case with the standard library’s
eager `map` (but not the case with `dropLast` which explicitly notes
the precondition). I have been somewhat surprised to realize how
common this “bug” is (i.e. not stating a precondition). I think we
have gotten away with it thus far because the sequences most people
use most of the time in practice are finite. But that doesn’t mean we
should accept this as good enough - IMO it is way to easy to forget to
document this precondition (and obviously easier for users to overlook
than preconditions that are actually encoded in the type system,
violations of which are caught at compile time).
The fact that this pattern is so pervasive is what I meant when I said
for..in “naturally” requires a finite sequence.
IMO it’s better to encode preconditions in the type system when that
is practical, and especially when the precondition is shared by a vast
majority of code written using a particular construct (in this case a
for..in loop written using the most generic for..in-able protocol).
I think the safest solution is to take the position that writing an
infinite loop is relatively uncommon and is a more “advanced”
technique, and thus should be done explicitly. Do people really write
infinite loops often enough that the convenience of using for..in when
writing infinite loops outweighs the safety benefit of preventing
accidental infinite loops? I haven’t seen a compelling argument for
this.
Good questions. I'd also add: “do infinite sequences come up often
enough that accidentally looping on them forever is a problem?”
That’s a good question as well. In practice the frequency of infinite
sequences likely depends on the domain.
IMO this falls into the same category as “do single pass sequences
come up often enough that attempting to iterate over them twice is a
problem. To paraphrase your previous post:
Today, people see a beautiful, simple protocol (Sequence) to which many
things conform. They don't recognize that there's a semantic restriction
(you can’t assume it is finite!) on it, so they write libraries
of functions that may iterate a Sequence to termination. They test
their libraries with the most commonly-available Sequences, e.g. Arrays
and Ranges, which happen to be finite. Their tests pass! But their
constraints are wrong, their whole model of how to write generic code
over sequences is wrong, and some of their code is wrong.
IMO this is a problematic programming model.
I definitely don’t mean to put words in your mouth here, but the
logical structure of the argument appears identical to me regardless
of which issue it is applied to. I am only trying to make that point.
Oh, I fully agree. We're *in* this discussion because neither
single-pass nor infinite sequences come up very often. It raises the
question of whether the conceptual framework ought to accomodate them at
all, and if they should be included, how they should be represented.
We’ve considered good practical examples of both. They exist and people use them. I think we’re better off having a common vocabulary for them rather than requiring everyone who uses these concepts to invent their own.
I agree that the question of how best to represent them is crucial.
The quotation you cited above isn't arguing that single-pass sequences
are important enough to represent. It is trying to point out that the
refinement relationship between single- and multipass sequences has an
undesirable effect. I'm certain the same thing could occur for infinite
and finite sequences.
I’m not sure I agree. Neither are clearly modeled in the current library so I don’t think we know what the impact would be if they were. I’m willing to bet that people would choose the correct constraints if they were available (modeled accurately in the standard library, well documented, etc).
What both of these *definitely* have in common is that they are purely semantic requirements that cannot be stated explicitly as requirements using syntax of the language. You have reminded us many times that protocols should not just be about syntax, but should also emphasize semantics. We shouldn’t shy away from introducing a protocol that doesn’t add any syntactic requirements if it is necessary to capture important semantic requirements.
FWIW, Max pointed out to me on Friday that Scala's concept for
possibly-single-pass sequences is called “TraversibleOnce.” I think
that name goes a long way to solving the problem. I'm not sure how we'd
apply the same idea to finite/infinite sequences, though.
I’ve was thinking about how to best represent these while mowing the lawn this afternoon and had an interesting thought. If we add one additional protocol and pin down some of the semantics that have been discussed in this thread we could have a pretty straightforward model that captures the various semantics using a protocol in each of 4 quadrants:
Single Pass Multi Pass
Potentially Sequence
Infinite IteratorProtocol <-------------------------| makeIterator()
> >
> Collection
Finite FiniteIteratorProtocol <------------------- makeIterator()
The semantic refinement on `Sequence` is that it must now produce a new iterator every time one is requested (and they must all produce identical sequences of values as long as the sequence was not mutated in between calls to `makeIterator`).
The semantic refinement on `Collection` is that it must now be finite and its `Iterator` must conform to `FiniteIteratorProtocol` rather than `IteratorProtocol`.
`FiniteIteratorProtocol` is the new protocol which introduces the semantic requirement that calls to `next` must “eventually” produce `nil`. “Eventually” might be a bit fuzzy but should carry the intent that your program won’t hang or be killed by the OS (i.e. diverge) if you iterate until `nil` is reached.
You mentioned in another thread that for..in wouldn’t necessarily have to use a single protocol (or refinement relationship). Under this model I would propose that we make for..in operate on top of `FiniteIteratorProtocol`, but also have the ability to get the iterator from a `Collection`. (I’m using still advocating for restricting the for..in sugar to finite iteration but the same idea would work at the “potentially infinite” level using `IteratorProtocol and `Sequence`). The only change from current state (aside from the finite restriction) is to allow you to directly provide an iterator to the for..in loop rathe than requiring the compiler to get a new one via a call to `makeIterator`.
If we do adopt something like this model, I think a couple of guidelines are clear regardless of whether for..in works exclusively with finite concepts or also admits potentially infinite ones. All code that requires terminating iteration should be written with `Collection` or `FiniteIteratorProtocol`, not `Sequence` or `IteratorProtocol`. Following from this, all finite sequences should conform to `Collection`, not just `Sequence` so it can continue to work with all code that requires finite sequences. One advantage of restricting for..in to the finite concepts is that it encourages people to follow this guideline so that their types will be usable with for..in, finite constraints, etc.
In order to make it easy to follow the second guideline, the library should strive to make it as easy as possible to implement forward only `Collection` if you can vend an iterator and guarantee it is finite. Ideally it would be no more difficult than it is to implement `Sequence` today. Further customization of the implementation would be done as an optimization, not just to achieve basic capabilities.
One other topic that came up is whether the single pass concepts should be required to have reference semantics. If we go in that direction it is clear that in the previous model `IteratorProtocol` would have a class requirement which would be inherited by `FiniteIteratorProtocol`.
However, I am not convinced we should require reference semantics here. This requirement doesn’t exist today and I haven’t heard anyone say it has directly been a problem. The problem seems to be the due to the fact that `Sequence` is missing a semantic requirement everyone expects. I don’t know the rationale that was originally behind that decision and am curious. Is it because for..in works via `Sequence` and it was desirable to have for..in work with single pass sequences? The above suggested tweak to how for..in addresses that.
Further, consider an algorithm which has multiple “choices” available and would like to be able to take advantage of backtracking. This can be achieved trivially with value semantic iterators by simply making a copy at any time during iteration and returning to the copy if a dead end is reached. You could sort of mimic this with a reference semantic iterator by conforming the iterator itself to `Sequence` or `Collection` and calling `makeIterator` to “clone” it. Obviously this would only be possible with intimate knowledge of the iterator and would usually not be possible retroactively, whereas it would be trivial to take advantage of iterators that have value semantics.
-Matthew
···
On Jun 27, 2016, at 2:41 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Mon Jun 27 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/>> wrote:
On Jun 27, 2016, at 11:46 AM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:
on Mon Jun 27 2016, Matthew Johnson <matthew-AT-anandabits.com <http://matthew-at-anandabits.com/> <http://matthew-at-anandabits.com/>> wrote:
On Jun 26, 2016, at 10:56 PM, Jonathan Hull via swift-evolution >>>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
If we adopt that position then for..in would need to be built on top
of a guaranteed finite construct. This would allow programmers to
continue writing generic code agains the most generic for..in-able
construct while eliminating a precondition that is often (usually?)
unstated and likely unconsidered.
If we do decide to move forward with infinite for..in loops I think we
need to establish strong guidance around how to properly write generic
code with these protocols. Should such code really be constrained to
`Collection` rather than `Sequence` (i.e. should a potentially
infinite `Sequence` have an eager map)? If this is the guidance,
should it be paired with guidance that all finite sequences should
conform to `Collection`? Or is it sufficient to just educate
developers about this issue and expect people to document the “finite
Sequence” precondition when the constraint is `Sequence` rather than
`Collection`?
I hope we will give serious consideration to these questions while
this topic is open for discussion.
-Matthew
Thanks,
Jon
on Wed Jun 22 2016, David Waite <david-AT-alkaline-solutions.com <http://david-at-alkaline-solutions.com/> >>>>>> <http://david-at-alkaline-solutions.com/ <http://david-at-alkaline-solutions.com/>>> wrote:
On Jun 22, 2016, at 2:57 PM, Dave Abrahams via swift-evolution >>>>>>>> <swift-evolution at swift.org <http://swift.org/> <http://swift.org/> >>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution >>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>>> >>>>>>>> wrote:
<Ahem> “Iterators,” please.
That makes me happy - for some reason I thought it was still GeneratorProtocol
destructively, but such Generators would not conform to the needs of
Sequence. As such, the most significant impact would be the inability
to use such Generators in a for..in loop,
Trying to evaluate this statement, it's clear we're missing lots of
detail here:
* Would you remove Sequence?
* If so, what Protocol would embody “for...in-able?”
No, I would just remove the allowance in the documentation and API
design for a destructive/consuming iteration. Sequence would be the
interface to getting access to repeatable iteration, without the need
for meeting the other requirements for Collection.
That would be wrong unless there exist substantial examples of a
multipass Sequence that *can't* meet the other requirements of
Collection without loss of efficiency. And since I can write an adaptor
that turns any multipass sequence into a Collection, I think it's
trivial to prove that no such examples exist.
--
-Dave
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution
--
-Dave
--
Dave