[Proposal] Change guarantee for GeneratorType.next() to always return nil past end


(Patrick Pijnappel) #1

*Situation*
Currently GeneratorType.next() requires callers to not call next() after it
has returned nil once, even encouraging a preconditionFailure() if this is
violated:

  /// - Requires: `next()` has not been applied to a copy of `self`

  /// since the copy was made, and no preceding call to `self.next()`

  /// has returned `nil`. Specific implementations of this protocol

  /// are encouraged to respond to violations of this requirement by

  /// calling `preconditionFailure("...")`.

However, all 28 GeneratorTypes in the standard library actually do return
nil on repeated calls past the end.

*Silent corner case*
Because basically all generators keep returning nil, it's not unlikely
people will write their code based on the assumption it will always return
nil (breaking the requirement) and that will almost always work – until
someone passes in a GeneratorType that actually does raise the recommended
preconditionFailure(). It becomes a silent corner case.

*Adds caller burden*
To avoid breaking the requirement, the caller will not uncommonly have to
track extra state and branch (e.g. keep a done/atEnd boolean). For example
the UTF-8 & UTF-16 decoders do this (incidentally introducing an extra
branch into a hot code path), while the UTF-32 decoder doesn't actually
check – passing the requirement on to its caller (without this being
documented). From personal experience implementing several custom
generators I've found that respecting the requirement invariably adds
complexity to the implementation.

*Doesn't prevent bugs*
There seems to be little advantage to having it crash on past-the-end calls
to next(). So far I haven't seen any examples where not crashing would
silently hide a bug – and again even if there was it wouldn't help much
considering almost no generators actually do crash.

*Proposal*
Change the guarantee for GeneratorType.next() to keep returning nil
indefinitely after it has been exhausted.


(Dmitri Gribenko) #2

Situation
Currently GeneratorType.next() requires callers to not call next() after it
has returned nil once, even encouraging a preconditionFailure() if this is
violated:

  /// - Requires: `next()` has not been applied to a copy of `self`
  /// since the copy was made, and no preceding call to `self.next()`
  /// has returned `nil`. Specific implementations of this protocol
  /// are encouraged to respond to violations of this requirement by
  /// calling `preconditionFailure("...")`.

I'd like to add more context to this discussion. We added this
requirement a while ago. [1] The reason for introducing it was not
an attempt to flag bugs in client code. Rather, we were not convinced
that all generators can return nil repeatedly without loss of
efficiency or extra storage burden.

[1] https://github.com/apple/swift/commit/304b4f33ae74a5abd09da485bbc435dfa2ade522
and rdar://problem/17392226

Adds caller burden
To avoid breaking the requirement, the caller will not uncommonly have to track extra state and branch

I would actually say the opposite -- running a non-trivial algorithm
on generators is a very uncommon thing to do. The 99% use case for
generators is implicit usage from the for-in loop. This is why
allowing generators to be as simple as possible and pushing the
requirement for extra branches into non-trivial algorithms made sense
for us when we introduced this requirement.

Silent corner case
Because basically all generators keep returning nil, it's not unlikely people will write their code based on the assumption it will always return nil

This is what concerns me the most about the current rules.

Dmitri

···

On Wed, Mar 2, 2016 at 10:47 PM, Patrick Pijnappel via swift-evolution <swift-evolution@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Haravikk) #3

+1 from me on the grounds that I had no idea that this was an expectation of GeneratorType.next(), so yeah; all my generators currently return `nil` infinitely past the end. Does anyone know the history of this requirement? Perhaps it’s a newer one and I didn’t notice it being added?

The only advantage to a failure vs continuing nil that I can think of is if the generator is returning an optional type anyway; if the developer doesn’t check correctly for nil from the generator vs a stored value of nil then they could end up with an infinite loop, but this seems unlikely given that they’d be dealing with a type of Element?? rather than Element?, so whatever they’re doing with the results should fail due to type mismatches anyway.

So yeah, there’s possibly (and I stress possibly) a narrow case in which failure might catch a bug, but in most cases the optional behaviour should be enough, and the generator itself should be handling indices correctly internally to prevent issues with stepping past .endIndex etc.

···

On 3 Mar 2016, at 06:47, Patrick Pijnappel via swift-evolution <swift-evolution@swift.org> wrote:

Situation
Currently GeneratorType.next() requires callers to not call next() after it has returned nil once, even encouraging a preconditionFailure() if this is violated:

  /// - Requires: `next()` has not been applied to a copy of `self`
  /// since the copy was made, and no preceding call to `self.next()`
  /// has returned `nil`. Specific implementations of this protocol
  /// are encouraged to respond to violations of this requirement by
  /// calling `preconditionFailure("...")`.

However, all 28 GeneratorTypes in the standard library actually do return nil on repeated calls past the end.

Silent corner case
Because basically all generators keep returning nil, it's not unlikely people will write their code based on the assumption it will always return nil (breaking the requirement) and that will almost always work – until someone passes in a GeneratorType that actually does raise the recommended preconditionFailure(). It becomes a silent corner case.

Adds caller burden
To avoid breaking the requirement, the caller will not uncommonly have to track extra state and branch (e.g. keep a done/atEnd boolean). For example the UTF-8 & UTF-16 decoders do this (incidentally introducing an extra branch into a hot code path), while the UTF-32 decoder doesn't actually check – passing the requirement on to its caller (without this being documented). From personal experience implementing several custom generators I've found that respecting the requirement invariably adds complexity to the implementation.

Doesn't prevent bugs
There seems to be little advantage to having it crash on past-the-end calls to next(). So far I haven't seen any examples where not crashing would silently hide a bug – and again even if there was it wouldn't help much considering almost no generators actually do crash.

Proposal
Change the guarantee for GeneratorType.next() to keep returning nil indefinitely after it has been exhausted.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Patrick Pijnappel) #4

Hmm I see.

Do we have any example cases where returning nil repeatedly would require
extra branches or state?

The generators in the standard library don't (*) – the usual pattern is
either of the following:
- 1) Check state if we're at end, if so return nil 2) get return value 3)
advance state. Since the state is not mutated before returning nil,
repeating nil is automatic.
- 1) Call next() on one or more wrapped generators 2) return some
transformation of that. If the wrapped generators repeat nil, repeating nil
is also automatic for the wrapper.

If you would have a generator setup that doesn't automatically repeat nil,
omitting a nil-repeat check might be dangerous considering the risk other
code hadn't considered the case.

(*) StrideThroughGenerator & ZipGenerator have a done flag, but need these
even without repeating nil. JoinGenerator has an .End state but actually
doesn't have to – even to repeat nil.

···

On Thu, Mar 3, 2016 at 8:12 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Wed, Mar 2, 2016 at 10:47 PM, Patrick Pijnappel via swift-evolution > <swift-evolution@swift.org> wrote:
> Situation
> Currently GeneratorType.next() requires callers to not call next() after
it
> has returned nil once, even encouraging a preconditionFailure() if this
is
> violated:
>
> /// - Requires: `next()` has not been applied to a copy of `self`
> /// since the copy was made, and no preceding call to `self.next()`
> /// has returned `nil`. Specific implementations of this protocol
> /// are encouraged to respond to violations of this requirement by
> /// calling `preconditionFailure("...")`.

I'd like to add more context to this discussion. We added this
requirement a while ago. [1] The reason for introducing it was not
an attempt to flag bugs in client code. Rather, we were not convinced
that all generators can return nil repeatedly without loss of
efficiency or extra storage burden.

[1]
https://github.com/apple/swift/commit/304b4f33ae74a5abd09da485bbc435dfa2ade522
and rdar://problem/17392226

> Adds caller burden
> To avoid breaking the requirement, the caller will not uncommonly have
to track extra state and branch

I would actually say the opposite -- running a non-trivial algorithm
on generators is a very uncommon thing to do. The 99% use case for
generators is implicit usage from the for-in loop. This is why
allowing generators to be as simple as possible and pushing the
requirement for extra branches into non-trivial algorithms made sense
for us when we introduced this requirement.

> Silent corner case
> Because basically all generators keep returning nil, it's not unlikely
people will write their code based on the assumption it will always return
nil

This is what concerns me the most about the current rules.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dave Abrahams) #5

Hmm I see.

Do we have any example cases where returning nil repeatedly would require
extra branches or state?

Off the top of my head: a stream of random numbers that stops when it
encounters a zero.

The generators in the standard library don't (*) – the usual pattern is
either of the following:
- 1) Check state if we're at end, if so return nil 2) get return value 3)
advance state. Since the state is not mutated before returning nil,
repeating nil is automatic.
- 1) Call next() on one or more wrapped generators 2) return some
transformation of that. If the wrapped generators repeat nil, repeating nil
is also automatic for the wrapper.

If you would have a generator setup that doesn't automatically repeat nil,
omitting a nil-repeat check might be dangerous considering the risk other
code hadn't considered the case.

(*) StrideThroughGenerator & ZipGenerator have a done flag, but need these
even without repeating nil. JoinGenerator has an .End state but actually
doesn't have to – even to repeat nil.

What algorithms or components can be simplified by taking advantage of
this extra guarantee? If the category of code that can use it is
broader than the category of generators that would suffer an overhead or
implementation complexity, it might be worth doing.

My intuition is that both categories are small.

···

on Thu Mar 03 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:

On Thu, Mar 3, 2016 at 8:12 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Wed, Mar 2, 2016 at 10:47 PM, Patrick Pijnappel via swift-evolution >> <swift-evolution@swift.org> wrote:
> Situation
> Currently GeneratorType.next() requires callers to not call next() after
it
> has returned nil once, even encouraging a preconditionFailure() if this
is
> violated:
>
> /// - Requires: `next()` has not been applied to a copy of `self`
> /// since the copy was made, and no preceding call to `self.next()`
> /// has returned `nil`. Specific implementations of this protocol
> /// are encouraged to respond to violations of this requirement by
> /// calling `preconditionFailure("...")`.

I'd like to add more context to this discussion. We added this
requirement a while ago. [1] The reason for introducing it was not
an attempt to flag bugs in client code. Rather, we were not convinced
that all generators can return nil repeatedly without loss of
efficiency or extra storage burden.

[1]
https://github.com/apple/swift/commit/304b4f33ae74a5abd09da485bbc435dfa2ade522
and rdar://problem/17392226

> Adds caller burden
> To avoid breaking the requirement, the caller will not uncommonly have
to track extra state and branch

I would actually say the opposite -- running a non-trivial algorithm
on generators is a very uncommon thing to do. The 99% use case for
generators is implicit usage from the for-in loop. This is why
allowing generators to be as simple as possible and pushing the
requirement for extra branches into non-trivial algorithms made sense
for us when we introduced this requirement.

> Silent corner case
> Because basically all generators keep returning nil, it's not unlikely
people will write their code based on the assumption it will always return
nil

This is what concerns me the most about the current rules.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave


(Lily Ballard) #6

Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors
(https://github.com/apple/swift-evolution/pull/95) would hit this case.
Both of those adaptors would need to keep around extra state and in
order to keep returning nil.

My preferred solution for this case is to add a new Generator adaptor
called a FuseGenerator, with a convenience method .fuse(). All this
adaptor does is include the extra state in order to ensure it keeps
returning nil forever. This way Generators don't have to keep the state
for that guarantee, and the majority case where client codes doesn't
rely on this guarantee doesn't need the check either, and in the rare
case where this guarantee is important all the user has to do is call
.fuse() on the generator and use the result of that instead.

All that said, I would be strongly in favor of dropping the language
about triggering a precondition failure. I'd prefer to leave it as implementation-
defined behavior, which an encouragement to keep returning nil if it's
easy to do so. A benefit of this is Generators could opt to explicitly
define their post-nil behavior, e.g. TakeWhileGenerator could explicitly
document that after it has returned nil, subsequent calls to .next()
will continue to consume the underlying generator and return another
stream of elements terminating in `nil` (with the caveat that if the
underlying generator is exhausted then behavior depends on the
underlying generator's post-nil behavior). Granted, this probably isn't
useful in most cases, but it could be useful upon occasion as a way to
lazily split a sequence without building intermediate data structures
(assuming that the underlying generator is fused or defines its post-nil
behavior as returning nil forever).

FWIW, Rust uses precisely the solution I've described here (and in fact
I'm responsible for its std::iter::Fuse iterator). It defines
Iterator::next() such that calling .next() after it has returned None
may or may not return more elements (but Iterators are not supposed to
assert in this case, they should always return something). And it has
the .fuse() convenience method that returns a std::iter::Fuse iterator
that provides the always-returns-None guarantee. And in practice, almost
nobody ever has to actually use .fuse(), since almost nobody writes
algorithms that cares about the behavior after next() returns None (and
in the rare case where they do, they're typically using some concrete
Iterator that has defined behavior, as opposed to working on arbitrary
Iterators).

-Kevin Ballard

···

On Thu, Mar 3, 2016, at 03:24 AM, Patrick Pijnappel via swift-evolution wrote:

Hmm I see.

Do we have any example cases where returning nil repeatedly would
require extra branches or state?


(Brent Royal-Gordon) #7

What algorithms or components can be simplified by taking advantage of
this extra guarantee? If the category of code that can use it is
broader than the category of generators that would suffer an overhead or
implementation complexity, it might be worth doing.

My intuition is that both categories are small.

I do agree that both categories are probably small. But they have very different consequences:

* If you make the guarantee, certain rare types of generators will need a little more storage and an extra branch.
* If you don't make the guarantee, code which works most of the time will fail when combined with certain generators.

It seems like we're facing a tradeoff between a tiny efficiency gain in rare cases and predictable semantics in all cases. I'm not convinced efficiency is the right choice here.

···

--
Brent Royal-Gordon
Architechies


(Patrick Pijnappel) #8

What algorithms or components can be simplified by taking advantage of this
extra guarantee?

Any generator that somehow buffers their underlying generator (as it can't
tell whether it already tried to refill the buffer). For example UTF8 &
UTF16's decode() both have 3 instead of 2 branches on ASCII input because
of this.

Off the top of my head: a stream of random numbers that stops when it
encounters

a zero.

You could generate the next random number in advance (you take a O(1) hit
instead of O(n)). Of course consuming more than you need is not always
allowed and the O(1) could outweigh the branch.

Overall I'd say performance-wise both categories are small (though taking
the standard library as sample case, we have some examples of the former
but not the latter).

We'd be left with the safety concern of code which fails in rare corner
cases.

···

On Sat, Mar 5, 2016 at 11:30 AM, Dave Abrahams via swift-evolution < swift-evolution@swift.org> wrote:

on Thu Mar 03 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:

> Hmm I see.
>
> Do we have any example cases where returning nil repeatedly would require
> extra branches or state?

Off the top of my head: a stream of random numbers that stops when it
encounters a zero.

>
> The generators in the standard library don't (*) – the usual pattern is
> either of the following:
> - 1) Check state if we're at end, if so return nil 2) get return value 3)
> advance state. Since the state is not mutated before returning nil,
> repeating nil is automatic.
> - 1) Call next() on one or more wrapped generators 2) return some
> transformation of that. If the wrapped generators repeat nil, repeating
nil
> is also automatic for the wrapper.
>
> If you would have a generator setup that doesn't automatically repeat
nil,
> omitting a nil-repeat check might be dangerous considering the risk other
> code hadn't considered the case.
>
> (*) StrideThroughGenerator & ZipGenerator have a done flag, but need
these
> even without repeating nil. JoinGenerator has an .End state but actually
> doesn't have to – even to repeat nil.

What algorithms or components can be simplified by taking advantage of
this extra guarantee? If the category of code that can use it is
broader than the category of generators that would suffer an overhead or
implementation complexity, it might be worth doing.

My intuition is that both categories are small.

>
>
> On Thu, Mar 3, 2016 at 8:12 PM, Dmitri Gribenko <gribozavr@gmail.com> > wrote:
>
>> On Wed, Mar 2, 2016 at 10:47 PM, Patrick Pijnappel via swift-evolution > >> <swift-evolution@swift.org> wrote:
>> > Situation
>> > Currently GeneratorType.next() requires callers to not call next()
after
>> it
>> > has returned nil once, even encouraging a preconditionFailure() if
this
>> is
>> > violated:
>> >
>> > /// - Requires: `next()` has not been applied to a copy of `self`
>> > /// since the copy was made, and no preceding call to
`self.next()`
>> > /// has returned `nil`. Specific implementations of this protocol
>> > /// are encouraged to respond to violations of this requirement by
>> > /// calling `preconditionFailure("...")`.
>>
>> I'd like to add more context to this discussion. We added this
>> requirement a while ago. [1] The reason for introducing it was not
>> an attempt to flag bugs in client code. Rather, we were not convinced
>> that all generators can return nil repeatedly without loss of
>> efficiency or extra storage burden.
>>
>> [1]
>>
https://github.com/apple/swift/commit/304b4f33ae74a5abd09da485bbc435dfa2ade522
>> and rdar://problem/17392226
>>
>> > Adds caller burden
>> > To avoid breaking the requirement, the caller will not uncommonly have
>> to track extra state and branch
>>
>> I would actually say the opposite -- running a non-trivial algorithm
>> on generators is a very uncommon thing to do. The 99% use case for
>> generators is implicit usage from the for-in loop. This is why
>> allowing generators to be as simple as possible and pushing the
>> requirement for extra branches into non-trivial algorithms made sense
>> for us when we introduced this requirement.
>>
>> > Silent corner case
>> > Because basically all generators keep returning nil, it's not unlikely
>> people will write their code based on the assumption it will always
return
>> nil
>>
>> This is what concerns me the most about the current rules.
>>
>> Dmitri
>>
>> --
>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Brent Royal-Gordon) #9

Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors (https://github.com/apple/swift-evolution/pull/95) would hit this case. Both of those adaptors would need to keep around extra state and in order to keep returning nil.

How much extra state? If it's merely "an extra Bool" or "wrap the inner generator in an Optional and nil it when you hit the end", it might be worth it.

(Actually, can we see the implementations you're thinking of using? Because the ones I can imagine for `dropWhile`, at least, would keep returning `nil` as long as the underlying generator did.)

All that said, I would be strongly in favor of dropping the language about triggering a precondition failure. I'd prefer to leave it as implementation-defined behavior, which an encouragement to keep returning nil if it's easy to do so.

I think "try to return `nil` but it's okay if you can't" is the worst of all worlds from a reliability standpoint. If most, but not all, generators continue to return `nil`, then people will write code assuming that all generators continue to return `nil`.

One thing that might help is to make sure we have examples of the full range of generator behaviors—multi-pass and single-pass, nil forever and not, etc.—in the standard library and list an example of each in the SequenceType documentation along with an encouragement to test with all of them. But a lot of people will miss that.

(Another reason to lean towards always returning `nil`: it's testable. In fact, I was thinking it might be nice to have something in XCTest where you could hand it a Sequence, or perhaps a closure that creates one, and an array containing the values you expect it to have, and it would assert the basic properties we expect from a Sequence, like that the elements come out in the right order, it returns nil at the end, `underestimateCount()` is not incorrect, etc. The `nil` semantic, unlike the other possible ones, is something you could include in that test.)

···

--
Brent Royal-Gordon
Architechies


(Dave Abrahams) #10

Hmm I see.

Do we have any example cases where returning nil repeatedly would
require extra branches or state?

Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors
(https://github.com/apple/swift-evolution/pull/95) would hit this case.
Both of those adaptors would need to keep around extra state and in
order to keep returning nil.

Are there realistic cases where this state and check would produce
measurable overhead?

Do you have an implementation somewhere I could look at?

My preferred solution for this case is to add a new Generator adaptor
called a FuseGenerator, with a convenience method .fuse(). All this
adaptor does is include the extra state in order to ensure it keeps
returning nil forever. This way Generators don't have to keep the state
for that guarantee, and the majority case where client codes doesn't
rely on this guarantee doesn't need the check either, and in the rare
case where this guarantee is important all the user has to do is call
.fuse() on the generator and use the result of that instead.

Between the lack of predictability being addressed here and the addition
of a rarely-used additional component, that's a lot of complexity in the
library; is there a clear benefit?

All that said, I would be strongly in favor of dropping the language
about triggering a precondition failure. I'd prefer to leave it as implementation-
defined behavior, which an encouragement to keep returning nil if it's
easy to do so. A benefit of this is Generators could opt to explicitly
define their post-nil behavior, e.g. TakeWhileGenerator could explicitly
document that after it has returned nil, subsequent calls to .next()
will continue to consume the underlying generator and return another
stream of elements terminating in `nil` (with the caveat that if the
underlying generator is exhausted then behavior depends on the
underlying generator's post-nil behavior). Granted, this probably isn't
useful in most cases, but it could be useful upon occasion as a way to
lazily split a sequence without building intermediate data structures
(assuming that the underlying generator is fused or defines its post-nil
behavior as returning nil forever).

TakeWhileGenerator could also provide a method that returns the
TakeWhileGenerator for the next segment. Since you have to know that
you've got a TakeWhileGenerator to take advantage of this special
behavior, a different interface is just as useful and probably results
in clearer code.

FWIW, Rust uses precisely the solution I've described here (and in fact
I'm responsible for its std::iter::Fuse iterator). It defines
Iterator::next() such that calling .next() after it has returned None
may or may not return more elements (but Iterators are not supposed to
assert in this case, they should always return something). And it has
the .fuse() convenience method that returns a std::iter::Fuse iterator
that provides the always-returns-None guarantee. And in practice, almost
nobody ever has to actually use .fuse(), since almost nobody writes
algorithms that cares about the behavior after next() returns None (and
in the rare case where they do, they're typically using some concrete
Iterator that has defined behavior, as opposed to working on arbitrary
Iterators).

This is about balancing performance, flexibility, simplicity of
specification, and predictability. Rust has a very C++-like approach to
zero-overhead abstractions, in the sense that it's willing to introduce
many small wrinkles to avoid even the theoretical possibility of a
performance penalty. Having cut my language/library-design teeth in the
C++ world, I understand that approach really well, but I have also seen
it do some damage (some of which I was personally responsible for!) and
I want to be more conservative with Swift. In short, I'm still leaning
towards specifying post-nil behavior, and I'd want to have more evidence of
the value of leaving it unspecified before doing so.

···

on Sat Mar 05 2016, Kevin Ballard <swift-evolution@swift.org> wrote:

On Thu, Mar 3, 2016, at 03:24 AM, Patrick Pijnappel via swift-evolution wrote:

--
-Dave


(Dave Abrahams) #11

Me neither; I'm just trying to focus the analysis.

···

on Fri Mar 04 2016, Brent Royal-Gordon <swift-evolution@swift.org> wrote:

What algorithms or components can be simplified by taking advantage of
this extra guarantee? If the category of code that can use it is
broader than the category of generators that would suffer an overhead or
implementation complexity, it might be worth doing.

My intuition is that both categories are small.

I do agree that both categories are probably small. But they have very different consequences:

* If you make the guarantee, certain rare types of generators will
  need a little more storage and an extra branch.

* If you don't make the guarantee, code which works most of the time
  will fail when combined with certain generators.

It seems like we're facing a tradeoff between a tiny efficiency gain
in rare cases and predictable semantics in all cases. I'm not
convinced efficiency is the right choice here.

--
-Dave


(Dave Abrahams) #12

What algorithms or components can be simplified by taking advantage of this
extra guarantee?

Any generator that somehow buffers their underlying generator (as it can't
tell whether it already tried to refill the buffer). For example UTF8 &
UTF16's decode() both have 3 instead of 2 branches on ASCII input because
of this.

Off the top of my head: a stream of random numbers that stops when it
encounters
a zero.

You could generate the next random number in advance (you take a O(1) hit
instead of O(n)). Of course consuming more than you need is not always
allowed and the O(1) could outweigh the branch.

Overall I'd say performance-wise both categories are small (though taking
the standard library as sample case, we have some examples of the former
but not the latter).

We'd be left with the safety concern of code which fails in rare corner
cases.

Thanks, this is helpful. I'm leaning in favor of making the change.

···

on Fri Mar 04 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:

On Sat, Mar 5, 2016 at 11:30 AM, Dave Abrahams via swift-evolution < > swift-evolution@swift.org> wrote:

on Thu Mar 03 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:

> Hmm I see.
>
> Do we have any example cases where returning nil repeatedly would require
> extra branches or state?

Off the top of my head: a stream of random numbers that stops when it
encounters a zero.

>
> The generators in the standard library don't (*) – the usual pattern is
> either of the following:
> - 1) Check state if we're at end, if so return nil 2) get return value 3)
> advance state. Since the state is not mutated before returning nil,
> repeating nil is automatic.
> - 1) Call next() on one or more wrapped generators 2) return some
> transformation of that. If the wrapped generators repeat nil, repeating
nil
> is also automatic for the wrapper.
>
> If you would have a generator setup that doesn't automatically repeat
nil,
> omitting a nil-repeat check might be dangerous considering the risk other
> code hadn't considered the case.
>
> (*) StrideThroughGenerator & ZipGenerator have a done flag, but need
these
> even without repeating nil. JoinGenerator has an .End state but actually
> doesn't have to – even to repeat nil.

What algorithms or components can be simplified by taking advantage of
this extra guarantee? If the category of code that can use it is
broader than the category of generators that would suffer an overhead or
implementation complexity, it might be worth doing.

My intuition is that both categories are small.

>
>
> On Thu, Mar 3, 2016 at 8:12 PM, Dmitri Gribenko <gribozavr@gmail.com> >> wrote:
>
>> On Wed, Mar 2, 2016 at 10:47 PM, Patrick Pijnappel via swift-evolution >> >> <swift-evolution@swift.org> wrote:
>> > Situation
>> > Currently GeneratorType.next() requires callers to not call next()
after
>> it
>> > has returned nil once, even encouraging a preconditionFailure() if
this
>> is
>> > violated:
>> >
>> > /// - Requires: `next()` has not been applied to a copy of `self`
>> > /// since the copy was made, and no preceding call to
`self.next()`
>> > /// has returned `nil`. Specific implementations of this protocol
>> > /// are encouraged to respond to violations of this requirement by
>> > /// calling `preconditionFailure("...")`.
>>
>> I'd like to add more context to this discussion. We added this
>> requirement a while ago. [1] The reason for introducing it was not
>> an attempt to flag bugs in client code. Rather, we were not convinced
>> that all generators can return nil repeatedly without loss of
>> efficiency or extra storage burden.
>>
>> [1]
>>
https://github.com/apple/swift/commit/304b4f33ae74a5abd09da485bbc435dfa2ade522
>> and rdar://problem/17392226
>>
>> > Adds caller burden
>> > To avoid breaking the requirement, the caller will not uncommonly have
>> to track extra state and branch
>>
>> I would actually say the opposite -- running a non-trivial algorithm
>> on generators is a very uncommon thing to do. The 99% use case for
>> generators is implicit usage from the for-in loop. This is why
>> allowing generators to be as simple as possible and pushing the
>> requirement for extra branches into non-trivial algorithms made sense
>> for us when we introduced this requirement.
>>
>> > Silent corner case
>> > Because basically all generators keep returning nil, it's not unlikely
>> people will write their code based on the assumption it will always
return
>> nil
>>
>> This is what concerns me the most about the current rules.
>>
>> Dmitri
>>
>> --
>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave


(Dmitri Gribenko) #13

After thinking about this more, in context of implementation
experience, existing generators in the library, I'm also in favor of
providing this guarantee.

Dmitri

···

On Fri, Mar 4, 2016 at 5:51 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Fri Mar 04 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:

What algorithms or components can be simplified by taking advantage of this
extra guarantee?

Any generator that somehow buffers their underlying generator (as it can't
tell whether it already tried to refill the buffer). For example UTF8 &
UTF16's decode() both have 3 instead of 2 branches on ASCII input because
of this.

Off the top of my head: a stream of random numbers that stops when it
encounters
a zero.

You could generate the next random number in advance (you take a O(1) hit
instead of O(n)). Of course consuming more than you need is not always
allowed and the O(1) could outweigh the branch.

Overall I'd say performance-wise both categories are small (though taking
the standard library as sample case, we have some examples of the former
but not the latter).

We'd be left with the safety concern of code which fails in rare corner
cases.

Thanks, this is helpful. I'm leaning in favor of making the change.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Lily Ballard) #14

> Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors (https://github.com/apple/swift-evolution/pull/95) would hit this case. Both of those adaptors would need to keep around extra state and in order to keep returning nil.

How much extra state? If it's merely "an extra Bool" or "wrap the inner generator in an Optional and nil it when you hit the end", it might be worth it.

(Actually, can we see the implementations you're thinking of using? Because the ones I can imagine for `dropWhile`, at least, would keep returning `nil` as long as the underlying generator did.)

Oh you're right, it's just takeWhile() that would actually have usable post-nil behavior. dropWhile() would end up with the exact same post-nil behavior as its underlying generator.

> All that said, I would be strongly in favor of dropping the language about triggering a precondition failure. I'd prefer to leave it as implementation-defined behavior, which an encouragement to keep returning nil if it's easy to do so.

I think "try to return `nil` but it's okay if you can't" is the worst of all worlds from a reliability standpoint. If most, but not all, generators continue to return `nil`, then people will write code assuming that all generators continue to return `nil`.

Most people don't write code that relies on the post-nil behavior of generators at all. Heck, most people don't even deal with generators directly, they deal with sequences, and sequences already have implementation-defined behavior of calling .generate() twice.

In fact, I don't think the post-nil behavior of generators is even the biggest potential gotcha. A much bigger issue is how generators behave when you make copies of them. If you ever copy a generator you're supposed to never touch the original again, but it's easy to violate that accidentally, and most generators actually behave sanely when copied anyway.

One thing that might help is to make sure we have examples of the full range of generator behaviors—multi-pass and single-pass, nil forever and not, etc.—in the standard library and list an example of each in the SequenceType documentation along with an encouragement to test with all of them. But a lot of people will miss that.

I agree that having more examples is a good thing.

(Another reason to lean towards always returning `nil`: it's testable. In fact, I was thinking it might be nice to have something in XCTest where you could hand it a Sequence, or perhaps a closure that creates one, and an array containing the values you expect it to have, and it would assert the basic properties we expect from a Sequence, like that the elements come out in the right order, it returns nil at the end, `underestimateCount()` is not incorrect, etc. The `nil` semantic, unlike the other possible ones, is something you could include in that test.)

Defining the post-nil behavior as implementation-defined means that, by definition, you don't need to test how arbitrary generators behave there. For specific concrete GeneratorTypes that define their post-nil semantics (and I think we'd want to try and define this for all of the stdlib GeneratorTypes) you can test the documented semantics. For any generator that defines it as returning nil forever (which I'd expect e.g. IndexingGenerator to do) you can easily test that semantic. And for generators like TakeWhileGenerator that effectively reset their behavior when returning nil, you can test that behavior too (e.g. `(1..<10).takeWhile({ $0 % 4 != 0 })` would return the sequences `[1,2,3]`, `[5,6,7]`, `[9]`, and then start returning nil forever).

Besides, you can't actually test that a given GeneratorType returns nil forever, you can only test that it returns nil for some finite number of calls to next(). It would be entirely possible to have a GeneratorType whose behavior depends on some external signal and will return nil until such time as that external signal has data again. For example, you could have a GeneratorType that reads lines from stdin in a non-blocking fashion, and so it would return nil until the user has typed another line of text. Or you could have a GeneratorType that represents an atomic FIFO queue and returns nil if the queue is empty.

-Kevin Ballard

···

On Sat, Mar 5, 2016, at 12:47 PM, Brent Royal-Gordon wrote:


Stream API
(Patrick Pijnappel) #15

Pull-request for the change here: https://github.com/apple/swift/pull/1544

···

On Sat, Mar 5, 2016 at 1:18 PM, Dmitri Gribenko via swift-evolution < swift-evolution@swift.org> wrote:

On Fri, Mar 4, 2016 at 5:51 PM, Dave Abrahams via swift-evolution > <swift-evolution@swift.org> wrote:
>
> on Fri Mar 04 2016, Patrick Pijnappel <swift-evolution@swift.org> wrote:
>
>>>
>>> What algorithms or components can be simplified by taking advantage of
this
>>> extra guarantee?
>>
>> Any generator that somehow buffers their underlying generator (as it
can't
>> tell whether it already tried to refill the buffer). For example UTF8 &
>> UTF16's decode() both have 3 instead of 2 branches on ASCII input
because
>> of this.
>>
>>> Off the top of my head: a stream of random numbers that stops when it
>>> encounters
>>> a zero.
>>
>> You could generate the next random number in advance (you take a O(1)
hit
>> instead of O(n)). Of course consuming more than you need is not always
>> allowed and the O(1) could outweigh the branch.
>>
>> Overall I'd say performance-wise both categories are small (though
taking
>> the standard library as sample case, we have some examples of the former
>> but not the latter).
>>
>> We'd be left with the safety concern of code which fails in rare corner
>> cases.
>
> Thanks, this is helpful. I'm leaning in favor of making the change.

After thinking about this more, in context of implementation
experience, existing generators in the library, I'm also in favor of
providing this guarantee.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Lily Ballard) #16

>> Hmm I see.
>>
>> Do we have any example cases where returning nil repeatedly would
>> require extra branches or state?
>
> Yes. My proposed .takeWhile() and .dropWhile() sequence adaptors
> (https://github.com/apple/swift-evolution/pull/95) would hit this case.
> Both of those adaptors would need to keep around extra state and in
> order to keep returning nil.

Are there realistic cases where this state and check would produce
measurable overhead?

I don't know, but at a bare minimum this is extra complexity on the implementation side.

Do you have an implementation somewhere I could look at?

No I don't. I didn't want to spend the time implementing this when I wasn't sure if the proposal would even be accepted.

> My preferred solution for this case is to add a new Generator adaptor
> called a FuseGenerator, with a convenience method .fuse(). All this
> adaptor does is include the extra state in order to ensure it keeps
> returning nil forever. This way Generators don't have to keep the state
> for that guarantee, and the majority case where client codes doesn't
> rely on this guarantee doesn't need the check either, and in the rare
> case where this guarantee is important all the user has to do is call
> .fuse() on the generator and use the result of that instead.

Between the lack of predictability being addressed here and the addition
of a rarely-used additional component, that's a lot of complexity in the
library; is there a clear benefit?

The added complexity is just a single extra GeneratorType adaptor and corresponding method. But it reduces complexity elsewhere, as GeneratorType implementations no longer have to worry about their post-nil behavior (unless the type itself wishes to make specific guarantees about that). Many generators won't notice because they'll naturally hit a fixed point when they return nil (e.g. indexing generators won't increment their index at that point), but generator adaptors, or generators that represent some external source of data, won't have to keep around the extra state to track this anymore.

And on the caller side, my claim is that there are very few people who end up writing code that invokes post-nil behavior to begin with (I believe the most common way to hit this case is when writing your own GeneratorType, where you have to decide what your generator's post-nil behavior is). The callers who care about getting nil forever can just use the `.fuse()` adaptor, and leaving this as implementation-defined gives the flexibility of providing other behaviors. Not only is this more flexible for existing generators, but it may allow other generator-like types to adopt the GeneratorType protocol when they couldn't have easily done so otherwise (e.g. my previously-mentioned idea of a FIFO queue that explicitly doesn't want to return nil forever could adopt GeneratorType, but if generators must return nil forever then the FIFO queue can't adopt it, and the author may choose to not go through the effort of writing a separate generator on the off chance that someone wants to use the FIFO queue in a context that needs a generator).

> All that said, I would be strongly in favor of dropping the language
> about triggering a precondition failure. I'd prefer to leave it as implementation-
> defined behavior, which an encouragement to keep returning nil if it's
> easy to do so. A benefit of this is Generators could opt to explicitly
> define their post-nil behavior, e.g. TakeWhileGenerator could explicitly
> document that after it has returned nil, subsequent calls to .next()
> will continue to consume the underlying generator and return another
> stream of elements terminating in `nil` (with the caveat that if the
> underlying generator is exhausted then behavior depends on the
> underlying generator's post-nil behavior). Granted, this probably isn't
> useful in most cases, but it could be useful upon occasion as a way to
> lazily split a sequence without building intermediate data structures
> (assuming that the underlying generator is fused or defines its post-nil
> behavior as returning nil forever).

TakeWhileGenerator could also provide a method that returns the
TakeWhileGenerator for the next segment. Since you have to know that
you've got a TakeWhileGenerator to take advantage of this special
behavior, a different interface is just as useful and probably results
in clearer code.

Clever, but it's still additional complexity. I rather like the fact that this described behavior just naturally occurs with the TakeWhileGenerator when implemented in the simplest way possible.

> FWIW, Rust uses precisely the solution I've described here (and in fact
> I'm responsible for its std::iter::Fuse iterator). It defines
> Iterator::next() such that calling .next() after it has returned None
> may or may not return more elements (but Iterators are not supposed to
> assert in this case, they should always return something). And it has
> the .fuse() convenience method that returns a std::iter::Fuse iterator
> that provides the always-returns-None guarantee. And in practice, almost
> nobody ever has to actually use .fuse(), since almost nobody writes
> algorithms that cares about the behavior after next() returns None (and
> in the rare case where they do, they're typically using some concrete
> Iterator that has defined behavior, as opposed to working on arbitrary
> Iterators).

This is about balancing performance, flexibility, simplicity of
specification, and predictability. Rust has a very C++-like approach to
zero-overhead abstractions, in the sense that it's willing to introduce
many small wrinkles to avoid even the theoretical possibility of a
performance penalty. Having cut my language/library-design teeth in the
C++ world, I understand that approach really well, but I have also seen
it do some damage (some of which I was personally responsible for!) and
I want to be more conservative with Swift. In short, I'm still leaning
towards specifying post-nil behavior, and I'd want to have more evidence of
the value of leaving it unspecified before doing so.

That's a fair point. But I think the important sentence from my comparison to Rust is "And in practice, almost nobody ever has to actually use .fuse(), …". In my experience, very few people ever write code that invokes post-nil behavior, and so I don't like the fact that everyone has to pay for the cost involved in tracking the extra state needed for post-nil behavior.

-Kevin Ballard

···

On Sat, Mar 5, 2016, at 09:12 PM, Dave Abrahams via swift-evolution wrote:

on Sat Mar 05 2016, Kevin Ballard <swift-evolution@swift.org> wrote:
> On Thu, Mar 3, 2016, at 03:24 AM, Patrick Pijnappel via swift-evolution wrote:


(Brent Royal-Gordon) #17

In fact, I don't think the post-nil behavior of generators is even the biggest potential gotcha. A much bigger issue is how generators behave when you make copies of them. If you ever copy a generator you're supposed to never touch the original again, but it's easy to violate that accidentally, and most generators actually behave sanely when copied anyway.

That's true, but it's just not addressable until we have some kind of borrowing system in place.

(Another reason to lean towards always returning `nil`: it's testable. In fact, I was thinking it might be nice to have something in XCTest where you could hand it a Sequence, or perhaps a closure that creates one, and an array containing the values you expect it to have, and it would assert the basic properties we expect from a Sequence, like that the elements come out in the right order, it returns nil at the end, `underestimateCount()` is not incorrect, etc. The `nil` semantic, unlike the other possible ones, is something you could include in that test.)

Defining the post-nil behavior as implementation-defined means that, by definition, you don't need to test how arbitrary generators behave there.

True, but then you're imposing a constraint on the clients of generators: "Don't call next() again after it turns nil". I think it will be far easier to write generic tests of any generator than it will be to write generic tests of any function that *uses* a generic generator.

(Although I suppose you could assist in that testing by writing a wrapper around any sequence/collection/generator which enforces all rules as strictly as possible.)

Besides, you can't actually test that a given GeneratorType returns nil forever, you can only test that it returns nil for some finite number of calls to next(). It would be entirely possible to have a GeneratorType whose behavior depends on some external signal and will return nil until such time as that external signal has data again. For example, you could have a GeneratorType that reads lines from stdin in a non-blocking fashion, and so it would return nil until the user has typed another line of text. Or you could have a GeneratorType that represents an atomic FIFO queue and returns nil if the queue is empty.

That's true, but you can still usefully test the common cases.

···

--
Brent Royal-Gordon
Architechies


(Dmitri Gribenko) #18

The concern is that people who do need to invoke .fuse(), won't,
because all generators they are likely to try in practice will 'just
work' and return a continuous stream of nils.

I think what this really comes down to is the trade off between a
subtle correctness issue and a small performance win for a small
number of data types (which can be even non-existent as Patrick
Pijnappel shows). Given the general direction of Swift, I'm inclined
to choose correctness here.

Dmitri

···

On Sun, Mar 6, 2016 at 5:58 PM, Kevin Ballard via swift-evolution <swift-evolution@swift.org> wrote:

That's a fair point. But I think the important sentence from my comparison to Rust is "And in practice, almost nobody ever has to actually use .fuse(), …".

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Dmitri Gribenko) #19

See stdlib/private/StdlibCollectionUnittest/MinimalCollections.swift.gyb.

Dmitri

···

On Sat, Mar 5, 2016 at 5:40 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

In fact, I don't think the post-nil behavior of generators is even the biggest potential gotcha. A much bigger issue is how generators behave when you make copies of them. If you ever copy a generator you're supposed to never touch the original again, but it's easy to violate that accidentally, and most generators actually behave sanely when copied anyway.

That's true, but it's just not addressable until we have some kind of borrowing system in place.

(Another reason to lean towards always returning `nil`: it's testable. In fact, I was thinking it might be nice to have something in XCTest where you could hand it a Sequence, or perhaps a closure that creates one, and an array containing the values you expect it to have, and it would assert the basic properties we expect from a Sequence, like that the elements come out in the right order, it returns nil at the end, `underestimateCount()` is not incorrect, etc. The `nil` semantic, unlike the other possible ones, is something you could include in that test.)

Defining the post-nil behavior as implementation-defined means that, by definition, you don't need to test how arbitrary generators behave there.

True, but then you're imposing a constraint on the clients of generators: "Don't call next() again after it turns nil". I think it will be far easier to write generic tests of any generator than it will be to write generic tests of any function that *uses* a generic generator.

(Although I suppose you could assist in that testing by writing a wrapper around any sequence/collection/generator which enforces all rules as strictly as possible.)

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Lily Ballard) #20

> In fact, I don't think the post-nil behavior of generators is even the biggest potential gotcha. A much bigger issue is how generators behave when you make copies of them. If you ever copy a generator you're supposed to never touch the original again, but it's easy to violate that accidentally, and most generators actually behave sanely when copied anyway.

That's true, but it's just not addressable until we have some kind of borrowing system in place.

>> (Another reason to lean towards always returning `nil`: it's testable. In fact, I was thinking it might be nice to have something in XCTest where you could hand it a Sequence, or perhaps a closure that creates one, and an array containing the values you expect it to have, and it would assert the basic properties we expect from a Sequence, like that the elements come out in the right order, it returns nil at the end, `underestimateCount()` is not incorrect, etc. The `nil` semantic, unlike the other possible ones, is something you could include in that test.)
>
> Defining the post-nil behavior as implementation-defined means that, by definition, you don't need to test how arbitrary generators behave there.

True, but then you're imposing a constraint on the clients of generators: "Don't call next() again after it turns nil".

Sure, but as I've already stated, I believe very few people ever write code that depends on post-nil behavior anyway. In fact, it wouldn't surprise me to learn that far more people have to deal with this case while writing their own GeneratorTypes than there are people who hit this merely by using pre-existing GeneratorTypes.

I think it will be far easier to write generic tests of any generator than it will be to write generic tests of any function that *uses* a generic generator.

(Although I suppose you could assist in that testing by writing a wrapper around any sequence/collection/generator which enforces all rules as strictly as possible.)

You could always just test the behavior of the FuseGenerator wrapping it, if you really want to be able to rely on always-returning-nil behavior. But I'm not sure what tests you'd write that even hits this case, except for a test that's designed specifically to test the post-nil behavior (which we've already established isn't necessary to check for arbitrary generators if GeneratorType leaves that as implementation-defined).

-Kevin Ballard

···

On Sat, Mar 5, 2016, at 05:40 PM, Brent Royal-Gordon wrote: