2 x lazy collection bugs


(Howard Lovatt) #1

Hi,

I think I have found a couple of bugs in lazy collections; just checking
before filling. The following code is a filter map version of the Sieve of
Eratosthenes:

func lazyFilterMapForEachLoop(limit: Int = 9) -> [Int] {

    var possibles = Array(count: limit + 1, repeatedValue: true)

    return (2 ... limit).lazy.filter { i in // Has to be lazy and
sequential so that `possibles` is updated before it is used

        print("i BF: \(i)")

        return possibles[i]

    }.map { i in

        print("i AF: \(i)")

        (i * i).stride(through: limit, by: i).forEach { j in

            possibles[j] = false

        }

        return i

    }

}

It produces for a limit of 9:

*i BF: 2*

*i BF: 3*

*i BF: 4*

*i BF: 5*

*i BF: 6*

*i BF: 7*

*i BF: 8*

*i BF: 9*

*i BF: 2*

*i AF: 2*

*i BF: 3*

*i AF: 3*

*i BF: 4*

*i BF: 5*

*i AF: 5*

*i BF: 6*

*i BF: 7*

*i AF: 7*

*i BF: 8*

*i BF: 9*

*fatal error: Index out of range*

There are a couple of odd things about this:

   1. The filter closure is called twice per trial integer! First without
   preceding to the map stage at all, i.e. all values are filtered out! Second
   time through the numbers it does proceed to the map stage as you would
   expect.
   2. It produces an "Index out of range" error despite the fact that
   maximum number processed is 9 which is an allowable index of `possibles`.

Is this a couple of bugs or have I misunderstood something?

Thanks in advance,

  -- Howard.


(Dmitri Gribenko) #2

Hi,

I think I have found a couple of bugs in lazy collections; just checking
before filling. The following code is a filter map version of the Sieve of
Eratosthenes:

func lazyFilterMapForEachLoop(limit: Int = 9) -> [Int] {

    var possibles = Array(count: limit + 1, repeatedValue: true)

    return (2 ... limit).lazy.filter { i in // Has to be lazy and sequential
so that `possibles` is updated before it is used

Before looking at your questions below, I want to note that this is
not a supported way of using the lazy subsystem. The issue is that
you seem to be relying on the evaluation order. It is not guaranteed,
even if you can reliably reproduce a certain evaluation order in the
current version of the library.

The filter closure is called twice per trial integer! First without
preceding to the map stage at all, i.e. all values are filtered out! Second
time through the numbers it does proceed to the map stage as you would
expect.

Again, remember that you are using the *lazy* subsystem. The
evaluation order of the closures you are passing to the operations is
not guaranteed. Closures can be evaluated once, more than once, or
not at all, and in the order that the library choses. Closures are
required to be pure, so you are not allowed to observe how many times
and in which order they were actually evaluated.

The reason why the closure is evaluated twice is because you end up
calling an eager map(), which tries to reserve the array capacity
suitable for the result. To do that, map() calls 'count' on the lazy
filter collection, which triggers the computation the first time.

It is an open question though, whether calling the closure twice is
profitable here, and what can we do about it. The idea though is that
those reallocations can sometimes be quite expensive when the closure
is cheap to execute. OTOH, when the closure is expensive to evaluate,
calling it twice is a loss.

It produces an "Index out of range" error despite the fact that maximum
number processed is 9 which is an allowable index of `possibles`.

Are you sure you didn't mean to use ..< instead of ...?

Dmitri

···

On Mon, Feb 8, 2016 at 8:56 PM, Howard Lovatt via swift-users <swift-users@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Howard Lovatt) #3

Thanks for the reply Dmitri - comments inline below:

  -- Howard.

> Hi,
>
> I think I have found a couple of bugs in lazy collections; just checking
> before filling. The following code is a filter map version of the Sieve
of
> Eratosthenes:
>
> func lazyFilterMapForEachLoop(limit: Int = 9) -> [Int] {
>
> var possibles = Array(count: limit + 1, repeatedValue: true)
>
> return (2 ... limit).lazy.filter { i in // Has to be lazy and
sequential
> so that `possibles` is updated before it is used

Before looking at your questions below, I want to note that this is
not a supported way of using the lazy subsystem. The issue is that
you seem to be relying on the evaluation order. It is not guaranteed,
even if you can reliably reproduce a certain evaluation order in the
current version of the library.

Not sure where it says that it won't be sequential - do you have a
reference?

> The filter closure is called twice per trial integer! First without
> preceding to the map stage at all, i.e. all values are filtered out!
Second
> time through the numbers it does proceed to the map stage as you would
> expect.

Again, remember that you are using the *lazy* subsystem. The
evaluation order of the closures you are passing to the operations is
not guaranteed. Closures can be evaluated once, more than once, or
not at all, and in the order that the library choses. Closures are
required to be pure, so you are not allowed to observe how many times
and in which order they were actually evaluated.

Not sure where it says it can be called more than one - the filter might

be a very expensive operation like a database lookup!

The reason why the closure is evaluated twice is because you end up
calling an eager map(), which tries to reserve the array capacity
suitable for the result. To do that, map() calls 'count' on the lazy
filter collection, which triggers the computation the first time.

map on a lazy collection results in another lazy collection, so I don't

see this.

It is an open question though, whether calling the closure twice is
profitable here, and what can we do about it. The idea though is that
those reallocations can sometimes be quite expensive when the closure
is cheap to execute. OTOH, when the closure is expensive to evaluate,
calling it twice is a loss.

> It produces an "Index out of range" error despite the fact that maximum
> number processed is 9 which is an allowable index of `possibles`.

Are you sure you didn't mean to use ..< instead of ...?

No I meant 2 ... limit, the array is of size limit + 1.

PS The equivalent code in Java works as expected.

···

On 9 February 2016 at 16:17, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Mon, Feb 8, 2016 at 8:56 PM, Howard Lovatt via swift-users > <swift-users@swift.org> wrote:

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Jens Alfke) #4

Not sure where it says that it won't be sequential - do you have a reference?

Not sure where it says it can be called more than one - the filter might be a very expensive operation like a database lookup!

I think it’s more that the documentation doesn’t specify sequential ordering, and that it doesn’t say the closure is only called once … so you can’t assume that those conditions are true. You’re just inferring that they are, based on your intuition about the implementation.

—Jens

···

On Feb 8, 2016, at 9:28 PM, Howard Lovatt via swift-users <swift-users@swift.org> wrote:


(Dmitri Gribenko) #5

Thanks for the reply Dmitri - comments inline below:

  -- Howard.

> Hi,
>
> I think I have found a couple of bugs in lazy collections; just checking
> before filling. The following code is a filter map version of the Sieve
> of
> Eratosthenes:
>
> func lazyFilterMapForEachLoop(limit: Int = 9) -> [Int] {
>
> var possibles = Array(count: limit + 1, repeatedValue: true)
>
> return (2 ... limit).lazy.filter { i in // Has to be lazy and
> sequential
> so that `possibles` is updated before it is used

Before looking at your questions below, I want to note that this is
not a supported way of using the lazy subsystem. The issue is that
you seem to be relying on the evaluation order. It is not guaranteed,
even if you can reliably reproduce a certain evaluation order in the
current version of the library.

Not sure where it says that it won't be sequential - do you have a
reference?

I'm sorry, it is hard to provide a reference to internal discussions
that we had when we were designing these things, so please take my
word for it :slight_smile:

If the documentation isn't saying anything about this, then it should
be improved to be more explicit.

The same is actually true even for the non-lazy map() and filter() --
you shouldn't be assuming the evaluation order for eager functions
either. It is just that in practice it is only efficient to build the
result array in exactly one pass, from start to the end, so the
implementation is unlikely to change. We also realize that many
people are likely to be accidentally relying on the evaluation order
there, so it would be hard to change it. But still, it is not a
guarantee.

> The filter closure is called twice per trial integer! First without
> preceding to the map stage at all, i.e. all values are filtered out!
> Second
> time through the numbers it does proceed to the map stage as you would
> expect.

Again, remember that you are using the *lazy* subsystem. The
evaluation order of the closures you are passing to the operations is
not guaranteed. Closures can be evaluated once, more than once, or
not at all, and in the order that the library choses. Closures are
required to be pure, so you are not allowed to observe how many times
and in which order they were actually evaluated.

Not sure where it says it can be called more than one - the filter might be
a very expensive operation like a database lookup!

The reason why the closure is evaluated twice is because you end up
calling an eager map(), which tries to reserve the array capacity
suitable for the result. To do that, map() calls 'count' on the lazy
filter collection, which triggers the computation the first time.

map on a lazy collection results in another lazy collection, so I don't see
this.

There are two overloads -- one from CollectionType protocol, that
returns an Array, and another one from lazy, that returns a lazy
collection. Both are visible, the second is preferred. Since you
used the map() function in a position that requires the Array return
type to match with the function's return type (via the return
statement), the lazy overload can't match, so the type checker
selected the first one.

It is an open question though, whether calling the closure twice is
profitable here, and what can we do about it. The idea though is that
those reallocations can sometimes be quite expensive when the closure
is cheap to execute. OTOH, when the closure is expensive to evaluate,
calling it twice is a loss.

> It produces an "Index out of range" error despite the fact that maximum
> number processed is 9 which is an allowable index of `possibles`.

Are you sure you didn't mean to use ..< instead of ...?

No I meant 2 ... limit, the array is of size limit + 1.

Oh, right, now I see the issue. The issue again caused by the
non-purity of the closure passed to filter(). The first time the
closure always says true, which promises a large filtered collection.
But then, during the second pass, it starts rejecting elements it
previously accepted, and delivers a collection smaller than promised.

PS The equivalent code in Java works as expected.

It is hard to say what the equivalent Java code would be, since Java
does not have a Swift lazy subsystem. Java has streams, but it would
be wrong to assume that the two operate according to the same exact
rules.

Dmitri

···

On Mon, Feb 8, 2016 at 9:28 PM, Howard Lovatt <howard.lovatt@gmail.com> wrote:

On 9 February 2016 at 16:17, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Mon, Feb 8, 2016 at 8:56 PM, Howard Lovatt via swift-users >> <swift-users@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Jens Alfke) #6

It makes sense that the closures passed to functional operations like filter() and map() should be pure, but I don’t see that documented. It would probably be a good idea to call this out explicitly, for the benefit of those not used to functional programming.

(I think someone earlier in this thread said that “all closures should be pure”, but of course that’s not true in Swift in general. This exposes the tension between functional and non-functional uses of Swift.)

[I’m by no means an expert at functional programming, but I develop a database engine that uses map/reduce for indexing, and I’m very used to having to remind users that their map functions need to be pure :slight_smile: ]

—Jens

···

On Feb 8, 2016, at 11:54 PM, Dmitri Gribenko via swift-users <swift-users@swift.org> wrote:

Oh, right, now I see the issue. The issue again caused by the
non-purity of the closure passed to filter().


(Howard Lovatt) #7

But it does say it is in-order. It says for SequenceType:

"A type that can be iterated with a for...in loop"

which means it must be in-order (for loops are allowed side effects). Also
why would you call something SequenceType if it wasn't sequential? It also
categorically says that you can't iterate multiple times reliably:

"As a consequence, it is not possible to run multiple for loops on a
sequence to "resume" iteration".

So why am I seeing two passes?

And an index out of bounds?

  -- Howard.

···

On 9 February 2016 at 17:26, Jens Alfke <jens@mooseyard.com> wrote:

On Feb 8, 2016, at 9:28 PM, Howard Lovatt via swift-users < > swift-users@swift.org> wrote:

Not sure where it says that it won't be sequential - do you have a
reference?

Not sure where it says it can be called more than one - the filter might
be a very expensive operation like a database lookup!

I think it’s more that the documentation *doesn’t* specify sequential
ordering, and that it *doesn’t* say the closure is only called once … so
you can’t assume that those conditions are true. You’re just inferring that
they are, based on your intuition about the implementation.

—Jens


(Howard Lovatt) #8

I am using StrideThrough which is a SequenceType and therefore the quotes I
used in the previous email were from the SequenceType documentation. There
is no random access for a SequenceType, all you can do is call a generator.
Which is what `for` does and therefore the generator must return elements
in the sequence defined by the underlying struct or class, in this case
StrideThrough. The documentation for StrideThrough says "A SequenceType of
values formed by striding over a closed interval." Which to me implies in
order since you take one stride and then another of the same size; it isn't
called RandomWalk!

Anyway: you can see from the printout that it is sequential.

But it does two passes and generates an out of bounds!

  -- Howard.

···

On 9 February 2016 at 17:39, Jens Alfke <jens@mooseyard.com> wrote:

Did you mean to reply just to me?

On Feb 8, 2016, at 10:33 PM, Howard Lovatt <howard.lovatt@gmail.com> > wrote:

which means it must be in-order (for loops are allowed side effects). Also
why would you call something SequenceType if it wasn't sequential?

The fact that the collection *can* be accessed sequentially doesn’t
guarantee that it will be. If you called a binary search function on it,
the comparison function wouldn’t be called in order, for example.

It also categorically says that you can't iterate multiple times reliably:

I’ve lost track of what method you’re calling, to be honest. Maybe it’s
defined on Collection, not Sequence.

—Jens