Standard library behavior change for LazyMapCollection prefix to act as a Sequence

We just had a discussion on a Slack about some unexpected behavior with lazy collections, given this small snippet of code:

let foo = (1...30).lazy.compactMap { test($0) }.prefix(4)

The intention here is that test is an expensive operation that may sometimes return nil, and we want to collect the first 4 non-nil results while minimizing the number of tests run. That isn't what happens, though. Let's make this bigger and more complete to see the steps:

func test(_ i: Int) -> Int? {
  print("running \(i)")
  return i % 2 == 0 ? i : nil
}
print("Step 1")
let foo = (1...30).lazy
print("Step 2")
let foo2 = foo.compactMap(test)
print("Step 3")
let foo3 = foo2.prefix(4)
print("Done")
foo3.forEach { print("output \($0)") }

Which produces this confusing output:

Step 1
Step 2
Step 3
running 1
running 2
running 3
running 4
running 5
running 6
running 7
running 8
running 9
running 10
running 1
running 2
Done
running 2
output 2
running 3
running 4
output 4
running 5
running 6
output 6
running 7
running 8
output 8
running 9

Notice how the call to prefix seems to do a lot of extra work. Ideally we'd only see "running" lines once for 1 through 8 and that's all. Why does this output look the way it does?

Well, the standard library defines a LazyMapCollection, which is a LazyMapSequence whose base sequence conforms to Collection, and then makes LazyMapCollection also conform to Collection. Therefore, because what we start with is a Collection (the ClosedRange (1...30)), foo through foo3 are all LazyMapCollections.

And because of that, Swift chooses the Collection version of prefix, which makes a slice of the original collection, rather than the Sequence version of prefix (what we would have expected), which would just have an iterator that counts up to the max length. The extra "running" lines are due to making a slice subsequence.

I can't think when that would be the desirable behavior for code like this.

So I'd like to pitch an addition to the standard library:

extension LazyMapCollection {
    func prefix(_ maxLength: Int) -> PrefixSequence<Self> {
        return PrefixSequence(self, maxLength: maxLength)
    }
}

This would make LazyMapCollections use the Sequence behavior for prefix calls, rather than the Collection behavior.

Thanks to Tal Atlas for the original problem statement, and Olivier Halligon for the expanded code that makes the problem clear.

3 Likes

Note that the lazy compactMap returns a LazyMapCollection<LazyFilterCollection<LazyMapCollection>>, since the range it's called on is also a Collection. Calling prefix(_ maxLength:) on this indeed picks Collection’s overload which crucially performs index calculations in order to return a SubSequence.

LazyFilterCollection seems to be causing the issue here, and indeed you will see similar behavior if you replace the lazy compactMap in the original code by a lazy filter. The problem is that LazyFilterCollection needs to call its predicate and evaluate its elements in order to do index calculations. So even though Collection’s prefix(_ maxLength:) looks lazy, it isn’t lazy for LazyFilterCollection (and consequently it also isn’t lazy for something like LayMapCollection<LazyFilterCollection>).

Therefore, to keep code like

collection.lazy.filter { ... }.prefix(n)

and

collection.lazy.compactMap { ... }.prefix(n)

truly lazy, it should call an overload of prefix(_ maxLength:) that doesn’t do any index calculations.

A reasonable fix seems to be to give LazyCollectionProtocol an overload of prefix(_ maxLength:) (and perhaps some other collection methods) that returns a lazy sequence, rather than a lazy collection. This would avoid any index calculations and premature closure calls.

The (slight) downside of this approach is that every lazy prefix(_ maxLength:) will then return a sequence even if it could have returned a collection, particularly when there's no filtering involved, like in the case of

(0..<10).map { $0 * 2 }.prefix(5)

LazyMapCollection itself doesn’t call its closure or evaluate its elements when doing index calculations, so such a change would technically be a regression for code like this.

(To get the best of both worlds, we would possibly need an extra lazy collection protocol for the purpose of opting into "index calculations are done lazily"? But that might be more trouble than it’s worth)

This seems like a good improvement to me. It would also need to make PrefixSequence conditionally a collection when the base is a collection, to preserve the current behavior (which requires availability for protocols, which we don't yet have but hope to get soon-ish). The source compatibility consequences would also need thinking through. The old slice-returning prefix would still exist since it comes from Collection, so you could always force it through type context, but you don't always have that.

As pointed out by @timv, I'm becoming more and more convinced that the problematic step here is the automatic opt-in to LazyFilterCollection, which is a Collection that doesn't meet the normal performance guarantees because advancing an index is arbitrarily expensive.

I would argue that performing a filter on a lazy Collection ought to automatically result in a Sequence and not a Collection. This matches the type to the necessary underlying behavior, which needs to iterate over the base in order to do any collection-y thing. In which case, the original code would work fine (since the implementation of lazy compactMap involves a lazy filter).

This is potentially a big breaking change, obviously, because lazy filter would be so much more common than lazy prefix. So I kinda doubt it could happen. Nevertheless, it seems right to me.

4 Likes