Lazy sequences and type inference

I don't quite understand how lazy sequences interact with type inference. What's going on here?

func f1(_ array: [Int]) -> [Int] {
    array.lazy
        .map { print("map triple: \($0)"); return $0 * 3 }
        .filter { print("filter even: \($0)"); return $0 % 2 == 0 }
        .map { print("map double: \($0)"); return $0 * 2 }
}

print("f1 - WHY?")
let r1 = f1(Array(1...2))
print(r1)

This prints:

f1 - WHY?
map triple: 1
filter even: 3
map triple: 2
filter even: 6
map triple: 1
filter even: 3
map triple: 2
filter even: 6
map triple: 2
map double: 6
[12]

Clearly, some of the steps are executed multiple times. The issue appears to be the way in which the resulting lazy sequence is converted into [Int]. Manual conversion to Array fixes the problem:

func f2(_ array: [Int]) -> [Int] {
    Array(array.lazy
        .map { print("map triple: \($0)"); return $0 * 3 }
        .filter { print("filter even: \($0)"); return $0 % 2 == 0 }
        .map { print("map double: \($0)"); return $0 * 2 })
}

print("\nf2 - manual conversion to Array")
let r2 = f2(Array(1...2))
print(r2)

Now the output is as expected:

f2 - manual conversion to Array
map triple: 1
filter even: 3
map triple: 2
filter even: 6
map double: 6
[12]

Surprisingly, another way to fix this is to pass a Sequence to the function rather than an Array:

func f3<S: Sequence<Int>>(_ seq: S) -> [Int] {
    seq.lazy
        .map { print("map triple: \($0)"); return $0 * 3 }
        .filter { print("filter even: \($0)"); return $0 % 2 == 0 }
        .map { print("map double: \($0)"); return $0 * 2 }
}

print("\nf3 - Sequence as input")
let r3 = f3((1...2))
print(r3)

Again, the output is as expected:

f3 - Sequence as input
map triple: 1
filter even: 3
map triple: 2
filter even: 6
map double: 6
[12]

I understand that lazy sequences don't cache their results which is why some steps may get executed repeatedly. What I don't really understand is how type inference, type conversions and input types interact in these examples. This seems pretty hard to reason about and in fact makes me never want to touch lazy sequences with a bargepole.

2 Likes

Keep in mind that lazy sequences have two different methods called map -- one that returns a LazyMapSequence, and one that returns an Array. I would only expect that to explain the difference between f1 and f2 though (f1 calls the latter, while f2 calls the former and then converts the result); I don't understand why f3 is different from f1.

The duplicate execution is because that last map(…) is resolving to Collection.map, which calls count on its base collection (a LazyFilterSequence<LazyMapSequence<LazySequence<[Int]>.Elements, Int>>), which - because it's a lazy filter sequence - has to actually evaluate the entire sequence in order to determine the count. Collection.map only calls count to pre-size its result buffer - it then actually iterates over its base collection to populate the result. Thus the duplicate execution.

This seems like a bug, basically. There's a version of map from Sequence which should be selected in preference (when using lazy sequences) because it uses underestimatedCount which avoids the extra execution of your whole pipeline.

If you force the compiler to treat it as a generic Sequence then it selects the correct version of map and you don't get the duplicate execution:

func f1(_ array: [Int]) -> [Int] {
    let i1: some Sequence<Int> = array.lazy
        .map {
            print("map triple: \($0)")
            return $0 * 3 }
        .filter {
            print("filter even: \($0)")
            return $0 % 2 == 0
        }

    return i1
        .map {
            print("map double: \($0)")
            return $0 * 2
        }
}

It's also a bit of a mystery to me why LazyFilterSequence conforms to Collection. That seems like a bug. Lazy sequences should not be Collections, otherwise you end up with problems like this.

Addendum

The reason why your other workarounds work is:

  • When you explicitly convert to an array, it lets the result of that last map "float" and so the compiler is free to pick the preferred version, which is the one that returns another lazy sequence (another LazyMapSequence<…> in this case). You then explicitly initialise an Array with that sequence, using the Array(_ s: some Sequence) initialiser.

    This is technically slightly less efficient, since you're instantiating an additional lazy sequence object, beyond what you strictly need.

  • When you pass in a Sequence instead of a Collection, to your function, then the resulting lazy sequences don't conform to Collection (their conformance is conditional on their base being a Collection), so that bad version of map simply isn't available, leaving only the good version (from Sequence).

5 Likes

Great analysis! Thanks for that