Why is Array.flatMap so slow? (why yes, again!)

I’m doing a basic planarize operation ([[a1, b1, c1], [a2, b2, c2], [a3, b3, c3]][a1, a2, a3, b1, b2, b3, c1, c2, c3]) and am running into an 80% slowdown when using a function expression as opposed to a looped-based one.

import func Glibc.clock

struct S:RandomAccessCollection
{
    let a:Int, 
        b:Int, 
        c:Int, 
        d:Int 
    
    var startIndex:Int 
    {
        return 0
    }
    
    var endIndex:Int 
    {
        return 4
    }

    subscript(index:Int) -> Int 
    {
        switch index 
        {
            case 0:
                return self.a
            case 1:
                return self.b 
            case 2:
                return self.c 
            case 3:
                return self.d
            default:
                fatalError("(_RGBAColor) index \(index) out of range")
        }
    }
}

let structured:[S] = (0 ..< 1 << 16).map 
{
    _ in
    .init(  a: Int.random(in: .min ... .max), 
            b: Int.random(in: .min ... .max), 
            c: Int.random(in: .min ... .max), 
            d: Int.random(in: .min ... .max))
}

func planarFunctional(_ structured:[S]) -> [Int]
{
    return (0 ..< 4).map 
    {
        (ci:Int) in
        
        structured.map{ $0[ci] }
    }.flatMap{ $0 }
}

func planarLoop(_ structured:[S]) -> [Int]
{
    var planar:[Int] = []
        planar.reserveCapacity(structured.count << 2)
    for ci:Int in 0 ..< 4
    {
        for element:S in structured 
        {
            planar.append(element[ci])
        }
    }
    
    return planar
}

func planarHybrid(_ structured:[S]) -> [Int]
{
    var planar:[Int] = []
        planar.reserveCapacity(structured.count << 2)
    for ci:Int in 0 ..< 4
    {
        planar.append(contentsOf: structured.map{ $0[ci] })
    }
    
    return planar
}


let t1:Int = clock()
let planar1:[Int] = planarLoop(structured)
print(clock() - t1)
print(planar1.last as Any)

let t2:Int = clock()
let planar2:[Int] = planarFunctional(structured)
print(clock() - t2)
print(planar2.last as Any)

let t3:Int = clock()
let planar3:[Int] = planarHybrid(structured)
print(clock() - t3)
print(planar3.last as Any)
$ swiftc -O planar.swift
$ ./planar 
1539 // loop
Optional(5101610777180498905)
2698 // functional (map + flatmap)
Optional(5101610777180498905)
1713 // hybrid (loop + flatmap)
Optional(5101610777180498905)

Last time, it was a missing RandomAccessCollection conformance that was slowing down the flatmap. But all the involved types here are RandomAccessCollections, including the 0 ..< 4 range. What is going on?

Without digging too deeply here—

The pattern s.map { ... }.flatMap { $0 } is likely the source of the difference. Running map first produces an intermediate array for every element before flattening them all.
flatMap exists to enable both the transformation and the flattening, so this call can be simplified to
s.flatMap { ... }, which should get you noticeable improvement.

Alternatively, using lazy (i.e. s.lazy.map { ... }.flatMap { $0 }) should be as performant, as the intermediate arrays from the call to map will never be stored.

replacing it with

func planarFunctional(_ structured:[S]) -> [Int]
{
    return (0 ..< 4).flatMap 
    {
        (ci:Int) in
        
        structured.map{ $0[ci] }
    }
}

still has a 70% slowdown compared with the loop-based version :(

$ ./planar 
1548
Optional(7070142529585507702)
2651
Optional(7070142529585507702)
1672
Optional(7070142529585507702)

What result do you get with this slight modification:

// ...

func test() {
    let structured:[S] = (0 ..< 1 << 16).map // NOTE: Moved this in here from global constant.
    {
        _ in
        .init(  a: Int.random(in: .min ... .max),
                b: Int.random(in: .min ... .max),
                c: Int.random(in: .min ... .max),
                d: Int.random(in: .min ... .max))
    }

    for _ in 0 ..< 5 {
        print("----------------")
        let t1:Int = clock()
        let planar1:[Int] = planarLoop(structured)
        print(clock() - t1)
        print(planar1.last as Any)
        
        let t2:Int = clock()
        let planar2:[Int] = planarFunctional(structured)
        print(clock() - t2)
        print(planar2.last as Any)
        
        let t3:Int = clock()
        let planar3:[Int] = planarHybrid(structured)
        print(clock() - t3)
        print(planar3.last as Any)
    }
}
test()

?

Here's a typical result when running that on my machine:

----------------
2758
Optional(-416902280685619520)
4751
Optional(-416902280685619520)
2430
Optional(-416902280685619520)
----------------
1307
Optional(-416902280685619520)
1652
Optional(-416902280685619520)
1357
Optional(-416902280685619520)
----------------
1270
Optional(-416902280685619520)
1631
Optional(-416902280685619520)
1576
Optional(-416902280685619520)
----------------
1301
Optional(-416902280685619520)
1628
Optional(-416902280685619520)
1342
Optional(-416902280685619520)
----------------
1291
Optional(-416902280685619520)
1588
Optional(-416902280685619520)
1405
Optional(-416902280685619520)

Note how the first run differs a lot from the subsequent ones, and that they fluctuate quite a bit between each run.

You are still allocating an additional array in the flatMap version that you aren't with the loop, here:

This really doesn't seem like a problem well suited to flatMap. It doesn't fit the usual pattern of "I have a function that takes an element and returns an array, and I want to apply it to all the elements of an array and get back a flattened array". Yes, you can transform your problem into that, but why is that good?

The loop is clearer as well as being more efficient (including the ability to use external knowledge to reserve space up-front).

3 Likes