Performance of TaskGroup

Hello,

I've noticed that TaskGroup has poor performance as the number of subtasks grow.

I've created sample SwiftUI app ( see below ) to demonstrate the issue.

If you run the code ( with release mode & optimisations enabled ) and increase the number of operations, you will notice that all functions that rely on TaskGroup are getting slower and then completely stuck, while simple loop with async calls or DispatchGroup approach runs under a second.

There were other threads, where it was mentioned that using TaskGroup should be better since it might know number of tasks and schedule better.

Is this expected behaviour?

Sample App
import SwiftUI

struct ContentView: View {
    
    @State var number: String = "100"
    
    var body: some View {
        VStack {
            
            TextField("Number of operations", text: $number)
            
            Button("Actors TaskGroup", action: {
                Task.detached(priority: .high) {
                    await runTaskGroup(name: "Actors TaskGroup", counters: Factory.actors(numberOfIterations()))
                }
            })
            
            Button("Actors Simple Loop", action: {
                Task.detached(priority: .high) {
                    await runSimpleLoop(name: "Actors Simple Loop", counters: Factory.actors(numberOfIterations()))
                }
            })
            
            Button("Queues TaskGroup", action: {
                Task.detached(priority: .high) {
                    await runTaskGroup(name: "Queues TaskGroup", counters: Factory.queues(numberOfIterations()))
                }
            })
            
            Button("Queues Simple Loop", action: {
                Task.detached(priority: .high) {
                    await runSimpleLoop(name: "Queues Simple Loop", counters: Factory.queues(numberOfIterations()))
                }
            })
            
            Button("Queues Dispatch-Group", action: {
                
                let queues = Factory.queues(numberOfIterations())
                let dispatcher = DispatchQueue(label: "Queues Dispatch-Group", qos: .userInitiated)
                
                let start = CFAbsoluteTimeGetCurrent()
                
                dispatcher.async {
                    
                    let incrementGroup = DispatchGroup()
                    for queue in queues{
                        incrementGroup.enter()
                        queue.increase(completion: { incrementGroup.leave() })
                    }
                    
                    incrementGroup.notify(queue: dispatcher) {
                        
                        let readGroup = DispatchGroup()
                        var result: [Int] = []
                        
                        for queue in queues {
                            readGroup.enter()
                            queue.read(completion: { value in
                                dispatcher.async {
                                    result.append(value)
                                    readGroup.leave()
                                }
                            })
                        }
                        
                        readGroup.notify(queue: dispatcher) {
                            let end = CFAbsoluteTimeGetCurrent()
                            precondition(result.allSatisfy({ $0 == 1 }))
                            print("Queues Dispatch-Group iterations: \(queues.count), took: \(end - start)")
                        }
                    }
                }
            })
        }
    }
    
    private func numberOfIterations() -> Int {
        return Int(number) ?? 1
    }
}

protocol Counter {
    func increase() async
    func read() async -> Int
}

actor CounterActor: Sendable, Counter {
    private var value: Int = 0
    func increase() async { value += 1 }
    func read() async -> Int { value }
}

final class CounterQueue: Counter, @unchecked Sendable {
    
    private var value: Int = 0
    private let queue = DispatchQueue(label: "CQ", qos: .userInteractive)
    
    func increase(completion: @escaping () -> Void) {
        queue.async {
            self.value += 1
            completion()
        }
    }
    
    func increase() async {
        await withCheckedContinuation({ continuation in
            self.increase(completion: { continuation.resume() })
        })
    }
    
    func read(completion: @escaping (Int) -> Void) {
        queue.async { completion(self.value) }
    }
    
    func read() async -> Int {
        await withCheckedContinuation({ continuation in
            self.read(completion: { continuation.resume(returning: $0) })
        })
    }
}

struct Factory {
    
    static func actors(_ numberOfEntries: Int) -> [CounterActor] {
        var result: [CounterActor] = []
        result.reserveCapacity(numberOfEntries)
        for _ in 1...numberOfEntries {
            result.append(CounterActor())
        }
        return result
    }
    
    static func  queues(_ numberOfEntries: Int) -> [CounterQueue] {
        var result: [CounterQueue] = []
        result.reserveCapacity(numberOfEntries)
        for _ in 1...numberOfEntries {
            result.append(CounterQueue())
        }
        return result
    }
}

func runTaskGroup<C: Counter>(name: String, counters: [C]) async {
    
    let start = CFAbsoluteTimeGetCurrent()
    
    await withTaskGroup(of: Void.self, body: { group in
        for counter in counters { group.addTask { await counter.increase() } }
        for await _ in group { }
    })
    
    await withTaskGroup(of: Int.self, returning: Void.self, body: { group in
        
        for counter in counters { group.addTask { await counter.read() } }
        
        var result: [Int] = []
        for await value in group {
            result.append(value)
        }
        
        let end = CFAbsoluteTimeGetCurrent()
        precondition(result.allSatisfy({ $0 == 1 }))
        print("\(name) Iterations: \(counters.count), took: \(end - start)")
    })
}

func runSimpleLoop<C: Counter>(name: String, counters: [C]) async {
    
    let start = CFAbsoluteTimeGetCurrent()
    
    for counter in counters {
        await counter.increase()
    }
    
    var result: [Int] = []
    for counter in counters {
        let value = await counter.read()
        result.append(value)
    }
    
    let end = CFAbsoluteTimeGetCurrent()
    precondition(result.allSatisfy({ $0 == 1 }))
    print("\(name) Iterations: \(counters.count), took: \(end - start)")
}

I doubt it’s the planned behavior but it seems clear you need to manage your own batching. You’ll want to measure your performance to find the optimal batch size for your workload.

I can't speak to whether this is a known issue, but please always feel free to send us bug reports about performance issues, either in feedback or in the public Swift bug tracker.

1 Like

Isolated reproducers would help; There is a bunch of performance work with it I never got to with groups.

Please file a reproducer (without UI stuff) on bugs.swift.org and we'll have a look, thanks.

I don't see the problem. Here is my results on Mac mini M1 with Xcode 13.3 beta:

Actors TaskGroup Iterations: 10000, took: 0.5801550149917603
Actors Simple Loop Iterations: 10000, took: 0.010395050048828125
Queues TaskGroup Iterations: 10000, took: 0.7475920915603638
Queues Simple Loop Iterations: 10000, took: 0.6819630861282349
Queues Dispatch-Group iterations: 10000, took: 1.522475004196167

I made a change to make it a fair comparison: I removed priority / qos specifications as they are not directly comparable. A fair comparison would be between the default configuration of each case.

This type of microbenchmark is unrealistic and the results may not reflect the real world performance. For example, Actors Simple Loop Iterations case gets fully optimized and no actual await is happening in the loop.

My version with some cosmetic changes
import SwiftUI

struct ContentView: View {
    
    @State var number: String = "100"
    
    var body: some View {
        VStack {
            
            TextField("Number of operations", text: $number)
            
            Button("Actors TaskGroup") { Task.detached {
                    await runTaskGroup(name: "Actors TaskGroup", counters: Factory.actors(numberOfIterations()))
            }}
            
            Button("Actors Simple Loop") { Task.detached {
                    await runSimpleLoop(name: "Actors Simple Loop", counters: Factory.actors(numberOfIterations()))
            }}
            
            Button("Queues TaskGroup") { Task.detached {
                    await runTaskGroup(name: "Queues TaskGroup", counters: Factory.queues(numberOfIterations()))
            }}
            
            Button("Queues Simple Loop") { Task.detached {
                    await runSimpleLoop(name: "Queues Simple Loop", counters: Factory.queues(numberOfIterations()))
            }}
            
            Button("Queues Dispatch-Group") {
                
                let queues = Factory.queues(numberOfIterations())
                let dispatcher = DispatchQueue(label: "Queues Dispatch-Group")
                
                let start = CFAbsoluteTimeGetCurrent()
                
                dispatcher.async {
                    
                    let incrementGroup = DispatchGroup()
                    for queue in queues{
                        incrementGroup.enter()
                        queue.increase { incrementGroup.leave() }
                    }
                    
                    incrementGroup.notify(queue: dispatcher) {
                        
                        let readGroup = DispatchGroup()
                        var result: [Int] = []
                        
                        for queue in queues {
                            readGroup.enter()
                            queue.read { value in dispatcher.async {
                                    result.append(value)
                                    readGroup.leave()
                            }}
                        }
                        
                        readGroup.notify(queue: dispatcher) {
                            let end = CFAbsoluteTimeGetCurrent()
                            precondition(result.allSatisfy({ $0 == 1 }))
                            print("Queues Dispatch-Group iterations: \(queues.count), took: \(end - start)")
                        }
                    }
                }
            }
        }
    }
    
    private func numberOfIterations() -> Int {
        return Int(number) ?? 1
    }
}

protocol Counter: Sendable {
    func increase() async
    func read() async -> Int
}

actor CounterActor: Sendable, Counter {
    private var value: Int = 0
    func increase() async { value += 1 }
    func read() async -> Int { value }
}

final class CounterQueue: Counter, @unchecked Sendable {
    
    private var value: Int = 0
    private let queue = DispatchQueue(label: "CQ")
    
    func increase(completion: @escaping () -> Void) {
        queue.async {
            self.value += 1
            completion()
        }
    }
    
    func increase() async { await withCheckedContinuation { continuation in
            self.increase { continuation.resume() }
    }}
    
    func read(completion: @escaping (Int) -> Void) {
        queue.async { completion(self.value) }
    }
    
    func read() async -> Int { await withCheckedContinuation { continuation in
            self.read { continuation.resume(returning: $0) }
    }}
}

struct Factory {
    
    static func actors(_ numberOfEntries: Int) -> [CounterActor] {
        var result: [CounterActor] = []
        result.reserveCapacity(numberOfEntries)
        for _ in 1...numberOfEntries {
            result.append(CounterActor())
        }
        return result
    }
    
    static func  queues(_ numberOfEntries: Int) -> [CounterQueue] {
        var result: [CounterQueue] = []
        result.reserveCapacity(numberOfEntries)
        for _ in 1...numberOfEntries {
            result.append(CounterQueue())
        }
        return result
    }
}

func runTaskGroup<C: Counter>(name: String, counters: [C]) async {
    
    let start = CFAbsoluteTimeGetCurrent()
    
    await withTaskGroup(of: Void.self) { group in
        for counter in counters {
            group.addTask { await counter.increase() }
        }
    }
    
    await withTaskGroup(of: Int.self, returning: Void.self) { group in
        
        for counter in counters {
            group.addTask { await counter.read() }
        }
        
        var result: [Int] = []
        for await value in group {
            result.append(value)
        }
        
        let end = CFAbsoluteTimeGetCurrent()
        precondition( result.allSatisfy { $0 == 1 } )
        print("\(name) Iterations: \(counters.count), took: \(end - start)")
    }
}

func runSimpleLoop<C: Counter>(name: String, counters: [C]) async {
    
    let start = CFAbsoluteTimeGetCurrent()
    
    for counter in counters {
        await counter.increase()
    }
    
    var result: [Int] = []
    for counter in counters {
        let value = await counter.read()
        result.append(value)
    }
    
    let end = CFAbsoluteTimeGetCurrent()
    precondition(result.allSatisfy { $0 == 1 } )
    print("\(name) Iterations: \(counters.count), took: \(end - start)")
}

You can run my example from this post. I've verified the performance hasn't improved under Xcode 13.3. Simply batching the tasks down to 10 simultaneous executors improves performance by 14,576%. According to the Time Profiler, most of that time is spent in swift_taskGroup_attachChild, which I'm guessing is due to contention on the underlying task lock.

Testing more queue sizes in my original example reveals a more than linear growth in time.

10_000  0.33886992931365967
20_000  1.435670018196106
30_000  3.091923952102661
40_000  5.573296070098877
50_000  9.05544900894165
75_000  19.68508505821228
100_000 35.64383399486542

From the look of it, taking the task lock also blocks the execution of the children, so the actual work finishes fairly quickly once they're all enqueued. This also leads to constant memory growth as the children are all active in memory as the rest are being enqueued, so even slightly larger children can really balloon memory.

Changes in Xcode won't affect this; this is OS code now.

Of course, that's different on non-Apple platforms.

I've filed SR-15785 in regard to the performance issue. Hilariously, the M1 Max actually performs worse than the i9 here: the 100,000 children case takes 89s. Not sure what could cause that difference.

1 Like

Batching allows 100_000 child tasks to complete in 0.125s.

@main
struct ConcurrencyTest {
    static func main() async {
        let start = CFAbsoluteTimeGetCurrent()
        await withTaskGroup(of: Void.self) { group in
            @Sendable
            func doNothing() {}

            var tasksToPerform = 99_990
            for _ in 0..<10 {
                group.addTask(operation: doNothing)
            }

            for await _ in group {
                if tasksToPerform > 0 {
                    group.addTask(operation: doNothing)
                    tasksToPerform -= 1
                }
            }
        }
        print(CFAbsoluteTimeGetCurrent() - start)
    }
}
3 Likes

(edited to hide completely incorrect diagnosis)

Ignore my original diagnosis here

My tools install is a bit wonky right now, but I did some hand-sampling of @Jon_Shier's test case in lldb, and it looks like we're spinning on the task status record lock. Specifically, this loop in swift_taskGroup_attachChild:

0x7ffb1f6fd990 <+80>:  movq   %rax, %rbx
0x7ffb1f6fd993 <+83>:  movq   0xc8(%rax), %rax
0x7ffb1f6fd99a <+90>:  testq  %rax, %rax
0x7ffb1f6fd99d <+93>:  jne    0x7ffb19afa990            ; <+80>

Will update further when it's not a weekend, I just realized I shouldn't be working :innocent:

10 Likes

Turns out I misread what that snippet was doing! Anyway, with some pointers from @rokhinip and @ktoso, I've put up a fix here: Fix SR-15785 by adding a cached reference to the last linked list node, avoiding n^2 append scaling by Catfish-Man ¡ Pull Request #41165 ¡ apple/swift ¡ GitHub

A quick test of @Jon_Shier's benchmark shows a 50x+ speedup despite me testing on a debug build :slight_smile:

Looking into adding a regression test now.

22 Likes

I was comparing efficiency of using Task groups vs pure tasks, reusing @Jon_Shier code and found out there's quite a gap in performance. Is this just the cost of doing structured concurrency ? Good news is that the gap stays linear when increasing repetitions, Tasks are always more or less 4x faster than TaskGroups when doing nothing in tasks.

With 100_000 reps :
around 0.12 with task groups
around 0.03 with pure tasks

@main
struct ConcurrencyTest {
    static func main() async {
        await taskGroupTest()
        await taskTest()
    }
    
    static func taskGroupTest() async {
        let start = CFAbsoluteTimeGetCurrent()
        await withTaskGroup(of: Void.self) { group in
            @Sendable
            func doNothing() {}

            for _ in 0..<100_000 {
                group.addTask(operation: doNothing)
            }

            for await _ in group {
                
            }
        }
        print(CFAbsoluteTimeGetCurrent() - start)
    }
    
    static func taskTest() async {
        @Sendable
        func doNothing() {}
        
        let start = CFAbsoluteTimeGetCurrent()
        let tasks = Array(0..<100_000).map { _ in Task { doNothing() } }
        for task in tasks {
            await task.value
        }
        print(CFAbsoluteTimeGetCurrent() - start)
    }
}

I'm not sure that's surprising - the TaskGroup is doing more; it has to coordinate the subtasks and track their conclusions, collate their results, etc.

1 Like

Worth filing an issue on GitHub with the contents of this post. You’re observing 1000s of tasks contending on the groups lock, so yeah it’s expected to be slower. Without the group there’s no contention point.

Do note however that these pieces of code are not equivalent. The group will iterate results in “completion order”. And the unstructured code will iterate and wait in the order of the tasks in the collection, potentially waiting for the first task to complete a long time while all others may have completed already. This may or may not matter to you.

There could be some optimization we could do with groups in general but it has not been a priority so far. Issues can help in maybe dedicating some time to optimizing the locking strategy employed by the group.

No promises, but better than leaving these observations on the forums where they’ll be forgotten. Thanks in advance!

3 Likes