You can run my example from this post. I've verified the performance hasn't improved under Xcode 13.3. Simply batching the tasks down to 10 simultaneous executors improves performance by 14,576%. According to the Time Profiler, most of that time is spent in swift_taskGroup_attachChild, which I'm guessing is due to contention on the underlying task lock.