To detect this problem, I might profile the app (⌘+i ) in Instruments. E.g., I might use the “Time Profiler” template and then manually add the “Swift Actors” and “Swift Tasks” tools, too. Then, when you run the app, you can use the “Alive Tasks” lane in the “Swift Tasks” tool, to identify situations where tasks were started and weren't allowed to finish. The “Alive Tasks” channel should always drop back down to zero when the app has reached quiescence. If it has not, you’ve got some deadlock somewhere.
You might also sprinkle the code with OSSignposter
instrumentation (e.g., replacing the print
with signposter.emitEvent
and for these tasks that take a little time, add a signposter.beginInterval
and the start and a signposter.endInterval
at the end:
import os.log
let poi = OSSignposter(subsystem: "Subsystem", category: .pointsOfInterest)
final class BarrierTests {
func test() async {
let subsystem = Subsystem()
await withTaskGroup(of: Void.self) { group in
for index in 0 ..< 1000 {
// by the way, to fix the problem in this example, uncomment the following line; this will constrain the concurrency and solve the problem here
//
// if index > 4 { await group.next() }
poi.emitEvent(#function, "adding task \(index)")
group.addTask { subsystem.performWork(id: index) }
}
await group.waitForAll()
}
}
}
final class Subsystem {
let queue = DispatchQueue(label: "my concurrent queue", attributes: .concurrent)
func performWork(id: Int) {
if id == 0 { write(id: id) }
else { read(id: id) }
}
func write(id: Int) {
let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")
poi.emitEvent(#function, "schedule exclusive \(id)")
queue.async(flags: .barrier) {
poi.emitEvent(#function, " execute exclusive \(id)")
poi.endInterval(#function, status)
}
poi.emitEvent(#function, "schedule exclusive \(id) done")
}
func read(id: Int) {
let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")
poi.emitEvent(#function, "schedule \(id)")
queue.sync {
poi.emitEvent(#function, " execute \(id)")
poi.endInterval(#function, status)
}
poi.emitEvent(#function, "schedule \(id) done")
}
}
I would also advise using the “Hangs” tool (which is part of this “Time Profiler” Instruments template. Personally, I always reduce the “Reporting Threshold” down to “Include All Potential Interaction Delays (>33 ms)”.
Regarding, that other question is the combination of the thread explosion (more than 64 threads), the limited threads of the cooperative thread pool, and that TaskGroup does not guarantee FIFO behavior. The most compelling solution is to always avoid unbridled parallel execution. (See in my example above where I have a line that I can uncomment inside the loop creating the tasks, to constrain the concurrency to no more than four at a time.)
And if you’re using a reader-writer pattern, like that other question, consider retiring it; it’s one of those patterns that feels like it should enjoy great benefits, it (a) is almost always slower than locks; and (b) frequently introduces problems in thread-explosion scenarios.) I know it looks like the concurrent reads should offer great benefits, but it also always does not.