How to detect Task + GCD Deadlock?

Right now I have project with mixed Tasks and legacy GCD threading code. Problem is that after some time, on some devices, we observe something similar to deadlock, code that perform task + GCD code are just stops performing, like it's waiting to some other thread finish.
I've found related thread:
Deadlock When Using DispatchQueue from Swift Task
And now I am wondering, what I can do to detect this kind of deadlock, using Xcode?

To detect this problem, I might profile the app (⌘+i ) in Instruments. E.g., I might use the “Time Profiler” template and then manually add the “Swift Actors” and “Swift Tasks” tools, too. Then, when you run the app, you can use the “Alive Tasks” lane in the “Swift Tasks” tool, to identify situations where tasks were started and weren't allowed to finish. The “Alive Tasks” channel should always drop back down to zero when the app has reached quiescence. If it has not, you’ve got some deadlock somewhere.

You might also sprinkle the code with OSSignposter instrumentation (e.g., replacing the print with signposter.emitEvent and for these tasks that take a little time, add a signposter.beginInterval and the start and a signposter.endInterval at the end:

import os.log

let poi = OSSignposter(subsystem: "Subsystem", category: .pointsOfInterest)

final class BarrierTests {
    func test() async {
        let subsystem = Subsystem()
        await withTaskGroup(of: Void.self) { group in
            for index in 0 ..< 1000 {
                // by the way, to fix the problem in this example, uncomment the following line; this will constrain the concurrency and solve the problem here
                //
                // if index > 4 { await group.next() }

                poi.emitEvent(#function, "adding task \(index)")

                group.addTask { subsystem.performWork(id: index) }
            }

            await group.waitForAll()
        }
    }
}

final class Subsystem {
    let queue = DispatchQueue(label: "my concurrent queue", attributes: .concurrent)

    func performWork(id: Int) {
        if id == 0 { write(id: id) }
        else { read(id: id) }
    }

    func write(id: Int) {
        let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")

        poi.emitEvent(#function, "schedule exclusive \(id)")
        queue.async(flags: .barrier) {
            poi.emitEvent(#function, " execute exclusive \(id)")
            poi.endInterval(#function, status)
        }
        poi.emitEvent(#function, "schedule exclusive \(id) done")
    }

    func read(id: Int) {
        let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")

        poi.emitEvent(#function, "schedule \(id)")
        queue.sync {
            poi.emitEvent(#function, " execute \(id)")
            poi.endInterval(#function, status)
        }
        poi.emitEvent(#function, "schedule \(id) done")
    }
}

I would also advise using the “Hangs” tool (which is part of this “Time Profiler” Instruments template. Personally, I always reduce the “Reporting Threshold” down to “Include All Potential Interaction Delays (>33 ms)”.


Regarding, that other question is the combination of the thread explosion (more than 64 threads), the limited threads of the cooperative thread pool, and that TaskGroup does not guarantee FIFO behavior. The most compelling solution is to always avoid unbridled parallel execution. (See in my example above where I have a line that I can uncomment inside the loop creating the tasks, to constrain the concurrency to no more than four at a time.)

And if you’re using a reader-writer pattern, like that other question, consider retiring it; it’s one of those patterns that feels like it should enjoy great benefits, it (a) is almost always slower than locks; and (b) frequently introduces problems in thread-explosion scenarios.) I know it looks like the concurrent reads should offer great benefits, but it also always does not.

3 Likes

Thank you!