How to detect Task + GCD Deadlock?

Right now I have project with mixed Tasks and legacy GCD threading code. Problem is that after some time, on some devices, we observe something similar to deadlock, code that perform task + GCD code are just stops performing, like it's waiting to some other thread finish.
I've found related thread:
Deadlock When Using DispatchQueue from Swift Task
And now I am wondering, what I can do to detect this kind of deadlock, using Xcode?

To detect this problem, I might profile the app (⌘+i ) in Instruments. E.g., I might use the “Time Profiler” template and then manually add the “Swift Actors” and “Swift Tasks” tools, too. Then, when you run the app, you can use the “Alive Tasks” lane in the “Swift Tasks” tool, to identify situations where tasks were started and weren't allowed to finish. The “Alive Tasks” channel should always drop back down to zero when the app has reached quiescence. If it has not, you’ve got some deadlock somewhere.

You might also sprinkle the code with OSSignposter instrumentation (e.g., replacing the print with signposter.emitEvent and for these tasks that take a little time, add a signposter.beginInterval and the start and a signposter.endInterval at the end:

import os.log

let poi = OSSignposter(subsystem: "Subsystem", category: .pointsOfInterest)

final class BarrierTests {
    func test() async {
        let subsystem = Subsystem()
        await withTaskGroup(of: Void.self) { group in
            for index in 0 ..< 1000 {
                // by the way, to fix the problem in this example, uncomment the following line; this will constrain the concurrency and solve the problem here
                //
                // if index > 4 { await group.next() }

                poi.emitEvent(#function, "adding task \(index)")

                group.addTask { subsystem.performWork(id: index) }
            }

            await group.waitForAll()
        }
    }
}

final class Subsystem {
    let queue = DispatchQueue(label: "my concurrent queue", attributes: .concurrent)

    func performWork(id: Int) {
        if id == 0 { write(id: id) }
        else { read(id: id) }
    }

    func write(id: Int) {
        let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")

        poi.emitEvent(#function, "schedule exclusive \(id)")
        queue.async(flags: .barrier) {
            poi.emitEvent(#function, " execute exclusive \(id)")
            poi.endInterval(#function, status)
        }
        poi.emitEvent(#function, "schedule exclusive \(id) done")
    }

    func read(id: Int) {
        let status = poi.beginInterval(#function, id: poi.makeSignpostID(), "\(id)")

        poi.emitEvent(#function, "schedule \(id)")
        queue.sync {
            poi.emitEvent(#function, " execute \(id)")
            poi.endInterval(#function, status)
        }
        poi.emitEvent(#function, "schedule \(id) done")
    }
}

I would also advise using the “Hangs” tool (which is part of this “Time Profiler” Instruments template. Personally, I always reduce the “Reporting Threshold” down to “Include All Potential Interaction Delays (>33 ms)”.


Regarding, that other question is the combination of the thread explosion (more than 64 threads), the limited threads of the cooperative thread pool, and that TaskGroup does not guarantee FIFO behavior. The most compelling solution is to always avoid unbridled parallel execution. (See in my example above where I have a line that I can uncomment inside the loop creating the tasks, to constrain the concurrency to no more than four at a time.)

And if you’re using a reader-writer pattern, like that other question, consider retiring it; it’s one of those patterns that feels like it should enjoy great benefits, it (a) is almost always slower than locks; and (b) frequently introduces problems in thread-explosion scenarios.) I know it looks like the concurrent reads should offer great benefits, but it also always does not.

3 Likes

Thank you!

Hi. I am also interested on a way to detect this.

The proposed solution with “Time profiler” works, but in practise is not great:

  • A developer needs to run profiler, in the specific problematic part of the app
  • In an app with hundreds of screens, this is not feasible/scalable: cant reliable/consistently detect deadlocks across all potential runtime paths

Is there a way to detect such problems at build-time, or run-time?

Example:

  • At run-time
  • Detect when cooperative thread is blocked waiting for a GCD sync operation
  • Use a run-time-reporting tool to notify developers of the potential deadlock

I would like to have a way at build-time or run-time to prevent sync invocation of GCD from inside a cooperative thread used to run concurrent swift code, so I could avoid cooperative threads being blocked waiting for the completion of a sync operation scheduled in a GCD queue - preventing deadlocks all together