Deadlock When Using DispatchQueue from Swift Task

One thread per core or just one?
Is that a ridiculous idea? I thought that's how it should be.

I meant one thread total. That doesn't guarantee you'll find cross-task dependencies - your writer might still happen to run before your reader, per @nsc's original example - but it does greatly increase the probability.

Currently Swift Concurrency nominally runs one thread per core, although it looks like it's more complicated than that because of performance vs efficiency cores (and task QoS…?).

In an ideal system one thread per core is optimal - if no thread ever actually blocks (synchronously) nor idles when there's work to do, you have maximal CPU utilisation without any overheads from thread context-switching. Which is surely why Swift Concurrency tries to do that.

Server processes, in particular, are sometimes written this way (not JVM ones, of course, but efficient ones in C/C++ etc). Every CPU core is actually dedicated exclusively to a single thread (bar one or two cores that run "everything else" - the regular OS daemons etc) to guarantee no perturbations or scheduler overheads. The threads use cooperative multi-tasking to keep busy, much like Swift Concurrency.

(it gets wilder when you factor in NUMA effects and do geographical thread placement, but that's way out of scope here)

Outside the server realm, it's not nearly so clean. Most processes - of which there are hundreds to thousands on modern iDevices and Macs - run many threads and have no idea what other apps are doing; they all fight naively for CPU residency (and other hardware resources).

So the "1 thread per core" is in a sense naive - an optimistic upper bound only - although realistically what else can the Concurrency runtime do.

Swift Concurrency is great because it lets you write regular "threads" code while efficiently interoperating with the prevalent callback "events" code of Apple platforms. However, the fixed-width executor for tasks and default actors isn't very useful for many desktop and mobile apps.

Too many operations are unsafe to run in a non-overcommitting, cooperative system. This includes blocking system calls (which includes all disk IO on Darwin) and performance intensive jobs which starve other jobs (Task.yield isn't useful for existing synchronous code).

To solve this problem, we dispatch these operations onto other threads and then await the results via continuations, or pass the results via shared memory protected by os_unfair_lock. However, after doing this, so little code runs on the cooperative executor that we question why it exists.

What Swift Concurrency needs is the ability to customize the executor for each task. A Task.detached(executor: .pthread) (or a Task whose jobs are submitted to the overcommit GCD queues) would be very useful and provide the performance isolation of preemptive multitasking in a seamless manner.

2 Likes

I’m trying to understand the ā€œfuture workā€ rule here.

@John_McCall wrote:

You should never block work in Swift concurrency on work that isn’t already actively running on a thread (ā€œfuture workā€).

Does this concern apply only to concurrent queues with barriers, or can it also apply to a serial DispatchQueue?

For example, suppose a serial queue already has jobs 1, 2, and 3 enqueued, and a Swift Task then calls:

queue.sync {
    // job 4
}

At that point, job 4 is blocked waiting for jobs 1–3. If job 3 has not started yet because it is waiting for job 2, then is the task technically blocked on ā€œfuture workā€?

In other words, is any queue.sync from a Swift concurrency context potentially problematic for this reason, regardless of whether the queue is serial or concurrent, although the barrier/concurrent-queue case makes the problem much easier to reproduce?

1 Like

You can absolutely deadlock Swift's cooperative task executor by trying to synchronously run a job on a serial DispatchQueue.

Most DispatchQueues are serviced by the same underlying thread pool as the cooperative pool; jobs just run with a flag saying that they're allowed to block on future work. This means that, if Dispatch notices that the queues are building up, it has to assume that some of those jobs might be deadlocked and proactively spin up new threads to try to unblock them. Since Swift's jobs do not run with this flag, if you block on a queue, that queue may end up being starved. (DispatchQueue.sync is not allowed to run arbitrary queues synchronously in order to unblock itself, because this could cause a lot of other problems.)

Since DispatchQueue.main has a dedicated thread, it would be okay to synchronously block on the main queue as a special case if nothing in the program ever synchronously blocks on a dispatch queue from the main queue. But that is not a reasonable assumption to make, and so you should not do that, either.

You should only block on queues if you're sure that you're currently running on a non-cooperative thread.

3 Likes

Thanks for the clear explanation, @John_McCall.

Two follow-up questions to make sure I'm understanding correctly:

1. Is it correct to say that any call to serialQueue.sync {} from within a Swift concurrency Task is a potential deadlock?

2. Is the correct workaround for this to replace queue.sync with queue.async inside a continuation, so the cooperative thread suspends rather than blocks?

For example, instead of:
// Unsafe: blocks a cooperative thread
let result = someSerialQueue.sync {
doWork()
}
Use
// Safe: cooperative thread suspends, work runs on the queue's own thread
let result = await withCheckedContinuation { continuation in
someSerialQueue.async {
let value = doWork()
continuation.resume(returning: value)
}
}

Is that the recommended pattern for calling into legacy GCD-based subsystems from a Swift Task?

FWIW, copy-pasting LLM output into the forums without reading it is rude. (I'm assuming if you had read it, you would have noticed that you should delete this part)

No, this is not the case. The issue is mixing sync and async on the same thread-pool-backed[1] queue. If you only ever use sync on a particular queue, then it's the same as an inefficient Mutex, which is safe because you're never waiting for async work.

Thankfully, most uses of mixed sync and async on a serial queue are due to people misunderstanding the characteristics of the system, and can be replaced with something simpler and more efficient. For example, the widespread "async+barrier for setting an instance variable, sync for getting it" pattern accomplishes very little other than drastically slowing your program down[2] and making it harder to understand.


  1. the main queue in a Darwin app is not pool-backed ā†©ļøŽ

  2. relative to simpler alternatives like a Mutex ā†©ļøŽ

2 Likes

Thanks for the response, @David_Smith — and apologies for the stray LLM line.

Just to make sure I'm understanding correctly:

1. If a particular serial queue only ever has sync calls and never gets any async calls on it, is it safe to call serialQueue.sync {}from within a Swift concurrency context (Task)?

2. If a serial queue does receive a mix of both sync and async calls, is the correct fix to replace the sync call (when in a Swift concurrency context) with an async call inside a continuation? So the outer Swift concurrency task suspends rather than blocks.

Yes, but unless you specifically need FIFO ordering of waiters there's likely more efficient ways to do it.

That's a possible fix, but there may situationally be better options. And of course simply swapping sync to async will change the behavior of your program, which you may need to adjust for.

1 Like

Thanks @David_Smith again for the response.
I do need FIFO ordering in my case.

Earlier @John_McCall said "You should only block on queues if you're sure that you're currently running on a non-cooperative thread."

But if you are in a Swift concurrency context (Task) then you are on a 'cooperative thread' , which means the above advice translates to "Never block on a queue if you are inside Swift concurrency"

But as per @David_Smith there is a very narrow (and somewhat risky) exception to this, when you are using a serial queue which only receives 'sync()' calls you can get away with blocking. Is it because, when you make a sync call on a serialQueue, the callers thread does the job (This is an optimisation I guess, because GCD thinks, "the calling thread is blocked so the calling thread has nothing better to do so why not use that thread rather than looking for another? " So even if we block a serialQueue by calling sync on it, and even if there are 3 jobs in front of us in the queue ("Future work that isn't running yet") we are good because all those jobs will run on the callers threads so they don't really need to find a thread. Is that right?
So the "Don't block on future work" rule becomes "Don't block of future work that might need a new thread allocated"?
And I feel trying to take advantage of this 'exception' is risky for a couple of reasons.

  1. It is the developers responsibility to ensure that the queue never receives an async job (Because an async job will need a thread to service it - as this time the callers thread can't do the job) - and the compiler is not going to help impose this rule. If someone later calls queue.async the compiler doesn't stop them but our safety is gone - right?
  2. We are relying on an optimisation - an implementation detail - of the serial queue. This is far from a guarantee.

For these reasons, would it be safer to go with the blanket "never block on any queue inside a swift concurrency context (Task)" rule? as @John_McCall says above?

That's correct. There's an alternative call, dispatch_async_and_wait, which doesn't do this optimization.

My personal position for many years has been "use queues for async, use locks for sync", which naturally arrives at the same conclusion as you have here. But, it doesn't do FIFO waiting.

"Always be async" is also a valid position, just one that many people find difficult in practice.