Deadlock When Using DispatchQueue from Swift Task

I don't think that is relevant for the deadlock, but you can force the barrier to be dispatched first by adding a sleep like I mentioned above.

The general rule is that you should never block work in Swift concurrency on work that isn’t already actively running on a thread (“future work”). That is being violated here because the barrier block is not actively running — it’s been enqueued, but it cannot clear the barrier without running, which it needs a thread to do.

The specific implementation reason this blows up today is that both Swift concurrency and Dispatch’s queues are serviced by same underlying pool of threads — Swift concurrency’s jobs just promise not to block on future work. So when Dispatch is deciding whether to over-commit, i.e. to create an extra thread in order to process the barrier block, it sees that all the threads are tied up by Swift concurrency jobs, which promise to not be blocked on future work and therefore can be assumed to eventually terminate without extra threads being created. Therefore, it doesn’t make an extra thread, and since you are blocking on future work, you’re deadlocking.

The upshot is that you cannot think about queues as their own isolated thread pools. A queue can still be starved by a refusal to over-commit, even though it itself is over-committing, if you’re violating other rules in the system.

16 Likes

Thanks @John_McCall. That makes sense. It wasn't obvious to me that they share a common pool.
Since I dug into this pretty deep, it is really the kernel that is deciding not to spawn a thread, when Dispatch is calling _pthread_workqueue_addthreads, right?

  * frame #0: 0x00000001a8247f04 libsystem_pthread.dylib`_pthread_workqueue_addthreads
    frame #1: 0x00000001a80a931c libdispatch.dylib`_dispatch_root_queue_poke_slow + 312
    frame #2: 0x00000001bc4c1058 libswiftDispatch.dylib`_swift_dispatch_barrier_async + 52
    frame #3: 0x00000001bc4bf8ac libswiftDispatch.dylib`__C.OS_dispatch_queue.async(group: Swift.Optional<__C.OS_dispatch_group>, qos: Dispatch.DispatchQoS, flags: Dispatch.DispatchWorkItemFlags, execute: @convention(block) () -> ()) -> () + 564
    frame #4: 0x0000000100c3bf24 DispatchTests`Subsystem.write(id=0, self=0x0000600000240a40) at DeadlockTest.swift:26:15
    frame #5: 0x0000000100c3b048 DispatchTests`Subsystem.performWork(id=0, self=0x0000600000240a40) at DeadlockTest.swift:20:22

I’ll have to tag in @rokhinip to answer that.

1 Like

The general rule is that you should never block work in Swift concurrency on work that isn’t already actively running on a thread (“future work”).

I'm surprised this could work at all - I thought it was categorically forbidden to block in a Task… are you saying it's explicitly permitted to block under that "active work" condition, or is it just a happenstance of the current Concurrency implementation?

More generally, can one realistically rely on that constraint; how can one be sure the unblocker is actively running, without introducing race conditions?

I pasted your code in a project on mine, and ran the test repeatedly (1000 times). It completed without any deadlock.

Were you on ARM or x86? I haven't looked into it properly, but anecdotally it seems like ARM Macs are way more likely to run into queue / task exhaustion issues, than x86 Macs. I wonder if they're simply configured differently re. over-commit limits, or similar.

(certainly the kernel's thread scheduler is more complicated and IMO worse on ARM Macs, since it will leave cores completely idle even if there's numerous runnable threads; it will only utilise "efficiency" cores in limited circumstances)

Typical traditional mutex use is probably the most obvious example here. If we’re blocking on a mutex, it’s because some other thread acquired the mutex, and that thread will presumably release it in short order without needing to schedule future work. That’s all fine; Swift’s concurrency system doesn’t need to know about any of that. (It may or may not be an optimal use of thread resources to block threads on a mutex this way, but that’s a separate level of problem.)

3 Likes

Consider code like this:

class ThreadsafeCounter {
  var protectedCounter = OSAllocatedUnfairLock(initialState: 0)
  mutating func increment() {
    protectedCounter.withLock { $0 += 1 }
  }
}

increment will block other threads while a thread is in withLock, but no new threads will need to be created or unblocked to unblock them: the thread holding the lock is the one that will do the work, and it already exists and is not blocked. So the worst that will happen in this case is that all but one thread in the thread pool will very briefly wait, reducing your concurrency a little.

You might expect that if the thread holding the lock was very low priority this could be a problem (since instead of waiting briefly they might wait a long time while other high priority work on the system ran), but both DispatchQueue and OSAllocatedUnfairLock know how to avoid that by boosting the priority of the lock-owning thread to the maximum priority of the waiting threads.

What isn't safe is something like this:

class DeadlockProneCounter {
  var protectedCounter = OSAllocatedUnfairLock(initialState: 0)
  mutating func increment() {
    protectedCounter.withLock { state in
      let asyncIncrementer = Task { state += 1 }
      //THIS DOESN'T ACTUALLY EXIST, FOR THIS REASON
      asyncIncrementer.synchronouslyWait()
    }
  }
}

Here's the scenario where it fails, imagining that we're running on a 2 core device for simplicity:

Thread 1: enters increment() and takes lock
Thread 2: enters increment() and begins waiting for lock
Thread 1: spawns a Task, then waits for it
Thread ???: runs the Task… wait, we already have 2 threads on our 2 cores, so there isn't another thread available. All 2 of our 2 threads are waiting.
Threads 1 and 2: wait forever

3 Likes

Emphasis on "presumably" there, though?

Possibly what you mean is that practically these are just rare edge cases?

By that token, I suppose if you replace 'mutex' with 'spinlock' then I'd be more convinced. But even then, nothing's technically stopping deadlock from happening, when you have finite concurrency and block a thread. When you have third-party code and multiple libraries layering atop each other, things devolve into unknown territory quickly.

With GCD deadlock is also possible in much the same ways, because GCD also has a hard cap on thread pool size. But there are at least two differences:

  • The GCD documentation is very clear that blocking inside a dispatch queue thread is dangerous:

    When designing tasks for concurrent execution, do not call methods that block the current thread of execution. When a task scheduled by a concurrent dispatch queue blocks a thread, the system creates additional threads to run other queued concurrent tasks. If too many tasks block, the system may run out of threads for your app.

  • GCD concurrent queues support over-committing the CPU cores, by an order of magnitude or more. I believe it used to be 64 threads per root queue (the four QoS classes) but it looks like it's now 255 per root queue.

So GCD is way more tolerant of "bad" code.

Deadlocking due to the thread exhaustion is rare (in my experience) with GCD, but all too easy with Swift Concurrency.

I don't understand why Swift Concurrency won't over-commit the CPU cores… a little too zealous ideologically, perhaps? I get that it all works fine if everything is Concurrency-savvy, but alas we don't live in that world. Yet, at least.

Yes. Practically speaking, it is an edge case because that code pattern (unlocking a mutex across a thread-pool enqueue) is vanishingly unlikely to be correct even removing any considerations specific to Swift concurrency. First, mutex libraries generally require releases to occur on the same thread as the acquire, which usually cannot be guaranteed in any sort of thread pool system that abstracts over exact thread assignments. Second, scheduling a job to do the release later necessarily involves indefinitely blocking the mutex, which is never a good idea even if that job is guaranteed to eventually run.

We've talked about this quite a bit, actually. You can't have thread pools that only over-commit a little bit — once you open the door to code patterns that rely on over-commit, you have to over-commit all the time, and you can never stop over-committing. If you queue up a thousand jobs, and they're all slow to finish for some reason, the system has to keep adding threads up to its hard limit in case the next thread is the one that unblocks all the earlier ones. You cannot expect the thread pool to magically know that it doesn't need new threads for some work but does for other work.

We do not require the entire system to know about Swift concurrency. We require the code written to work in Swift concurrency to not rely on over-commit. If the original example was a non-Swift-concurrency-aware library that kicked off a bunch of jobs itself, the fact that it coexisted in a process with Swift concurrency would not be a problem. The issue is that jobs are being started with Swift concurrency but are synchronizing with a mechanism that's effectively a condition variable. That code is therefore already being rewritten, and it's not too much to expect that rewrite to use appropriate patterns.

1 Like

There are sensible arguments to be made both for "try to save people from themselves" and "fail fast and do the right thing", with both approaches having notable up and down sides. Postel's Law is not as obviously correct as it seems at a glance; engineering is full of tradeoffs.

My personal hope (and expectation) is that in the medium term being a bit stricter here will lead to both a more efficient system (way fewer unnecessary threads) and a more robust one (fewer edge case bugs where it hits the limit, because the limit is now obvious rather than obscure), but reasonable people could disagree with me.

3 Likes

True, but also practically-speaking even finite over-commit is helpful, just like practically-speaking one can do some blocking in Tasks, as you pointed out, even though broadly-speaking it's unsafe to do so.

It's not even so much about disagreeing; I get that there's no unequivocal answer. I respect the position taken here. I get what is being attempted. It's just frustrating, you know, when one hits deadlock issues with Tasks. It feels like a little leeway, like GCD offers, would go a long way, pragmatically-speaking.

Perhaps the way to improve this situation is laterally, through better APIs and libraries for useful patterns like producer-consumers, back-pressure in general, etc. Last I was really playing in this area, a few months ago, there was surprisingly little there for Swift Concurrency. e.g. is there an equivalent to Dispatch's apply yet? Something like that could help with @nsc's scenario.

I did see some things recently, like [Pitch] New APIs for Async[Throwing]Stream with backpressure support, that are promising.

Aside from AsyncStream.makeStream, there haven't been any additions to the built in async tools since the original proposal (IIRC). There is swift-async-algorithms, which has some bits of functionality, but that's focused on collection operations, not on fundamental async tools like apply, queueing, serial execution, or predictable testing.

These are fundamentally different, though. The some blocking that's safe to do in Tasks has specific local properties that make it provably safe. The some over-commit you're talking about only helps if the finite limit is high enough that, globally, all simultaneous uses of dangerous patterns happen to never reach it.

I broadly agree, pragmatically. I've been a bit cavalier with terminology, but what I've meant when I've talked about 'blocking' is mostly the subset of cases which block awaiting something from another task. The ones that can outright deadlock. Although I still think it's concerning if CPU cycles go to waste even if forward progress is guaranteed.

And spinlocks and mutexes and so forth can be used safely (in the forward progress sense), but in that respect I'm worried less about technicalities or principles and more practicalities of mixing code from various libraries, authors, and eras, and the potential for poor interactions. Like in the original poster's situation of mixing Swift Concurrency and GCD.

Hmm, I think that's the crux of our disagreement, then. If my app used to deadlock, but now with threads + 1 it doesn't, I call that helping. Even if your app still needs yet more threads and still deadlocks. Even if perhaps my app now working hides a technically incorrect use of Swift Concurrency (or GCD similarly). Not because I approve of the misuse but just because in practice it can be hard to even know the misuse is occurring.

Maybe there can be some runtime detection & warning of blocked executor threads? Some kind of "Task XYZ has been blocking a thread for more than 100ms" heuristic?

It'd be quite different if it were a compiler error, for example. I'm all for deterministic, reliable compile-time diagnostics. The problem here is that there's no guarantees I can detect the problem - my app might only deadlock in some situations, that I don't happen to see in my own use, but my users do. Could be as simple as that my Mac Studio has twenty cores but my user's MacBook Air has four.

In fact if the objective of the strict limit were to force these errors to the surface then the Swift Concurrency thread pool should have only one thread! :laughing:

(I'm being a little facetious, of course, but maybe that should actually be a developer 'debug' option in Xcode? :thinking:)

Anyway, we don't have to agree - this is subjective and relies on predicting the future and guessing how all the second and third and forth order effects will ultimately sum up. I'm only elaborating in case it's helpful to think about.

Well, but this would immediately become a semantic guarantee that the scheduler would have to satisfy forever or else we break your program. Devices wouldn't be able to, say, be more conservative about spawning threads when trying to conserve power, because that could cause deadlock. And programs would have pretty severe portability problems between different hardware profiles. There's a reason why systems with soft limits here tend to all end up with arbitrary over-commit.

1 Like

This is already the case in the iOS simulator, or was last I checked. And it does lead to deadlock bugs.

One thread per core or just one?
Is that a ridiculous idea? I thought that's how it should be.

I meant one thread total. That doesn't guarantee you'll find cross-task dependencies - your writer might still happen to run before your reader, per @nsc's original example - but it does greatly increase the probability.

Currently Swift Concurrency nominally runs one thread per core, although it looks like it's more complicated than that because of performance vs efficiency cores (and task QoS…?).

In an ideal system one thread per core is optimal - if no thread ever actually blocks (synchronously) nor idles when there's work to do, you have maximal CPU utilisation without any overheads from thread context-switching. Which is surely why Swift Concurrency tries to do that.

Server processes, in particular, are sometimes written this way (not JVM ones, of course, but efficient ones in C/C++ etc). Every CPU core is actually dedicated exclusively to a single thread (bar one or two cores that run "everything else" - the regular OS daemons etc) to guarantee no perturbations or scheduler overheads. The threads use cooperative multi-tasking to keep busy, much like Swift Concurrency.

(it gets wilder when you factor in NUMA effects and do geographical thread placement, but that's way out of scope here)

Outside the server realm, it's not nearly so clean. Most processes - of which there are hundreds to thousands on modern iDevices and Macs - run many threads and have no idea what other apps are doing; they all fight naively for CPU residency (and other hardware resources).

So the "1 thread per core" is in a sense naive - an optimistic upper bound only - although realistically what else can the Concurrency runtime do.

Swift Concurrency is great because it lets you write regular "threads" code while efficiently interoperating with the prevalent callback "events" code of Apple platforms. However, the fixed-width executor for tasks and default actors isn't very useful for many desktop and mobile apps.

Too many operations are unsafe to run in a non-overcommitting, cooperative system. This includes blocking system calls (which includes all disk IO on Darwin) and performance intensive jobs which starve other jobs (Task.yield isn't useful for existing synchronous code).

To solve this problem, we dispatch these operations onto other threads and then await the results via continuations, or pass the results via shared memory protected by os_unfair_lock. However, after doing this, so little code runs on the cooperative executor that we question why it exists.

What Swift Concurrency needs is the ability to customize the executor for each task. A Task.detached(executor: .pthread) (or a Task whose jobs are submitted to the overcommit GCD queues) would be very useful and provide the performance isolation of preemptive multitasking in a seamless manner.

1 Like