Concurrency cooperative queue crash with iOS 15/16

I recently began migrating an app over to use Swift concurrency as it had a ton of issues with improperly using DispatchQueues that would cause a crash. It's been fantastic to use but after updating the most used part of the app over to use it, crashlytics has been reporting a decent amount of crashes from the cooperative dispatch queue crashing:

Crashed: com.apple.root.user-initiated-qos.cooperative 
EXC_BREAKPOINT  0x0000000106444c98

0  Core                           0x3a6e2c SomeRepository.updateData(_:_:_:) + 10828 (SomeRepository.swift:10828)
1  libswift_Concurrency.dylib     0x29f94 swift::runJobInEstablishedExecutorContext(swift::Job*) + 132
2  libswift_Concurrency.dylib     0x2a938 swift_job_runImpl(swift::Job*, swift::ExecutorRef) + 68
3  libdispatch.dylib              0x13b94 _dispatch_root_queue_drain + 340
4  libdispatch.dylib              0x1439c _dispatch_worker_thread2 + 172
5  libsystem_pthread.dylib        0x1dd4 _pthread_wqthread + 224
6  libsystem_pthread.dylib        0x193c start_wqthread + 8

There are other threads available in the crash but I don't imagine they have any relevant information. Here they are:

Crash threads
com.apple.main-thread
0  AttributeGraph                 0xd2a4 AG::Graph::UpdateStack::~UpdateStack() + 132
1  AttributeGraph                 0x38c8 AG::Graph::update_attribute(AG::data::ptr<AG::Node>, unsigned int) + 464
2  AttributeGraph                 0x2794 AG::Subgraph::update(unsigned int) + 856
3  SwiftUI                        0x15324 GraphHost.flushTransactions() + 420
4  SwiftUI                        0xad0360 closure #1 in closure #1 in closure #1 in GraphHost.asyncTransaction<A>(_:mutation:style:mayDeferUpdate:) + 20
5  SwiftUI                        0x8894 partial apply for closure #1 in ViewGraphDelegate.updateGraph<A>(body:) + 20
6  SwiftUI                        0x12c5c closure #1 in ViewRendererHost.updateViewGraph<A>(body:) + 88
7  SwiftUI                        0xc9b0 ViewRendererHost.updateViewGraph<A>(body:) + 92
8  SwiftUI                        0x65c4 ViewGraphDelegate.updateGraph<A>(body:) + 56
9  SwiftUI                        0x58a4 closure #1 in GraphHost.init(data:) + 120
10 SwiftUI                        0xad10cc partial apply for closure #1 in closure #1 in GraphHost.asyncTransaction<A>(_:mutation:style:mayDeferUpdate:) + 24
11 SwiftUI                        0xedea0 thunk for @escaping @callee_guaranteed () -> () + 20
12 SwiftUI                        0x4094 static NSRunLoop.flushObservers() + 140
13 SwiftUI                        0x410c closure #1 in closure #1 in static NSRunLoop.addObserver(_:) + 36
14 SwiftUI                        0xd3898 specialized thunk for @callee_guaranteed () -> (@error @owned Error) + 20
15 libswiftObjectiveC.dylib       0xbd0 autoreleasepool<A>(invoking:) + 56
16 SwiftUI                        0x3fc0 closure #1 in static NSRunLoop.addObserver(_:) + 48
17 SwiftUI                        0x4204 @objc closure #1 in static NSRunLoop.addObserver(_:) + 52
18 CoreFoundation                 0x3e83c __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 32
19 CoreFoundation                 0xfa74 __CFRunLoopDoObservers + 616
20 CoreFoundation                 0xaffc __CFRunLoopRun + 1012
21 CoreFoundation                 0x1e250 CFRunLoopRunSpecific + 572
22 GraphicsServices               0x1988 GSEventRunModal + 160
23 UIKitCore                      0x4e5a94 -[UIApplication _run] + 1080
24 UIKitCore                      0x27efd4 UIApplicationMain + 336
25 Tels                           0x7840 main + 10 (AppDelegate.swift:10)
26 ???                            0x1047204d0 (Missing)

com.apple.uikit.eventfetch-thread
0  libsystem_kernel.dylib         0xaac mach_msg_trap + 8
1  libsystem_kernel.dylib         0x107c mach_msg + 72
2  CoreFoundation                 0x6d88 __CFRunLoopServiceMachPort + 368
3  CoreFoundation                 0xb090 __CFRunLoopRun + 1160
4  CoreFoundation                 0x1e250 CFRunLoopRunSpecific + 572
5  Foundation                     0x17eec -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232
6  Foundation                     0x57000 -[NSRunLoop(NSRunLoop) runUntilDate:] + 88
7  UIKitCore                      0x464f00 -[UIEventFetcher threadMain] + 512
8  Foundation                     0x64bfc __NSThread__start__ + 792
9  libsystem_pthread.dylib        0x3348 _pthread_start + 116
10 libsystem_pthread.dylib        0x1948 thread_start + 8

com.google.firebase.crashlytics.MachExceptionServer
0  Core                           0x78b4cc FIRCLSProcessRecordAllThreads + 393 (FIRCLSProcess.c:393)
1  Core                           0x78b8ac FIRCLSProcessRecordAllThreads + 424 (FIRCLSProcess.c:424)
2  Core                           0x798dc4 FIRCLSHandler + 34 (FIRCLSHandler.m:34)
3  Core                           0x7995e4 FIRCLSMachExceptionServer + 521 (FIRCLSMachException.c:521)
4  libsystem_pthread.dylib        0x3348 _pthread_start + 116
5  libsystem_pthread.dylib        0x1948 thread_start + 8

com.apple.NSURLConnectionLoader
0  libsystem_kernel.dylib         0xaac mach_msg_trap + 8
1  libsystem_kernel.dylib         0x107c mach_msg + 72
2  CoreFoundation                 0x6d88 __CFRunLoopServiceMachPort + 368
3  CoreFoundation                 0xb090 __CFRunLoopRun + 1160
4  CoreFoundation                 0x1e250 CFRunLoopRunSpecific + 572
5  CFNetwork                      0x247aa8 _CFURLStorageSessionDisableCache + 50420
6  Foundation                     0x64bfc __NSThread__start__ + 792
7  libsystem_pthread.dylib        0x3348 _pthread_start + 116
8  libsystem_pthread.dylib        0x1948 thread_start + 8

AVAudioSession Notify Thread
0  libsystem_kernel.dylib         0xaac mach_msg_trap + 8
1  libsystem_kernel.dylib         0x107c mach_msg + 72
2  CoreFoundation                 0x6d88 __CFRunLoopServiceMachPort + 368
3  CoreFoundation                 0xb090 __CFRunLoopRun + 1160
4  CoreFoundation                 0x1e250 CFRunLoopRunSpecific + 572
5  AudioSession                   0x6478 CADeprecated::GenericRunLoopThread::Entry(void*) + 156
6  AudioSession                   0xf7c8 CADeprecated::CAPThread::Entry(CADeprecated::CAPThread*) + 88
7  libsystem_pthread.dylib        0x3348 _pthread_start + 116
8  libsystem_pthread.dylib        0x1948 thread_start + 8

Thread #1
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #2
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #3
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #4
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #5
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #6
0  libsystem_kernel.dylib         0x1014 __workq_kernreturn + 8
1  libsystem_pthread.dylib        0x1e5c _pthread_wqthread + 360
2  libsystem_pthread.dylib        0x193c start_wqthread + 8

Thread #7
0  libsystem_kernel.dylib         0xaac mach_msg_trap + 8
1  libsystem_kernel.dylib         0x107c mach_msg + 72
2  CoreFoundation                 0x6d88 __CFRunLoopServiceMachPort + 368
3  CoreFoundation                 0xb090 __CFRunLoopRun + 1160
4  CoreFoundation                 0x1e250 CFRunLoopRunSpecific + 572
5  Foundation                     0x17eec -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232
6  Foundation                     0x18660 -[NSRunLoop(NSRunLoop) run] + 88
7  SwiftUI                        0xa6000 static DisplayLink.asyncThread(arg:) + 592
8  SwiftUI                        0xa42a4 @objc static DisplayLink.asyncThread(arg:) + 96
9  Foundation                     0x64bfc __NSThread__start__ + 792
10 libsystem_pthread.dylib        0x3348 _pthread_start + 116
11 libsystem_pthread.dylib        0x1948 thread_start + 8

This seems to be targeting mostly iOS 15 (15.6.1, 15.5.0, 15.7.0, 15.6.0, 15.3.1) but a small number of these crashes have happened on iOS 16 (16.0.0, 16.0.3). The app is also supported on iOS 14 though there hasn't been a concurrency crash on iOS 14 yet. The OS breakdown for users of the app is:

  • iOS 16 - 22.5%
  • iOS 15 - 72.2%
  • iOS 14 - 5.2%

Seeing that ~89% of these crashes occur on an iOS 15 device with the other ~11% happening on an iOS 16 device, that loosely fits with the number of iOS 15 users being greater than 16. Though this could also be because the 16 runtime is less vulnerable to this issue.

I've been unable to reproduce this and debugging in the simulator doesn't get me anywhere (using ASAN/TSAN or LIBDISPATCH_COOPERATIVE_POOL_STRICT=1).

Looking through other threads about Swift crashes I see a thread about an issue with TaskGroups and a known race condition issue in the concurrency runtime that there isn't a solution for with back-deployed runtimes. The function this is blowing up on makes no use of TaskGroup or even async let. It's a single Task that is created in SwiftUI that calls a single function that can make many web requests. I'm not sure this could even be diagnosed as the known race condition issue with little stack trace information & because there are a few instances of this crash happening on iOS 16 with the Swift 5.7 concurrency runtime.

My questions:

  • Has anyone else run into this issue?
  • Are there any debugging techniques for Swift concurrency that I have not tried?
  • I realize that the minimal stack trace and no MRE is not helpful. Is there any way I'd be able to gather more information about what's causing the crash?
  • Are there any other known concurrency issues that this could be?
  • Is it something else entirely?

If there's anything that's plain obvious that I'm not seeing I apologize - not the greatest expert on Swift/iOS yet. Any information would be greatly appreciated - thanks!!

I am also facing the same issue

Crashed: com.apple.root.user-initiated-qos.cooperative
0  Mobile Reality Capture         0x367cc $defer #1 () in closure #1 in ScanViewModel.wiredDownload(of:from:) + 4370638796 (<compiler-generated>:4370638796)
1  Mobile Reality Capture         0x35b28 closure #1 in ScanViewModel.wiredDownload(of:from:) + 461 (ScanViewModel.swift:461)
2  libswift_Concurrency.dylib     0x40f6c swift::runJobInEstablishedExecutorContext(swift::Job*) + 420
3  libswift_Concurrency.dylib     0x41e78 swift_job_runImpl(swift::Job*, swift::ExecutorRef) + 72
4  libdispatch.dylib              0x15a6c _dispatch_root_queue_drain + 396
5  libdispatch.dylib              0x16284 _dispatch_worker_thread2 + 164
6  libsystem_pthread.dylib        0xdbc _pthread_wqthread + 228
7  libsystem_pthread.dylib        0xb98 start_wqthread + 8

Async functions pretty much always run on the cooperative queue: the only time they don't is when they're running on a specialized actor like @MainActor that has a custom executor. And they're expected to always have a very shallow C stack: they're broken down into funclets that tail-call each other specifically to ensure that they can return to the concurrency runtime simply by doing a C return. (In fact, if there are ever frames for two different async funclets on the C stack at once, that's a serious bug that can lead to C stack exhaustion.)

It is therefore expected that the C stack will look approximately like this in basically every async function:

  • The top of the stack will be an async funclet.
  • That funclet's return address will be a function in the Concurrency runtime that runs a specific async work item, usually the task that the funclet is running as part of.
  • That function's return address (maybe after a few functions) will be a function in Dispatch that drains a dispatch queue and runs individual items.
  • That function's return address (maybe after a few functions) will be a function in pthreads that runs as the main body of a thread.
  • That function will be the bottom of the stack, because the OS will have set the thread up to start running that specific function when it created the thread.

If the async funclet calls something synchronous, those frames will be on top of the funclet's frame. But if it calls something asynchronous, or if it returns to its asynchronous caller, that's just a tail-call that changes the funclet on the top of the stack but leaves everything else intact. This is what I mean when I say that the stack stays very shallow.

You're seeing a crash in an async funclet in (IIUC) your own code. So, on the one hand, it is possible that this crash is resulting from a bug in the concurrency runtime where we ended up resuming a task in some broken configuration that led eventually to a crash. On the other hand, if you had an async function that just crashed for reasons entirely of its own — like, it did something dangerous with an unsafe pointer — the C stack trace will still look basically like this, because it always looks like this. So the first place to start is to look at where it's immediately crashing to see why it's crashing and if that context at all suggests what's going on. And if that doesn't make any sense — if it looks like it's crashing because it's accessing memory that's been completely corrupted or deallocated — then maybe it's a runtime bug.

10 Likes

This was very informative, thanks for the details!

Just for my own understanding, what is a "C stack"? Is that just short for call stack, or is that referring to some interop with a C language runtime (or something else entirely :sweat_smile:)?

Many compiled languages, especially ones with C interop like Swift has, use the same stack memory as C, managed in essentially the same way. Among other reasons, modern CPUs kinda assume something along those lines is happening with e.g. how return address predictors work.

1 Like

You are right, the issue ended up being a force unwrap that had some incorrect assumptions made about how it was supposedly safe. Ended up removing the force unwrap a week or two after my post and saw these crashes no longer get reported. I feel a bit foolish for not trying to check my own code more thoroughly first - since I wasn't able to reproduce it, the stack trace had some of the async internals below my own code, and the line number wasn't accurate I figured it could maybe be related to async & I posted this here as a hail mary.

Thanks for the insight! I enjoyed reading your thorough explanation. Apologies for not updating this thread after I resolved my issue months ago, I forgot I posted this here until the emails from replies started rolling in.

1 Like