Debugging a Concurrency Crash

Since updating to Xcode 14.3/Swift 5.8, an app I work on has been running into a very difficult to debug concurrency crash. My best guess at the issue is that an invalid job is being resumed, but I haven't been able to track down what job that is or where it's being enqueued from. In Xcode, an invalid stack address is displayed in the debugger for the crashed thread with no backtrace.

We are using underscored attributes (e.g. _unsafeInheritExecutor, _inheritActorContext) for various correctness reasons, so it may well be our use of them causing invalid code that triggers a runtime crash – since we're lacking good alternatives to the use of those attributes in many cases, I'm mainly seeking to track down the issue at this point before trying to work around them.

The issue was also present in Swift 5.7, but only in debug mode. It doesn't seem to be a race condition – it repros consistently with LIBDISPATCH_COOPERATIVE_POOL_STRICT=1.

I've tried building a custom Swift toolchain with SWIFT_TASK_DEBUG_LOG enabled but haven't been able to get any output; my assumption is the OS concurrency runtime is being used instead and I haven't been able to override it.

Any advice on how to track this down would be appreciated.

The crash log varies, but typically looks like:

Crash Log
Code Type:             X86-64 (Native)
Parent Process:        launchd [1]
User ID:               501

Date/Time:             2023-04-03 13:01:49.1528 +1200
OS Version:            macOS 13.3 (22E252)
Report Version:        12
Bridge OS Version:     7.4 (20P4252)
Anonymous UUID:        3A7FC9C4-0015-B179-3B95-9B7A37ED5B97

Sleep/Wake UUID:       8E109BB7-819E-4649-B899-BF51CEBA2075

Time Awake Since Boot: 400000 seconds

System Integrity Protection: enabled

Crashed Thread:        18  Dispatch queue: com.apple.root.default-qos.cooperative

Exception Type:        EXC_BAD_ACCESS (SIGBUS)
Exception Codes:       KERN_PROTECTION_FAILURE at 0x00007fd5cb773750
Exception Codes:       0x0000000000000002, 0x00007fd5cb773750

Termination Reason:    Namespace SIGNAL, Code 10 Bus error: 10
Terminating Process:   exc handler [9694]

VM Region Info: 0x7fd5cb773750 is in 0x7fd5cb700000-0x7fd5cb800000;  bytes after start: 472912  bytes before end: 575663
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC_MEDIUM (reserved) 7fd5c7800000-7fd5c8000000 [ 8192K] rw-/rwx SM=NUL  ...(unallocated)
      GAP OF 0x3700000 BYTES
--->  MALLOC_TINY              7fd5cb700000-7fd5cb800000 [ 1024K] rw-/rwx SM=PRV  
      MALLOC_SMALL             7fd5cb800000-7fd5cc000000 [ 8192K] rw-/rwx SM=PRV  

Thread 0::  Dispatch queue: com.apple.main-thread
0   SwiftUI                       	    0x7ff91cdc3531 0x7ff91bf6c000 + 15037745
1   SwiftUI                       	    0x7ff91cdc46f4 0x7ff91bf6c000 + 15042292
2   SwiftUI                       	    0x7ff91d2e1bb6 0x7ff91bf6c000 + 20405174
3   SwiftUI                       	    0x7ff91d058637 0x7ff91bf6c000 + 17745463
4   SwiftUI                       	    0x7ff91c7f399c 0x7ff91bf6c000 + 8944028
5   SwiftUI                       	    0x7ff91cf0ac25 0x7ff91bf6c000 + 16378917
6   SwiftUI                       	    0x7ff91cf0a116 0x7ff91bf6c000 + 16376086
7   SwiftUI                       	    0x7ff91c927b31 0x7ff91bf6c000 + 10206001
8   SwiftUI                       	    0x7ff91cf0ac25 0x7ff91bf6c000 + 16378917
9   SwiftUI                       	    0x7ff91cf0a116 0x7ff91bf6c000 + 16376086
10  SwiftUI                       	    0x7ff91c7f39bb 0x7ff91bf6c000 + 8944059
11  SwiftUI                       	    0x7ff91cf0ac25 0x7ff91bf6c000 + 16378917
12  SwiftUI                       	    0x7ff91cf0a116 0x7ff91bf6c000 + 16376086
13  SwiftUI                       	    0x7ff91c7f39bb 0x7ff91bf6c000 + 8944059
14  SwiftUI                       	    0x7ff91cf0ac25 0x7ff91bf6c000 + 16378917
15  SwiftUI                       	    0x7ff91cf0a116 0x7ff91bf6c000 + 16376086
16  SwiftUI                       	    0x7ff91c637405 0x7ff91bf6c000 + 7123973
17  SwiftUI                       	    0x7ff91c636fb2 0x7ff91bf6c000 + 7122866
18  SwiftUI                       	    0x7ff91d03a711 0x7ff91bf6c000 + 17622801
19  SwiftUI                       	    0x7ff91d03a8a2 0x7ff91bf6c000 + 17623202
20  AppKit                        	    0x7ff814e8c2cf -[NSView hitTest:] + 403
21  AppKit                        	    0x7ff814e8be37 -[NSThemeFrame _hitTest:ignoringResizeRegion:] + 135
22  AppKit                        	    0x7ff8153a33ba -[_NSTrackingAreaAKManager setCursorForMouseLocation:] + 897
23  AppKit                        	    0x7ff8153a2b0e -[_NSTrackingAreaAKManager displayCycleUpdateStructuralRegions] + 339
24  AppKit                        	    0x7ff814dbb07e __NSWindowGetDisplayCycleObserverForUpdateStructuralRegions_block_invoke + 390
25  AppKit                        	    0x7ff814db5d1c NSDisplayCycleObserverInvoke + 142
26  AppKit                        	    0x7ff814db594c NSDisplayCycleFlush + 878
27  QuartzCore                    	    0x7ff8194fd5c6 CA::Transaction::run_commit_handlers(CATransactionPhase) + 98
28  QuartzCore                    	    0x7ff8194fc0be CA::Transaction::commit() + 372
29  AppKit                        	    0x7ff814e5314f __62+[CATransaction(NSCATransaction) NS_setFlushesWithDisplayLink]_block_invoke + 285
30  AppKit                        	    0x7ff815665ea8 ___NSRunLoopObserverCreateWithHandler_block_invoke + 41
31  CoreFoundation                	    0x7ff811c63584 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
32  CoreFoundation                	    0x7ff811c634ab __CFRunLoopDoObservers + 482
33  CoreFoundation                	    0x7ff811c62a36 __CFRunLoopRun + 859
34  CoreFoundation                	    0x7ff811c62071 CFRunLoopRunSpecific + 560
35  HIToolbox                     	    0x7ff81b6cafcd RunCurrentEventLoopInMode + 292
36  HIToolbox                     	    0x7ff81b6cadde ReceiveNextEventCommon + 657
37  HIToolbox                     	    0x7ff81b6cab38 _BlockUntilNextEventMatchingListInModeWithFilter + 64
38  AppKit                        	    0x7ff814cf47a0 _DPSNextEvent + 858
39  AppKit                        	    0x7ff814cf364a -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1214
40  AppKit                        	    0x7ff814ce5cb8 -[NSApplication run] + 586
41  AppKit                        	    0x7ff814cb9ed2 NSApplicationMain + 817
42  Application                   	       0x100b4d099 main + 9 (AppDelegate.swift:27)
43  dyld                          	    0x7ff81182e41f start + 1903

Thread 1:
0   dyld                          	    0x7ff811875332 DyldSharedCache::forEachImageTextSegment(void (unsigned long long, unsigned long long, unsigned char const*, char const*, bool&) block_pointer) const + 118
1   dyld                          	    0x7ff81186033c dyld4::APIs::findImageMappedAt(void const*, dyld3::MachOLoaded const**, bool*, char const**, void const**, unsigned long long*, unsigned char*) + 398
2   dyld                          	    0x7ff811862515 dyld4::APIs::_dyld_images_for_addresses(unsigned int, void const**, dyld_image_uuid_offset*) + 183
3   libsystem_c.dylib             	    0x7ff811a454a1 backtrace_image_offsets + 77
4   libsystem_trace.dylib         	    0x7ff811911ea4 os_log_backtrace_create_from_pcs + 99
5   libsystem_trace.dylib         	    0x7ff811911dfb os_log_backtrace_create_from_return_address + 122
6   libsystem_trace.dylib         	    0x7ff81190c944 _os_log_impl_flatten_and_send + 8477
7   libsystem_trace.dylib         	    0x7ff81191bdc0 __os_signpost_emit_impl + 216
8   libsystem_trace.dylib         	    0x7ff81190e39d _os_signpost_emit_with_name_impl + 21
9   libswift_Concurrency.dylib    	    0x7ffc16f6e480 swift_task_enqueueGlobal + 160
10  libswift_Concurrency.dylib    	    0x7ffc16f6fdf1 swift_task_create_commonImpl(unsigned long, swift::TaskOptionRecord*, swift::TargetMetadata<swift::InProcess> const*, void (swift::AsyncContext* swift_async_context) swiftasynccall*, void*, unsigned long) + 1633
11  Substrate       	       0x109ffc04f specialized Task<>.init(priority:operation:) + 154 [inlined]
12  Substrate       	       0x109ffc04f MetalFunctionCache.function(for:) + 1007 (MetalCaches.swift:31)
13  Substrate       	       0x10a0110c1 MTLRenderPipelineDescriptor.init(_:functionCache:) + 1 (MetalDescriptors.swift:321)
14  Substrate       	       0x109ffd121 closure #1 in MetalRenderPipelineCache.subscript.getter + 1 (MetalCaches.swift:78)
15  Substrate       	       0x10a002231 partial apply for closure #1 in MetalRenderPipelineCache.subscript.getter + 1

Thread 2:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 3:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 4:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 5:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 6:
0   libsystem_malloc.dylib        	    0x7ff8119b9ff0 nanov2_calloc + 392
1   CoreFoundation                	    0x7ff811beb251 _CFRuntimeCreateInstance + 395
2   CoreFoundation                	    0x7ff811bea981 __CFStringCreateImmutableFunnel3 + 2145
3   CoreFoundation                	    0x7ff811bf7b2d CFStringCreateWithBytes + 27
4   Foundation                    	    0x7ff812afcc7b +[NSString stringWithCString:encoding:] + 69
5   Metal                         	    0x7ff81b23c9e1 DeserialContext::stringFromSerializedData() + 77
6   Metal                         	    0x7ff81b2ababe deserializeArguments(id<MTLDeviceSPI>, DeserialContext&, bool, MTLBindingInternal**&, std::__1::unordered_map<unsigned int, MTLStructTypeInternal*, MTLBindingInternal**&::hash<unsigned int>, MTLBindingInternal**&::equal_to<unsigned int>, MTLBindingInternal**&::allocator<MTLBindingInternal**&::pair<unsigned int const, MTLStructTypeInternal>>>&, unsigned int, bool) + 288
7   Metal                         	    0x7ff81b2ab973 MTLArgumentListReader::deserializeArguments(id<MTLDeviceSPI>, DeserialContext&, bool) + 81
8   Metal                         	    0x7ff81b2ad000 MTLFragmentReflectionReader::deserialize(id<MTLDeviceSPI>, NSObject<OS_dispatch_data>*) + 176
9   Metal                         	    0x7ff81b305ede -[MTLFunctionReflectionInternal initWithDevice:reflectionData:functionType:options:] + 416
10  Metal                         	    0x7ff81b310d02 __53-[_MTLFunction reflectionWithOptions:binaryArchives:]_block_invoke + 74
11  Metal                         	    0x7ff81b316543 -[MTLCompiler compileFunctionRequestInternal:frameworkLinking:linkDataSize:reflectionOnly:completionHandler:] + 1733
12  Metal                         	    0x7ff81b31b364 -[MTLCompiler reflectionWithFunction:options:sync:pipelineLibrary:binaryArchives:completionHandler:] + 380
13  Metal                         	    0x7ff81b31b1c6 -[MTLCompiler reflectionWithFunction:options:sync:binaryArchives:completionHandler:] + 27
14  Metal                         	    0x7ff81b310c71 -[_MTLFunction reflectionWithOptions:binaryArchives:] + 196
15  Metal                         	    0x7ff81b310fcf -[_MTLFunction newArgumentEncoderWithBufferIndex:reflection:binaryArchives:] + 45
16  Substrate       	       0x10a01cceb specialized static MetalPipelineReflection.fillCaches(function:argument:stages:bindingPathCache:reflectionCache:argumentEncoders:) + 7323 (MetalPipelineReflection.swift:201)
17  Substrate       	       0x10a01d127 static MetalPipelineReflection.fillCaches(function:argument:stages:bindingPathCache:reflectionCache:argumentEncoders:) + 26 [inlined]
18  Substrate       	       0x10a01d127 closure #2 in MetalPipelineReflection.init(threadExecutionWidth:vertexFunction:fragmentFunction:renderState:renderReflection:) + 61 (MetalPipelineReflection.swift:97) [inlined]
19  Substrate       	       0x10a01d127 specialized Sequence.forEach(_:) + 61 [inlined]
20  Substrate       	       0x10a01d127 specialized MetalPipelineReflection.__allocating_init(threadExecutionWidth:vertexFunction:fragmentFunction:renderState:renderReflection:) + 855 (MetalPipelineReflection.swift:96)
21  Substrate       	       0x109ffd2b4 MetalPipelineReflection.__allocating_init(threadExecutionWidth:vertexFunction:fragmentFunction:renderState:renderReflection:) + 26 [inlined]
22  Substrate       	       0x109ffd2b4 closure #1 in MetalRenderPipelineCache.subscript.getter + 292 (MetalCaches.swift:86)
23  Substrate       	       0x10a002231 partial apply for closure #1 in MetalRenderPipelineCache.subscript.getter + 1

Thread 7:
0   libsystem_kernel.dylib        	    0x7ff811b495b2 mach_msg2_trap + 10
1   libsystem_kernel.dylib        	    0x7ff811b5772d mach_msg2_internal + 78
2   libsystem_kernel.dylib        	    0x7ff811b505e4 mach_msg_overwrite + 692
3   libsystem_kernel.dylib        	    0x7ff811b4989a mach_msg + 19
4   Application                   	       0x10126ad85 exception_server_thread + 154 (PLCrashMachExceptionServer.m:674)
5   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
6   libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 8:: com.apple.NSEventThread
0   libsystem_kernel.dylib        	    0x7ff811b495b2 mach_msg2_trap + 10
1   libsystem_kernel.dylib        	    0x7ff811b5772d mach_msg2_internal + 78
2   libsystem_kernel.dylib        	    0x7ff811b505e4 mach_msg_overwrite + 692
3   libsystem_kernel.dylib        	    0x7ff811b4989a mach_msg + 19
4   CoreFoundation                	    0x7ff811c641af __CFRunLoopServiceMachPort + 145
5   CoreFoundation                	    0x7ff811c62c30 __CFRunLoopRun + 1365
6   CoreFoundation                	    0x7ff811c62071 CFRunLoopRunSpecific + 560
7   AppKit                        	    0x7ff814e54909 _NSEventThread + 132
8   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
9   libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 9:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 10:: com.apple.NSURLConnectionLoader
0   libsystem_kernel.dylib        	    0x7ff811b495b2 mach_msg2_trap + 10
1   libsystem_kernel.dylib        	    0x7ff811b5772d mach_msg2_internal + 78
2   libsystem_kernel.dylib        	    0x7ff811b505e4 mach_msg_overwrite + 692
3   libsystem_kernel.dylib        	    0x7ff811b4989a mach_msg + 19
4   CoreFoundation                	    0x7ff811c641af __CFRunLoopServiceMachPort + 145
5   CoreFoundation                	    0x7ff811c62c30 __CFRunLoopRun + 1365
6   CoreFoundation                	    0x7ff811c62071 CFRunLoopRunSpecific + 560
7   CFNetwork                     	    0x7ff8165decba 0x7ff8163a0000 + 2354362
8   Foundation                    	    0x7ff812ae67b3 __NSThread__start__ + 1009
9   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
10  libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 11:: CVDisplayLink
0   libsystem_kernel.dylib        	    0x7ff811b4c0ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff811b8878d _pthread_cond_wait + 1295
2   CoreVideo                     	    0x7ff819c9a1cc CVDisplayLink::waitUntil(unsigned long long) + 370
3   CoreVideo                     	    0x7ff819c9914c CVDisplayLink::runIOThread() + 526
4   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
5   libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 12:: com.apple.CFStream.LegacyThread
0   libsystem_kernel.dylib        	    0x7ff811b495b2 mach_msg2_trap + 10
1   libsystem_kernel.dylib        	    0x7ff811b5772d mach_msg2_internal + 78
2   libsystem_kernel.dylib        	    0x7ff811b505e4 mach_msg_overwrite + 692
3   libsystem_kernel.dylib        	    0x7ff811b4989a mach_msg + 19
4   CoreFoundation                	    0x7ff811c641af __CFRunLoopServiceMachPort + 145
5   CoreFoundation                	    0x7ff811c62c30 __CFRunLoopRun + 1365
6   CoreFoundation                	    0x7ff811c62071 CFRunLoopRunSpecific + 560
7   CoreFoundation                	    0x7ff811cdb58f _legacyStreamRunLoop_workThread + 251
8   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
9   libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 13:
0   libsystem_kernel.dylib        	    0x7ff811b4ae62 __read_nocancel + 10
1   libsystem_c.dylib             	    0x7ff811a3cd19 _sread + 16
2   libsystem_c.dylib             	    0x7ff811a3ccc8 __srefill1 + 24
3   libsystem_c.dylib             	    0x7ff811a628f9 _fseeko + 836
4   libsystem_c.dylib             	    0x7ff811a44de2 fseek + 74
5   Metal                         	    0x7ff81b2321ee LibraryWithFile::setPosition(unsigned long long) + 28
6   Metal                         	    0x7ff81b3066aa invocation function for block in MTLLibraryDataWithArchive::validateBitCode(unsigned long, unsigned long, NSData const*, MTLUINT256_t const&) + 55
7   libdispatch.dylib             	    0x7ff8119e7033 _dispatch_client_callout + 8
8   libdispatch.dylib             	    0x7ff8119f4ba1 _dispatch_lane_barrier_sync_invoke_and_complete + 60
9   Metal                         	    0x7ff81b306652 MTLLibraryDataWithArchive::validateBitCode(unsigned long, unsigned long, NSData const*, MTLUINT256_t const&) + 124
10  Metal                         	    0x7ff81b2422a0 MTLCompilerFunctionRequest::serializedRequest(unsigned int, char**) + 1162
11  Metal                         	    0x7ff81b2416b2 XPCCompilerConnection::BuildRequestInternal(MTLCompilerRequest*, char const*, NSObject<OS_dispatch_data>*, int, bool, void (unsigned int, void const*, unsigned long, char const*) block_pointer) + 206
12  Metal                         	    0x7ff81b2c4591 invocation function for block in XPCCompilerConnection::BuildRequest(MTLCompilerRequest*, char const*, NSObject<OS_dispatch_data>*, int, bool, void (unsigned int, void const*, unsigned long, char const*) block_pointer) + 61
13  libdispatch.dylib             	    0x7ff8119e7033 _dispatch_client_callout + 8
14  libdispatch.dylib             	    0x7ff8119f4ba1 _dispatch_lane_barrier_sync_invoke_and_complete + 60
15  Metal                         	    0x7ff81b24158e XPCCompilerConnection::BuildRequest(MTLCompilerRequest*, char const*, NSObject<OS_dispatch_data>*, int, bool, void (unsigned int, void const*, unsigned long, char const*) block_pointer) + 116
16  Metal                         	    0x7ff81b241373 MTLCompilerConnectionManagerPrivate::buildRequest(unsigned int, MTLCompilerRequest*, bool, void (MTLCompilerError, NSObject<OS_dispatch_data>*, char const*) block_pointer) + 467
17  Metal                         	    0x7ff81b317d64 -[MTLCompiler compileFunctionRequestInternal:frameworkLinking:linkDataSize:reflectionOnly:completionHandler:] + 7910
18  Metal                         	    0x7ff81b32597a -[MTLCompiler computeVariantEntryWithDescriptor:options:serializedComputeDataDescriptor:asyncCompile:pipelineCache:destinationBinaryArchive:computeProgram:kernelDriverCompileTimeData:compileTimeStatistics:] + 1865
19  Metal                         	    0x7ff81b326afa -[MTLCompiler newComputePipelineStateWithDescriptorInternal:options:pipelineCache:destinationBinaryArchive:reflection:error:completionHandler:] + 350
20  Metal                         	    0x7ff81b23d6b2 -[MTLCompiler newComputePipelineStateWithDescriptor:options:reflection:error:completionHandler:] + 127
21  Metal                         	    0x7ff81b2bc912 -[_MTLDevice newComputePipelineStateWithDescriptor:options:reflection:error:] + 76
22  Substrate       	       0x109ffeced closure #1 in MetalComputePipelineCache.subscript.getter + 461 (MetalCaches.swift:173)
23  Substrate       	       0x10a002211 partial apply for closure #1 in MetalComputePipelineCache.subscript.getter + 1

Thread 14:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 15:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 16:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 17:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 18 Crashed::  Dispatch queue: com.apple.root.default-qos.cooperative
0   ???                           	    0x7fd5cb773750 ???

Thread 19:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 20:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 21:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 22:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0

Thread 23:: CVDisplayLink
0   libsystem_kernel.dylib        	    0x7ff811b4c0ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff811b8878d _pthread_cond_wait + 1295
2   CoreVideo                     	    0x7ff819c9a1cc CVDisplayLink::waitUntil(unsigned long long) + 370
3   CoreVideo                     	    0x7ff819c9914c CVDisplayLink::runIOThread() + 526
4   libsystem_pthread.dylib       	    0x7ff811b881d3 _pthread_start + 125
5   libsystem_pthread.dylib       	    0x7ff811b83bd3 thread_start + 15

Thread 24:
0   libsystem_pthread.dylib       	    0x7ff811b83bb0 start_wqthread + 0


Thread 18 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000000  rcx: 0x00007fd56c00b040  rdx: 0x0000000000000082
  rdi: 0x00007fd5cb773750  rsi: 0x00007fd56c00b040  rbp: 0x0000000000000000  rsp: 0x0000700010666ec0
   r8: 0x00007ff8534d02b0   r9: 0x0000000000000082  r10: 0x0000000104195000  r11: 0xffff9000f3b2f23c
  r12: 0x00007ff85525f480  r13: 0x0000000008000000  r14: 0x00007ff85525f480  r15: 0x00007ffc51f2b610
  rip: 0x00007fd5cb773750  rfl: 0x0000000000010206  cr2: 0x00007fd5cb773750
  
Logical CPU:     6
Error Code:      0x00000015 (invalid protections for user instruction read)
Trap Number:     14

The issue appears to be Intel only; the crash doesn't occur on Apple Silicon.

The address of the crash is at memory allocated by malloc within swift_task_create_commonImpl; malloc_history shows (for a different instance of the crash to the one above):

VM_ALLOC 0x14c926000-0x14c926fff [size=4096]:  
0x10fdfc927 (...) specialized RenderGraph.init(inflightFrameCount:transientBufferCapacity:transientTextureCapacity:transientArgumentBufferArrayCapacity:)  RenderGraph.swift:804 
0x10fdfc5bc (...) specialized RenderGraphContextImpl.init(backend:inflightFrameCount:transientRegistryIndex:)  RenderGraphContext.swift:55 
0x10fbb840e (...) TaskStream.init(priority:)  TaskStream.swift:30 
0x7ffc0a6e99ff (libswift_Concurrency.dylib) swift_task_create_commonImpl(unsigned long, swift::TaskOptionRecord*, swift::TargetMetadata<swift::InProcess> const*, void (swift::AsyncContext* swift_async_context) swiftasynccall*, void*, unsigned long) 
0x7ff805150149 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy | 0x10be385b1 (libgmalloc.dylib) GuardMalloc_mallocInternal 
0x7ff8052c50df (libsystem_kernel.dylib) vm_allocate 
0x7ff8052c5160 (libsystem_kernel.dylib) mach_vm_allocate 

ALLOC 0x14c926ee0-0x14c926fff [size=288]:  
0x10fdfc927 (...) specialized RenderGraph.init(inflightFrameCount:transientBufferCapacity:transientTextureCapacity:transientArgumentBufferArrayCapacity:)  RenderGraph.swift:804 
0x10fdfc5bc (com.livesurface.LiveSurfaceSDK) specialized RenderGraphContextImpl.init(backend:inflightFrameCount:transientRegistryIndex:)  RenderGraphContext.swift:55 
0x10fbb840e (...) TaskStream.init(priority:)  TaskStream.swift:30 
0x7ffc0a6e99ff (libswift_Concurrency.dylib) swift_task_create_commonImpl(unsigned long, swift::TaskOptionRecord*, swift::TargetMetadata<swift::InProcess> const*, void (swift::AsyncContext* swift_async_context) swiftasynccall*, void*, unsigned long)
0x7ff8051501c8 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy 

I managed to get the app running against a concurrency runtime with SWIFT_TASK_DEBUG_LOG enabled; the output before the crash looks something like this:

...
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:299](completeFuture) waking task 0x7facfe269eb0 from future of task 0x7facfe65d7f0
[0x70000b312000] [<swift-source>/Concurrency/TaskPrivate.h:741](flagAsAndEnqueueOnExecutor) 0x7facfe269eb0->flagAsAndEnqueueOnExecutor()
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1791](swift_task_enqueueImpl) enqueue job 0x7facfe269eb0 on executor 0x0
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:46](leave) [0x70000b311e88] Restoring original voucher 0x0
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:36](VoucherManager) [0x70000b311e88] Constructing VoucherManager
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1559](swift_job_runImpl) job 0x7facfe269eb0
[0x70000b312000] [<swift-source>/Concurrency/TaskPrivate.h:676](flagAsRunning) 0x7facfe269eb0->flagAsRunning()
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:65](swapToJob) [0x70000b311e88] Swapping jobs to 0x7facfe269eb0
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:73](swapToJob) [0x70000b311e88] Swapping jobs to 0x7facfe269eb0, adopting voucher 0x0
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:91](swapToJob) [0x70000b311e88] Saved original voucher 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1385](tryLock) Thread attempting to jump onto 0x6000020640a0, as drainer = 0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1087](traceActorStateTransition) Actor 0x6000020640a0 transitioned from 0 to 0x2 (traceActorStateTransition)
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1761](swift_task_switchImpl) switch succeeded, task 0x7facfe269eb0 assumed thread for executor 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1656](giveUpThreadForSwitch) Giving up current generic executor 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x6000020640a0 to 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x6000020640a0 to 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x6000020640a0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1761](swift_task_switchImpl) switch succeeded, task 0x7facfe269eb0 assumed thread for executor 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1468](unlock) Try unlock-ing actor 0x6000020640a0 with forceUnlock = 1
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1087](traceActorStateTransition) Actor 0x6000020640a0 transitioned from 0x2 to 0 (traceActorStateTransition)
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1531](unlock) Actor 0x6000020640a0 is idle now
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x0 to 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1385](tryLock) Thread attempting to jump onto 0x6000020640a0, as drainer = 0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1087](traceActorStateTransition) Actor 0x6000020640a0 transitioned from 0 to 0x2 (traceActorStateTransition)
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1761](swift_task_switchImpl) switch succeeded, task 0x7facfe269eb0 assumed thread for executor 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1656](giveUpThreadForSwitch) Giving up current generic executor 0x0
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1739](swift_task_switchImpl) Task 0x7facfe269eb0 trying to switch from executor 0x6000020640a0 to 0x6000020640a0
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:742](swift_task_create_commonImpl) Creating a detached task from 0x7facfe269eb0
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:805](swift_task_create_commonImpl) Task's base priority = 0x15
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:852](swift_task_create_commonImpl) allocate task 0x7facfd784b10, parent = 0x0, slab 512
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:932](swift_task_create_commonImpl) creating task 0x7facfd784b10 ID 684 with parent 0x0 at base pri 21
[0x70000b312000] [<swift-source>/Concurrency/TaskPrivate.h:741](flagAsAndEnqueueOnExecutor) 0x7facfd784b10->flagAsAndEnqueueOnExecutor()
[0x70000b312000] [<swift-source>/Concurrency/Actor.cpp:1791](swift_task_enqueueImpl) enqueue job 0x7facfd784b10 on executor 0x0
[0x70000b312000] [<swift-source>/Concurrency/Task.cpp:133](waitFuture) task 0x7facfe269eb0 waiting on task 0x7facfd784b10, going to sleep
[0x70000b312000] [<swift-source>/Concurrency/TaskPrivate.h:791](flagAsSuspended) 0x7facfe269eb0->flagAsSuspended()
[0x70000b312000] [<swift-source>/Concurrency/VoucherSupport.h:103](restoreVoucher) [0x70000b311e88] Restoring voucher on task 0x7facfe269eb0
[0x70000b312000] [<swift-source>/Concurrency/TaskStatus.cpp:565](swift_task_escalateImpl) Escalating 0x7facfd784b10 to 0x15 priority
[0x70000b312000] [<swift-source>/Concurrency/TaskStatus.cpp:573](swift_task_escalateImpl) Task is already at 0x15 priority

before crashing with stack frame 0 = 0x7facfe269eb0, stack frame 1 = 0x70000b311ed0 (with jo 0x7facfe269eb0 as the first instruction from stack frame 1).

The assembly at the crashed stack frame is:

addb   %bl, 0x111(%rsi,%rbp)
addb   %al, (%rbx)
addb   %al, (%rax)
addb   %al, (%rax,%rax)
addb   %al, (%rax)
addb   %al, (%rax)
addb   %al, (%rax)
addb   %al, (%rax)
addb   %al, (%rax)
addl   %eax, (%rax)
addb   %al, (%rax)
addb   %al, (%rax)
addb   %al, (%rax)

The task that gets created/waited on (0x7facfd784b10) is created on line 111 of MetalStateCaches for concurrency debugging · GitHub; when the crash occurs (which is only after this function has been called a few times), the created task appears to never be switched to.

Since this is going down something of a rabbit-hole, for anyone who has experience in this area and happens to have a moment: would you have any advice on how I might further track this down or what appears to be happening here? Any help would be much appreciated.

This all would be good to report as an issue on GitHub so it doesn’t get lost and has a chance getting looked at. Mind opening an issue?

I've filed Concurrency-Related Crash · Issue #64966 · apple/swift · GitHub. I was a little hesitant to initially because we're using some technically-unsupported attributes (e.g. @_unsafeInheritExecutor), but as far as I can tell the crash doesn't seem to be connected to their use.

3 Likes

That assembly corresponds to a bunch of mostly zero bytes; it’s not actually code. The process is crashing because it did a wild jump, which suggests that memory got corrupted somehow.

2 Likes

Thanks; that's useful to have confirmed. The address of that assembly is consistently the start of the malloc block allocated for the waiting task, so my assumption given that is that it corresponds to the hopefully-intact task header. The assembly of the calling stack frame's (frame 1 in the backtrace) looks similar; for the sake of completeness, it's the following for the latest run:

->  0x7000082a2ed0: jo     0x7000082a2f01
    0x7000082a2ed2: subb   (%rax), %cl
    0x7000082a2ed4: addb   %dh, (%rax)
    0x7000082a2ed7: addb   %bl, 0x10672b7(%rdi)
    0x7000082a2edd: addb   %al, (%rax)
    0x7000082a2edf: addb   %al, (%rax)
    0x7000082a2ee1: addb   %al, (%rax)
    0x7000082a2ee3: addb   %al, (%rax)
    0x7000082a2ee5: addb   %al, (%rax)
    0x7000082a2ee7: addb   %al, (%rax)
    0x7000082a2ee9: addb   %al, (%rax)
    0x7000082a2eeb: addb   %al, (%rax)
    0x7000082a2eed: addb   %al, (%rax)

vmmap shows that 0x7000082a2ed0 is stack memory for a thread (vmmap uses different thread numbers than Xcode so it's hard to tell which thread it corresponds to), whereas 0x7000082a2f01 is task memory. Both addresses shift between runs.

In terms of memory corruption none of ASAN/Malloc Scribble/Malloc Guard Edges/Guard Malloc pick anything up (although that obviously doesn't rule it out at all).

Is there a particular function that would be responsible for the jump that I could try to get more debugging information from (e.g. try to isolate the jump address)? I'm guessing we're not looking at a regular function call due to the lack of backtraces.

Yeah, usually to debug a wild jump you have to trace forward from a point where control flow was still a-okay. Stack memory is not mapped executable, so there’s no way it was actually running any of that; either the debugger is confused about frames (not unlikely) or the caller’s frame has been smashed.

1 Like

In case it's useful to anyone, I made a screen recording stepping through the disassembly in the debugger to the point to the crash, starting from the swift::AsyncTask::waitFuture call in line 119 of MetalStateCaches for concurrency debugging · GitHub ( _ = try await renderStateTask.value). The actual crash occurs in swift::runJobInEstablishedExecutorContext; the return address appears to be invalid. I haven't had a chance to look any more in depth than that myself yet but intend to do so next week.

I've managed to narrow down the problem, although I don't have an isolated test case; the bug seems to be fairly brittle. The issue appears to be memory corruption related to captures in an async closure (where I'm capturing a number of variables by value which were vars in the enclosing scope). Concretely, on x86_64 and not ARM, this crashes at the next await call that suspends the current task:

renderGraph.addDrawCallbackPass(renderTarget: renderTarget, 
                                reflection: ResolveAccumulatedTargetPassReflection.self, 
                                { [uvToNDC, imageUVToSceneUV, filmicToneMapUniforms, vignetteUniforms, colorGradedSceneRegionBackgroundColor] encoder in
    ...
    await Task.yield() // crash
    ...
}

Moving the imageUVToSceneUV to a tuple works around it (doesn't crash):

let arguments = (uvToNDC, imageUVToSceneUV)
renderGraph.addDrawCallbackPass(renderTarget: renderTarget, 
                                reflection: ResolveAccumulatedTargetPassReflection.self, 
                                { [arguments, filmicToneMapUniforms, vignetteUniforms, colorGradedSceneRegionBackgroundColor] encoder in
    let (uvToNDC, imageUVToSceneUV) = arguments
    await Task.yield() // no crash
                          
}

as does inserting a print statement for imageUVToSceneUV specifically:

renderGraph.addDrawCallbackPass(renderTarget: renderTarget, 
                                reflection: ResolveAccumulatedTargetPassReflection.self, 
                                { [uvToNDC, imageUVToSceneUV, filmicToneMapUniforms, vignetteUniforms, colorGradedSceneRegionBackgroundColor] encoder in
    print(imageUVToSceneUV)
    await Task.yield() // no crash
    ...
}

but moving the print statement to after the first use of imageUVToSceneUV in the function does crash:

renderGraph.addDrawCallbackPass(renderTarget: renderTarget, 
                                reflection: ResolveAccumulatedTargetPassReflection.self, 
                                { [uvToNDC, imageUVToSceneUV, filmicToneMapUniforms, vignetteUniforms, colorGradedSceneRegionBackgroundColor] encoder in
    ...
    encoder.set1.uniforms = .init(clearColor: SIMD4(SIMD3(clearColor), 0.0),
                                  backgroundColor: SIMD4(colorGradedSceneRegionBackgroundColor),
                                  uvToSceneUVScale: SIMD2(imageUVToSceneUV.scale),
                                  uvToSceneUVOffset: SIMD2(imageUVToSceneUV.offset),
                                  uvToBlitTexture0UV: multiFrameAccumulationWeight < 1.0 ? previewBufferUVScale : .one,
                                  highlightColorMultiply: SIMD3(highlightColorMult),
                                  highlightColorAdditive: SIMD3(highlightColorAdd),
                                  toneMapParams: filmicToneMapUniforms,
                                  vignetteParams: vignetteUniforms)
    print(imageUVToSceneUV)
    await Task.yield() // crash            
    ...
}

uvToNDC and imageUVToSceneUV are RectTransform<Float>s, which are composed of two SIMD2<Float>s each; addDrawCallbackPass is defined at Substrate/RenderGraph.swift at 6f25f5b6606278e705144b3664a4fdfe111b3a53 · troughton/Substrate · GitHub.

I can’t provide source but isolated SIL/LLVM IR may be possible.

I’ve also added this information to the GitHub issue, and can shift discussion there if that’s best.