I want to raise the visibility of withCheckedContinuation crashes on Xcode 16 RC · Issue #75952 · swiftlang/swift · GitHub, which is a major, 100% reproducible crash for Catalyst users when using any of the global with*
concurrency functions, except withUnsafeContinuation
(apparently). This affects withCheckedContinuation
, withTaskCancellationHandler
(current and deprecated forms), and withTaskGroup
(and throwing variants). So far the impact includes the RevenueCat and Alamofire frameworks, and any app that includes them and has rebuilt using Xcode 16. (May be some debug vs. release shenanigans as well.)
This is a known issue that was addressed on the release/6.0.1
branch here: [6.0.1] SILOptimizer: Allow inlining of transparent functions in `@backDeployed` thunks by tshortli · Pull Request #76218 · swiftlang/swift · GitHub. Designed for iPhone/iPad binaries built with debug optimizations and deployed to macOS Sonoma or earlier are affected.
This issue is documented in the Xcode 16 release notes (search 134793410).
Ah, thanks! Hopefully that makes it into an Xcode release soon.
FWIW I see this crash on iOS 18 and Sequoia, on TestFlight builds with optimizations enabled. Using the withThrowingTaskGroup
and withCheckedThrowingContinuation
APIs. Designed for iPad Catalyst app.
Were you building with the Xcode 16.1 beta? As I understand it there are two different bugs here that crash in a similar way: issues in Xcode 16 with Catalyst apps where some back deployed calls are broken and issues in Xcode betas (including Xcode 16.1 betas I think) where the code produced doesn't match the ABI of the system.
No, it was many versions of 16.0. I just installed 16.1 beta yesterday, but I've been seeing this issue for over a month. I just thought it was my fault somehow
Guess we won't really know until they ship an Xcode version with Swift 6.0.1, which hopefully isn't just Xcode 16.1.
@Jon_Shier Wait, why? Is it not allowed to upload apps with Swift toolchains that are not shipped directly via Xcode releases?
Nope, plus most of those don't even work for building non-macOS apps.
@tshortli Is it possible that this can also occur with -oS
or -oZ
? I'm seeing a similar crash which also involves back deployment, but it's occurring on iOS 18 and 18.1, various iPhones:
0 App back deployment thunk for Swift.withTaskCancellationHandler<A>(operation: () async throws -> A, onCancel: @Sendable () -> (), isolation: isolated Swift.Actor?) async throws -> A (in c67ad3b4-e9c7-32f7-acba-e2d1120a1dd3) + 100 at <compiler-generated>:0
1 App back deployment thunk for Swift.withTaskCancellationHandler<A>(operation: () async throws -> A, onCancel: @Sendable () -> (), isolation: isolated Swift.Actor?) async throws -> A (in c67ad3b4-e9c7-32f7-acba-e2d1120a1dd3) + 88 at <compiler-generated>:0
2 libswift_Concurrency.dylib swift::runJobInEstablishedExecutorContext(swift::Job*) (in 70c0165c-c3ba-3236-84d7-c145346c6ebd) + 252
No, if you're seeing a crash on iPhones the root cause of that crash is not related to the fix I linked to.
It's been understandably difficult for folks to avoid conflating these issues since they have similar looking symptoms, but there are two completely unrelated, known issues that result in crashes in the back deployment thunks of APIs in the _Concurrency
library:
- A miscompile caused iOS apps compiled without optimization running on macOS Sonoma or earlier to misevaluate availability checks, resulting in attempts to invoke functions that aren't present at runtime. This issue cannot occur in iOS apps running on an iOS runtime.
- The ABI of various back deployed
_Concurrency
library APIs changed (intentionally) over the course of the Xcode 16 beta. Apps using those APIs and compiled against newer SDKs are now incompatible with the OS runtimes of earlier betas and will crash.
If you check the OS versions associated with the crash reports and confirm that they all come from devices that are still running the older betas, then you are seeing (2). Those devices should be updated to the shipping software to avoid this issue.
We've been receiving crash reports with nearly identical stack traces from devices running beta versions of iOS 18.0 and 18.1.
The crash consistently occurs after app launch and the first call to a global function that uses withTaskCancellationHandler(operation:onCancel:isolation:)
. I'm surprised that the stack trace for the crash on iOS 18.1 also indicates the back-deployed thunk. I thought it made sense for older versions to show this crash but not for the newer one.
So far, we haven't been able to reproduce it ourselves unless we build the app for Mac (Designed for iPhone).
OS Version: iOS 18.1 (22B5007p)
Report Version: 104
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: SEGV_NOOP at 0x0000000000000004
Crashed Thread: 6
Application Specific Information:
Exception 1, Code 1, Subcode 4 >
KERN_INVALID_ADDRESS at 0x4.
Thread 6 Crashed:
0 NeoCore 0x106278cb8 $ss27withTaskCancellationHandler9operation8onCancel9isolationxxyYaKXE_yyYbXEScA_pSgYitYaKlFTwb
1 libswift_Concurrency.dylib 0x33c9c88d0 swift::runJobInEstablishedExecutorContext
Thanks so much for getting back so quickly. We had initially dismissed this explanation because of the 18.1 users hitting it and because there were a decently high volume, but we revisited after this message and this does appear to be the reason.
We tried switching to the old, deprecated function, but based on my cursory first look it seems like that is marked @_emitIntoClient
and just calls the old one so it's basically the same as calling the old one in the first place. Is there any workaround for this or are we stuck waiting for our users to upgrade/downgrading to Xcode 15 till things have stabilized?
One thing I'm curious about: can we work around this by copy-pasting the impl of withTaskCancellationHandler
into our app and calling that instead?
I'm seeing that it depends on two pieces of api from the runtime (swift_task_addCancellationHandler
and swift_task_removeCancellationHandler
) which based on the availability haven't changed since Swift 5.1, so it seems safe to do this.
Example:
public func withTaskCancellationHandler<T>(
operation: () async throws -> T,
onCancel handler: @Sendable () -> Void,
isolation: isolated (any Actor)? = #isolation
) async rethrows -> T {
// unconditionally add the cancellation record to the task.
// if the task was already cancelled, it will be executed right away.
let record = _taskAddCancellationHandler(handler: handler)
defer { _taskRemoveCancellationHandler(record: record) }
return try await operation()
}
@usableFromInline
@_silgen_name("swift_task_addCancellationHandler")
func _taskAddCancellationHandler(handler: () -> Void) -> UnsafeRawPointer /*CancellationNotificationStatusRecord*/
@usableFromInline
@_silgen_name("swift_task_removeCancellationHandler")
func _taskRemoveCancellationHandler(
record: UnsafeRawPointer /*CancellationNotificationStatusRecord*/
)
We'd probably add some code to gate running this or the "real" impl depending on whether we were on a beta or not.
Yeah, it's unintuitive but some of the iOS 18.1 betas also have the older ABI for these functions, too.
Yes, I haven't tried it but I'd expect this approach to successfully work around the issue in principle. Be aware, though, that using @_silgen_name
outside of the standard library adds some risk to your project since it's an unofficial feature that could change in the future. If you do choose to use this workaround I would make it as temporary as you can.
Thanks again for a quick reply!
Yeah, the two risks I'm aware of is that this might not be available on prior stdlib versions or that y'all will remove it in a future version.
The prior stdlib versions thing I think is covered by the annotation marking it as being added in 5.1. The future ones though definitely are cause for concern. Would it crash on launch if they're not present, or only if we hit a code path that calls them? Or would it be better to use a different mechanism like getting the function pointers with dlysm
so that if they're ever removed we can seamlessly fall back to the old version? (our current plan was to gate this on the specific build numbers that we know to be no good).
The declarations are ABI since they are referenced from inlinable code, so they can't be removed in the future. The risk with @_silgen_name
is specifically that the compiler's handling of the attribute at build time may change in the future, not that the binary you build with this technique could be made invalid by future changes to the standard library.
I think that release note is a bit misleading as the crash happens when users are on iOS 18, not MacOS for us.
Okay so the risk is that either
- future builds fail with a compiler error of some sort
- future builds succeed, but crash if we call this function, b/c the behavior silgen_name has changed
We'll probably go ahead if this is all we have to worry about. (1) is totally fine; presumably we can adopt the replacement with little trouble, and this might be a nonissue by then. (2) I beleive we can work around by feature flagging/gating our use of these functions and unit testing our usage to catch any potential issues, as long as it's not breaking on launch or as soon as we enter a function that might happen to call one of these functions.