[Concurrency] completion handler objc bridging interactions with build systems

I'm currently experiencing a very odd issue in an internal app at my workplace: adding a new module breaks an async func that was bridged from an objc completion handler, though no code in the module is actually run. All of the relevant code is compiled with the Swift 6.2.3 compiler in Swift 6 mode, and the crash only occurs when a specific dependency is built with Bazel; XcodeBuild builds of the dep work fine. Recently, it started crashing 100% deterministically, here is the relevant part of the stack trace:

#0	0x00000001003a9a0c in UnsafeContinuation.resume<>(returning:) ()
#1	0x00000001003a261c in _resumeUnsafeContinuation<()>(_:_:) ()
#2	0x00000001003a2510 in @objc completion handler block implementation for @escaping @callee_unowned @convention(block) @Sendable () -> () with result type () ()
#3	0x0000000112f811e4 in WTF::Detail::CallableWrapper<WTF::CompletionHandler<void (IPC::Connection*, IPC::Decoder*)> IPC::Connection::makeAsyncReplyCompletionHandler<Messages::WebCookieManager::SetCookie, WTF::CompletionHandler<void ()>>(WTF::CompletionHandler<void ()>&&, WTF::ThreadLikeAssertion)::'lambda'(IPC::Connection*, IPC::Decoder*), void, IPC::Connection*, IPC::Decoder*>::call ()
#4	0x0000000112db68d8 in WTF::Detail::CallableWrapper<WebKit::AuxiliaryProcessProxy::sendMessage(WTF::UniqueRef<IPC::Encoder>&&, WTF::OptionSet<IPC::SendOption, (WTF::ConcurrencyTag)0>, std::__1::optional<IPC::ConnectionAsyncReplyHandler>, WebKit::AuxiliaryProcessProxy::ShouldStartProcessThrottlerActivity)::$_1, void, IPC::Connection*, IPC::Decoder*>::call ()
#5	0x00000001133464f8 in IPC::Connection::dispatchMessage ()
#6	0x000000011336149c in WTF::Detail::CallableWrapper<IPC::Connection::enqueueIncomingMessage(WTF::UniqueRef<IPC::Decoder>)::$_2, void>::call ()
#7	0x000000010d671e98 in WTF::RunLoop::performWork ()
#8	0x000000010d673628 in WTF::RunLoop::performWork ()
#9	0x000000010cb5b3a4 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ ()

As you can see, it's a Swift continuation that's crashing, but it's not in our code, it's inside of a webkit function that originally was written with a completion handler but was bridged to async. This one, specifically.

I isolated the reproduction to a single line of code in an open source dependency: a call to this function in SafariServices.

This function is never called in my program. Its presence in the binary is enough to break setCookies. When I delete it, the crash simply vanishes.

My current theory is that clearWebsiteData() calls setCookie internally, and that that somehow interferes with Swift's ability to generate stubs for setCookie (As far as I'm aware this code is not part of the Webkit open source project, so I don't think I can confirm this myself). This theory only makes sense if we are generated stubs for the completion handler functions, but I haven't looked into whether that's how this Swift feature is implemented.

I also thought that perhaps causing SafariServices to be linked into my program might mess with WebKit in a +load or +initialize function, but merely importing SafariServices is not enough to repro as far as I can tell.

Are there any build flags we should be looking at that could explain this? Any pointers on how the compiler is implemented that might be helpful to debug this?

Sounds like this: Swift 6.1 runtime crash when calling @objc async protocol method in target with mixed Swift 5 and Swift 6 dependencies · Issue #81846 · swiftlang/swift · GitHub

Are you sure that all of your code is being compiled in Swift 6 mode, including any dependencies that are either statically linked or inlined into your binary? The issue (when I've seen it) happens because Swift 5 and Swift 6 code are both generating the same named symbol for the completion handler stub, but the ABI of that function changes depending on the language mode. The linker looks at all the copies of the symbols across the inputs to your binary and just picks one, which will be wrong for some code in your binary.

We've had success working around it by forcing checked continuations to be enabled for everyone regardless of language mode (-checked-async-objc-bridging=on); since we build (nearly) everything from source we can guarantee that (nearly) everything gets compatible symbols. But note the last comment in that issue for situations that still cause problems.

2 Likes

Thanks Tony. That was indeed the cause. Ironic that the original draft of my post read "The entire app is compiled with Swift 6," which should be the case. Time to write the linter we've been putting off.

We're currently having discussing turning on the Swift 6 default of this compiler flag (-checked-async-objc-bridging=on) for Swift 5 libs to make sure this doesn't happen happen in future. It seems like if we go the other way we'll end up breaking on the backdeploy issue you mentioned more and more as time goes on (assuming that apple continues to ship completion handler based apis that are backdeployed).

Curious if y'all had any issues when you turned this flag on. Our plan is to find all the symbols that match "objc completion handler block implementation" in the binary, and then make sure that when we run the code that produced them it doesn't break with the async bridging flag on.

Turning checked continuations on fixed all but one of our issues—it's actually what caused the breakage in the AVFoundation case, because AVFoundation's Swift overlay is compiled in Swift 5 mode (you can see this if you look at its .swiftinterface file in the SDK). So the SIL for the backdeployed method that gets copied from the .swiftmodule into your application uses unchecked continuations, but then it gets linked again the checked completion block, causing the crash. But at that point, we still wanted the safety of checked continuations, and a workaround existed (just don't use the async version of that API), so we left it.

It's entirely possible that Apple could release a future SDK where you have one framework that was built with Swift 5 mode a second framework built in Swift 6 mode, and both contain backdeployed wrappers of Obj-C completion block methods. In that case, there really isn't a good workaround—it needs to be fixed in the compiler.

Turning checked continuations on fixed all but one of our issues—it's actually what caused the breakage in the AVFoundation case

Yeah, the plan to find all the symbols and verify them after we turn the flag on comes from the warning at the end of your original post.

you can see this if you look at its .swiftinterface file in the SDK

Ah, so sounds like we could actually detect this potential breakage without even running the app. Maybe even run a check whenever we update Xcode/our base SDK to make sure we aren't linking two incompatible libraries.

Oh, that's unfortunate. It should be possible to change the mangling to incorporate whether the continuation-resuming callback is checked or unchecked, though---since these symbols are emitted with shared linkage, they are not part of the ABI.

1 Like

It would be great if this change is made.

I think it would come too late for us to avoid having to turn on the flag Tony mentioned and go through the work of verifying that we didn't regress, but at least nobody else will ever have to deal with this, and we won't have to verify that we didn't regress because a third party dep or Apple library happens to be using a different Swift version than we are.

since these symbols are emitted with shared linkage, they are not part of the ABI

For my curiosity, would that be as simple as making a hypothetical Swift 6.2.x use a suffix other than TZ for the mangled name and maintainers remembering to keep adding new ones if the internals of the generated code changed, or would there be some more involved future proof change like including the Swift Version in things that aren't part of the ABI?