The problem
By default, Swift interprets Objective-C functions with block completion handlers as async APIs. Suppose that we have an Objective-C function with a signature like this:
- (void) fooWithBar: (NSString* __nonnull) bar
completion: (void (^)(NSString* __nonnull)) completion;
Swift will import this as:
func foo(bar: String) async -> String
If you call this API from a Swift async function, Swift will generate essentially this code pattern:
let result = withUnsafeContinuation { continuation in
object.foo(bar: arg._bridgeToObjectiveC(), completion: { result in
continuation.resume(returning: String._unconditionallyBridgeFromObjectiveC(result))
})
}
If you implement this signature from Swift, Swift will generate a "thunk" with essentially this code pattern:
@objc func foo(bar bridgedArg: NSString, completion: @convention(block) (NSString) -> Void)) {
let arg = String._unconditionallyBridgeFromObjectiveC(bridgedArg)
Task {
let result = await self.foo(bar: arg)
completion(result._bridgeToObjectiveC())
}
}
Unfortunately, this means that, when a call using the first pattern ends up calling a thunk generated from the second pattern, we end up dynamically losing the original Task structure, despite both sides being implemented in Swift. There's two main problems with this:
- We break the dynamic structure of the original task. Effectively, this part of the task becomes an unstructured subtask, but worse: not only can we not propagate cancellation to the new task, but because we end up blocking on it through a continuation, we can't even propagate priority changes.
- We incur some significant dynamic overheads. For one, we have to allocate and set up a new, independent task, including making a new async stack instead of working on the calling task's stack. But also, enqueuing the new task and resuming the old task are both asynchronous operations, so we end up with a context switch on both sides.
This pattern of Swift calling Swift through an Objective-C API can happen in several ways. Even within a single module, it can happen if the call goes through an Objective-C protocol with an async requirement. But it can also happen if a framework that was previously implemented in Objective-C changes to a Swift implementation, which is something we expect to see more and more of.
Proposed code pattern
Now, we cannot propagate async task structure just because we happen to be running on a task. It's very important here that the caller is actually blocking waiting for this call to finish; without that, trying to make the async operation of the callee happen on the current task will actually introduce concurrency with task, which will deeply corrupt the task structures. Instead, we need to "handshake" somehow across the call to establish that the caller and callee are cooperating. I'll explain how we can do that later. How do we actually want this to execute?
Let's break down the code pattern:
- The caller has to bridge the arguments.
- The caller has to enter a continuation.
- The caller has to create a block to serve as a completion handler.
- The caller has to call the Objective-C entrypoint.
- The caller has to await its continuation.
- The callee has to unbridge the arguments.
- The callee has to create a closure to run as a task.
- The callee has to create the task.
- The task function has to make the native async call.
- The task function has to bridge the result.
- The task function has to call the completion handler.
- The completion handler has to unbridge the result.
- The completion handler has to resume the continuation.
In an ideal world, the handshake would allow us to perform the call directly, skipping everything here except the native async call. That is unrealistic; the handshake has to be triggered within the callee based on information passed to it by the caller. So at minimum, everything the caller does has to actually happen. Trying to skip other steps dynamically would greatly increase the code-size and complexity of these code sequences, which are fairly common.
A simpler approach is to leave most of the control and data flow alone, including the bridging, and just let the runtime intervene in specific places:
- In step 2, the caller will call a new runtime function that enters a "foreign continuation". This contains a native continuation, but also has space to track the success of the handshake and to potentially store the task function.
- In step 8, the callee will call a new runtime function that tries to perform the handshake. If the handshake succeeds, as part of the handshake, the runtime function will be able to find the foreign continuation. The runtime function will mark the success in the handshake in the foreign continuation and store the task function there. Otherwise, the runtime function will just start a new task to run the task function.
- In step 5, the caller will call a new runtime function that awaits the foreign continuation. This function will check if the handshake succeeded. If so, it will retrieve the task function and start it running, setting it up so it will return to the resumption point in the caller. Otherwise, it will flag that the handshake failed and await the native continuation.
- In step 13, the completion handler will call a new runtime function that resumes the foreign continuation. If the handshake succeeded, this will flag that the continuation has been resumed. Otherwise, it will resume the native continuation.
Note that it is important for the execution of the task function to get deferred back to the caller so that we don't accumulate a C stack frame. (The runtime function to await a continuation is an async
funclet that is always tail-called, so this doesn't accumulate anything.)
Note that manipulation of the foreign continuation can be surprising, even concurrent, if there's an intermediate function between the caller and callee that forwards the continuation in a surprising way. Whether the handshake actually occurs in such a situation is unspecified. It is okay for the handshake to occur in this situation; the correctness of the handshake protocol relies only on the caller and callee atomically agreeing whether to defer execution back to the caller. To an intermediate function, deferring execution effectively behaves like the main body of the callee (the task function) was just scheduled to run concurrently, which for an async function is of course allowed. Note that this is another reason why it's necessary to defer execution back to the caller is critical: the callee may not actually be running in the context of the caller's task at all. (There may be some priority-inversion risk here, but it's unsolvable.)
This protocol should require emitting more or less the same amount of code as today. The runtime functions will be new, so back-deployment support will require some additional code size when targeting old runtimes. (The back-deployed implementations will probably just unconditionally fail to handshake, which I think is okay.)
Proposed handshake
There's really only way to perform the handshake: it has to be somehow recorded in the completion handler block. Fortunately, blocks have a very flexible layout with a lot of space for new metadata in the block descriptor. We've talked about wanting to recognize several different kinds of blocks, so my suggestion is this:
- There is a bit in the block flags saying that there's one or more
Block_info
objects in the block descriptor. - The
Block_info
objects go at the end of the block descriptor in order of increasing kind. - Each
Block_info
is a variable-width object that starts with asize_t
of flags:- Bits 0-14 are the kind.
- Bit 15 indicates whether this is the last
Block_info
(0) or followed by another (1). - Bits 16-N are reserved for the specific kind of
Block_info
.
Block_info
kind 0 means "this is a Swift continuation block". The reserved storage holds the offset (insizeof(void*)
units) of a pointer to the foreign continuation within the block object.Block_info
kind 1 means "this block synchronously delegates to another block". The reserved storage holds the offset (insizeof(void*)
units) of the delegate block pointer within the block object.
Since Block_info
objects are variable-width, the runtime must interpret each object to get to this next. This is fine when looking for a specific kind because any particular OS should know about the first k kinds and won't be looking for a kind beyond them.
Step 3 should create a block with a continuation-block Block_info
. We don't currently have plans to emit synchronous-delegate Block_info
s by default, but it might become interesting in the future; at any rate, the runtime should make an effort to look through it when trying to make the handshake.
Final notes
The handshake runtime function (step 8) will need to be passed the completion handler block. It would be a nice micro-optimization to then pass the completion handler to the task function, because that would allow the copy of the block (normally a required step when capturing a block in an escaping function, but arguably unnecessary in the handshake case) to be performed by the runtime, reducing code size.
In this design, the closure object for the task function has to be allocated separately. It'd be nice if it could be allocated locally on the task, but I think that would add a lot of complexity and code size. We should at least make an effort to make it callee-owned, though.