Round-tripping Swift async tasks through Objective-C interfaces

John_McCall · November 15, 2024, 1:57am

The problem

By default, Swift interprets Objective-C functions with block completion handlers as async APIs. Suppose that we have an Objective-C function with a signature like this:

- (void) fooWithBar: (NSString* __nonnull) bar
         completion: (void (^)(NSString* __nonnull)) completion;

Swift will import this as:

func foo(bar: String) async -> String

If you call this API from a Swift async function, Swift will generate essentially this code pattern:

let result = withUnsafeContinuation { continuation in
  object.foo(bar: arg._bridgeToObjectiveC(), completion: { result in
    continuation.resume(returning: String._unconditionallyBridgeFromObjectiveC(result))
  })
}

If you implement this signature from Swift, Swift will generate a "thunk" with essentially this code pattern:

@objc func foo(bar bridgedArg: NSString, completion: @convention(block) (NSString) -> Void)) {
  let arg = String._unconditionallyBridgeFromObjectiveC(bridgedArg)
  Task {
    let result = await self.foo(bar: arg)
    completion(result._bridgeToObjectiveC())
  }
}

Unfortunately, this means that, when a call using the first pattern ends up calling a thunk generated from the second pattern, we end up dynamically losing the original Task structure, despite both sides being implemented in Swift. There's two main problems with this:

We break the dynamic structure of the original task. Effectively, this part of the task becomes an unstructured subtask, but worse: not only can we not propagate cancellation to the new task, but because we end up blocking on it through a continuation, we can't even propagate priority changes.
We incur some significant dynamic overheads. For one, we have to allocate and set up a new, independent task, including making a new async stack instead of working on the calling task's stack. But also, enqueuing the new task and resuming the old task are both asynchronous operations, so we end up with a context switch on both sides.

This pattern of Swift calling Swift through an Objective-C API can happen in several ways. Even within a single module, it can happen if the call goes through an Objective-C protocol with an async requirement. But it can also happen if a framework that was previously implemented in Objective-C changes to a Swift implementation, which is something we expect to see more and more of.

Proposed code pattern

Now, we cannot propagate async task structure just because we happen to be running on a task. It's very important here that the caller is actually blocking waiting for this call to finish; without that, trying to make the async operation of the callee happen on the current task will actually introduce concurrency with task, which will deeply corrupt the task structures. Instead, we need to "handshake" somehow across the call to establish that the caller and callee are cooperating. I'll explain how we can do that later. How do we actually want this to execute?

Let's break down the code pattern:

The caller has to bridge the arguments.
The caller has to enter a continuation.
The caller has to create a block to serve as a completion handler.
The caller has to call the Objective-C entrypoint.
The caller has to await its continuation.
The callee has to unbridge the arguments.
The callee has to create a closure to run as a task.
The callee has to create the task.
The task function has to make the native async call.
The task function has to bridge the result.
The task function has to call the completion handler.
The completion handler has to unbridge the result.
The completion handler has to resume the continuation.

In an ideal world, the handshake would allow us to perform the call directly, skipping everything here except the native async call. That is unrealistic; the handshake has to be triggered within the callee based on information passed to it by the caller. So at minimum, everything the caller does has to actually happen. Trying to skip other steps dynamically would greatly increase the code-size and complexity of these code sequences, which are fairly common.

A simpler approach is to leave most of the control and data flow alone, including the bridging, and just let the runtime intervene in specific places:

In step 2, the caller will call a new runtime function that enters a "foreign continuation". This contains a native continuation, but also has space to track the success of the handshake and to potentially store the task function.
In step 8, the callee will call a new runtime function that tries to perform the handshake. If the handshake succeeds, as part of the handshake, the runtime function will be able to find the foreign continuation. The runtime function will mark the success in the handshake in the foreign continuation and store the task function there. Otherwise, the runtime function will just start a new task to run the task function.
In step 5, the caller will call a new runtime function that awaits the foreign continuation. This function will check if the handshake succeeded. If so, it will retrieve the task function and start it running, setting it up so it will return to the resumption point in the caller. Otherwise, it will flag that the handshake failed and await the native continuation.
In step 13, the completion handler will call a new runtime function that resumes the foreign continuation. If the handshake succeeded, this will flag that the continuation has been resumed. Otherwise, it will resume the native continuation.

Note that it is important for the execution of the task function to get deferred back to the caller so that we don't accumulate a C stack frame. (The runtime function to await a continuation is an async funclet that is always tail-called, so this doesn't accumulate anything.)

Note that manipulation of the foreign continuation can be surprising, even concurrent, if there's an intermediate function between the caller and callee that forwards the continuation in a surprising way. Whether the handshake actually occurs in such a situation is unspecified. It is okay for the handshake to occur in this situation; the correctness of the handshake protocol relies only on the caller and callee atomically agreeing whether to defer execution back to the caller. To an intermediate function, deferring execution effectively behaves like the main body of the callee (the task function) was just scheduled to run concurrently, which for an async function is of course allowed. Note that this is another reason why it's necessary to defer execution back to the caller is critical: the callee may not actually be running in the context of the caller's task at all. (There may be some priority-inversion risk here, but it's unsolvable.)

This protocol should require emitting more or less the same amount of code as today. The runtime functions will be new, so back-deployment support will require some additional code size when targeting old runtimes. (The back-deployed implementations will probably just unconditionally fail to handshake, which I think is okay.)

Proposed handshake

There's really only way to perform the handshake: it has to be somehow recorded in the completion handler block. Fortunately, blocks have a very flexible layout with a lot of space for new metadata in the block descriptor. We've talked about wanting to recognize several different kinds of blocks, so my suggestion is this:

There is a bit in the block flags saying that there's one or more Block_info objects in the block descriptor.
The Block_info objects go at the end of the block descriptor in order of increasing kind.
Each Block_info is a variable-width object that starts with a size_t of flags:
- Bits 0-14 are the kind.
- Bit 15 indicates whether this is the last Block_info (0) or followed by another (1).
- Bits 16-N are reserved for the specific kind of Block_info.
Block_info kind 0 means "this is a Swift continuation block". The reserved storage holds the offset (in sizeof(void*) units) of a pointer to the foreign continuation within the block object.
Block_info kind 1 means "this block synchronously delegates to another block". The reserved storage holds the offset (in sizeof(void*) units) of the delegate block pointer within the block object.

Since Block_info objects are variable-width, the runtime must interpret each object to get to this next. This is fine when looking for a specific kind because any particular OS should know about the first k kinds and won't be looking for a kind beyond them.

Step 3 should create a block with a continuation-block Block_info. We don't currently have plans to emit synchronous-delegate Block_infos by default, but it might become interesting in the future; at any rate, the runtime should make an effort to look through it when trying to make the handshake.

Final notes

The handshake runtime function (step 8) will need to be passed the completion handler block. It would be a nice micro-optimization to then pass the completion handler to the task function, because that would allow the copy of the block (normally a required step when capturing a block in an escaping function, but arguably unnecessary in the handshake case) to be performed by the runtime, reducing code size.

In this design, the closure object for the task function has to be allocated separately. It'd be nice if it could be allocated locally on the task, but I think that would add a lot of complexity and code size. We should at least make an effort to make it callee-owned, though.

ole · November 18, 2024, 5:57pm

~~Quick question for my understanding: there's an await missing in the let result = self.foo(bar: arg) line, correct?~~

Edit: has been fixed above.

John_McCall · November 18, 2024, 6:02pm

Yes, that's right. I'll fix the original post.

Nickolas_Pohilets · November 18, 2024, 7:54pm

Can we use ObjC runtime to discover Swift callee by the Swift caller? This would bypass all steps 1-13 and could be used for non-async cases as well?

Something like this:

let result: String
if let swiftImpl = objc_getSwiftImpl(object, #selector(fooWithBar:completion:)) {
    result = await unsafeBitCast(swiftImpl, (@convention(thin) (AnyObject, String) async -> String).self)(object, arg)
} else {
    result = withUnsafeContinuation { continuation in
        object.foo(bar: arg._bridgeToObjectiveC(), completion: { result in
            continuation.resume(returning: String._unconditionallyBridgeFromObjectiveC(result))
        })
    }
}

EDIT:
But this would not solve the case when there are intermediate ObjC calls. If supporting this case is indeed a goal, then handshake mechanism is still necessary.

Also thread-local variables can be used for this. Using thread-locals would eliminate possibility of the concurrent handshake, but would trivially support nesting continuation blocks, without the need for complicated analysis like "this block synchronously delegates to another block".

John_McCall · November 18, 2024, 8:12pm

I actually talked about that in the post. The concurrency runtime already does track that a task is running, but we cannot treat all Objective-C async calls that happen while a task is running as if they were blocking the task because that could effectively create concurrency within the task. A handshake through the completion handler is the best way I can think of to tie these things together — it ties the non-concurrency of the task to the completion handler's intrinsic called-once nature, which all code can be assumed to be cooperating with. The callee knows through the handshake that calling the completion handler is the unique event that will unblock the caller and the task it's running on, and that makes the callee effectively part of the task.

Nickolas_Pohilets · November 18, 2024, 8:54pm

@John_McCall, were you responding to my post? If so, then there seems to be some misunderstanding.

Idea with objc_getSwiftImpl() allows to skip handshake only in simple cases when there is no Objective-C code involved at all:

@objc protocol Foo {
    func foo(bar: String) async -> String
}

class Impl: Foo {
    func foo(bar: String) async -> String {
        return bar + bar
    }
}

func useIt(foo: any Foo) async {
    print(await foo.foo(bar: "hello"))
}

If there is actual ObjC code involved, then I completely agree that handshake is necessary.

How does runtime function from step 8 obtain the foreign continuation? Do I understand correctly that it reads it from the completion block?

If so, then, until Block_info kind 1 is implemented, it will work only as long the original block is passed to the callee. If there is intermediate ObjC code, that passes to the Swift callee different completion block (wrapping the original one), then foreign continuation will not be found.

@interface Wrapper: Foo
@property(nonatomic, readonly) id<Foo> base;
- (instancetype)initWithBase:(id<Foo>)base;
@end

@implementation Wrapper

- (void)fooWithBar:(NSString *)bar completion:(void(^)(NSString *result))completion {
    NSLog(@"Begin");
    [self.base fooWithBar:bar completion:^(NSString *result){ // <- This block has no Block_info, right?
        completion(result);    
        NSLog(@"End");
    }];
}
@end

John_McCall · November 18, 2024, 10:19pm

Yes, that's understood — wrapping the completion handler will break the chain unless the wrapper ends up with metadata that allows the handshake to occur through it. I think in the long run that's a viable approach, though; at worst, it would just require some simple explicit attribute on the block.

I'll confess that I didn't really consider the approach of doing a separate method lookup. I think the code size, memory, and speed costs would be prohibitive, though. If we want to optimize calls in general to bypass bridging and other ObjC overheads, static techniques seem like the way to go — just declare that a Swift interface is available in OS version x.y and automatically call the best one dynamically available.

The handshake approach also works for non-ObjC async calls, which seems like a useful future direction when we're talking about libraries gradually being reimplemented in Swift.

Nickolas_Pohilets · November 18, 2024, 10:35pm

Using thread-locals to propagate foreign continuation does not have this problem.

Called creates foreign continuations and sets it to a thread-local value. Then calls ObjC method. Then resets continuation.

At step 8 runtime tries to find the foreign continuation in the task-local value.

So foreign continuation is available to all Objective-C code running within that scope and only on the current thread. Depth of the stack between Swift caller and Swift callee does not matter. Example above with Wrapper will work. And implementation does not need atomic operations.

John_McCall · November 18, 2024, 10:45pm

You cannot make the assumption that an arbitrary ObjC async call initiated by an ObjC async function will finish before the ObjC async function calls its completion handler. The handshake is necessary because it ties this specifically to the completion handler and its future execution.

Continuations already must use atomics internally. I think we can avoid needing additional atomics.

Nickolas_Pohilets · November 19, 2024, 10:49am

Yes, indeed. Using thread-locals to pass foreign continuation only helps to avoid atomics in steps 2 and 8, and the first check in steps 5 and 13 ("if the handshake succeeded").

In case of successful handshake, after running the task function we still need to wait for the completion block to finish. So in step 5 we need to await for the native continuation on all code paths. Right?

This implies that in step 13 we should resume native continuation both when handshake was successful and when it was not. So we don't really need to check for handshake status in step 13.