I apologize for posting this before I post the concurrency proposal, but I think these questions can be usefully discussed now.
It seems certain that Swift will add
await. One question, then, is how to represent these functions in SIL. For that, we need to discuss the ways that
async functions are different from normal functions:
The most important difference is that
async functions are expected to be broken up into multiple partial functions in a coroutine-style transformation. But we explicitly don't want to model this in SIL because it would interfere with optimizations that we very much want to do. For example, we should be able to do normal copy/destroy optimizations within
async functions by analyzing things from the perspective of the
async function. We maintain the normal function structure until we do coroutine splitting, which means all the way through IRGen — which means SIL doesn't need to care about it at all.
The only subtlety here is the possibility of well-defined interleaving of side-effects at the function's potential suspension points, but that's not fundamentally different from the possibility of well-defined interleaving of side-effects during calls the function makes. We just need to make sure that all potential suspension points are treated as conservatively as calls. Since suspension points are generally calls, that shouldn't be a problem.
async functions can also call other
async functions. At the SIL level, we (mostly) don't need to treat these differently from calls to synchronous functions, because by default these calls are synchronous from the perspective of the function, which means they can be adequately modeled with ordinary
begin_apply (should we decide to support that), etc. And that's really good, because the last thing we want to do is introduce yet another orthogonal axis of function application.
async functions are implicitly parameterized by an (optional) actor that they have to run on. This actor reference is carried dynamically by first-class
async function values. For calls that aren't to first-class values, we need to derive it from the context of the call somehow, in some function-specific way. What that probably means for SIL is that we need something more than
function_ref to get a reference to an
async function. We then treat the actor reference as something we can derive from an
async function value in SIL, and in IRGen we just track the actor reference as part of the lowered value.
Actor switching optimization
We want to be able to optimize
async functions intelligently so that we don't enqueue work unnecessarily onto actors. For example, if an
async function that's semantically tied to an actor starts by making a call to a different actor, we want calls to that function to just initiate that call without switching actors. I think the right idea here is probably just to (1) make sure that we can easily recover from SIL when code needs to be running on an actor and then (2) represent that in IRGen in a way that coroutine splitting can recover and turn into the right form to be used dynamically. If (1) is possible, we don't need extra support from SIL here. I don't know if it's conclusive whether (1) is possible.
Actor function constraints
We're going to have to infer when functions are constrained to run on a specific actor because of the things they do. It's possible that this is doable entirely in the type-checker, but it's also possible that we'll need to do it as a SIL analysis (if it's interprocedural / data-flow dependent?). So maybe this combines with the point above about optimization to make it a harder requirement that we represent actor dependencies explicitly in SIL.
I think task creation, cancellation info, etc. should all be pretty straightforward to embed with intrinsics, but maybe we'll need to be more intelligent about nesting.
async functions need to be able to cleanly interoperate with functions that use callbacks. The current design proposes doing this with some sort of
withUnsafeContinuation library function, with a prototype like this:
func withUnsafeContinuation<T>(operation: (UnsafeContinuation<T>) -> ()) async -> T
operation is a synchronous function that's allowed to do whatever it wants to the continuation value as long as it eventually resumes it exactly once. Resuming the continuation value logically transfers control to the "continuation point", i.e. the point of returning from
operation is also supposed to return, or else it ties up the thread uselessly.
I was going around in circles about how to represent this in SIL without completely blocking optimization. The problem is that somewhere within
operation we have control flow to the continuation point, and the start of that is basically opaque — we have no idea when we might trigger the continuation to run. But then I realized that there's second major problem here: if resuming the continuation immediately starts running code associated with the continuation point, we're actually potentially running that concurrently with
operation, which means we might be doing terrible things to captured local variables. So we really need to block resumption of the continuation until we know we've returned out of
operation. But hey, if we do that, then from the perspective of the
async function the control flow is totally synchronous:
- We start running
operationmight kick off arbitrary asynchronous work, but so might any other call we make.
- We return out of
- We block waiting for the continuation to get called.
- We get resumed at the continuation point.
So we just need the representation to:
- tie the start of
operationto the continuation point,
- allow us to insert the code necessary to block resuming the continuation until
- reflect that the continuation point has arbitrary side-effects like a call, and
- make sure we don't do
asyncstuff in the middle, which would really mess us up.
So it can be something like:
(%token, %continuation) = begin_async_with_continuation $T apply %operation(%continuation) : $(UnsafeContinuation<T>) -> () // totally inlinable! %result = end_async_with_continuation %token
And then we structurally disallow suspension points,
begin_async_with_continuation but before
end_async_with_continuation. If somehow the
end_async_with_continuation becomes unreachable, that's fine — I mean, it probably means the user's code is really broken, but we just need to recognize it and generate a bogus continuation, and that still semantically makes sense because the rule is that the continuation can't start running until
operation returns, and hey, for some reason it didn't return.