SIL representations for async functions

I apologize for posting this before I post the concurrency proposal, but I think these questions can be usefully discussed now.

It seems certain that Swift will add async/await. One question, then, is how to represent these functions in SIL. For that, we need to discuss the ways that async functions are different from normal functions:

Function splitting

The most important difference is that async functions are expected to be broken up into multiple partial functions in a coroutine-style transformation. But we explicitly don't want to model this in SIL because it would interfere with optimizations that we very much want to do. For example, we should be able to do normal copy/destroy optimizations within async functions by analyzing things from the perspective of the async function. We maintain the normal function structure until we do coroutine splitting, which means all the way through IRGen — which means SIL doesn't need to care about it at all.

The only subtlety here is the possibility of well-defined interleaving of side-effects at the function's potential suspension points, but that's not fundamentally different from the possibility of well-defined interleaving of side-effects during calls the function makes. We just need to make sure that all potential suspension points are treated as conservatively as calls. Since suspension points are generally calls, that shouldn't be a problem.

async calls

async functions can also call other async functions. At the SIL level, we (mostly) don't need to treat these differently from calls to synchronous functions, because by default these calls are synchronous from the perspective of the function, which means they can be adequately modeled with ordinary apply, try_apply, begin_apply (should we decide to support that), etc. And that's really good, because the last thing we want to do is introduce yet another orthogonal axis of function application.

Actor references

async functions are implicitly parameterized by an (optional) actor that they have to run on. This actor reference is carried dynamically by first-class async function values. For calls that aren't to first-class values, we need to derive it from the context of the call somehow, in some function-specific way. What that probably means for SIL is that we need something more than function_ref to get a reference to an async function. We then treat the actor reference as something we can derive from an async function value in SIL, and in IRGen we just track the actor reference as part of the lowered value.

Actor switching optimization

We want to be able to optimize async functions intelligently so that we don't enqueue work unnecessarily onto actors. For example, if an async function that's semantically tied to an actor starts by making a call to a different actor, we want calls to that function to just initiate that call without switching actors. I think the right idea here is probably just to (1) make sure that we can easily recover from SIL when code needs to be running on an actor and then (2) represent that in IRGen in a way that coroutine splitting can recover and turn into the right form to be used dynamically. If (1) is possible, we don't need extra support from SIL here. I don't know if it's conclusive whether (1) is possible.

Actor function constraints

We're going to have to infer when functions are constrained to run on a specific actor because of the things they do. It's possible that this is doable entirely in the type-checker, but it's also possible that we'll need to do it as a SIL analysis (if it's interprocedural / data-flow dependent?). So maybe this combines with the point above about optimization to make it a harder requirement that we represent actor dependencies explicitly in SIL.

Task management

I think task creation, cancellation info, etc. should all be pretty straightforward to embed with intrinsics, but maybe we'll need to be more intelligent about nesting.

Accessing continuations

async functions need to be able to cleanly interoperate with functions that use callbacks. The current design proposes doing this with some sort of withUnsafeContinuation library function, with a prototype like this:

func withUnsafeContinuation<T>(operation: (UnsafeContinuation<T>) -> ()) async -> T

Note that operation is a synchronous function that's allowed to do whatever it wants to the continuation value as long as it eventually resumes it exactly once. Resuming the continuation value logically transfers control to the "continuation point", i.e. the point of returning from withUnsafeContinuation. Generally, operation is also supposed to return, or else it ties up the thread uselessly.

I was going around in circles about how to represent this in SIL without completely blocking optimization. The problem is that somewhere within operation we have control flow to the continuation point, and the start of that is basically opaque — we have no idea when we might trigger the continuation to run. But then I realized that there's second major problem here: if resuming the continuation immediately starts running code associated with the continuation point, we're actually potentially running that concurrently with operation, which means we might be doing terrible things to captured local variables. So we really need to block resumption of the continuation until we know we've returned out of operation. But hey, if we do that, then from the perspective of the async function the control flow is totally synchronous:

  1. We start running operation.
  2. operation might kick off arbitrary asynchronous work, but so might any other call we make.
  3. We return out of operation.
  4. We block waiting for the continuation to get called.
  5. We get resumed at the continuation point.

So we just need the representation to:

  • tie the start of operation to the continuation point,
  • allow us to insert the code necessary to block resuming the continuation until operation returns,
  • reflect that the continuation point has arbitrary side-effects like a call, and
  • make sure we don't do async stuff in the middle, which would really mess us up.

So it can be something like:

  (%token, %continuation) = begin_async_with_continuation $T
  apply %operation(%continuation) : $(UnsafeContinuation<T>) -> () // totally inlinable!
  %result = end_async_with_continuation %token

And then we structurally disallow suspension points, return, and throw after begin_async_with_continuation but before end_async_with_continuation. If somehow the end_async_with_continuation becomes unreachable, that's fine — I mean, it probably means the user's code is really broken, but we just need to recognize it and generate a bogus continuation, and that still semantically makes sense because the rule is that the continuation can't start running until operation returns, and hey, for some reason it didn't return.

35 Likes

I've been thinking about this part of the semantic a lot (more than I'd like to admit) about running the continuation code "right at the resume" (#1) vs "after resume+the return from withUnsafeContinuation" (#2).

What I'm concerned about #2 is that it uses the actor that calls the withUnsafeContinuation (well, someone has to block & wait), and so assumes that the actor is available. As far as I'm aware of the designs in this direction, it fails at the first call since you return the control to the caller of the beginAsync (borrowed from Chris' design) at the call to withUnsafeContinuation. Is this still the case for the current (potential) design, or is a new actor supplied at the call to beginAsync (which will necessarily be shared across the entire async block)?

Where is the current design on this? I searched the GitHub repo for withUnsafeContinuation and didn't get any hits.

1 Like

There’s no beginAsync in the current design. What you’re describing is not an issue.

1 Like

I called myself out for that at the start of the post. It’ll be “soon”. I figured people could probably figure out the relevant points enough to follow along if they really had input to make about SIL design.

Again, this is a SIL / implementation design thread.

1 Like

Yeah, I was having a little trouble understanding what a use site of withUnsafeContinuation would look like, and was hoping to see where it's introduced. I think I get it though:

let result = await withUnsafeContinuation { continuation in
  DispatchQueue.global.async {
    continuation(computeValue())
  }
}

And ideally that call to DispatchQueue.async gets inlined into the containing function, since it's entirely synchronous. And then both new instructions are necessary, since we need to make the continuation, then use it in some way with normal existing SIL instructions, then wait for the continuation to be resumed. Using a separate %token and %continuation allows us to guarantee that the token is consumed exactly once, and having a special end_async_with_continuation shows where to split the function.

Although you mentioned that we don't want to split SIL functions, I wonder if it would actually be beneficial to do the splitting as a SIL->SIL pass, which would allow us to write SIL optimizations on the split form as well. But then we'd lose out on LLVM's coroutine support.

2 Likes

You got it exactly.

Splitting in SIL would also require SIL to have unscoped versions of all the SIL instructions that influence optimization/IRGen, e.g. begin_access.

3 Likes

Thanks for working through this John. I've tried to integrate this design into the SIL reference documentation here:

4 Likes

The more I think about it, the more I think active executor requirements should be explicit rather than contextual in SIL. So in the AST, we would say that a particular function — on a function-wide basis — might have a constraint to run on a particular actor, but in SIL we would have an instruction that represents a potential suspension point, i.e. by forcing execution to switch to a particular actor. We would emit that instruction before running any actor-restricted code, either eagerly in SILGen or maybe retroactively in a SIL pass. That would allow us to freely inline and optimize across suspension points, and it would let us explicitly reason about suspensions that we can eliminate.

We'd still want to proactively establish structural actor-correctness, so that we can verify statically that all actor-restricted operations are being executed on the right actor. As part of that, we'd want to able to say explicitly that a function is known to be running on a particular actor on entry, or required to be running on that actor on exit.

8 Likes

This makes a lot of sense to me,

-Chris

1 Like