SE-0300: Continuations for interfacing async tasks with synchronous code

Joe_Groff · January 21, 2021, 8:19pm

The checks (both traps and warnings) should happen at all optimization levels. I can make this explicit in the proposal.

The queue from which resume is invoked is irrelevant, since it only transitions the task out of the suspended state; tasks already know what executor they're associated with, and will be scheduled on whatever queue/runloop/thread/other scheduling mechanism their current executor uses. If an API takes a queue argument on which to run its completion handler, it's probably best to pass the queue that the code dispatching the completion handler will run on, if you know it. In cases when we know a task's work can be safely enqueued on a specific dispatch, to allow the queue hops to be minimized, we could perhaps provide some API on Task to get that queue for the current task, and have an unsafeResumeImmediately variant of API on *Continuation that immediately resumes execution of the task on the current thread, relying on the assumption that the code invoking resume was already executed by the correct queue. This would be a very sharp knife, though, and using it incorrectly could lead to really subtle problems that'd be tough to diagnose, so I think it'd be good to take a wait-and-see approach to see if queue-hopping becomes a bottleneck in practice. Despite the API recommendations, nearly all of the completion-handler-based APIs in Apple's frameworks do not take a queue argument, and there's not a lot of consistency in how completion handlers are invoked, so very few Apple APIs would be able to take advantage of such an optimization today. And as more code adopts Swift's native tasks and async functions, the Swift compiler and runtime's own optimizations for avoiding expensive queue hops will hopefully reduce overall overhead in time.

As currently implemented, UnsafeContinuation is a plain pointer to the task structured to be resumed, whereas CheckedContinuation allocates a class instance to hold the task pointer, so compared to UnsafeContinuation, it'll have ARC operations applied to it when copied around, and will use atomic operations to "take" the task pointer from the instance the first time it's resumed so that the double-resume protection is thread safe. When we have move-only types, we should be able to make a safe
Continuation type that has to be resume-d in order to dispose of it; before then, I don't see an obvious way to avoid the overhead in the language today. However, even with move-only types, since a lot of the
value of this API is interop with existing C and ObjC APIs, I wonder whether a move-only continuation type would be very practical to use, or just force the use of unsafe escape hatches down a layer.

I can reword the proposal to make this clearer. The distinction that we're trying to make is between resume happening after the with*Continuation block has finished executing and the task has suspended, and resume happening while the with*Continuation block is still executing. An example of the latter would be:

await withUnsafeContinuation { c in
  if condition {
    // Suspend the task and resume it later
    doSomethingAsynchronously(completion: { c.resume(returning: ()) })
  } else {
    // Resume the task immediately
    c.resume(returning: ())

    // According to rule 2, we can't do anything after `resume` if we run
    // it immediately
  }
}

I'll amend the text to make this clearer.

The error propagation from with*Continuation is either-or; if you specified a more specific Error type, it would have no type system impact on what the caller sees when withUnsafeThrowingContinuation throws. Task.Handle's error parameterization seems a bit suspect to me for this reason too; in practice it can only ever be Never or Error.

It's a leak, and potentially a deadlock if other work relies on the abandoned task finishing, but it's not a memory safety violation. If the task is never resumed, it'll sit around forever holding onto whatever resources it was using when it suspended. My bigger concern about making leaking a task trap is the lack of determinism in where that trap happens; one could miss a trap because naive -Onone ARC put off destroying the checked continuation wrapper until the process ends for other reasons, and then end up shipping a -O build that cuts the lifetime shorter and exposes the trap.

Jon_Shier · January 21, 2021, 8:45pm

Joe_Groff:

The queue from which resume is invoked is irrelevant, since it only transitions the task out of the suspended state; tasks already know what executor they're associated with, and will be scheduled on whatever queue/runloop/thread/other scheduling mechanism their current executor uses. If an API takes a queue argument on which to run its completion handler, it's probably best to pass the queue that the code dispatching the completion handler will run on, if you know it. In cases when we know a task's work can be safely enqueued on a specific dispatch, to allow the queue hops to be minimized, we could perhaps provide some API on Task to get that queue for the current task, and have an unsafeResumeImmediately variant of API on *Continuation that immediately resumes execution of the task on the current thread, relying on the assumption that the code invoking resume was already executed by the correct queue. This would be a very sharp knife, though, and using it incorrectly could lead to really subtle problems that'd be tough to diagnose, so I think it'd be good to take a wait-and-see approach to see if queue-hopping becomes a bottleneck in practice. Despite the API recommendations, nearly all of the completion-handler-based APIs in Apple's frameworks do not take a queue argument, and there's not a lot of consistency in how completion handlers are invoked, so very few Apple APIs would be able to take advantage of such an optimization today. And as more code adopts Swift's native tasks and async functions, the Swift compiler and runtime's own optimizations for avoiding expensive queue hops will hopefully reduce overall overhead in time.

Based on other replies up thread, especially Ben_Cohen's, the queue on which resume is invoked is anything but irrelevant, as it determines the queue on which the resumed code is executed. Or is the phrase "absent anything else like actors being involved" doing the hard work there, because there's really no way to execute async code without being in a context that would guarantee execution resumes on the original queue? So is this real behavior (which would be an issue with this proposal), or is it a behavior that won't exist in the fully concurrency story? That is, to use the current snapshot as an example, where does the code execution resume in the example below?

runAsyncAndBlock {
    // On runAsyncAndBlock's backing queue.
   let string = await makeNetworkCall() // Wrapper which resumes a continuation on URLSession's arbitrary completion queue.
    print(string) // What queue is this on? runAsyncAndBlock's backing queue, or URLSession's completion queue?
}

And despite Apple's frameworks not usually following the practice, largely due to age or the friction of passing DispatchQueue's in Obj-C, it has certainly been the recommendation by Apple engineers on social media and in WWDC labs, as well as the community at large, to pass queues for completion handlers. So when people bring up best practice, we don't just mean the practices visible in Apple's SDKs but those the Swift community have been using over the last several years.

Joe_Groff · January 21, 2021, 8:48pm

resume does not immediately resume the task, it only makes it ready to be resumed again. Its executor/actor will resume the task in a context appropriate for its current state. I'll make this clearer in the proposal text. In your example, runAsyncAndBlock creates a task that takes over the main thread; after makeNetworkCall suspends the task, it ought to resume on the main thread once its continuation is resumed.

Jon_Shier · January 21, 2021, 8:52pm

Ah yes, that does clear it up, thanks. With that being the case, queue hopping becomes an optimization problem I'm happy to leave for later.

michelf · January 21, 2021, 10:37pm

Joe_Groff:

await withUnsafeContinuation { c in
  if condition {
    // Suspend the task and resume it later
    doSomethingAsynchronously(completion: { c.resume(returning: ()) })
  } else {
    // Resume the task immediately
    c.resume(returning: ())

    // According to rule 2, we can't do anything after `resume` if we run
    // it immediately
  }
}

The second comment in this example really bugs me. Does "can't do anything" imply no defer blocks? No ARC-triggered deinit? What happens if you do something like print("hi") after the c.resume line?

Joe_Groff · January 22, 2021, 2:45am

After asking around, it sounds like this second restriction might not be needed. Certainly the current implementation shouldn't depend on it. I'll remove this.

Lantua · January 22, 2021, 3:34am

The first invariance:

Either the resume function must only be called exactly-once after the operation function passed into withUnsafeContinuation has finished executing, on every execution path through the program,

sounds tricky to comply. Even if I do:

withUnsafeContinuation { operation in
  DispatchQueue.global().async {
    operation.resume(...)
  }

  // Some clean up
}

There's no guarantee that DispatchQueue will be executed after operation has ended. Even if we don't have anything after DispatchQueue.async, it may still be executed while we're unrolling the stack, however unlikely it is.

Joe_Groff · January 22, 2021, 5:18pm

I don't think this restriction is really necessary, and I asked the other authors of the proposal and we can't recall why that restriction was there. I'll remove it.

Lantua · January 22, 2021, 5:29pm

Does this imply that continuation is not run on any executor, or at the very least, different one from the caller of withContinuation?

Joe_Groff · January 22, 2021, 5:31pm

resume only transitions the task back to a schedulable state, and then returns control back to its caller. The task will resume execution when its executor from the time it was suspended reschedules it.

Lantua · January 24, 2021, 7:03pm

So we could say that resume enqueues the next partial task and immediately returns. I'm all for simple rules, so removing the restriction is a welcomed addition.

What about withUnsafeContinuation itself? Does it enqueue operation on a different (unspecified) executor somewhere and immediately suspend the task, or does it execute the operation and suspend the task once operation completes? I think it's part of the semantic that should be documented. At the very least, we can decide whether to recommend not to perform long-running tasks inside continuation (though it should be done using Task.runDetached anyway).

Joe_Groff · January 24, 2021, 8:50pm

The withUnsafeContinuation operation runs immediately in the about-to-be-suspended task's current context.

Ben_Cohen · February 2, 2021, 4:50am

Review Conclusion

Thanks to everyone who participated in this review. Given a number of clarifications of behavior emerged during discussion with the proposal authors, the core team has decided to run a second review of the clarified proposal before a final review decision.