[Pitch #2] Structured Concurrency

curt · January 3, 2021, 6:51pm

Thanks for the updated pitch. I really appreciate the example showing the desugaring of async let into task groups.

`withCancellationHandler`

I’d find it helpful to have an example and more exposition around withCancellationHandler as well. (@anandabits and @John_McCall had an exchange on this up-thread, but I’m still not completely following.) My understanding is that the closure bound to the operation parameter must always execute to completion. I’m guessing that the closure runs on the same task from which Task.withCancellationHandler was called — that is, there is no implicit runDetached here. On what task does the closure bound to the handler parameter run? The docs say it runs immediately, but if it’s also run on the current task, it seems like the handler closure must wait until the operation closure reaches a suspension point. Is the handler closure implicitly runDetached? Am I missing something?

Some minor errata

In “`async let` as sugar…”:

func makeDinnerTaskGroup() async throws -> Meal {
  withTaskGroup(resultType: DinnerChildTask.self) { group in

The resultType argument should be DinnerChild.self.

In this example in the detailed design of “Task handles”:

func eat(mealHandle: Task.Handle<Meal, Error>) {
  let meal = try await mealHandle()
  meal.eat() // yum
}

This was mentioned up-thread: the eat function should be marked as throws. It was also noted that the initialization of meal needs to call .get() on mealHandle. However, the presented syntax makes me wonder whether Handle might be made dynamically callable, invoking the get() method. Perhaps that’s too clever.

In the detailed design of “Detached tasks”:

static func runDetached<T>(
    priority: Priority = .default,
    operation: @escaping () async -> T
  ) -> Handle<T>

The return type should be Handle<T, Never>.

In the detailed design of “Adding tasks to a group”:

The prelude to the declaration of the add method says “ResultType generic parameter”. I think this was renamed to be the TaskResult generic parameter.

anandabits · January 3, 2021, 6:58pm

This is the same question I asked. @John_McCall said it runs immediately and not in the context of any task. The handler must therefore be thread safe as I understand things.

michelf · January 3, 2021, 7:15pm

isCancelled could return false when invoked outside of a task. Task-local values could return their default values. The problem is the task context is passed as a hidden parameter to the function and normal synchronous functions don't have that. There must be something in the function signature signaling the presence of this hidden parameter. async does this, in addition to allowing suspension points.

The question is what to do when you need that context but don't need suspension points? Is this "async but without suspension points" or is it "synchronous but with a task context". I feel the later formalizes the presence of a task context, but it's all a question of perception.

It occurs to me we could avoid this dilemma if the hidden parameter was less hidden. Instead of Task.isCancelled we could be calling task.isCancelled, where task is the hidden parameter made visible (similar to self). We wouldn't need a special attribute to pass the task context as a hidden parameter if it could be passed as a non-hidden parameter or as self. I guess the main difficulty is this task value must not escape; you'd need some compiler magic for that.

xwu · January 3, 2021, 7:41pm

Well put. A distinction, though, is that the latter doesn't currently have a spelling in the language, whereas no async function is guaranteed to have at least one suspension point, so async without suspension points is just plain async. This allows the former to be presented to users as simply a refinement of an existing feature, whereas the latter requires, as you say, reference to a task context explicitly.

jayton · January 3, 2021, 9:26pm

If you believe that, your preferred solution should be to have no annotation at all. The point of having an annotation is that “async with guaranteed zero suspension points” is not just plain async.

xwu · January 3, 2021, 9:49pm

Let me clarify: it is, by that view, to async as a square is to a rectangle. This is why I proposed a spelling such as async(nonsuspending) instead of an annotation via attribute, emphasizing that it is just async.

It is also, I hasten to add, why I mentioned earlier that a full feature should also come with a withoutActuallySuspending (in the same vein as withoutActuallyEscaping) such that such non-suspending functions which are labeled as plain async rather than as async(nonsuspending) (or whatever we decide to call this) can be used in all the same places.

michelf · January 3, 2021, 9:56pm

I see it like this:

a synchronous function is one that guaranties having no suspension point
an asynchronous function is one that may have zero or more suspension points

async with "a guaranty of no suspension points" is pretty much a contradiction; it means the function is synchronous, which is the reverse of asynchronous for which async stands for. That in the remains of things cancelling each other you get a hidden task context is the most inconspicuous.

I see async functions having an implicit task context parameter as an implementation detail. Yes, it's important, but the word async doesn't really tell you about it and thus this detail will likely fly under the radar of most people.

The actual existing feature is the synchronous function. Calling async/await an existing feature is stretching things a bit. async is only in pre-release on the main branch hidden behind a flag. In practice the first phase of the concurrency rollout will be just be one big feature, not as a chain of incremental proposals.

And this supposedly "existing" feature isn't even entirely frozen yet. The core team said the async/await proposal could be revised if reviews of the other concurrency proposals highlight reasons to do so.

xwu · January 3, 2021, 10:01pm

I hate to repeat myself, as I presume you've already followed the preceding messages, but the point I'm making is that any implementation of async/await necessarily allows the user to spell a nonsuspending function that requires a task context as async. This cannot be banned because the compiler cannot distinguish a function that reserves the right to be suspending at some later point in its evolution (but not in its currently implemented form) and is therefore async from a function that semantically can never suspend but is nonetheless declared async.

As you put it: "an asynchronous function is one that may have zero or more suspension points." This necessarily includes functions that must have zero suspension points.

This is the sense in which I mean that a spelling already exists for this feature. In other words Task.isCancelled() is already implementable with the building blocks that we have, and it is by labeling it as async. By contrast, it is not possible to write a synchronous function in Swift--either currently or with the set of features proposed thus far--with a task context.

Therefore, particularly if we buy the argument that such functions are "esoteric," I think it's important to be able to spell the lack of suspension points in a way that's a refinement of the already-and-necessarily-possible-but-not-ideal async and not to have to invent a new attribute out of whole cloth.

Or, perhaps approaching this differently, I see the meaning of async as "a function that can only be used from an asynchronous context." That, after all, is the only guarantee. I see the suspension points, rather, as an implementation detail. In fact, by design, it is a detail of the function's implementation how many suspension points there are (if any)--something not at all knowable by the presence of async.

You are right, though, in that it simply represents two different ways of viewing "what async means." My argument is that this way of viewing it is a simpler model for the user, because it is not possible in general to reason about the number of suspension points but it is very much salient as to whether something is called from an async context or not.

michelf · January 3, 2021, 10:22pm

I concur we're diverging on what's important in async: you think it's the hidden task context while I think it's the actual asynchronous execution through suspension point. Getting rid of the context thus makes no sense from your perspective. And getting rid of the suspension points makes no sense for me.

As I suggested earlier, we could sidestep this debate by having the task context become an implicit local variable like self. Synchronous functions could just reference the task context like any other variable, with some magic restrictions to prevent it from escaping. That would remove the need for "half-async" functions.

That's probably more work to implement though.

bjhomer · January 4, 2021, 1:28am

Would this hidden task variable be available to all functions? Or only async ones? It seems like there may very well be synchronous functions which still want to access task state. For example:

// How does this function know it can access a Task?
func printDebugTaskState() {
    print("Task \(Task.local(\.name)) state:")
    print("  cancelled: \(Task.isCancelled)")
    print("  itemsProcessed: \(task.local(\.finishedCount))"
}

How does this function gain access to the task? Does every function have access to an implicit task variable? If not, then there must be some annotation on the function that says "This function requires an async context.

I think many users (myself included) would assume an async function is one that is asynchronous (or may be asynchronous, depending on runtime conditions), which means that the caller must use await.

Because we want to be able to call synchronous functions that require an asynchronous context, we requiring users to understand one of the following two situations:

@instanenous func foo() async:
Async APIs are available in an async context, which is any function marked async. To actually call an async function, you may need to use await if the function could be asynchronous, as noted by the absence of @instantanous.

Complexity: async doesn't actually mean "asynchronous", so it doesn't tell you whether you need to use await.
@asyncRequired func foo():
Async APIs are available in an async context, which is any function marked async or any function with the @asyncRequired annotation.

Complexity: There are two ways to declare a function that can be called asynchronously: async, and @asyncRequired.

Between these two options, the first feels more complex to me. It muddies the relationship between async and await. I'm in favor of something like @asyncRequired.

BigSur · January 4, 2021, 2:26am

I'm in favor of @task .. async or ..async(task) which means @instanenous async pattern.

michelf · January 4, 2021, 2:28am

bjhomer:

Would this hidden task variable be available to all functions? Or only async ones? It seems like there may very well be synchronous functions which still want to access task state. For example:
// How does this function know it can access a Task?
func printDebugTaskState() {
    print("Task \(Task.local(\.name) state:")
    print("  cancelled: \(Task.isCancelled)")
    print("  itemsProcessed: \(task.local(\.finishedCount))"
}
How does this function gain access to the task? Does every function have access to an implicit task variable? If not, then there must be some annotation on the function that says "This function requires an async context.

My idea is that async functions would get an implicit task local variable, similar to self. Synchronous functions would not. But an async function can pass its task explicitly to a synchronous function. Your example above could become:

func printDebugTaskState(_ task: Task) {
    print("Task \(task.local(\.name) state:")
    print("  cancelled: \(task.isCancelled)")
    print("  itemsProcessed: \(task.local(\.finishedCount))"
}

or:

extension Task {
   func printDebugTaskState() {
       print("Task \(self.local(\.name) state:")
       print("  cancelled: \(self.isCancelled)")
       print("  itemsProcessed: \(self.local(\.finishedCount))"
   }

and would be called like this:

func test() async {
    printDebugTaskState(task) // calling first version
    task.printDebugTaskState() // calling second version

    if task.isCancelled { ... }
    print(task.local(\.finishedCount))
}

So in short: asynchronous functions always have a task implicitly (like methods have a self). Synchronous functions don't, but can receive a task as an explicit parameter (or as self).

The thing I'm unsure about is how to make the task unable to escape. I guess values of type Task could "magically" be disallowed from being passed to @escaping closures, or assigned to any variable that isn't a local variable, or assigned anywhere that discards static type information which could be used to. Perhaps it could be purely a runtime error instead of these complex rules (with warnings for detectable improper usage). I'm uncertain about this part actually, and it could be the downfall of this idea. (Edit: would making task not Sharable be enough to prevent it from escaping?)

But I think this design makes Task and async less mysterious. Instead of having a static Task.isCancelled that magically get the value of the current task from who knows where, now it's conceptually contained in this task variable you get implicitly inside an async function. The Task API feels less global and more local when not using static functions. It's also natural to pass the task to a synchronous function when necessary.

xwu · January 4, 2021, 3:22am

bjhomer:

Because we want to be able to call synchronous functions that require an asynchronous context, we requiring users to understand one of the following two situations:

@instanenous func foo() async :
Async APIs are available in an async context, which is any function marked async . To actually call an async function, you may need to use await if the function could be asynchronous, as noted by the absence of @instantanous .Complexity: async doesn't actually mean "asynchronous", so it doesn't tell you whether you need to use await .

@asyncRequired func foo() :
Async APIs are available in an async context, which is any function marked async or any function with the @asyncRequired annotation.Complexity: There are two ways to declare a function that can be called asynchronously: async , and @asyncRequired .

Between these two options, the first feels more complex to me. It muddies the relationship between async and await . I'm in favor of something like @asyncRequired .

As noted above, I would propose something that's conceptually closer to (1) but spelled like neither of these. Instead, something like:

func foo() async(nonsuspending)
// or
func foo() async(nonawaiting)
// or
func foo() async(instant)

async answers the question you ask below:

... while what's in the parentheses answers the concern you name:

... and meanwhile, we don't encounter the following problem:

bjhomer · January 4, 2021, 3:36am

I’m not entirely opposed to those spellings, but it still feels confusing that a synchronous function is spelled async(_____)

xwu · January 4, 2021, 4:03pm

But is it a “synchronous” function when it requires an asynchronous context? I think of it as an asynchronous function for that reason, as discussed above.

bjhomer · January 4, 2021, 4:06pm

It doesn't feel like an asynchronous function to me when it always returns immediately with a guarantee of never suspending. It behaves exactly like any other normal (non-async) function; the only difference is that it has access to certain Task APIs which are also synchronous. Yes, it requires an asynchronous context, but only because we can't actually pass around a Task in the normal way.

xwu · January 4, 2021, 4:27pm

Well, if you could specify a task other than the current one, then calling the function would potentially require a suspension point. Either way you slice it, the asynchronous context is key to the function.

Lantua · January 4, 2021, 4:30pm

I think we can bikeshed the actual syntax later. The more important question would be whether to have task cancellation APIs wait for this @instantaneous, to use async in the mean time, or to forgo @instantaneous altogether.

This pitch is about Structured Concurrency, afterall.

I'd love if we can figure out how to expand task-local to non-async functions, but the semantic would be quite tricky around DispatchQueue.async. Not to mention the implementability.

Perhaps we could treat sync functions as part of the main Task, but then cancellation becomes much more onerous around sync function.

Nevin · January 4, 2021, 5:26pm

Wait, we’re still talking about isCancelled, right? Why would that require a suspension point when called from a different task?

Any code from anywhere that has a reference to a task, should be able to ask that task, “Are you currently cancelled?” and get an immediate response, synchronously.

michelf · January 4, 2021, 6:27pm

I don't think this particular naming is any good. A context is neither synchronous or asynchronous.

Anyway, what we are talking about is the context accompanying an asynchronous function. This context allows the function to resume after a suspension point and also contains data related to the current task.

I note the async/await proposal doesn't give this context a consistent name other than "context". I've been calling it "task context" here in this thread, but that probably should only mean the part of this context referring to the current task; not the data to resume after suspension. This is also the only part of the context that needs to be passed to a synchronous function wanting to deal with the task API; data to resume after suspension is clearly not useful when you know you won't suspend.

It'd be nice if we could establish clearer terminology to discuss these things.

That's Handle, which is different from Task. They're two sides of the same coin: one is for use inside the task while the other is for external use. Both have isCancelled (although one is a function while the other is a property, one is static while the other is not).