[Concurrency] Structured concurrency

I was thinking about something like this:

extension Task.Handle {
  func set(callback: (Value)->())
}

... Though what you said made me realized that I just do this:

Task.runDetached {
  let value = await doSomethind()
  callback(value)
}

or even this:

let dispatchGroup = ...
dispatchGroup.enter()
var value: Value?
Task.runDetached {
  value = await doSomething()
  dispatchGroup.leave()
}
dispatchGroup.wait()

which allows us to synchronously wait on the async task.

How about just:

Task<Int>.withSubtasks { task in
    ...
}
1 Like

Just to make sure we're on the same page about what a Task.Handle is...

We specifically do not want such public operations on a Task.Handle, this would conflate the consuming and producing parts of such API.

Task.Handle is a "future", and is not a "promise". It is the "read side", it is not the "write side" of the contract. Cancellation through the handle is a notable exception here, but cancellation is a slightly different beast than just succeed/fail. Those two "sides of the same coin" (read/write) must not be exposed on the same type, as it gets increadibly confusing who is responsible to complete it. If the both APIs are available on the same type, it gets confusing if it gets passed somewhere -- am I expected to complete it or am I expected to wait on it?

Thus, Task.Handle is only the "read side" or a "future", but not a "promise."

The task APIs, on purpose, do not expose a "promise-like" API for normal day to day usage as they intend to push towards structured calls and such usage:

Again, takes some getting used to when coming from heavy future/promise using libraries, but it makes sure we have one primary way of doing async.

That's good, it does mean we do not have yet another shape to express these things -- it is always some task that runs and completes (and it can await inside etc). It never is "randomly complete a promise from the outside", except the UnsafeContinuation API (that's the "promise" shape API actually, but we strongly encourage not using it unless one has to, see the proposal for why it is error prone) which has unsafe in the name for a reason :slight_smile:

Long story short:

  • Task.Handle is like a "future."
    • It is the "read side" of a task, enabling waiting on results, but not completing them;
  • No explicit "promise" style API is offered for normal usage; Instead, asynchronous work should be done by asynchronous functions, and if necessary and starting new tasks
    • This is less error prone as one cannot "forget to complete" such async functions.
  • Unsafe "promise" style API is available via Task.withUnsafeContinuation as last-resort, if unable to express something using the safer patterns.
6 Likes

We currently renamed those APIs to: Task.withGroup { group in ... } and the type being Task.Group. The team is comfortable with the pun that a "task group" is formed to perform some work and does it "together", as those child tasks do in a task group.

Please refer to the full proposal text for the latest wording which reflects the code on the main branch.

1 Like

Sounds much better

2 Likes

Is it technically possible to have try handle.getByBlocking() to be able to scape hatch an async context and allows us to wrap async functions inside synchronous funcs?

Thanks,
Chéyo

You can do that as is:

Whether we want to make it any easier for something that's very much discouraged, :woman_shrugging:.

1 Like

await can only appear in an async context I believe.

I'm not sure what you mean. await is in runDetached argument block, which is async, and runDetached itself synchronously returns.

First of all: I really like this group of proposals, especially the simplicity of the async let syntax. Thank you to everyone who has been involved.

I have a question regarding this example:

To me, this contradicts the following sentence from the proposal:

A task group guarantees that it will await for all tasks that were added to it before it returns.

To me, this sounds like your example would wait for both tasks before returning, which is not what you want. You would have to call nursery.cancelAll() (or group.cancelAll() with the new naming) at the end to get the behavior you want.

I think this behavior is pretty unintuitive and can lead to many implementation errors, as demonstrated by @anandabits example. I think cancelling all tasks that have not been awaited would be better, but not optimal. Another solution would be to await all tasks by default but switch to cancelling if group.next() has been called at least once. Or could the compiler somehow enforce all tasks to be explicitly awaited or cancelled? Maybe someone has a better idea.

Right yeah... we're in the middle of figuring out what the default behavior should be; some of the writeup had assumed cancelling while some of it did waiting on the returning edge.

Other implementations of this concept opt for awaiting all outstanding work, and cancelling on the group failing which is what we're closing in on as the default; Yes the above example would need a cancelAll() then.

For every example where cancelling outstanding work makes sense there's other examples which make sense with implicitly waiting. I'm kind of hoping we leave this up as configuration parameter when starting a task group (or may be Task.Group.awaitAll { ... } or Task.Group.cancelPending { ... }?).

(Currently the proposal is assuming waiting).

Exact shapes not yet decided and we'll only get a commitment here once we've had a chance to implement and use the APIs a little bit more, hopefully very soon.

2 Likes

Customizing the global thread pool is quite interesting actually -- specifically, reusing the global thread pool that one already has in an existing large system. I see you mentioned this before:

Do you mean modifying the code in the Swift runtime or the standard library? If so, one would have to build & use a custom toolchain, which is quite unfortunate. It would push this use case into unsupported territory, make it non-testable in the Swift CI, and provide no guarantees against accidentally or intentionally creating stronger ties with Dispatch in future that would break it.

Allowing the global pool to be replaced at load time might be interesting, especially for testing. We can explore that. I don’t think that making your own runDetached, TaskGroup, and so on would be a sensible way to try to replace the pool — that wouldn’t catch any implicit uses, such as when resuming tasks (if they don’t need to be resumed on a specific actor), or any uses in libraries or the runtime. If there’s a good reason to have two global pools split in such an ad hoc way, I can’t see it.

The default actor scheduler really wants to be integrated with the system scheduler, though. For example, you want to be able to signal a thread that it’s been processing the same actor for a long time, and it’s time to let some other actor use the thread. We can make that customizable, too — I think you’d need to, if you wanted a testable scheduler — but it is inevitable that those ties will grow over time.

2 Likes

My colleagues at Google (most notably @gribozavr, @saeta, and @pschuh) and I have these questions and suggestions:


Motivation -- Errors thrown from initializers of async variables

func chopVegetables() async throws -> [Vegetable] { ... }
// ...
  async let veggies = try chopVegetables()
// ...
  ... await try veggies ...

  • If chopVegetables() throws an error thrown before the first suspension, does the error get propagated from the initializer, or from await try veggies?
  • If errors only get propagated from await try veggies, we probably shouldn't have a try on the call to chopVegetables(), as it won't be annotating a possible control flow branch.
  • Is it better to shift all errors to be propagated from await try veggies? Otherwise, one call to a throwing async function can create two error propagation points.

Child tasks

  • It would make this section easier for me to understand if you introduced async let as a “non-escaping future” at the beginning.
  • “a child task does not persist beyond the scope in which is was created”—did we think about the alternative where the child task could persist beyond the scope in which it is created, but not the function? It seems like some of the uses for task groups might be subsumed if we did that.

Child tasks with Task Groups

  • The problem being solved is important, but—acknowledging my weak technical argument here—the design of the library components to support it feels clumsy and inelegant to me. I'm suspicious every time I see a “with…” function these days. It seems like there are many other possible choices for APIs to support this functionality. For example, why not just create a local TaskGroup instance, given that the type is already magically noncopyable?
  • This appears to be a higher-level API on top of detached tasks. I would rather see us start with the lower-level components and have a period of experimentation to build library components before they become baked.
  • Presumably(?) it’s possible to interleave calls to add and next rather than adding all the tasks at the beginning and waiting for all of them at the end? Either way, it would be good to be explicit about that possibility.
  • Can I create an async let that adds something to the nursery, or calls next() on the nursery?

Detailed design — Tasks

  • “A task runs one function at a time”— this is potentially confusing because of function calls. Isn’t the outer function still running? I suggest leaving out this particular phrase, or maybe substitute “call stack” for “function.”

Detailed design — Child tasks

  • “a function that creates a child task must wait for it to end before returning”— this is less restrictive than the scope restriction of async let. Is that intentional?
  • “the features of this design that apply to an entire task tree, such as cancellation, … don’t automatically propagate upwards in the task hierarchy,”—I can't quite imagine what this would mean for a feature such as cancellation. Up/down seems like an orthogonal consideration to the lifetime of child tasks, but the language used in this section makes it sound like they are connected. Can you clear that up?

Detailed design — Executors

  • “Swift provides a default executor implementation, but both actor classes and global actors can suppress this and provide their own implementation”— This leaves a lot to the imagination. Can you tell us what the executor API looks like? In particular, we want to evaluate whether the API is consistent and compatible with our internal concurrency primitives.

Detailed design — Task priorities

  • “Detached tasks do not inherit priority (or any other information) because they semantically do not have a parent task”— That doesn't seem like enough of a reason. They could inherit information from the task that launches them, for example. I’m less interested in what behavior is implied by parent/child relationships, and more in having the optimal behavior, and understanding why it’s optimal. Any thoughts?
  • “If a task is running on behalf of an actor”—My understanding of the model is that tasks may cross between actors, so what does it mean for a task to be running "on behalf of" an actor? Does that have something to do with where the task started, or does it just mean "on" an actor?
  • “This does not affect child tasks or Task.currentPriority(); it is a property of the thread running the task, not the task itself”—Does that mean it’s not visible to the programmer (except possibly through platform-specific system APIs)? If so, perhaps say so.

Detailed design — Cancellation

  • “A task can be cancelled asynchronously by any context that has a reference to a task”— this left me with the strong impression that Task was going to be a class. That caused a lot of confusion for me, because there’s no declaration. Eventually I think(?) I devined that it’s just a pseudo-namespace enum, and I think (?) you mean Task.Handle not reference.
  • “A flag is set in the task which marks it as having been cancelled; once this flag is set, it is never cleared. Operations running synchronously as part of the task can check this flag and are conventionally expected to throw a CancellationError.”—it seems like this check-and-throw idiom should be encapsulated in a library facility.
  • “Any cancellation handlers which have been registered on the task are immediately run”—AFAICT this is the only mention of cancellation handlers in the proposals. Seems like they need to be described in detail or removed?
  • async let onion = try chop(Onion())”—it's a little odd that the try doesn't go along with the async syntactically, and annoying that try must be used both here and at the point where the value of onion is first(?) demanded. The declaration site never throws as far as I know.
  • Do we need to try and await every time we mention onion after this declaration, or just at the first use, or…?
  • guard await !Task.isCancelled() else”—why is checking a single local bit annotated with await?

Detailed design — Child tasks with async let

  • “The initializer of the async let is considered to be enclosed by an implicit await expression.”—What does this sentence mean? An obvious (but seemingly incorrect) interpretation is that the code is compiled as if the following was written instead:

    async let result = await try fetchHTTPContent(of: url)
    

    But that means a completely different thing -- the caller is suspended at await until the callee returns. So it can't mean that. What does it mean?

  • “if any of the values initialized by this closure throws, all other left-hand side to-be-initialized variables must also be considered as it they had thrown that error.”— It’s not obvious that these are the implied semantics, since the execution is async and no throwing happens in this function until the variables are awaited. Can you offer some rationale?

  • error: did not await 'result' along this path.”—Isn't making this a hard error as opposed to a warning somewhat inconsistent with the rest of the language?

  • “The requirement to await a variable from each async let along all (non-throwing) paths ensures that child tasks aren't being created and implicitly cancelled during the normal course of execution’’—This seems like an overstatement. IIUC it only ensures that one such task from each let isn’t handled that way.

Detailed design — Child tasks with Nurseries

  • It seems like the proposal is in an inconsistent state w.r.t. the choice of “nursery” or “task group”?

  • /// Individual tasks throwing results in their corresponding try group.next()``

    /// call throwing,”—this sentence is confusing; can you clear up what it’s supposed to say?

  • “and the task group enforces awaiting for all tasks before it returns”—I think(?) “it” should be replaced with “withGroup,” since the task group appears to be a value.

  • Is there no way to decide we’ve got the result we want and cancel the remaining tasks? Think searching for any occurrence of a word in a bunch of documents.

  • @unmoveable”—Making a special-case pinned type for this purpose seems like overkill. There are certainly quite a few places in the standard library where we’d want a similar thing for safety reasons. Is making this thing a copyable type (e.g. a reference type) going to lead to serious abuse?

  • “Task.Group.add/addWithHandle”—It seems unnecessary to provide two APIs that do the same thing, but differ in return value (void vs. something). Why not only provide add(), return a handle from it, and mark it with @discardableResult?

  • About the naming of "Task.Group.addWithHandle". "With" in function names typically refers to parameters. (Even though the "with" preposition is considered semantically weak and is discouraged.) I think using "with" to refer to the return value is confusing. If we must have two separate functions, add() and addWithHandle(), WDYT about addReturningHandle()?

Detailed Design -- Child Tasks with Nurseries -- Task.Group.add is async for back pressure, but is it effective?

Task.Group.add is marked async. ktoso explained "So what this code is saying is that “adding can suspend”, e.g. because a nursery could be configured to “never have more than 1,000” outstanding tasks, or similar." ([Concurrency] Structured concurrency - #96 by ktoso)

Naive code using task groups (including code presented in the proposal!) will often consist of two loops: the first loop is adding tasks with add(), and the second loop is consuming results with next(). The second loop does not start consuming results until the first loop is done adding child tasks.

Say the code is attempting to add a million tasks to a group. It would be suspended on the add() call, but would not yet reach the next() call to actually consume the results. Therefore, while suspension in add() will help to control concurrency, it won't help with peak memory usage. The application would need to accumulate a million of task results in memory before it starts consuming them.

Maybe a better approach for back pressure is to let the caller immediately know that they can't add more tasks now, and instead should focus on consuming the results? If the caller can't yet process partial results it should explicitly declare a local array and store partial results there, making the peak memory consumption explicit in the code.

Detailed Design -- Task groups: throwing and cancellation

  • The section starts with "Worth pointing out here is that adding a task to a task group could fail because the task group could have been cancelled when we were about to add more tasks to it." But the example that follows does not demonstrate that case. All task.add calls finish by the time (2) will allow an error to escape into the group. So it is still not clear how to construct a piece of code where it is possible to call task.add after unwinding through the group has started.

Detailed design — Task Groups: Parent task cancellation

  • “So far we did not yet discuss the cancellation of task groups”—? the previous section discusses them
  • “ This task is the parent task of the task group, and as such the cancelation will be propagated to it”—The antecedent is unclear here. Presumably “it” refers to the task group?

Detailed design — Detached Tasks

  • Declarations of proposed APIs should have doc comments. Please tell us what a Task.Handle is and what it does.
  • The type parameter “Success” to Task.Handle seems poorly named. Seems to me it’s the result type of the task; how about “Result”?
  • Why is Task.Handle a nested type? Is it important to emphasize that we have a handle and not the task itself? If yes, what is the difference? If not, can we use Task instead to represent a reference to a task? What are the non-static members of Task? (the proposal does not show any)
  • Is there a way to chain multiple tasks together to avoid blocking on a call to “get”?

Detailed design — Low-level code and integrating with legacy APIs with UnsafeContinuation

  • What do these functions do? It’s truly not obvious. Doc comments please.
  • The “init(...)” is mysterious until you notice “private,” which took me several readings. Please explain this stuff better.
  • “the resume function must be called exactly at the end of the operation function's execution”—It’s hard to understand what this actually means. Are we supposed to use defer to call it?
  • Given the vagueness of everything that comes before in this section, a “purposefully convoluted” example is pretty hard on the reader. Can we maybe start with something really simple.
  • Is this API ready for prime-time? Seems a little sketchy still.

Detailed design — Voluntary suspension

  • “The function does not check for cancellation automatically, so if one wanted to check for exceeding a deadline this would have be done manually before sleeping the task.”—wouldn’t one want to check after sleep ends as well? What’s the motivation for not including cancellation in this API?
6 Likes

I think the motivation for separating Task from Task.Handle is that the latter is generic. The separate types allow for both Task.isCancelled() and Task.Handle.get().

My impression after reading everything was that Task is probably supposed to be a pseudo-submodule (an enum with no cases). But that's just a guess; it really would be better if the proposals were explicit about this.

1 Like

I would have assumed the former, but...

... this seems reasonable to me. It's easier to reason about if the errors are always propagated at the await try site.

The model where a child task doesn't persist beyond the scope in which it is created provides useful structure that helps one reason about programs. I think you're going to need to bring some very specific, compelling examples if you want to change that.

This might be nicely ergonomic. I think the Task APIs in general need a lot more refinement before they are ready.

Detached tasks aren't child tasks: they aren't finished when you exit your current scope, don't inherit the priority of the current task, and their handles can be returned outside of the current scope for future query. They need a separate API.

Yes, you can, and we should show it. A few more examples involving task groups would be helpful here.

No. Do you have a specific use case in mind?

Yes. It's tied to the lifetime of the task group.

If you cancel a task, it cancels all of that that's child tasks (and the child tasks of those tasks, etc.), "down" the task tree. It does not cancel any parent tasks "up" the tree, nor any sibling tasks.

Detached tasks can be launched from synchronous code, which doesn't have a task to inherit from. If you are in asynchronous code and want to launch a task with the same priority as the current one, go for you---you can get the current task's priority with Task.currentPriority().

It means "on".

Task.Handle is the way you reference a task.

It is: Task.checkCancellation.

Yes, these need detail.

This is the same question you had above, and I agree: if we don't allow the declaration to throw at all, the try shouldn't be required there.

Every time.

Task.isCancelled() is async, because it can only be called from asynchronous code, and therefore it needs await. We've considered whether should add some kind of @instantaneous attribute for functions like this that need to be async but will never suspend (something that can be checked in the body), but it's a small syntactic optimization so we haven't put it into any proposal.

It means you don't have to write the await if you call asynchronous functions in the initializer. In other words, the same thing you're suggesting for try.

The initializer expression is what is evaluated, and that can throw (in this case). Destructuring happens after, so if you want to use any of the variables, you need to await try on them.

I think you're going to need to explain that question a bit more. The intent here is to prevent mistakes where the user fires off a bunch of concurrent work that isn't needed. Such work would be better off scoped more narrowly.

See the explanation above. This fits the implementation model.

It used "nursery", we're moving to task group. It is entirely likely that we missed a few cases in a mass-rename.

It's this API in the document:

    /// Cancel all the remaining tasks in the task group.
    /// Any results, including errors thrown, are discarded.
    public mutating func cancelAll() { ... } 

Probably not more prone to abuse than, say, withUnsafeBytes. We'll remove it; it's noise at this point in the process and can be added more generally any time later.

Yeah, seams reasonable to me.

I think we should just have add ;)

Maybe? I think the concern is less about code flooding the local nursery (which would be easy to debug/fix) and more about the overall schedule preventing new tasks from getting scheduled so that other work could complete first.

As noted previously, the Task API needs a lot of design still. I don't really want to do it all in one giant reply to a post with ~50 questions in it.

It's not blocking, it's suspending. Chain tasks together using await and get.

These are very, very low-level implementation details for dealing with completion-handler code. Yes, they'll get better explanations in time.

... sure ...

And, what, throw CancellationError? We could, or we could provide a combined API, but that doesn't seem fundamental to sleep() and some clients might want the notions separated.

Doug

3 Likes

Reading through this proposal makes me very excited for async/await on Swift. I think it really addresses some of the shortcoming I've seen in JS. The approach to cancellation especially looks really excellent. In JS, there's no good way to make async function cancellable which is especially problematic when web requests start to back up and timeout. On the other hand I've grown quit frustrated with Combine which can cancel at any point.

The one part of this that doesn't really jell with me is the nursery concept (even the name seems really odd). The syntax is so odd that I can't see myself using it. Although perhaps it will just be a building block for higher level api like asyncMap.

2 Likes

Thanks for the feedback!

You may want to prefer reading the proposal https://github.com/DougGregor/swift-evolution/blob/structured-concurrency/proposals/nnnn-structured-concurrency.md which has been updated with minor adjustments since the original post. Most notably, nurseries are "Task Groups".

(Any chance to make the first post just a link to the proposal to avoid people reading old wordings? @Douglas_Gregor @John_McCall? We should do a "v2.0" soon anyway, but until then lessening confusion might be nice...).

And yes, they're the most powerful yet low level of all the offered tools. It is quite likely that people won't interact with them much directly but instead rely on APIs which use them internally for their implementations.

3 Likes

I envisioned a detached task API that allows a priority (including the current task priority) to be explicitly specified, and thought of the proposed TaskGroup as a class whose instance holds an array of child detached tasks, accessible only to the TaskGroup instance, and whose deinit would cancel the child tasks. That would make detached tasks a fundamental component that can be ratified by the community and rolled out, giving us time to develop various takes on the task group idea without worrying about having the perfect API right out of the gate.

Edit: I omitted that one needs some way to await the next task completion in a collection of detached tasks, to implement the next() method.

1 Like