SE-0304 (3rd review): Structured Concurrency

Chris_Lattner3 · June 2, 2021, 4:55am

This is an "obviously critical" proposal for the Swift Concurrency direction, but I feel that this version of the proposal is "one step forward, one step back" vs the previous proposal, which I was pretty enthusiastic about. I have put a lot of time and energy into this and neighboring proposals.

While I agree that these are related, the async let proposal is a syntactic sugar proposal for a narrow part of what this proposal covers. It still has serious issues that need to gel, so I recommend that we get the foundation right without over indexing on a sugar proposal that needs more iteration. Thank you for mentioning this though, because one of the threads of discussion from that should absolutely be pulled in here (below).

Here are detailed thoughts below:

The move to embracing unstructured tasks, support for the creation of tasks in sync modifiers, are all really great. That said, I'm concerned with this direction:

let dinnerHandle = Task {
  try await makeDinner()
}
let dinner = try await dinnerHandle.value

This is a subtle but really problematic conflation of two different ideas: a task is an independent process/thread/task that does some computation. While we often specify these as functions and functions have a return value, these often yield many values during their computation and have side effects. This is why we have things like AsyncSequence in proposal, while generators are a thing in other languages, etc.

The problem with this API direction is that it is incorrectly conflating Tasks with Futures by using Task as a standin for a future/asynchronously-completed value. Outside the simple cases, Tasks can produce streams of values, and can return multiple asynchronously completed values as independent futures. This becomes important when you start composing async tasks out of multiple other async tasks which may be detached - and detached tasks emerge extremely quickly when you branch out to IPC and RPC situations.

Furthermore, Actors need to interact with all the same sort of functionality, so it feels that this is a feature best modeled as a new library feature that is orthogonal to structured concurrency and actors, not something that is "part of" the structured concurrency proposal.

The only rationale I see for this is in the changelog, which says:

collapse Task.Handle<Success, Failure> into Task<Success, Failure>. This is the most-used type in the Task API and should have the shortest name.

To be clear, I super endorse giving Task.Handle a better name. I am arguing that it shouldn't be conflated with Task. I am suggesting that we should introduce a new top-level word like Future, since this concept will crosscut structured concurrency and actors. This issue was also raised on the "async let" proposal thread.

The biggest change in this proposal is moving from the unifying word spawn (an active verb) to the word async (an adjective) when creating new tasks. While the /semantics/ of this operation are good (and I think pretty well nailed down by this point) this /naming/ move is a big step backward, for several reasons. The most important of which is:

Async isn't a technically correct word for this operation. The fundamental concept of a async {} child computation or a group.async {} child computation is happens independently and typically in parallel with the current task. However, the word async in the Swift language means "potentially suspends". It does NOT mean "happens concurrently". Conflating these two is a huge problem to me, and I think this will make it much more difficult to teach and learn Swift concurrency.

This issue is also raised in the async let thread discussion, observing that async is our second effect and that we should learn from precedent of our first effect (throws):

There are strong reasons why error handling has multiple "words in the lexicon": throws for the effect, try for the marker, Result for the "handle" type when erasing to an uneffectful function, and do/catch when introducing a new catch-processing region. I think that all these things are substantially different and are worth different "words" to clarify them.

In the case of Structure Concurrency, the former version this proposal had a stronger design: it used the word async for the effect, it used await for the marker, it uses "TBD" for the future abstraction (this is the juicy center of the async let proposal that we haven't gotten to yet) , and it used spawn as the equivalent for do/while that introduces a new independent concurrent region.
The group.async {} and top level async {} operations create a new Task and start it executing in parallel. However, the word async is an adjective, not an active/imperative verb. This directly contradicts the guidance in the published Swift API Design Guidelines, which says we should use an imperative verb here.
The rename create weird APIs that don't make sense: if you aren't a "Swift Concurrency Expert", what would you expect asyncUnlessCancelled to do?
This proposal fractures the "attached" and "detached" world. Where it proposes the spelling async {..} for attached tasks, it proposes Task.detached {..} for detached tasks. We want people to use attached tests where possible, but we shouldn't fragment the API this way. If we go with the term spawn {..} then the natural term is spawnDetached {..} which would pull these things together into a unifying framework, make the different clear, and slightly nudge programmers towards attached tasks.
Version 3 of the proposal continues to use this verb pervasively to explain itself, e.g.:

group.async spawns a child task in the task group to execute the given operation function concurrently.

As well as sections like "Spawning TaskGroup child tasks". If people will continue to think about this operation as "spawning" something, then we should just embrace that, particularly without rationale for a change.
Beyond the problems with renaming this operation to async there is no motivation for doing so - spawn was discussed extensively in revision #1 of the proposal and we agreed that it had a lot of prior art and is an active verb that successfully conveys "creating a new thing" concisely.

To recap: the move to the non-verb "async" for this operation is a big step back.
We don't need to stick with the word spawn, but if there is a problem with it, it would be better to air that problem so we can solve it. Moving to overloading an adjective effect modifier isn't a step forward.

The proposal suggest different spelling for the TaskGroup case vs the global case: group.async {...} vs Task {...}, which both inherit metadata. The family also includes Task.detatched {} and asyncUnlessCancelled {}.

I think it would be much more uniform to go with spawn {}, group.spawn { }, spawnDetached {}, and spawnUnlessCancelled {} as discussed in the previous round of the proposal.

Per the above point, this modeling:

struct Task<Success: Sendable, Failure: Error>

Seems wrong. In generality, a task can return multiple different results that have different lifetimes (consider a Task talking to a name server and a computation server independently) and tying their lifetime together seem unnecessary and limiting. It seems better to decouple "spawning" the task from "constructing the object", which allows providing more expressive APIs without sacrificing ease of use.

The move to change the withTaskCancellationHandler is a great move. Changing the onCancel member to be second will lead to more consistent and fluent APIs. A+

Agreeing with the discussion upthread, the design of the sleep API seems like it would benefit from further discussion, crosscuts actors, seems like it could be split out to a subsequent library discussion. This proposal would be easier to read if it were focused on the mechanics of spawning and interoperating with tasks, independent of the values those tasks create (AsyncSequence, futures, etc) and the things they may want to do (sleep, open files, etc).

-Chris

jayton · June 2, 2021, 8:26am

The proposal doesn’t include async {}.

Ben_Cohen · June 2, 2021, 2:23pm

Right, spawn { }, async { } and Task { } are all different spellings of the same thing. The current proposal has settled on Task { }.

Chris_Lattner3 · June 2, 2021, 7:51pm

Thanks for pointing that out, I'll edit the comment above!

So it has group.async {..} for the TaskGroup scoped case, and Task {} for the global case? That isn't very consistent. This is also inconsistent with asyncUnlessCancelled and Task.detatch {}.

Thank you for the clarification though, incorporated in the comment above!

-Chris

xAlien95 · June 4, 2021, 1:21am

I remember someone already mentioned that having "child" may not be appropriate (due to children in the common sense outliving their parents in general). Could the term subtask be considered instead? The sub/super relation is already present in the language. From the mathematical standpoint alone, it reminds to sub/supersets, which is appropriate in this context: a subset cannot overrun its superset as much a subtask cannot overrun its supertask.
It's also apt from the common parlance point of view: if you mark a task consisting of various subtasks as completed, that means that you've generally completed/handled all said subtasks.

ktoso · June 4, 2021, 1:24am

"Child", "parent" (and sometimes "leaf") nodes are incredibly common vocabulary in threading/concurrency concepts (processes, actors), and also just plain old tree data-structured which is exactly what is modeled by these here. I really don't think revisiting the names of child task and parent task is necessary. They are vocabulary only and not API per se, and that's IMHO totally fine.

xwu · June 4, 2021, 4:25am

Overall, I find this iteration of the proposal to be the best yet. The problem is certainly important and the API has evolved nicely. I do not have as much hands-on experience with concurrency features in other languages as I'd like, but I've read about a fair number of them, and I've thought about this proposal carefully through its various iterations, hasty though this review will be in the writing of it.

Now, to the details:

I'm glad I procrastinated in writing, because @Chris_Lattner3 has pointed out several issues in a more articulate way than I could, which I wanted to speak on as well:

I am glad that there is striving for consistency, but in standardizing on async we've got some odd phraseology because "async" is fundamentally an adjective or adverb, and that paints us into awkward situations. If we are to strive for consistency, I think it's important that some of the task-group-based spawning APIs and non-task-group-based spawning APIs be more harmonized too.
This proposal is clearly avoiding an API with the term "future," but every iteration of this proposal has had a future-like type. I understand that the whole point of this structured concurrency idea is to avoid a future that can be passed around willy-nilly, but it does feel somewhat like we've now got a type-that-shall-not-be-named, and the overall design seems to be under strain in order to accommodate that. (Sticking to this analogy, in this version of structured concurrency we find that Task is now Professor Quirrell, with the type-that-shall-not-be-named on the back of its head.) I think @Chris_Lattner3's exploration of the issue is a persuasive one.

As to the naming of things--

I think it ought not to be rejected out of hand the objection that Swift is establishing a design where "children" must not outlive their "parents." I do not think this is frivolous. Sure, the term "child" is incredibly common in many technical contexts, but in most cases the thing termed a "child" does not have a lifetime that is constrained to be shorter than that of its parent as a desideratum. This is just an incredibly sad way to phrase something that doesn't need to be expressed with such emotional valence, particularly since it's actually incredibly exciting that we're going to be able to use this property to improve the correctness of the code we write: we must remember that we are speaking to human beings about this feature.
It seems we have settled on a design in which one task has many jobs. In the ordinary world, typically a person has one job but many tasks. Can we find another way to describe a quantum of schedulable work that might more intuitively describe its relationship to a task?

And finally...

I raised this in an earlier review or pitch feedback, but the point has not been addressed by way of explanation or correction. Standard U.S. English spelling is "canceled," and Swift standard library APIs have always adhered to this Websterian convention (for example: isSignalingNaN, not isSignallingNaN). I just checked again, and Apple still has a style guide, which says just as it has for decades*:

canceled (v.), canceling (v.), cancellation (n.)
Use one l for the verb cancel—for example canceled, canceling. Use two l’s for the noun cancellation.

If there's a rationale for deviating from this, the authors should explain why so that the community and core team can evaluate the reason. Otherwise, we'll inevitably have the scenario where first-party documentation for the API will read something like: "isCancelled—A Boolean value indicating whether the current task is canceled." And we'll run into clashes where one moment we're cancelling and the next moment we're signaling.

* FWIW, it’s not just dictionaries and technical documentation that adhere to this rule. Consider, for example, this educational dialogue from the hit 2000s TV series The OC, season 2 episode 3 (penultimate scene):

RYAN: Oh, well, um, next time, don't spell “canceling” with two l's. Yeah, that's wrong. You wanna—you wanna fix that?

LINDSAY: I—I was using the Canadian spelling.

RYAN (Canadian accent): Oh, you were usin’ the Canadian spelling, eh?

jayton · June 4, 2021, 2:39pm

I’m concerned that the spelling Task { ... } is too convenient. It lends itself to thinking of Task { ... } as the “simple” case and group.async as an “advanced” case, as seen here:

In its simplest form, you can start concurrent work by creating a new Task object and passing it the operation you want to run.

For more complex work, you should create task groups instead – collections of tasks that work together to produce a finished value.

With a spelling as simple and attractive as Task { ... }, it seems hard not to present things this way, unless you’re deeply invested in advocating a structured-first approach. If we want structured concurrency to be the go-to choice, I think there needs to be at least a slight road bump here.

(I have a feeling “why not just use Task { ... } everywhere?” will be the new “why not use [weak self] everywhere?”)

Ben_Cohen · June 4, 2021, 4:34pm

While I agree there's a risk of Task over-use, "let's make this common need awkward to use so people don't use it incorrectly" is generally not a good solution. Rather, it's better to make doing the right thing in those circumstances easy too, which is what the async let proposal is for.

JJJ · June 4, 2021, 4:58pm

+1 for "redefining" the use of Future to mean something that will not outlive its source

Judgeman777 · June 8, 2021, 4:44am

So, I may be wrong, but the Platform State of the Union just talked about how great Structured Concurrency is… but it hasn’t even been accepted or implemented into Swift 5.5 What am I missing, was a decision announced on this proposal?

John_McCall · June 8, 2021, 5:32am

This is the third round of review. The basic design of the proposal has been broadly accepted by the community, and we're now debating a few largely superficial details. Those details are important, but no matter how they're decided, it's no longer in question that Swift will incorporate some form of structured concurrency around tasks.

We currently expect that Swift 5.5 will provide whatever design is accepted here. Indeed, the underlying implementation is already in place, and it's mostly just the API design that's changing.

Douglas_Gregor · June 10, 2021, 6:00pm

Yes, and there's some rationale over in that thread about centralizing around async for structured concurrency. Yes, async is an adjective/adverb in English, but Dispatch has set a very strong precedent for using async to initiate new asynchronous work.

We can certainly clean this up.

A task has a single starting point and returns a single value, which might be a value or a thrown error. Any asynchronous calls the task does along the way don't create new tasks, they're just part of the same task.

It could be separated out, but it's only worthwhile if we think there's going to be significant revision. Doing sleep really well requires a type to describe time properly, which we don't yet have and is a big undertaking in and of itself. Yet Task.sleep is an important operation, hence my desire to get it the slightly-uglier name Task.sleep(nanoseconds:) and leave the time-type design (and nicer name Task.sleep(_:) for later.

Doug

Chris_Lattner3 · June 10, 2021, 9:24pm

Ok, but dispatch and its APIs be effectively gone (replaced by this new thing) from the nomenclature of Swift in a few years. This isn't an industry term of art that you're aligning with. I don't see how this is very strong rationale, it seems like we should fix the mistake of the past.

Also, it doesn't align with other uses of async in Swift which is very big deal as pointed out by many on this thread. async means "this is suspendable" not "create a new task".

Detatch, Task groups and the proposed 'async let' thing (however it is spelled) all create new tasks, which all produce asynchronous results. It is entirely reasonable to want to describe a function that returns multiple results that are resolvable at non-determinstic time with respect to each other (e.g. they are coming from two different remote machines). In other systems you typically spell this with (Future<..>, Future<..>) (where the outer parens are a tuple.

Sure, agreed.

-Chris

Douglas_Gregor · June 11, 2021, 7:30pm

I like your optimism, and while I don't expect that the timeline will be so short: point taken.

So, you can take the structured approach if you want both values resolved together:

async let a = thing1()
async let b = thing2()
return await (a, b)

or you can use an unstructured approach if you want to make them separately resolvable:

let aTask = Task { await thing1() }
let bTask = Task { await thing2() }
return (aTask, bTask)

The async is keeping you in the structured world. An async let is tracking the task that you'll need to await to get the value. The naming is emphasizing the split between structured and unstructured.

Doug

Alejandro_Martinez · June 12, 2021, 6:10pm

Maybe is a bit off topic, but Joe recommend to raise it on the forums:

right now sleep doesn’t check at all for cancellation, making some use cases of structured concurrency await longer than needed.

Would it be possible to at least implement some eager cancellation before this big undertaking for a time type?

benlings · June 12, 2021, 8:36pm

I agree there should be a way to make sleep() be able to be cancelled. Were you thinking that it should check at the beginning, or return early?

Alejandro_Martinez · June 12, 2021, 9:18pm

IMO it should return early, as soon as possible. Somewhat like when urlsession new async functions get cancelled and stop early. Of course I have no clue how complex that is to implement ^^’

Zhu_Shengqi · June 13, 2021, 1:52pm

Although it may be a little late to join the discussion, but I found the following code (from WWDC session) rather confusing to me:

async { await self.healthKitController.save(drink: drink) }

As other community members have pointed out, the first async keyword here has totally different meaning from the async in async/await proposal. I believe the code above will be common when people adapt existing code bases, which may cause further misunderstanding when we discuss things in terms of async and await.

I understand async may be an easy term to memorize and use, but clarity should not be sacrificed when choosing a keyword which will be used in common concurrent scenarios. When talking about async I'd prefer the single case that functions have the capability to suspend.

Some may argue that we use DispatchQueue.async already, but it's a library feature instead of language feature. I suppose we should have a much higher bar for naming in language features.

bzamayo · June 13, 2021, 2:53pm

FWIW, the latest version of the proposal changes the spelling of async { ... } to Task { ... }, but this didn't make it for the beta 1 seed build.