[Concurrency] Structured concurrency

Okay, so something that transparently presents itself as a collection of values but actually represents a bunch of work which hasn’t finished yet, and so arbitrary fetches from it might have to suspend?

I see two basic use-cases:

In the simple case, all you want to do is to wait for all results to complete, so that you can return a collection of all of them. FWIW, when I've used a DispatchGroup to get this kind of concurrency, it's almost always this simple scenario — I just add things to the group then wait on the whole group. In the simple case, I'd expect to be able to do something like await myCollection to wait for all the child tasks.

In more advanced cases, you may want to wait for specific results, or you may want to refer to incomplete child tasks to cancel them or change their priority, etc. In this case, I think it might be acceptable to refer to something like myCollection.nursery to get access to incomplete child tasks.

IMO, the simpler ergonomics (of not having to mention a nursery explicitly) would be better used for the simple case (of referring only to the complete set of results), since I'm guessing this is by far the most common scenario.

Edit: The big win here would be the ability to get concurrency by writing things like:

func chopVegetables(inputs: [Vegetable]) async -> [Vegetable] {
    return await inputs.map(chopped)
}

That may not be quite enough syntax inside the function to be unambiguous, but that's the rough idea.

Edit 2: It occurs to me the above could be achieved by the mechanism @John_McCall mentioned earlier: a library function — an async map in this case. That works but changes the semantics in an undesirable way. In slightly more complicated cases:

func chopVegetables(inputs: [Vegetable]) async -> [Vegetable] {
   let outputs = … some collection of things based on `inputs` …
   return await outputs // wouldn't work … or …
   return await outputs.await() // oddly messy … or …
   return await outputs.map{$0} // unobvious workaround
}

It would be better (I claim) if it were the collection that was async here, rather than the function called on it.

1 Like

Given that method signatures of async functions in this proposal have the same return type as sync methods I wondered wether the task/promise wrapping would be something that was fully hidden from the user, unlike in C# or JavaScript/TypeScript etc where the return type is explicitly wrapped. But now I think I understand that it's not so much a wrapping of the return type, but a different dimension to it?

The nursery example with the veggies seems to bear this out - I don't know if it's because you are mutating an array that makes it so, but it seems a bit awkward and boilerplatey. I was hoping you could do something like this, inspired by the C# syntax to achieve the same thing...

func chopVegetables() async throws -> [Vegetable] {
  let veggies = gatherRawVeggies()

  let choppingTasks = veggies.map { v in
    v.chopped() // note: no await
  }

  return await choppingTasks.awaitAll()
}

Would this sort of thing be possible, or even a good idea? If I did something like the above map {...}, would I have a [Task.Handle<Vegetable>] to play with?

1 Like

How do you spell the throwing/nonthrowing version of this future? It seems that they should be different static types, or parameterized on throwability somehow.

-Chris

4 Likes

With @anandabits's Generalized supertype constraints you could theoretically hack this into the type system with a generic parameter T: () throws -> () (which would, I think, only be inhabitable by () -> () and () throws -> ()).

Surely they’re all throwing, because cancellation is modeled as an Error.

2 Likes

Completely missed it, sorry! Looks like the proposal is exactly what I had in mind.

1 Like

Eagerly (or not) throwing Task.Handle.get() on cancellation

The Task.Handle(.get()) design intentionally is always throwing.

There are two ways to design this API and they differ in execution semantics in face of cancellation:

  • eagerly throwing handles on cancellation -- as future libraries would,
  • lazily throwing handles on cancellation -- only if the task code itself decides to handle the cancellation.

We take the first approach. The exact semantics differ in this way:

Current semantics from proposal: Eagerly throwing task handles

Observation: unlike structured concurrency, task handles give up control of the structure of tasks, and things can be cancelled on the specific task object by anyone (other tasks/actors/threads). Such cancellation is not possible with the structured concurrency APIs or Task in general, one always has be be a task to set cancellation (or deadlines).

As such, handles are always "external world" to the normal case; Consider the following example:

// Existing and proposed semantics:
// struct Handle<Success> { 
//     func get() async throws -> Success { ... }
// }

let h: Task.Handle<String> = Task.runDetached { <code> } 

{ // (1) some other actor "B" did get the handle and performs:
    h.cancel()
    // sets the `isCancelled` flag in the Task that h points to.
}

await try h.get() // (2) throws Task.CancellationError, 
// regardless of <code> checking for cancellation or not

// --------------------
// The <code> may not ever be checking for cancellation and run to completion
// anyway.

Here, we showcase that cancellation is "eager" with handles, this is because we can stop wasting time waiting for a result we intentionally cancelled. This is useful in being able to minimize useless scheduling, waiting, and generally more work with a task that is intentionally, externally, cancelled.

For what it's worth since Task.Handle is really like a future; this is also semantics that Futures would exhibit in any other implementation I'm aware of. A cancelled future will not accumulate callbacks, and eagerly rejects them, results are dropped even if produced.

Note that such best effort cancellation but don't get the result anyway is also how all-for-one failure works in nurseries. We of course always set isCancelled = true, however even if tasks don't check for it -- we would not get their results. So the concept of attempt to cancel, but it may still run is not inherently tied to a task throwing eagerly or not -- both this API and nurseries work the same semantically about it.

Purely re-throwing Task.Handles

What Chris alludes to can be expressed, however it has different tradeoffs.

// struct Handle<Success, Failure: Error> { 
//     func get() async throws -> Success { ... }
// }
// extension Handle where Failure == Never { 
//     func get() async -> Success { ... }
// }

We had this API shape at some point as well. The difference goes beyond just the API looks though, it changes execution semantics.

Note that the Failure is never really used in the API except for knowing if it's a never or not.

Consider the above example, but written in the 2) style:

// the <code> task never throws
// either cancellation errors, or any other errors
let h: Task.Handle<String, Never> = Task.runDetached { <code> } 

{ 
    h.cancel()
}

await h.get() // has to wait until <code> completes
// cancellation is not visible to the handle; 
// multiple awaits can keep being enqueued to the task until it completes.

In a way, this means that we have unbounded growth of the number of waiters on this already cancelled task handle. This is somewhat sub-optimal; we are working with handles which are about "outside world control" and we absolutely know this was intentionally cancelled and don't want the result anymore; yet we will get the results anyway.

On the flip side, technically we can await

// the <code> task never throws
// either cancellation errors, or any other errors
let h: Task.Handle<String, Error> = Task.runDetached { <code> } 

await try h.get() // MAY throw Error

We can't really enforce what specific error will be thrown, because throws are not typed. So this is just to enable the throwing codepath and get() throws overload. This is generally fine, we want to allow users to write try Task.checkCancellation() after all in their <code>.

To be really honest, it would be optimal if Swift had panics or "soft faults" which such handle could isolate; I'm not in love with errors for this and embracing a let-it-crash mindset would be nicer, but we have Errors and that's what we use to model these things.


Analysis:

This tradeoff is more about execution and semantics of the get() rather than the handle type shape itself.

Generally, I believe the "eager" case is useful, however one can argue the same way that the "lazy" way is useful too -- perhaps my work handles cancellation nicely and returns a partially best-effort-computed result which I'd like to get then.

I'm not sure if we want to offer both? It would have to be a get() and a getOrThrowIfCancelled() (terrible spelling on purpose, we could figure out some better name).

Open to feedback and what people think about the different semantics.

Thank you for the helpful response @ktoso, that really clarifies things. I haven't really digested the Task API proposed here. Do you expect cancelation to be reliable and widely usable?

It seems that the behavior described above could lead to resource leaks, e.g.:

  let x = malloc(42)
  ...
  await somethingInnocuous()
   ...
  free(x)

However, I suppose the counter argument is that the same issue of non-local exits from scope exists with thrown errors in general, this is the reason we have the try marker, and why we have things like defer.

Does a canceled task correctly execute deferd cleanups from the frame it unwinds through?

-Chris

2 Likes

I'm glad the proposal addresses your questions @Paulo_Faria. :slight_smile:

While we're here I'd like to point out something I'm very excited about and we've been working towards for a while in the server side swift ecosystem.

Extra: Distributed Deadlines

I don't want to side track the discussion too much from the concurrency proposal itself, so I've folded away the details of the distributed deadlines, feel free to read if interested.

It makes use of the instrumentation ecosystem that we worked on with @slashmo in his Swift Tracing Google Summer of Code this year.

:bulb: This is not part of the proposal but will be a fantastic use-case for the deadline and task semantics proposed.

Click here to expand details about Distributed Deadlines, e.g. across HTTP calls etc.

The deadline model proposed for the language here fits in perfectly with distributed computing and best effort deadline based cancellations of distributed requests or calls as well it turns out (!). (The moment you work with actors you realize everything is just a small distributed system :wink:).

With swift-tracing (which we prototyped as summer of code with @slashmo: GitHub - slashmo/gsoc-swift-tracing: WIP collection of Swift libraries enabling Tracing on Swift (Server) systems.) we intentionally designed it to not just be about Tracer types, but also any Instrument.

Now, libraries like Async HTTPClient are going to be instrumented using swift-tracing instruments. These instruments perform the task of:

  • "inject metadata from current baggage context / task context into outgoing request"
  • "extract any known metadata into baggage context / task context from incoming request"

So, we can trivially implement an instrument that carries deadlines:

public struct SwiftDeadlinePropagationInstrument: Instrument {
    public func inject<Carrier, Inject>(
        _ baggage: Baggage, into carrier: inout Carrier, using injector: Inject
    ) async where Inject: Injector, Carrier == Inject.Carrier {
        let deadline = await Task.currentDeadline()
        let deadlineRepr = "\(deadline)"  // TODO: properly serialize rather than just string repr
        injector(deadlineRepr, forKey: "X-Swift-Deadline", into: carrier)
    } // HTTP request gained `X-Swift-Deadline: ...` header

    public func extract<Carrier, Extract>(
        _ carrier: Carrier, into baggage: inout Baggage, using extractor: Extract
    ) async where Extract: Extractor, Carrier == Extract.Carrier {
        if let deadlineRepr = extractor(key: "X-Swift-Deadline", from: carrier),
           let deadline: Deadline = try? parse(deadlineRepr) {
            baggage.deadline = deadline // TODO assign only if earlier than existing deadline
        }
    } // then HTTPClient does Task.withDeadline(baggage.deadline) { ... user code ... }
}

But that's just the internals.

What this means is this:

// client code
let response = await try Task.withDeadline(in: .seconds(2)) { 
  await try http.get("http://example.com/hello")
}

and a server receiving such request:

// some fictional HTTP server:
handle("/hello") { // automatically has deadline set (!)

  // if this were to exceed the deadline it could throw!
  let work = await try makeSomeWork() 

  return response(work)
}

So we're able to best-effort* cancel unnecessary remote work without any extra work! As long as libraries we use are instrumented using Swift Tracing and users opt-into the deadline propagation instrument -- deadline propagation works even across nodes.

This pattern is well known and proven in the Go ecosystem: gRPC and Deadlines | gRPC and we'll be able to adopt it without code-noise relying on the Task acting as our context propagation.

  • (yeah clock drift is a thing, so instruments can add some wiggle room time to the deadlines)
5 Likes

Glad this helps. I'll ponder some more if offering both APIs is viable or not.

Yes, cancellation is useful / reliable and should be used by APIs which want to allow cancellation.

Do note though that it is cooperative we're not forcefully killing tasks (we can't really) so a task still runs, but it can check for cancellation and prevent expensive work from running if the results would not be useful anyway. You can imagine this being tremendously useful in many places:

  • prediction APIs, predict "fast but crappy" and "slow but better" so if users already clicked on something from fast prediction, there's no need to keep processing the slow path
  • general deadlines, no need to even start working on something if it's past a deadline already and the result will be discarded (this spans also multiple nodes, see my collapsed note about distributed deadline in the previous comment)
  • purposefully "racing" tasks -- the typical example is "get first response, cancel all other ones" etc.

Not in any different way than today's Swift.

Please note that cancellation does not introduce new codepaths, it simply allows returning early by checking the Task.isCancelled() flag and then returning some partial result (or placeholder) or throwing.

To your example specifically, the way you wrote it there can not be any "other control flow":

  let x = malloc(42)
  await somethingInnocuous()
  free(x) // ok

The free must and will be invoked -- await does not change the available control flow paths at all. Cancellation can be handled by somethingInnocuous() however it must do so by returning something, not by throwing. (This makes sense for APIs which are "fetch image or return placeholder image ").

If we said that somethingInnocuous() does handle cancellation by throwing the code then becomes:

  let x = malloc(42)
  ...
  await try somethingInnocuous()
   ...
  free(x) // bad

In which case it is exactly "as bad as" like normal Swift code -- the free really should have been in a defer {} or one should handle the errors thrown by await try somethingInnocuous().

So cancellation is not really anything new control flow wise. Async or not, the following snippets are correct and good/safe Swift style:

  let x = malloc(42)
  defer { free(x) } // good
  somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  try somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  await somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  await try somethingInnocuous()

To specifically answer the question asked:

Yes - because cancellation is not introducing any new control flow -- it is just a chance to check for it and throw or return before or during performing some work:

func somethingInnocuous() async throws { 
  defer { print("returning") } // this follows normal Swift rules

  guard await !Task.isCancelled() else { return } // bail out by returning
  // Sidenote, we'll try to see if we can spell this check nicer...

  // or just
  await try Task.checkCancellation() // throws Task.CancellationError if is cancelled
  someVeryHeavyComputing()
}

Hope this helps.

The effect of cancellation within the cancelled task is fully cooperative and synchronous. That is, cancellation has no effect at all unless something checks for cancellation. Conventionally, most functions that check for cancellation report it by throwing CancellationError() ; accordingly, they must be throwing functions, and calls to them must be decorated with some form of try . As a result, cancellation introduces no additional control-flow paths within asynchronous functions; you can always look at a function and see the places where cancellation can occur. As with any other thrown error, defer blocks can be used to clean up effectively after cancellation.

Did you consider a design where tasks are automatically cancelled at each suspension point by default? This is how Scala’s Zio library works and it is a design that has worked well for them. It ensures unnecessary work is skipped when there is no demand for a task’s result.

Zio supports an “uninterruptible” operator for cases where this default is problematic. Given the general design of this proposal for Swift I think we would need to burn a keyword to specify an block as not implicitly cancelled.

If we went in this direction, the implicit cancellation could be behave by throwing a standard library CancellationError at the next suspension point when cancelled, thus making async imply throws and await imply try in all cases (as was suggested elsewhere). If Swift ever supports typed errors, a TaskError protocol would allow async functions to support typed errors while still modeling the cancellation case.

protocol TaskError {
    init(cancelled: CancellationError)
}

Race

I'm hoping one of the authors can comment on whether this code would work as expected:

/// Runs `first` and `second` concurrently, returning the result of the first task to complete.
/// The loser of the race is cancelled.
func race<T>(_ first: () async -> T, _ second: () async -> T) async -> T {
Task.withNursery(resultType: T.self) { nursery in
    nursery.add(first)
    nursery.add(second)

   guard let result = await try nursery.next() else {
        // we only get here if the parent task was cancelled
        throw CancellationError()
   }

   return result

Detached tasks

Similarly, I'm wondering if the following code is a valid implementation technique for cases where an unbounded async operation (such as a network request) should not extend the lifetime of an object, but instead where the lifetime of the object should place an upper bound on the lifetime of the task. If there is a better way to express this semantics in the current proposal I would like to understand it.

protocol Cancellable {
    func cancel()
}
extension Task.Handle: Cancellable {}

class SomeClass {
    private var pendingTasks: [UUID: Cancellable] = [:
    
    deinit { 
        for task in pendingTasks.values {
            task.cancel()
        }
    }

    func startTask(_ task: () async -> String) { 
        let id = UUID()
        let handle = Task.runDetached(task)
        // I was surprised to find that I needed two calls to `Task.runDetached`
        // but am not sure how else to call back to `self` when the task finishes and `self`
        // is still around.
        Task.runDetached { [unowned self] in
             let result = await Result { try handle.get() }

             // assuming `self` is an actor, am I back in the correct context after the previous `await`?
             self.handleResult(result)
        }
        pendingTasks[id] = handle
    }

    func handleTaskResult(_ result: Result<String, Error>) { ... }
}

Cancellation checking

Shouldn't this be await try Task.checkCancellation()? If not, why does isCancelled introduce a suspension point and not checkCancellation?

2 Likes

That's a typo (at least according to current proposal / status quo).

As I discussed somewhere though: We are considering introducing an annotation for "must be called from an async context but will never suspend" which such check is -- it will never switch execution context. Thus really try Task.checkCancellation() is what we'd be able to write here.


The race example you posted seems correct "semantically". Syntax wise you've omitted many await try calls though.

I see. If you do that, would it make sense to also include checkCancellationAsync for the case where a long running computation wants to suspend when it checks cancellation?

Yes, except that cancellation of the parent task wouldn’t cause next to return nil.

Thanks, can you clarify when next would return nil? That part of the API wasn’t clear to me so I was guessing.

When there are no more tasks to await on.

Sorry, I might have missed something. Could someone elaborate on how workload controlling work? With nursery, we add every Task (I guess) via add(:_) method, and then begin to wait for results via next(). Does that meaning the task is not actually run before the first call to next()? Because if I do not call next(), this piece of code

for i in rawVeggies.indices {
  await try nursery.add { 
    (i, veggies[i].chopped())
  }
}

is the same as this piece of code

for i in rawVeggies.indices {
  async let task = { 
    (i, veggies[i].chopped())
  }
  // Store the task and await on it later.
}

Child tasks start running immediately.

Is that to say there is no concurrency control even with the nursery? For example, I can launch thousands of tasks and they immediately run parallelly without being scheduled.