[Concurrency] Structured concurrency

Completely missed it, sorry! Looks like the proposal is exactly what I had in mind.

1 Like

Eagerly (or not) throwing Task.Handle.get() on cancellation

The Task.Handle(.get()) design intentionally is always throwing.

There are two ways to design this API and they differ in execution semantics in face of cancellation:

  • eagerly throwing handles on cancellation -- as future libraries would,
  • lazily throwing handles on cancellation -- only if the task code itself decides to handle the cancellation.

We take the first approach. The exact semantics differ in this way:

Current semantics from proposal: Eagerly throwing task handles

Observation: unlike structured concurrency, task handles give up control of the structure of tasks, and things can be cancelled on the specific task object by anyone (other tasks/actors/threads). Such cancellation is not possible with the structured concurrency APIs or Task in general, one always has be be a task to set cancellation (or deadlines).

As such, handles are always "external world" to the normal case; Consider the following example:

// Existing and proposed semantics:
// struct Handle<Success> { 
//     func get() async throws -> Success { ... }
// }

let h: Task.Handle<String> = Task.runDetached { <code> } 

{ // (1) some other actor "B" did get the handle and performs:
    h.cancel()
    // sets the `isCancelled` flag in the Task that h points to.
}

await try h.get() // (2) throws Task.CancellationError, 
// regardless of <code> checking for cancellation or not

// --------------------
// The <code> may not ever be checking for cancellation and run to completion
// anyway.

Here, we showcase that cancellation is "eager" with handles, this is because we can stop wasting time waiting for a result we intentionally cancelled. This is useful in being able to minimize useless scheduling, waiting, and generally more work with a task that is intentionally, externally, cancelled.

For what it's worth since Task.Handle is really like a future; this is also semantics that Futures would exhibit in any other implementation I'm aware of. A cancelled future will not accumulate callbacks, and eagerly rejects them, results are dropped even if produced.

Note that such best effort cancellation but don't get the result anyway is also how all-for-one failure works in nurseries. We of course always set isCancelled = true, however even if tasks don't check for it -- we would not get their results. So the concept of attempt to cancel, but it may still run is not inherently tied to a task throwing eagerly or not -- both this API and nurseries work the same semantically about it.

Purely re-throwing Task.Handles

What Chris alludes to can be expressed, however it has different tradeoffs.

// struct Handle<Success, Failure: Error> { 
//     func get() async throws -> Success { ... }
// }
// extension Handle where Failure == Never { 
//     func get() async -> Success { ... }
// }

We had this API shape at some point as well. The difference goes beyond just the API looks though, it changes execution semantics.

Note that the Failure is never really used in the API except for knowing if it's a never or not.

Consider the above example, but written in the 2) style:

// the <code> task never throws
// either cancellation errors, or any other errors
let h: Task.Handle<String, Never> = Task.runDetached { <code> } 

{ 
    h.cancel()
}

await h.get() // has to wait until <code> completes
// cancellation is not visible to the handle; 
// multiple awaits can keep being enqueued to the task until it completes.

In a way, this means that we have unbounded growth of the number of waiters on this already cancelled task handle. This is somewhat sub-optimal; we are working with handles which are about "outside world control" and we absolutely know this was intentionally cancelled and don't want the result anymore; yet we will get the results anyway.

On the flip side, technically we can await

// the <code> task never throws
// either cancellation errors, or any other errors
let h: Task.Handle<String, Error> = Task.runDetached { <code> } 

await try h.get() // MAY throw Error

We can't really enforce what specific error will be thrown, because throws are not typed. So this is just to enable the throwing codepath and get() throws overload. This is generally fine, we want to allow users to write try Task.checkCancellation() after all in their <code>.

To be really honest, it would be optimal if Swift had panics or "soft faults" which such handle could isolate; I'm not in love with errors for this and embracing a let-it-crash mindset would be nicer, but we have Errors and that's what we use to model these things.


Analysis:

This tradeoff is more about execution and semantics of the get() rather than the handle type shape itself.

Generally, I believe the "eager" case is useful, however one can argue the same way that the "lazy" way is useful too -- perhaps my work handles cancellation nicely and returns a partially best-effort-computed result which I'd like to get then.

I'm not sure if we want to offer both? It would have to be a get() and a getOrThrowIfCancelled() (terrible spelling on purpose, we could figure out some better name).

Open to feedback and what people think about the different semantics.

Thank you for the helpful response @ktoso, that really clarifies things. I haven't really digested the Task API proposed here. Do you expect cancelation to be reliable and widely usable?

It seems that the behavior described above could lead to resource leaks, e.g.:

  let x = malloc(42)
  ...
  await somethingInnocuous()
   ...
  free(x)

However, I suppose the counter argument is that the same issue of non-local exits from scope exists with thrown errors in general, this is the reason we have the try marker, and why we have things like defer.

Does a canceled task correctly execute deferd cleanups from the frame it unwinds through?

-Chris

2 Likes

I'm glad the proposal addresses your questions @Paulo_Faria. :slight_smile:

While we're here I'd like to point out something I'm very excited about and we've been working towards for a while in the server side swift ecosystem.

Extra: Distributed Deadlines

I don't want to side track the discussion too much from the concurrency proposal itself, so I've folded away the details of the distributed deadlines, feel free to read if interested.

It makes use of the instrumentation ecosystem that we worked on with @slashmo in his Swift Tracing Google Summer of Code this year.

:bulb: This is not part of the proposal but will be a fantastic use-case for the deadline and task semantics proposed.

Click here to expand details about Distributed Deadlines, e.g. across HTTP calls etc.

The deadline model proposed for the language here fits in perfectly with distributed computing and best effort deadline based cancellations of distributed requests or calls as well it turns out (!). (The moment you work with actors you realize everything is just a small distributed system :wink:).

With swift-tracing (which we prototyped as summer of code with @slashmo: https://github.com/slashmo/gsoc-swift-tracing) we intentionally designed it to not just be about Tracer types, but also any Instrument.

Now, libraries like Async HTTPClient are going to be instrumented using swift-tracing instruments. These instruments perform the task of:

  • "inject metadata from current baggage context / task context into outgoing request"
  • "extract any known metadata into baggage context / task context from incoming request"

So, we can trivially implement an instrument that carries deadlines:

public struct SwiftDeadlinePropagationInstrument: Instrument {
    public func inject<Carrier, Inject>(
        _ baggage: Baggage, into carrier: inout Carrier, using injector: Inject
    ) async where Inject: Injector, Carrier == Inject.Carrier {
        let deadline = await Task.currentDeadline()
        let deadlineRepr = "\(deadline)"  // TODO: properly serialize rather than just string repr
        injector(deadlineRepr, forKey: "X-Swift-Deadline", into: carrier)
    } // HTTP request gained `X-Swift-Deadline: ...` header

    public func extract<Carrier, Extract>(
        _ carrier: Carrier, into baggage: inout Baggage, using extractor: Extract
    ) async where Extract: Extractor, Carrier == Extract.Carrier {
        if let deadlineRepr = extractor(key: "X-Swift-Deadline", from: carrier),
           let deadline: Deadline = try? parse(deadlineRepr) {
            baggage.deadline = deadline // TODO assign only if earlier than existing deadline
        }
    } // then HTTPClient does Task.withDeadline(baggage.deadline) { ... user code ... }
}

But that's just the internals.

What this means is this:

// client code
let response = await try Task.withDeadline(in: .seconds(2)) { 
  await try http.get("http://example.com/hello")
}

and a server receiving such request:

// some fictional HTTP server:
handle("/hello") { // automatically has deadline set (!)

  // if this were to exceed the deadline it could throw!
  let work = await try makeSomeWork() 

  return response(work)
}

So we're able to best-effort* cancel unnecessary remote work without any extra work! As long as libraries we use are instrumented using Swift Tracing and users opt-into the deadline propagation instrument -- deadline propagation works even across nodes.

This pattern is well known and proven in the Go ecosystem: https://grpc.io/blog/deadlines/ and we'll be able to adopt it without code-noise relying on the Task acting as our context propagation.

  • (yeah clock drift is a thing, so instruments can add some wiggle room time to the deadlines)
5 Likes

Glad this helps. I'll ponder some more if offering both APIs is viable or not.

Yes, cancellation is useful / reliable and should be used by APIs which want to allow cancellation.

Do note though that it is cooperative we're not forcefully killing tasks (we can't really) so a task still runs, but it can check for cancellation and prevent expensive work from running if the results would not be useful anyway. You can imagine this being tremendously useful in many places:

  • prediction APIs, predict "fast but crappy" and "slow but better" so if users already clicked on something from fast prediction, there's no need to keep processing the slow path
  • general deadlines, no need to even start working on something if it's past a deadline already and the result will be discarded (this spans also multiple nodes, see my collapsed note about distributed deadline in the previous comment)
  • purposefully "racing" tasks -- the typical example is "get first response, cancel all other ones" etc.

Not in any different way than today's Swift.

Please note that cancellation does not introduce new codepaths, it simply allows returning early by checking the Task.isCancelled() flag and then returning some partial result (or placeholder) or throwing.

To your example specifically, the way you wrote it there can not be any "other control flow":

  let x = malloc(42)
  await somethingInnocuous()
  free(x) // ok

The free must and will be invoked -- await does not change the available control flow paths at all. Cancellation can be handled by somethingInnocuous() however it must do so by returning something, not by throwing. (This makes sense for APIs which are "fetch image or return placeholder image ").

If we said that somethingInnocuous() does handle cancellation by throwing the code then becomes:

  let x = malloc(42)
  ...
  await try somethingInnocuous()
   ...
  free(x) // bad

In which case it is exactly "as bad as" like normal Swift code -- the free really should have been in a defer {} or one should handle the errors thrown by await try somethingInnocuous().

So cancellation is not really anything new control flow wise. Async or not, the following snippets are correct and good/safe Swift style:

  let x = malloc(42)
  defer { free(x) } // good
  somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  try somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  await somethingInnocuous()
  let x = malloc(42)
  defer { free(x) } // still good
  await try somethingInnocuous()

To specifically answer the question asked:

Yes - because cancellation is not introducing any new control flow -- it is just a chance to check for it and throw or return before or during performing some work:

func somethingInnocuous() async throws { 
  defer { print("returning") } // this follows normal Swift rules

  guard await !Task.isCancelled() else { return } // bail out by returning
  // Sidenote, we'll try to see if we can spell this check nicer...

  // or just
  await try Task.checkCancellation() // throws Task.CancellationError if is cancelled
  someVeryHeavyComputing()
}

Hope this helps.

The effect of cancellation within the cancelled task is fully cooperative and synchronous. That is, cancellation has no effect at all unless something checks for cancellation. Conventionally, most functions that check for cancellation report it by throwing CancellationError() ; accordingly, they must be throwing functions, and calls to them must be decorated with some form of try . As a result, cancellation introduces no additional control-flow paths within asynchronous functions; you can always look at a function and see the places where cancellation can occur. As with any other thrown error, defer blocks can be used to clean up effectively after cancellation.

Did you consider a design where tasks are automatically cancelled at each suspension point by default? This is how Scala’s Zio library works and it is a design that has worked well for them. It ensures unnecessary work is skipped when there is no demand for a task’s result.

Zio supports an “uninterruptible” operator for cases where this default is problematic. Given the general design of this proposal for Swift I think we would need to burn a keyword to specify an block as not implicitly cancelled.

If we went in this direction, the implicit cancellation could be behave by throwing a standard library CancellationError at the next suspension point when cancelled, thus making async imply throws and await imply try in all cases (as was suggested elsewhere). If Swift ever supports typed errors, a TaskError protocol would allow async functions to support typed errors while still modeling the cancellation case.

protocol TaskError {
    init(cancelled: CancellationError)
}

Race

I'm hoping one of the authors can comment on whether this code would work as expected:

/// Runs `first` and `second` concurrently, returning the result of the first task to complete.
/// The loser of the race is cancelled.
func race<T>(_ first: () async -> T, _ second: () async -> T) async -> T {
Task.withNursery(resultType: T.self) { nursery in
    nursery.add(first)
    nursery.add(second)

   guard let result = await try nursery.next() else {
        // we only get here if the parent task was cancelled
        throw CancellationError()
   }

   return result

Detached tasks

Similarly, I'm wondering if the following code is a valid implementation technique for cases where an unbounded async operation (such as a network request) should not extend the lifetime of an object, but instead where the lifetime of the object should place an upper bound on the lifetime of the task. If there is a better way to express this semantics in the current proposal I would like to understand it.

protocol Cancellable {
    func cancel()
}
extension Task.Handle: Cancellable {}

class SomeClass {
    private var pendingTasks: [UUID: Cancellable] = [:
    
    deinit { 
        for task in pendingTasks.values {
            task.cancel()
        }
    }

    func startTask(_ task: () async -> String) { 
        let id = UUID()
        let handle = Task.runDetached(task)
        // I was surprised to find that I needed two calls to `Task.runDetached`
        // but am not sure how else to call back to `self` when the task finishes and `self`
        // is still around.
        Task.runDetached { [unowned self] in
             let result = await Result { try handle.get() }

             // assuming `self` is an actor, am I back in the correct context after the previous `await`?
             self.handleResult(result)
        }
        pendingTasks[id] = handle
    }

    func handleTaskResult(_ result: Result<String, Error>) { ... }
}

Cancellation checking

Shouldn't this be await try Task.checkCancellation()? If not, why does isCancelled introduce a suspension point and not checkCancellation?

2 Likes

That's a typo (at least according to current proposal / status quo).

As I discussed somewhere though: We are considering introducing an annotation for "must be called from an async context but will never suspend" which such check is -- it will never switch execution context. Thus really try Task.checkCancellation() is what we'd be able to write here.


The race example you posted seems correct "semantically". Syntax wise you've omitted many await try calls though.

I see. If you do that, would it make sense to also include checkCancellationAsync for the case where a long running computation wants to suspend when it checks cancellation?

Yes, except that cancellation of the parent task wouldn’t cause next to return nil.

Thanks, can you clarify when next would return nil? That part of the API wasn’t clear to me so I was guessing.

When there are no more tasks to await on.

Sorry, I might have missed something. Could someone elaborate on how workload controlling work? With nursery, we add every Task (I guess) via add(:_) method, and then begin to wait for results via next(). Does that meaning the task is not actually run before the first call to next()? Because if I do not call next(), this piece of code

for i in rawVeggies.indices {
  await try nursery.add { 
    (i, veggies[i].chopped())
  }
}

is the same as this piece of code

for i in rawVeggies.indices {
  async let task = { 
    (i, veggies[i].chopped())
  }
  // Store the task and await on it later.
}

Child tasks start running immediately.

Is that to say there is no concurrency control even with the nursery? For example, I can launch thousands of tasks and they immediately run parallelly without being scheduled.

There is a goal that creating a child task via a nursery will be somewhat responsive to system load, which is why it can suspend. We’ll have to see how successful that is.

2 Likes

I'm hoping that most of the needs for nurseries can be captured in standard library functions, like concurrent variants of forEach, map, filter, reduce, etc.

I imagine these would cover most of users' needs, so they won't need to use nurseries directly.

5 Likes

Can't you accomplish the same exact thing using Combine?

I.e.:

func chopVegetables() -> AnyPublisher<[Vegetable], VegetableChoppingError>

etc.

Why do we need yet another API that does the same thing? I noticed your API does not strongly type errors like Combine does, which is something that we'd ideally want. (I guess it could be accomplished using the Result type like:

func chopVegetables() async -> Result<[Vegetable], VegetableChoppingError>

But I am curious why you think we need a new API for async rather than just using Combine or equivalent things like ReactiveSwift, RxSwift etc.

PS—Maybe an alternate proposal would be, "Open source Combine and add it to Swift." :D

While frameworks such as Rx and constructs such as futures can be used to cover a common subset of scenarios, they have different design goals.

Rx, and Combine, provide a way to react to streams of events. They are not task-oriented. Futures, and parts of the concurrency proposals, are intended to deal with tasks. Inherently one-off processes that return a result. Yes, the process may be initiated multiple times, but the results of each are independent. They don't form a time-dependent stream to which some handler will react to each event in turn.

Of course you have one-off streams, such as Just, and you have dependent operations in a task framework. And that increases the subset of scenarios where either could be used. However, the goals and focus are still distinct.

7 Likes

A lot of great stuff in this proposal. There is a lot to digest, so I think for now I'll focus on questions and comments about Deadline and leave some other things for later.

  • The proposal uses a comparison on a Deadline (Task.currentDeadline().remaining > duration), but it's not clear what protocols Deadline actually conforms with (or what it's full API is).
  • One of the challenges with deadlines as a timeout in practice is dealing with the different ideas of how clocks advance. Sometimes the desire is monotonically increasing, sometimes it is wall clock time (which of course can change). Sometimes it stops when a computer is asleep, sometimes it does not. Given the ambition of Swift to be appropriate for low level and high level programming, how will this stdlib deadline type allow for expression of these different kinds of times?
  • Providing API like .seconds(X) or .minutes(Y) is usually great, but we have to be careful not to go too far up this scale -- .days(Z) could be actively wrong in the case of daylight saving transitions.
  • On Darwin, the currency types for the concepts in the proposal are Date and TimeInterval. The proposal even shows an API which takes a Duration as an argument, but this is not a value that a developer will likely have on hand. It would have to be converted from one of the other time types we have, which is unfortunate.
  • Are any math operations available on Deadline, like addition or subtraction? Do we have the right numeric protocols already to express a timeline without also allowing for a potentially nonsense operation like multiplying two deadlines?
  • We went through an effort a few years ago to make sure all time-related API in the Darwin SDKs came with a tolerance parameter. It is really important to get additional context from a developer about how critical/exact the time value needs to be. This allows the scheduler to group timeouts together and wake up the CPU less frequently.
10 Likes

Also, if I wanted to unit test the behaviour of my code when a deadline is passed, how do I do that?

5 Likes
Terms of Service

Privacy Policy

Cookie Policy