[Pitch #2] Structured Concurrency

ktoso · January 9, 2021, 12:22pm

Douglas_Gregor:

This has been bothering me a lot with the Task design. Continuation is but one example where we are inside synchronous code but we might want to check whether the task we're running in has been cancelled. Other cases might be, e.g., some long-running synchronous operation where we might want to be able to check for cancellation:
func updateRecords(_ records: [Record], isCancelled: () -> Bool) {
  for record in records {
    if isCancelled() { return }
    record.update(...)
  }
}
That's just not possible in the design here, because you can only check cancellation if you have a Task.Handle (you won't for child tasks in the design we have) or you are in asynchronous code. That seems like a big hole to me.

Absolutely agreed we need a way to support these.

But I don't fully agree that "Task can be a value" solves this automatically all the way. We cannot just rely on "please pass the Task object to sync functions" since that only works if one controls the full invocation chain to be able to pass such Task value, and often one may not be able to do this), more on this below.

So the TL;DR; is that all comes back to the need of static func unsafeCurrent() -> UnsafeCurrentTask?, but let's dive in and explore why.

To be able to implement the simple read operations on tasks we have two options:

implement as async functions and thus be guaranteed to have a Task
implement as sync functions, using unsafeCurrent() and have good default values for the read values

What I mean by the second point is, we could implement isCanceled as follows:

extension Task { 
  static var isCanceled: Bool { // NOT ASYNC!
    Task.unsafeCurrent()?.isCanceled ?? false
  }
}

Where unsafeCurrent() is func unsafeCurrent() -> UnsafeCurrentTask? which contains all read and write operations, but one must be super careful to never invoke those from any other task than that current task (thus the Unsafe prefix; I believe @John_McCall was okey with this spelling of it).

So this impl makes a lot of sense for isCanceled specifically: if we're not in a task, there's no cancellation to be set, so we should never react to it, so the default is to return false. Thanks to this rather than static func isCanceled() async -> Bool we could express it as static var isCanceled: Bool and implement it correctly.

Deadlines in the future can also be implemented like this would be "infinite" if no task is available, so we'd be able to keep that API consistent in the future:

extension Task { 
  static var deadline: Task.Deadline { 
    Task.unsafeCurrent()?.deadline ?? Task.Deadline.never
  }
}

Thus usages of that specific call become non awaiting and usable from non-async functions:

if Task.isCanceled { ... }

The same about checkCancellation():

extension Task { 
  static func checkCancellation() throws { 
    if self.isCanceled { throw ... } 
  }
}

If we wanted to also offer the task as a value, this is fine, and we can offer:

extension Task { 
  var isCanceled: Bool { ... }
}

In general I think it's nice to allow current but I don't think it necessarily means removing the static functions to be honest... Specifically since the number of APIs that are okey as member functions on Task are less than the ones that work on the current task implicitly (analysis below).

All discussions about something in a async functions all rely on the existence of unsafeCurrent, which relies on the ABI of setting tasks into a thread-local by all swift operations which I believe is our plan anyway right?

Long story short, to properly solve this, we'll need unsafeCurrent IMHO, and it also solves task local values, cancelation, deadlines and general access to a task from a non-async function. Coming back to your original example, it'll look like this:

// func updateRecords(_ records: [Record], isCancelled: () -> Bool) {
//  for record in records {
//    if isCancelled() { return }
//    record.update(...)
//  }
// }

func updateRecords(_ records: [Record]) {
  for record in records {
    if Task.isCanceled() { return }
    record.update(...)
  }
}

Okey let's talk about the Task as value then...

I think that's fine but it is not a complete answer to the problem at hand -- saying that out loud right away, just to not get caught up that "that's it" At least, not without the unsafeCurrent() version available as well.

Douglas_Gregor:

You would be able to get the the task instance from the currently-running asynchronous function:
struct Task {
  static func current() async -> Task
}
but that would effectively be the only "instantaneous" async operation one would deal with. (emphasis mine - ktoso)

I want to confirm what you mean here. Is it:

(1) to apply compiler magic that Task.current() needs no await
- rather than introduce the @instantaneous allowed to be put on random functions
or (2) that thanks to moving things to instance functions, it becomes less terrible to look at,
- and the Task can be passed to some sync function, and that sync function can then also check cancellation on the task via the instance function.

(1) helps a bit... since indeed we can then write:

if Task.current().isCancelled { ... } // (1) because "current" is magical / instant

but (2) does not really solve things; So I want to confirm if we mean (1) or (2) here.

Specifically, (2) breaks down if something is being called "through frameworks" and those happen to be a sync function; And I'd want to be able to check if I'm cancelled or not, but the framework invoked me through a sync function... So I'm stuck, and I cannot fix that other framework because I don't own it.

For completeness, option (3) is the introduction of the attribute and have it available to others...

(3) introduce @instantaneous (or other spelling) and allow async functions to implement this.

Alternatively...

Consider basing those APIs a lot around synchronous and unsafeCurrent based implementations, just as I've shown above with the isCanceled. I think we can do great with this and propose a complete set of functions for the API a little bit below.

Okey, so by this reasoning a task local's bind operation withLocal is the same category of function -- as it needs to be on a task, and it needs to maintain structure, so it remains: static func withLocal(...).

However, a task local's read operation local(\.key) can be successfully implemented as not async if we had Task.unsafeCurrent. This is because we can return the TaskLocalKey.defaultValue() if no task is available, or a task is available but the value is not set.

So by inspecting this rule...

"APIs that can only make sense on the current task because they need to maintain structure [...] would stay as static async methods"

So we'd arrive at:

extension Task { 
 static func withLocal<Key, BodyResult>(
   _ key: KeyPath<TaskLocalValues, Key>,
   boundTo value: Key.Value,
   body: @escaping () async -> BodyResult
 ) async -> BodyResult { ... } // `reasync` even, if it existed
}

extension Task { 
 static func local<Key>(_ keyPath: KeyPath<TaskLocalValues, Key>)
   -> Key.Value where Key: TaskLocalKey { ... }

Note that it is not async, similar to isCanceled because we can implement it correctly with Task.unsafeCurrent() -- we return Key.defaultValue() if either no task is available, or the key is not bound.

Both remain static... I think this abides to the rule you just formulated there, because they really only must be invoked on "current", and may never be as instance functions on task from another task, because of how they strongly depend on the structure of task lifetimes (and we don't want synchronization on those accesses either).

Okey... so summing up all those thoughts–and riding on the existence of unsafeCurrent–I think we could provide a great consistent API that ends up like this, @Douglas_Gregor:

Task :: static async functions

Getting the task instance

static func current() async -> Task
- Would this one then get the magic @instantaneous logic? If yes then I'm on board with this (and getting unsafeCurrent)

Task local value binding

static func withLocal(_:boundTo:body:) async rethrows -> BodyResult

Creating tasks

static func withGroup(...) async (throws) -> GroupResult

Cancellation handlers since these strongly relate to structure of tasks too

static func withCancellationHandler(handler:operation:) async rethrows

Voluntary Suspension

statif func yield() async
any other "wait 1 second" or similar if we wanted those

Task :: static sync functions

Getting unsafe task, must not be shared nor accessed from any other task; returns nil if no task in current scope (e.g. we're called from some random pthread)

static var unsafeCurrent: UnsafeCurrentTask { get }

Querying cancelation, regardless if in a task or not

static var isCanceled: Bool { get }
- implementation as explained above
static func checkCancellation() throws
static var priority: Task.Priority? { Task.unsafeCurrent?.priority }
- this one is a bit weird, since we do not have a good default to use for when not in a Task I think... so we can either:
  - not have this function at all as a static, and only have it on instances
  - have this static as well, but it is optional here...
- I'd be happy with not having this at all.

Note that these isCanceled static versions are NOT async and this is on purpose to allow checking from synchronous loops after all.

Creating detached tasks

static func runDetached(...) -> Task.Handle

Task local reading is synchronous, and returns default value if key not bound, or no task is available

func local(_:) -> Key.Value // can return Key.defaultValue

Task :: instance functions

var isCanceled: Bool { get }
func checkCancellation() throws
var priority: Task.Priority { get }
in the future also deadline here fits well.

Those are fine and possible to implement by access also from other tasks so they can be on the instance -- task locals cannot, so they remain as static.

Task locals MUST NOT be accessible through this API, as a Task could be passed around and accessed by other tasks, that's not safe.

UnsafeCurrentTask :: instance functions

This must only be invoked from the current task, passing it around, storing it, or any access from another task or thread is illegal and may crash hard.

var isCanceled: Bool { get }
func checkCancellation() throws
var priority: Task.Priority { get }
in the future also deadline here fits well.
func local(_:) -> Key.Value -- this allows for task locals outside of async functions, excellent.

This seems pretty good to me... but all these API shapes ride on the existence of unsafeCurrent... hopefully we could get that? as the sync/async and static/instance placement of functions look pretty good (assuming the above outline is what you had in mind).

Thanks for digging into this Doug,

// PS: Yeah I'm spelling it as isCanceled since we got some feedback for consistence that we should move to this... pending a PR to adjust it should happen soon I guess. Long story, but we reached out to confirm what spellings APIs should use.

PPS: I'm not sure if it should be static func isCanceled() -> Bool or static var isCanceled: Bool -- either way it is O(1) however (!) it does imply hitting a thread-local to get access to the Task... so it's a bit heavier... Perhaps it makes sense to keep those as ()?