[Pitch] Task Cancellation Shields

I'll use Task.isCancelled to refer to the static property, task.isCancelled for the instance property.

Unless I'm mistaken, the intention is for task.isCancelled to always be true once the task has been cancelled.
I think that's the only reasonable behavior, the instance property should always reflect if the task was cancelled or not and not depend on cancellation shields at all.

  • We always need to be able to test if a specific task was irrevocably cancelled, which we can do with task.isCancelled.

  • We always need to be able to determine if we should take the non-cancelled or cancelled execution paths in the current task which Task.isCancelled tells us.

These are different properties with different purposes and mixing their behavior would only make things more confusing in my opinion.

I think the core issue is that these are, while related, very different properties but as they share the same name, it's easy to expect them to behave the same even though they do not and are not interchangeable.

Without cancellation shields, you could use Task.isCancelled and currentTask.isCancelled interchangeably (but probably never should have) but that would no longer be the case with cancellation shields in the mix.

I don't think this is a huge issue in practice, I don't expect there is much use of withUnsafeCurrentTask to get the current task or passing the task handle into the task via a separate reference just to use task.isCancelled when you could just have used Task.isCancelled in the first place.

But it's something that needs to be very clear in the documentation if this proposal is accepted to state very explicitly that these properties are not interchangeable.
In retrospect, these properties should perhaps have had different names to make them more distinctive and avoid this source of confusion.
Renaming and deprecating one of them is perhaps an option but I'm not sure if going that far is warranted, I still think it's unlikely to be misused in practice.


Why `task.isCancelled` taking cancellation shields into account is IMO not a good option

Changing task.isCancelled to return the same cancellation-shield aware value as Task.isCancelled would IMHO be a bad idea for the following reasons:

  • We would still need to add a new way to check if the task has been cancelled (ignoring cancellation shields).

  • Within a task task.isCancelled would return the same value as Task.isCancelled, while nicely symmetric it would ultimately be redundant.

  • task.isCancelled would be mostly meaningless outside of the task as we would be observing how the task moves in and out of cancellation shields. I don't really see a use for that capability in the first place and it would not really be useful as it would be stale immediately anyway.

  • Alternatively, task.isCancelled could return Task.isCancelled when used for the current task but the cancellation status ignoring cancellation shields otherwise. But now the instance property becomes context sensitive which is IMO just as confusing.

2 Likes

Yes! In my mind the choice to use the parent for querying the isCancelled state is an indication that you are interested in the real, actual cancellation state of that specific Task. OTOH, Task.isCancelled is a question about the 'current execution environment', which someone up the stack might have decided to 'mask' with a cancellation shield. (Put another way, if you manage to have a handle to your own actual Task object, you are in some sense empowered to inspect the 'real' state of that task; when your only access is via the current execution environment, your caller in some sense 'owns' that environment and is empowered to present any view it wants to you.)

  • task.isCancelled would be mostly meaningless outside of the task as we would be observing how the task moves in and out of cancellation shields. I don't really see a use for that capability in the first place and it would not really be useful as it would be stale immediately anyway.

If we had task.isCancelled behave 'the same' as Task.isCancelled (which I don't think I'd want), I would expect 'the same' to be a structural relationship rather than a dynamic one. I.e. if task is not the current task, then task.isCancelled always reflects the actual cancellation state regardless of the current state of task's execution (inside or outside a cancellation shield). OTOH if task is the current task and is called within a cancellation shield, then it will reflect false just like Task.isCancelled.

I am... somewhat ambivalent about whether UnsafeCurrentTask ought to reflect the Task instance state or the execution environment state. On the one hand, it is a separate type (rather than generating a Task handle) and so I don't think it needs to match the Task instance's isCancelled value; and it is inherently a query about the current execution environment, so I think it's perfectly justifiable to have it match the statu Task.isCancelled value (i.e., respect shields). OTOH, it presents an instance-based API, and so it almost feels like you have a proper Task handle, so it might be surprising if the value differed.

Either way, the API says "unsafe" and I think there's justifiable reasons for either choice—as long as its documented I don't have many qualms whichever we pick.

1 Like

i like this characterization, and i think it may finally offer me the 'out' to much of my discomfort with the proposed API that i've been looking for – namely that the isCancelled queries could sometimes be 'lying'. but if the model is that there's really 2 'things' to be interrogated – a specific Task handle, and the 'ambient environment'/'execution context' – then i think this tension can be resolved. my intuition is that i'd prefer all the instance-based queries to return the 'true' cancellation state value, and the static API to allow the contextual overrides, though i'll have to think about it more.

1 Like

My intuition was the same in that it'd return true in both cases. And I even agree it's what you want most of the time. But then it becomes an even more oddly shaped "shield" — it doesn't protect the code from all the effects of cancellation, nor from cancellation propagating upwards: just from the effects of cancellation as observed through one of the two (currently identical) APIs to query task cancellation status (but not the other), and from task propagation to child tasks.

I've always been under the impression that Task.isCancelled is just a more convenient way of calling task.isCancelled without needing to do the implicitly unrwapped reference dance @jamieQ did in his post:

var task: Task<Void, Never>!
task = Task { 
    // I can use task here...
}

I could be wrong, of course, but SE-0304 had this to say about Task.isCancelled (emphasis mine):

It is possible to query for cancellation from within a synchronous task, e.g. while iterating over a loop and wanting to check if we should abort its execution by using the static Task.isCancelled property:

[...]

This works the same as its instance counterpart, except that if invoked from a context that has no Task available, e.g. if invoked from outside of Swift's concurrency model (e.g. directly from a pthread) a default value is returned.

Regardless of whether it was the original intent or not, no amount of documentation now is going to make developers think of Task.isCancelled and task.isCancelled as two completely different things.

I agree that if one thinks of Task.isCancelled as some sort of "environmental" or "contextual" property about cancellation, rather than as a shortcut to a specific task instance's isCancelled property, it does simplify the mental model immensely.

But for as long as they share the name, I can't shake the feeling that the API is now 'lying' to me to make shields work.

Pragmatically, if the goal of task cancellation shields is to be able to ensure cleanup always succeeds regardless of task cancellation state, UnsafeCurrentTask ought to return the "shielded" cancellation state. Otherwise, the cleanup code wouldn't be protected if it —for whatever reason— was checking for withUnsafeCurrentTask { $0.isCancelled } instead of Task.isCancelled.

1 Like

My mental model is that by grabbing the Task handle itself and poking at it from within the task itself you are inherently taking a 'top down' or 'external' view of the task, the same as the task owner could. After all, (ignoring unsafe options) a task cannot get at its own handle without the cooperation of the task owner. Yes this, may result in some oddities, but such oddities are a result of the inherently recursive nature of a Task inspecting itself.

I don't think there is any way to get around the fact that a cancellation shield does necessarily entail some amount of 'lying'—after all, as has been explained elsewhere in the thread, the Task is really, actually cancelled, and yet Task.isCancelled returns false.

I think the question of "who owns the execution environment of downstack code" is an important part of my mental model here, and I think the scope-based, caller-owning answer that this proposal presents makes sense. Consider the most-reduced version of the scenario that is causing consternation:

// @MainActor
// A
var parent: Task!
let task = Task { @MainActor in
  // B
  withTaskCancellationShield {
    // C
    parent.isCancelled // ?
    Task.isCancelled // false
  } 
}
parent = task
parent.cancel()

The concern as I understant it is that the scope at C might be confused if parent.isCancelled returned false and Task.isCancelled returned true, even though C 'knows' that parent is the current task itself. C only 'knows' this due to cooperation by virtue of close cooperation between A and C to install the current task is installed in parent and left there. But we have scope B as an intermediary with A and C! And B could do anything to violate the contract between A and C. It could install a cancellation shield (as it does here), or it could decide to spawn an entirely new task and run C within it—which would (potentially) result in the same observable behavior where Task.isCancelled reports false though parent.isCancelled reports true.

So, either A and C must also be in cahoots with B in order to preserve the task structure they desire, or else A and C do not control B in which case assuming that parent.isCancelled == Task.isCancelled is a fundamentally flawed assumption.

2 Likes

After following this thread it’s still not totally clear to me why protected cleanup can’t be done in a child task, and we provide some new mechanism for shielding an entire task from cancellation propagation. IIUC if this were possible it would solve all of these misgivings about having API that “lies”. I think that this post from @ktoso is the one that speaks most directly to the question, but some parts of the answer weren’t clear to me. You said that using a child task “breaks ordering”, which I understand to mean that a normally spawned child task will not synchronously start at its point of creation. You do mention Task.immediate later in the post, but I couldn’t tell what your stance was on that as the answer to the ordering issue. We’ve already blessed the idea of providing task creation APIs that have this special behavior of starting synchronously, so why is that approach not a viable answer to the ordering problem here? I think that not all cleanup code would require this property - surely sometimes the only requirement is that it runs eventually, in which case a child task would be fine. Am I missing something?

1 Like

Thinking about this a bit more I guess this would make it impossible for a synchronous function to protect its cleanup - is that perhaps then the primary reason why a child task approach can’t work?

In my opinion, there are three reasons why requiring a child task to do cleanup is not going to work:

  1. The resource might be non-Sendable or not disconnected so you can't send it into a child task to run the clean up
  2. Synchronous functions should also be allowed to protect sections from being cancelled
  3. Child tasks incur overhead which isn't really needed here

Hm. That's a good model, thanks. I agree that it'd make the most sense to behave that way.

I see that the API ‘lying’ is a logical consequence of introducing task cancellation shields. But is there no other solution to the problem of task cancellation in cleanup code? Or, putting it another way, if keeping the semantics of Task.isCancelled the same was a requirement, would this be an unsolvable problem?

If the promise of Task.immediate is that it will start a task synchronously up to the first suspension point, it follows that if you provide it a closure with no suspension points, the entire task will be executed synchronously.

So if I’m not missing anything, synchronous code that needs to protect its cleanup from cancellation can already do this today, and it’d be correct as long as there’s no suspension point in the operation[1]:

// Synchronous context…
Task.immediate {
    assert(!Task.isCancelled())
    resource.cleanup() 
}

In fairness, it would be extremely easy for someone to make cleanup() an async function in the future, add an await inside the Task.immediate, and fail to realize that doing so introduces an ordering issue.

I can't think of a way to make it less fragile though.

Well...

If only the compiler could “tell” that calling the immediate task’s value isn’t really an asynchronous operation when the closure provided is synchronous[1:1], you could do:

// Synchronous context…
Task.immediate {
    assert(!Task.isCancelled())
    resource.cleanup() 
}.value

And be protected from someone making cleanup() asynchronous in the future.

Sadly that’s not possible, because immediate returns a Task<Success, Failure>, so the information of whether the operation was async or not is lost by the time you call .value.


  1. And it's in the right isolation domain. ↩︎ ↩︎

1 Like

presumably if you wanted to do this (maybe some already are), you'd expose a wrapper that only allows a synchronous parameter. maybe something like this:

func performSyncCleanup<Success, Failure>(
  body: () throws(Failure) -> Success
) throws(Failure) -> Success {
  let result = withoutActuallyEscaping(body) { body in
    // Have to use a makeshift 'out' parameter because we can't
    // await the Task
    var result: Result<Success, Failure>?
    Task.immediate {
      result = Result(catching: body)
    }
    return result!
  }

  return try result.get()
}

haven’t tested enough to know if there’s isolation crossing issues in that formulation, but I think that’s the gist.

2 Likes

This post’s primary goal is to be thought-provoking:

——

This is likely not an actionable line of thinking because it differs too much from the status quo, but it makes too much sense to me not to write it down and share it. I share this here because the problem being addressed in this thread is precisely the problem that my proposed line of thinking aims to architect out of existence.

The Three Laws of Functions

I posit that functions are guided by three laws, similar in structure to Asimov's Three Laws of Robotics:

  1. A function must never do anything "incorrect".
  2. A function must move directly towards its particular goal/finish line.
  3. A function must minimize its usage of the available resources.

And, like Asimov's laws (and really like any set of laws should be), these laws have a relative order which governs which one to obey if two of them come into conflict. For example, the First Law trumps the Second Law, meaning that a function must stop moving towards its goal if doing so would require doing something "incorrect" - in Swift we generally throw an error in such case.

Swift Defaults to Safety

In a recently resurrected thread I read this comment from @xwu:

It seems to me that Swift has generally continued to stick to this philosophy since the time of Xiaodi's comment. It also seems to me like the right default in the context of my proposed laws - safety is fundamentally more important than performance (not to diminish the importance of performance too much - it's important enough to make it into the three laws after all).

What is it to cancel a function?

I propose that to "cancel a function" is to set its "particular goal" (see the Second Law) to nil. It no longer has any finish line to aim at. Thus, the Second Law becomes inert, and all that is left are the First and Third laws, meaning that it is still of utmost importance that the function not do anything "incorrect", and then within that constraint the function should use resources minimally. For many functions, simply stopping in the middle is not "incorrect", and is also the minimal usage of resources. For these functions, cancellation should trigger an early exit. For other functions, there may be some cleanup that is necessary to stay within the bounds of "correctness", and even though this violates the Third Law we still do the cleanup, because the First Law is more important.

What is it to cancel a task?

A task is essentially a handle for an in-progress function. Under Swift Concurrency, to cancel a task is to cancel the top-level function that it represents (where cancellation has the semantics I described in the previous paragraph) AND to individually cancel all in-progress subcomponents of the top-level function. Using a cooking example like in one of the classic WWDC videos: if our task is to "make salad", then to cancel is to say that we no longer need to get to the finish line of having a prepared salad, and additionally to tell the "chop carrots" child task that we no longer need to get to the finish line of chopping carrots.

How does this relate to the pitch at hand?

The issue at hand in this thread is that sometimes it is very much incorrect for the system to tell a subcomponent of a function that it no longer has to reach its finish line, just because the parent no longer has to reach its finish line. In the general case, this is not a sound logical leap. One would have to leverage specific knowledge of the entire call tree (including descendants) of a function to determine that the function losing its finish line is equivalent to every single subcomponent losing its finish line. In our cooking example, the problem is telling the "move knife to a safer location" function that it no longer needs to finish, just because we no longer need a salad.

Conclusion

Swift Concurrency's automatic task cancellation propagation seems to me to break with the guiding principle of safety-first that Swift seems follow pretty much everywhere else. It is a behavior that is always performant, but sometimes incorrect, which is a reversal of the First and Third laws. If we were still debating how Swift Concurrency should be implemented I would be arguing that we should seriously consider having task cancellation not propagate automatically, and then find ways to make opting into performant code easy in proportion to how difficult it is to introduce incorrect behavior (like we do everywhere else).

I understand that it might be impossible at this point to switch to not propagating cancellation by default, and I also haven't thought very far in terms of what those opt-in mechanisms would ideally look like. One thing that seems clear to me is that turning on propagation for an entire portion of the task tree should either be impossible or be a very salted tool since, again, one would need to have full knowledge and correct analysis of the implementation of every sub-function in order to use this tool correctly. The much more easily reached tool should be for creating one or many child tasks to which cancellation is propagated (but not propagating to the children of those children), since that is something that the given function can reason about locally.

The only thing that makes me wonder if such a change could conceivably be made is something along the lines of this comment by @allevato from the same ancient thread:

If I'm not mistaken that automatic cancellation propagation is strictly a performance improvement (albeit an incredibly important one), then it seems conceivable to me that under a new language mode (e.g. Swift 7) and managing to come up with a low-effort migration story, we could change this behavior, since no code would become incorrect, just slower.

Please feel free to correct or criticize both the content and sheer size of this post. I'm not proposing in full earnest that we make such a change, I'm just curious what others take on my line of thinking is, and maybe it will stir up some new ideas or consensus in relation to this pitch.

1 Like

Normally speaking, “cancellation propagation” refers to propagating the cancellation to child tasks. But in your way of framing things, simply calling a function would have to be called propagation, because cleanup code that needs to be shielded from the cancellation generally sits at the same level as the code doing the cancellable work.

To effectively reverse things in a way that guaranties cleanup code is being shielded by default, you’d need to make all function calls shielded by default. Then the burden is on annotating the calls you need to be unshielded. It could be done at the expression level like this:

// this call is cancellable
let result = try cancellable longCall()

// this call is NOT cancellable (implicitly shielded)
let result = try longCall()

This would make code that is cancellable stand out, and your cleanup code would become obviously incorrect if labeled cancellable. But it’d be quite a departure from the current model.

2 Likes

Ah yes of course, thank you… if my argument were taken seriously then it would require more than just not propagating cancellation to child tasks in order for Swift to “default to safety”

that statement is only true for pure functions without side effects, though