[Pitch] Task Executor Preference

ktoso · November 1, 2023, 1:34pm

Hello everyone,
We'd like to share a proposal that've been working towards for a while now.

Read the full proposal here: SE-NNNN: Task Executor Preference.
(In case of small typos, please comment on the pull request)

The proposal introduces task executors which enable a structured task hierarchy to have a "preference" where all of its tasks should execute.

In short, task executors allow setting an executor that will be used by the task. The task is enqueued on ("starts on") that executor, and attempts to run on it whenever possible. This includes running nonisolated async functions on the preferred executor rather the global one, and even running default actors on it.

The execution semantics of asynchronous code are as follows below:

Currently the decision where an async function or closure is going to execute is binary:

// `func` execution semantics before this proposal

[ func / closure ] - /* where should it execute? */
                                 |
                           +--------------+          +==========================+
                   +- no - | is isolated? | - yes -> | default (actor) executor |
                   |       +--------------+          +==========================+
                   |
                   |                                 +==========================+
                   +-------------------------------> | on global conc. executor |
                                                     +==========================+

This proposal introduces a way to control hopping off to the global concurrent pool for nonisolated functions and closures. This is expressed as task executor preference and is sticky to the task and entire structured task hierarchy created from a task with a specified preference. This changes the current decision diagram to the following:

// `func` execution semantics with this proposal

[ func / closure ] - /* where should it execute? */
                               |
                     +--------------+          +===========================+
           +-------- | is isolated? | - yes -> | actor has unownedExecutor |
           |         +--------------+          +===========================+
           |                                       |                |      
           |                                      yes               no
           |                                       |                |
           |                                       v                v
           |                  +=======================+    /* task executor preference? */
           |                  | on specified executor |        |                   |
           |                  +=======================+       yes                  no
           |                                                   |                   |
           |                                                   |                   v
           |                                                   |    +==========================+
           |                                                   |    | default (actor) executor |
           |                                                   v    +==========================+
           v                                   +==============================+
/* task executor preference? */ ---- yes ----> | on Task's preferred executor |
           |                                   +==============================+
           no
           |
           v
  +===============================+
  | on global concurrent executor |
  +===============================+

This allows specialized applications to optimize context-switching, by e.g. using an event loop (e.g. from NIO), as the task's executor. Tasks using such executor minimize context switching and can yield better performance in such very specific applications. This can also be used to isolate blocking IO tasks to specific executors dedicated to such work.

It should be noted that one should have a deeper understanding of context switching, and blocking in your application before attempting to use task executors to address them, as they can also cause negative effects -- e.g. if an actor is forced to change task executors consitently back and forth, while normally it could keep draining its queue more efficiently without changing executors.

For more background, you may want to read:

SE-0338: Clarify the Execution of Non-Actor-Isolated Async Functions which defined that nonisolated async functions to always execute on the global pool, rather than dangerously hanging onto the calling actor's executor.
SE-0392: Custom Actor Executors which was the first steps towards customizing swift's concurrency runtime semantics by providing custom executors for actors.

Implementation of this proposal is still in progress, though we'll share when it will be ready to give it a spin.

wadetregaskis · November 1, 2023, 4:33pm

I went in a bit sceptical, but the proposal is well-written and has convinced me this is a fruitful direction.

Explicitly using the global concurrent executor

If a library really wants to ensure that hops to the global concurrent executor are made by e.g. such task group, they should use group.addTask(on: nil) to override the inherited task executor preference.

Using nil as a magic constant in this way seems a bit, well, magical. What if instead there were some suitable static member on TaskExecutor, e.g. globalConcurrentExecutor?

group.addTask(on: .globalConcurrentExecutor)

More verbose but also much clearer and more intuitive to the reader.

Serial executors (and parallels to GCD)

Since serial executors are executors, they can also be used with this API. However since serial executors are predominantly used by actors, in tandem with actor isolation — there is a better way to run tasks on a specific actor, and therefore its serial executor.

I was already thinking, up to this point in the proposal, that this was sounding a lot like reimplementing GCD in Swift Concurrency. The above point kinda emphasises it.

I'm not sure if that's good or bad, but it seems worth addressing that more directly in the proposal. e.g. to what degree is that or isn't that the objective, what are the remaining distinctions (if any) after this proposal is implemented, etc.

And spin-off questions like can / how can you use a GCD queue as a TaskExecutor (or conversely use GCD APIs to enqueue work on TaskExecutors)?

Main task executor

For example, over-hanging on the MainActor's executor is one fo the main reasons earlier Swift versions moved to make nonisolated asynchronous functions always hop off their calling execution context; and this proposal brings back this behavior for specific executors.

Ironically, this makes me ponder if there should be a ".mainActor" TaskExecutor so that you can dynamically (or "manually") tie a task to the main thread (as opposed to using static declarations of @MainActor). And I see that this is in fact part of the proposal (albeit buried at the end).

I suspect that's a better way to put things on the main thread than having lots of MainActor.run { … } and similar constructs scattered about. As the proposal notes, "hacks" like { @MainActor in … } have unnecessary performance costs too.

Although this is covered under "Future directions", it seems to be saying that this will already work with this proposal? If so, maybe it should be moved out of "Future directions", and also I suggest a compiler diagnostic be added [as part of this proposal] for that { @MainActor in … } pattern, with a FixIt to use the more efficient form.

SwiftUI

I know it's outside the purview of SEPs, as a proprietary Apple framework, but I think it's instructive to consider whether the SwiftUI task view modifier should have a task(on: TaskExecutor) variant added?

Generally the default - of tying such tasks to the main thread - is appropriate, but I have found myself occasionally wanting to put things on other threads, and having to put await Task.detached { … } around such things is fine but seems inelegant in light of this proposal.

Is there any reason SwiftUI (and similar frameworks) would not want to follow this on: TaskExecutor pattern?

Blocking inside Tasks

…IO systems which willingly perform blocking operations and need to perform them off the global concurrency pool.

This seems like burying the lede. If I understand it correctly, the ability to use a custom TaskExecutor means you can finally oversubscribe CPU cores (by providing your own thread pools with arbitrary or even unbounded numbers of threads)? So you can (to a degree) safely use blocking code inside Tasks?

I'm a big fan of this - I don't buy into the "the whole world should be async so deadlock from blocking becomes moot" plan, at least from a will-we-ever-actually-get-there perspective - but it seems expressly at odds with how many Swift team members feel Structured Concurrency should work [as a matter of principle].

Progressive disclosure

I like the emphasis on progressive disclosure, and I think it's the right counter-balance to having "expert" controls.

A lot of what's being enabled in this proposal is [in essence] the ability to "regress" back to manually managing execution on specific threads (which may, out of scope of this proposal, even be tied to specific cores etc). Very powerful, but it must be underscored that most Swift users should not need to be aware of nor utilising this, most of the time.

Which seems to be exactly the proposal's attitude. I'm just emphasising its importance.

Is it fair to say that the mental flowchart is intended to be something like:

Is my program slow? If no, break.
Separate things into detached Tasks (appropriately). If no longer slow, break.
Fine-tune thread (and core) assignment via manual TaskExecutor control (appropriately).

AsyncSequence

The AsyncSequence example is a particularly important one. I've been bitten by exactly that performance pitfall quite a few times, including in seemingly trivial code that's just async iterating over something (e.g. lines of a file) from the main thread. It'll be good to at least have a way to fix that, albeit manually.

I still feel like AsyncSequence should just work fast by default, though. This feels more like a bandaid than a cure.

Thread pools by any other name…?

The proposal points out more than once that conceptually TaskExecutor is essentially delineating a thread pool. I can see that there's some symmetry, of sorts, with a name like TaskExecutor, but then given the apparent need to frequently explain what that really means… should it just be called ThreadPool?

Conceptually I see some merit in distinguishing between isolation domains and thread pools, because they're potentially orthogonal. Actors have isolation requirements which might be implemented by executing them only a specific thread, but not necessarily. Etc.

Parse error

Thanks to the improvements to treating @SomeGlobalActor isolation proposed in SE-NNNN: Improved control over closure actor isolation we would be able to that a Task may prefer to run on a specific global actor’s executor, and shall be isolated to that actor.

I'm not sure what the above is trying to say…?

ktoso · November 1, 2023, 9:50pm

Thanks for the feedback, cleaning up proposal a bit based on that -- thanks!

I added one cleanup about the serial executors that's important: Task executor preference by ktoso · Pull Request #2187 · apple/swift-evolution · GitHub

wadetregaskis:

Using nil as a magic constant in this way seems a bit, well, magical. What if instead there were some suitable static member on TaskExecutor, e.g. globalConcurrentExecutor?
group.addTask(on: .globalConcurrentExecutor)
More verbose but also much clearer and more intuitive to the reader.

Yeah it's something to discuss. The "nil" makes sense because really means "no preference" and not "the global executor preference", although arguably the resulting behavior is the same

I'll have to check more if there isn't something wrt. actor execution that would make claiming that we "prefer" the global pool misleading here.

Whoops sorry that's a leftover from prior versions; seems I missed this one section while updating it recently.

Initially we thought to reuse the Executor protocol; at that time this sentence was true; This turned out to not work well with actors; so we introduced the TaskExecutor protocol.

As written, the proposal does not allow just to throw a serial executor into a task preference. It has to be an executor that implements TaskExecutor. The default actor executor does not implement (or is exposed at all as a type), so this would not be possible for actors's default executors.

Similar to the above comment; this won't work unless we expose it as a task executor which we didn't plan at this point yet.

It's definitely worth thinking about if it should be allows to be used like this. A big reason for making nonisolated async methods always hop off the actor was methods being too sticky to the main actor... so we'd be worried about re-introducing this issue, although at least this time it is within the user's control

It probably would be nice to allow; but with warnings that it can be a foot-gun, as the proposal explains in the "not a golden hammer" section. But yes it could be reasonable to consider this.

Yes; there's separate work happening to address the async sequence issues. Also because sendable violations the Iterator sharing technically causes today when used from inside an actor.

This proposal does not aim to solve this issue, however as you said it has an useful impact on it already. We envision a more static (vs. "dynamic" as this proposal does) solution to the AsyncSequence issue.

Yeah open discussion on naming here. Please feel free to consider the TaskExecutor a stand-in name until we find something better. Something similar to or just thread pool might be a good candidate.

I'll reword this a bit -- the point is being able to express that a closure is isolated to the actor parameter: Task(...) { isolated to that actor } if we were to allow passing actors as task executors. Today we don't allow a default actor's executor as task executor, so that's two discussions to be had there though.

(Updated wording: Task executor preference by ktoso · Pull Request #2187 · apple/swift-evolution · GitHub)

John_McCall · November 1, 2023, 9:53pm

A thread pool would be a reasonable concrete way to provide a task executor, but it doesn't seem like the right name for the abstract concept.

I wonder if TaskExecutor is also not quite the right name, though. I mean, it seems like a good name for the relationship between a task and its default executor, but maybe it's not a good name for the kind of thing that a task's task executor is.

ktoso · November 1, 2023, 11:59pm

Yeah it's not a great name, I'll keep thinking about what we could call it -- open to ideas!

When we talked about them we keep calling them a thread source but that's just another way to say thread pool... Let's think some more.

And also what semantics we need it to guarantee -- but it seems it can be pretty free, as the isolation is guaranteed by the serial executor of an actor after all

wadetregaskis · November 2, 2023, 2:08am

Yeah, it makes sense to think about it that way too.

One advantage of not using nil, however, is that then the parameters like this don't have to be optional. That makes them harder to misuse.

I think the big difference is the user control; that it has to be done explicitly. Avoiding the main thread is absolutely the right default for most applications (i.e. interactive applications), but it's not always the best approach. e.g. servers, non-interactive [phases of] command line tools, etc.

tclementdev · November 2, 2023, 10:55am

So there is no way to have nonisolated async functions simply inherit the current actor executor, is that correct? If so, that seems like a missed opportunity, wouldn't it be interesting to be able to write reusable async code that one could reuse as part of different actor execution contexts?

ktoso · November 2, 2023, 11:13am

@John_McCall has been thinking on something explicitly for actors that would inherit isolation contexts like that. It’d be either an attribute or just passing isolated members and “defaulting them with the callers isolation #isolation” or similar. This proposal isn’t that though, right.

Not sure when that proposal will be ready to be reviewed but a draft is in a pull request here: Improved closure actor isolation by rjmccall · Pull Request #2174 · apple/swift-evolution · GitHub (my understanding is that it may still change quite a bit).

tclementdev · November 2, 2023, 1:46pm

I see, this seems to require passing an isolated actor as a parameter which isn't really what I had in mind. What I had in mind is a way to isolate usage of an instance of an unmarked type to the current actor.

johannesweiss · November 2, 2023, 2:25pm

Thanks @ktoso et al, I very much support this pitch and have been involved in discussions (providing use cases etc). So it's probably not surprising that I believe this solves the performance issues that Swift Concurrency and any I/O system (such but not restricted to SwiftNIO) has today.

Today, most async functions get pulled onto global default executor which forces thread switches as you can't sensibly do I/O there. So if you have a high-performance system that needs to avoid the thread hops, the only option you have today is to take over the entire global default executor and make it run on a more capable and powerful system. As an example, this complete takeover can be done with SwiftNIO as outlined in this PR. That of course works but feels heavy-handed. It would be much nicer to retain the regular Swift Concurrency thread pool, hop once to perform I/O and then stay on the I/O system's executor until some other actor forces us to leave it.

I believe that with the implementation of this pitch the need to completely take over the global default executor is pretty much gone (at least for the vast, vast, vast majority of use cases that I can think of). That's wonderful, let's do it!

The only addition request (and I do think it's an important) one is to add the possibility to do Task(on: <current executor preference>). Why would I want a Task(...) { ... } and enter unstructured concurrency land?
I don't. But for resource tear downs to work on a cancelled task it is often required to pull the try await Task { try await runTeardown() }.value hack to be sure that runTeardown() actually works. That's often required because a lot of async code refuses to perform work on an already-cancelled task (uses try Task.checkCancellation or guard !Task.isCancelled or calls an API that does so). Therefore, to ensure that try await runTeardown() actually works, I need it to run with a "cancellation shield" (Trio terminology) and because Swift doesn't have that (yet), we use the structured-but-unstructured-looking try await Task { try await runTeardown() }.value hack.

Now, I of course don't want this try await Task { ... }.value to hop to the global default executor and then for it to immediately hop back to the I/O executors just to run the tear down. So I need to be able to just have it inherit my task executor preference. Possible APIs could be Task(inheritTaskExecutorPreference: Bool = false) or Task(on: Task.currentExecutorPreference) or so.

Joe_Groff · November 2, 2023, 3:31pm

Somewhat related, I wonder if maybe SE-0338 went a bit too far in always forcing nonisolated async code to switch off an actor executor after every async call. That behavior is critical for the @MainActor, since developers really need to know both what code is definitely running on the main actor and what code is definitely not running on the main actor, but for the average actor, it's maybe not worth the overhead, and we could do a cheaper check whether there is any higher-priority work waiting for the same actor but otherwise keep running on the same executor after an actor call without as ill effects. That doesn't fully obviate the need for custom task executors, but might mitigate the performance problems with the default policy, allowing developers to avoid having to adopt them just to get a performance boost.

tclementdev · November 2, 2023, 4:21pm

Personally I wish the decision in SE-0338 was made to inherit the current actor, even for the main actor. In my experience, this is always what developers expect and developers are usually shocked when they realize execution jumped off to a background thread just by calling an async function. The lack of compiler warnings/errors when this happens in an unsafe manner also made this the main source of bugs/crashes in the projects I worked on in the last few years (e.g. non-sendable self crossing actor domains).

It would be confusing if the behavior was made different between the main actor and other actors, especially if there is no new explicit syntax. Now that SE-0338 has set things the way they are, it's not obvious to me what new piece of syntax would do a good job to change this behavior and inherit the execution context.

Joe_Groff · November 2, 2023, 4:49pm

My hope would be that, as Sendable adoption increases and our concurrency type checking gets more airtight, then that particular class of bug should become impossible. In theory at least, when working only with Sendable values in a nonisolated context, it shouldn't matter where you execute the code, and access to non-Sendable values should require going through the proper synchronization.

tclementdev · November 2, 2023, 5:01pm

I think this one is going to produce a decent amount of developer sweat once the warning/error goes live

Yes, so that's the latter one that I wish this pitch could address: using non-sendable values in isolated contexts. If we can have context inheritance, then separate instances of the same type are safe to use when each is confined to an execution context.

Jon_Shier · November 2, 2023, 5:05pm

I believe there's a pitch (which I can't find at the moment) about concurrency context scoping that solves this.

tclementdev · November 2, 2023, 5:08pm

If you're referring to the region scoping one, I don't think it addresses this. I think region scoping is about the compiler becoming aware of safe patterns. But this one here is not safe: calling a non-annotated async function jumps off to a different executor.

sliemeobn · November 3, 2023, 8:51am

Overall I love this, thank you for driving swift concurrency forward!

I have to say when reading through the proposal I found myself surprised by this

* **Do not** inherit task executor preference
  * Unstructured tasks: `Task {}` and `Task.detached {}`
  * methods on actors which **do** use a custom executor (including e.g. the `MainActor`)

and my gut reaction was - since task-locals and actor contexts are inherited when using Task {..} - why would the executor preference be "cleared away" on the new task?

And then I read this:

So, my question is: Why isn't the default for Task { } to inherit the executor preference as well?

Could you explain the reasoning for this a bit more?

stephencelis · November 3, 2023, 4:07pm

This looks promising!

Does task executor preference address some of the issues raised in this thread?

More specifically, do these tools provide a more structured way of testing async code in a serialized way without having to resort to overriding the swift_task_enqueueGlobal_hook?

ktoso · November 4, 2023, 5:02am

You could use it to stick tasks to an executor that is backed by a single thread, including all their child tasks. So yeah it could help a bit in that sense. I don’t really believe the forcing everything to a single thread as all those attempts do is sufficient or desired way of testing a system that will be actually parallel in production — you might be missing all kinds of interesting interactions in testing that will appear in production. But yeah this can help achieve such testing approach, it definitely has its place — like testing specific small isolated parts of a system etc.

tcldr · November 6, 2023, 11:45am

Interesting proposal with obvious benefits.

One question I have is whether or not there’s a way for a Task with an executor preference to avoid the same actor hop to execute the Task’s body closure, if it’s initiated from an isolation context with the same specified executor i.e conditionally promote the semantics of the Task closure to ~~non-escaping~~ that of executing its body immediately and continuing only upon first suspension/completion of the Task.

I imagine this would be a tough ask, but the reason I ask is that this is one of the most useful properties of Rx derivative libraries such as Combine.

Often I’d find myself subscribing to Combine publishers within the initializer of some object for which I’d like the first value retrieved by the time initialisation is finished.

Without this property it’s difficult to use AsyncSequence for many of the use cases where Combine really shined.