Benefits of cooperative multitasking

dima_kozhinov · October 1, 2024, 10:41pm

~~Probably offtopic here~~ the topic is "Benefits of cooperative multitasking" BTW Macintosh 512K with System 7 was an adorable little machine with cooperative multitasking. It had a 1-core processor and a single-threaded operating system, but it had multitasking.

QuinceyMorris · October 1, 2024, 11:23pm

This doesn't seem technically accurate, in detail.

The "thread pool" in which an async function runs asynchronously is "selected" by the executor responsible for its isolation domain. If the async function being called is isolated to MainActor, it runs on the main thread, not the thread pool associated with the global concurrent executor. Or, if the called function is isolated to a domain belonging to a custom executor, it executes on whatever threads that executor is written to use.
In the scenario you describe, if the async function is awaited, there is no parallel execution. The caller's task is suspended during the call, so there's no parallel or even concurrent execution as a result.

In order to get parallel execution, you'd need to actually create a new Task, and await the function in the new task.

John_McCall · October 1, 2024, 11:30pm

This is not the way to understand this.

On all of our supported platforms, Swift co-exists with C and is built on top of an underlying C runtime. In particular, all work done by Swift is actually running on some C thread. C threads are typically scheduled preemptively, usually directly by the kernel, and there is always a potential for there to be threads in a process that are not running Swift code. Thus, at a low level, all Swift work is scheduled preemptively.

C threads are not a scalable resource. If you try to make a very large number of C threads in a process, you will quickly exhaust resources (mostly memory) in both the kernel and in user space, and subsequent attempts to create threads will fail. Swift concurrency is therefore designed around the principle that it is bad practice to hold on to a thread for an indefinite period if you're not currently doing any work (i.e. a task wants to block on some arbitrary condition). Instead, the thread should be released to do useful work; when the task becomes unblocked, it can request to be scheduled back onto a thread. Swift, by default, schedules tasks onto a strictly-limited pool of threads, and it only ever interrupts a task at a dynamic "suspension point", such as when a function calls or returns to a function with different actor isolation. This can be seen as a kind of cooperative scheduling: Swift expects code to cooperate by following the principle above, on the pain of global effects that would not be possible under strict preemptive scheduling, up to and including thread starvation. But it is not cooperative scheduling in the way that that term has traditionally often been used (e.g. in macOS classic), because the work is still executing on a thread which can both be preempted for and run in parallel with other threads in the same process.

Regardless of the details about scheduling, tasks and actors always run sequentially "internally", and you do not need to worry about concurrency or parallelism within a single task or actor. Only one thread can ever be running with a particular actor isolation at a time, and if the actor "migrates" between threads, those threads will synchronize with each other to create a total order of the events on the actor. Similarly, if a task needs to change threads, those threads will synchronize to create a total order of the events on the task.

isaac-weisberg · October 2, 2024, 7:16am

aaaahhhh as in, even if I am staying in Swift 5 language mode, I AM using Swift 6, just in the Swift 5 language mode

Damn, this didn't come to my mind... This clears absolutely everything up, sorry for being a dummy

Jumhyn · October 2, 2024, 1:07pm

Yeah, it is… somewhat confusing naming that “the Swift 6.0” compiler is importantly different in certain ways from “the Swift 6 language mode”. But the Swift 6.0 compiler does indeed support the Swift 5 language mode!

filiplazov · October 3, 2024, 9:15am

Dispatch has nothing to do with Swift 5 and you can still use it if you want nothing is preventing you from using GCD if thats what you intend.

isaac-weisberg · October 4, 2024, 9:07am

I was referring to Strict Concurrency Checking in Swift 6 mode

Gero · October 10, 2024, 9:46am

Is it fair to say that all these points are marked with the await keyword? I am especially wondering about a Continuation's yield, as that is not an async method.
So, in a theoretical "only 1 thread" system, where I have the continuation in Task A and another Task B awaits an AsyncStream that is "driven" by the continuation, when I yield on it, Task A would still continue to run (and potentially fill up the stream with multiple yields) and Task B would only ever be able to process the stream elements once A suspends (awaits somewhere in its execution), right? Because then the system could finally let B run on its single thread?

By the way, thanks for clearing that up and being precise, because I believe I had/have understood the system quite well by now, but the comment about it being both preemptive and cooperative had confused me.
If I got it right, one could say that while Swift concurrency builds on top of threads (which are preemptively scheduled, of course), it is itself providing cooperatively (or "cooperatively-like") scheduled tasks, which are not "affected" by the underlying threads being preemptively scheduled.

tcldr · October 10, 2024, 12:51pm

The naming with AsyncStream is a little confusing in this context.

AsyncStream doesn't actually yield the current Task when you call yield (hence there's no need for an await as there's no suspension) but if there is a a suspended Task waiting on the stream (as in your example) it calls resume on that Task's pending UnsafeContinuation.

So, in this case you'd need to call yield on the AsyncStream and then Task.yield() to allow the suspended Task to resume.

Gero · October 10, 2024, 1:52pm

I assume you meant to write "yield on the continuation and then Task.yield(), right? I'm not being pedantic here, just paranoid I might misunderstand...

This is basically what I meant and suspected: The continuation you get to "fill" an AsyncStream does not suspend the current task (the one you fill from, so to say) when you call yield on it. That task is only suspended when an await is reached (so if you don't have another async call sometimes soon, you might want to manually call Task.yield()).

I don't think that's a problem, btw, I love AsyncStream, it's a great way to communicate state between two unstructured tasks (e.g. when downloading things and updating the UI). Usually you have an await close to any calls to yielding on the continuation anyway, I guess.

So we can be sure that suspension points in concurrency are always properly marked, there are no "secret" methods that despite being synchronous somehow yield control.

My hypothetical example of a concurrency runtime that is constrained to only one thread is then just that: a hypothetical example. And even within such a system, if you ensure your Tasks yield often enough, the concurrent scheduling would still work well, I assume.

tcldr · October 10, 2024, 2:19pm

Yes, I could have been clearer!

It's a bit confusing, as there's overlapping terms:

There's AsyncStream's continuation (AsyncStream.Continuation), and then there's the Task continuation that AsyncStream uses under the hood in its implementation (UnsafeContinuation).

Then there's the synchronous yield instance method on AsyncStream.Continuation and the static asynchronous method yield on Task.

So I should have said, "in this case you'd need to call yield on AsyncStream's continuation and then call await Task.yield() to allow the suspended Task to resume.".

John_McCall · October 10, 2024, 5:53pm

Yes, but with some caveats.

There are potential suspension points on entry/exit edges between async functions because of the possibility for isolation change, and those are not explicit in the source of the function. However, since every call to an async function has to be awaited, you can think of that await as also covering those points.

A somewhat more egregious case is that there's an implicit suspension point when an async let goes out of scope. Normally, that suspension point does not actually suspend because you've already awaited the async let. But there can be paths out of the scope that haven't passed through that await (e.g. throwing an error), and the containing function still has to wait for the subtask to finish before those paths can continue. In that proposal, we just decided that this was an acceptable deviation from the general rule that there's a visible await.

Like resuming a CheckedContinuation, yielding into an AsyncStream's Continuation is not a potential suspension point: it unblocks any tasks waiting on the stream and allows them to be scheduled again, but the current task continues running without interruption.

dabrahams · October 11, 2024, 4:32am

Being able to support huge numbers of tasks has nothing to do with being stackless. You can do it with stackful coroutines too (and it's by no means obvious that stackless is better). It doesn't even have to do with being cooperative. You can do it with a threadpool like the one backing GCD, where each task is a function.

Cooperative multitasking allows you to maintain logically linear code written as though it blocks to await long-running operations like I/O, without actually blocking a thread; the task is suspended and the thread resumes another suspended task that is not waiting on some long-running operation. Also, yielding explicitly from a long-running task can be used to decrease latency of other tasks which would otherwise not be able to run until a currently-running task completes.

al45tair · October 11, 2024, 8:25am

It does if you don't want to write your code explicitly as a state machine, which, as I observed previously, most people don't. (Your example of GCD with functions is one where you are manually decomposing your code into a state machine.)

Stackful coroutines don't scale any better than threads; you've got the exact same problem — they all need a stack, and modern code assumes large, contiguous stacks.

Nested event loops at least share the thread stack; the problem there is it's easy to set yourself up for unbounded stack growth, plus you lose control over the order in which tasks resume processing when they suspend. That creates the potential for interesting deadlocks too — you can't have a task wait on a task (or a resource controlled by a task) that's already suspended on the same thread, since the latter can't be resumed.

Stackless is definitely better. Your thread stacks only need to be large enough to run the task with the largest between-suspension stack requirement, since when suspended it isn't using the stack, and you don't constrain the order of resumption.

Gero · October 11, 2024, 11:11am

Thank you a lot for the explanation!

async let is indeed an interesting case in this context, but I agree with the offered solution and think that the result is very elegant, in fact.
I guess in practice that stays more or less invisible, since especially for cases where "leaving the scope" is actually leaving the function (which would fall back to something that was awaited anyway). Even if not (let's assume the scope is in an if let or the like), it makes sense that the async let has to be awaited somehow (be it implicit), so I don't think that can lead to a nasty surprise for any developer.

I'm glad the design in the end allows us to reason along the lines of "the await denotes the cooperation markers", at least it has so far allowed me to grasp where my program flow goes and what's potentially concurrently running very well.

dabrahams · October 11, 2024, 8:24pm

No code transformation is required with stackful coroutines to get to millions of tasks; we have successfully run the skynet benchmark with stackful coroutines, with competitive performance.

You're only manually transforming your code if your tasks (before transformation) would otherwise need to block (much more like a continuation-passing transformation than a state machine decompositions). But not all tasks are like that: parallel sorting offers a simple example.

At the risk of repeating myself, fundamentally, cooperative multitasking is about avoiding a code transformation while at the same time avoiding blocking threads in problems that are like that. It only has an indirect relationship to scaling up, because blocking threads becomes totally impractical when you have a huge number of tasks. But we always want to avoid blocking threads (and thread explosion) no matter how many tasks are spawned.

There's evidence that stackful coroutines are more scalable than you say, enough to convince people who are much smarter than me about concurrency mechanisms. I refer you to the work of Lucian Radu Teodorescu (see the section on stack usage here). You can argue with him in Hylo's discussions if you don't agree, and in fact we'd be grateful for some debate about that if you have the time.

al45tair · October 11, 2024, 10:14pm

Sure, if you don't need to block (or explicitly yield control in a co-operative system), then no transformation is necessary.

That's an interesting link, thanks, but if I read it correctly, Teodorescu isn't using a large number of stackful coroutines. What he's doing is using stackful coroutines on top of a thread pool to build something much like Swift Concurrency, and the way he's doing this actually limits the number of stackful coroutines to (a bit more than, because of his await implementation) one per thread in the thread pool.

This avoids the fundamental issue here, which is that stackful coroutines that are executing need a stack^[1]. If you have a million stackful coroutines, all executing, each with a 1MB stack, you have exactly the same problem as having a million threads, all executing, each with a 1MB stack.

Unless you've transformed them into some stackless form through some code transformation, but then we're really back into stackless things. ↩︎

dabrahams · October 12, 2024, 3:33am

When that happens, at any given time nearly all tasks are suspended. In that case suspension can “unmount” the part of the stack that's actually being used via memcpy, leaving the full capacity of that core's stack available for the task that's being resumed.

These scenarios are very likely I/O bound and the cost of stack mounting/unmounting ought to disappear in the noise. I won't take credit for this insight—I read it here. As the poster writes, all concurrency schemes have overhead; with stackless coroutines every function call that can contain a suspension point (every await in Swift) incurs overhead, and when there are lots more of those than actual suspensions, the cost of unmount/remount on suspend/resume can be lower. Tradeoffs.