[Concurrency] thread-per-core architecture considered?

hassila · April 15, 2021, 4:07pm

This is really a question for the core team so busy doing this interesting work on concurrency (or anyone with insight I guess).

One interesting model for some applications is the thread-per-core one - just wanted to understand if this is something that was considered to be supported during the design of executors / tasks? I must admit it sometimes is a bit hard to understand the planned upcoming implementations that are possible given the current proposals.

John_McCall · April 15, 2021, 5:45pm

Our goal is to back task execution as much as possible with a global thread pool that's capped at around one thread per core, yes.

One of the deep problems that Apple's Dispatch library has to deal with is that C/C++/ObjC code frequently blocks threads on future work, whether with dispatch_sync (which has to block until the queue has processed everything previously submitted to it) or with something like a condition variable (which blocks until an unknown thread signals the condition, which might happen in submitted-but-not-yet-running job). As a result, in order to ensure progress, Dispatch must occasionally over-commit threads if it seems like progress is not being made, and this can lead to explosions of threads. In principle, this could've been solved by banning this kind of blocking on Dispatch threads, but the weight of active practice made that too difficult initially, and now they're stuck with it. We want to avoid that for Swift tasks, which is why we've been fairly careful to not introduce any APIs that require this kind of thread-blocking, and we will likewise document that you shouldn't ever use C APIs which allow it from a Swift async function.

hassila · April 15, 2021, 6:09pm

Thank you, that is excellent news (having seen a few dispatch thread explosions).

Backing with ncpus is great, would be awesome if the related step of optionally pinning one or more of them without having to replace the executor was possible for some (server side) use cases.

Can’t express it succinctly, but it also feels like there might be room for some interesting optimizations for actors/tasks with such a setup as there would be no need for memory barriers between entities executing on the same core.

Anyway, thanks for the exciting work being done here, it’s almost feeling like Christmas ;-)

ktoso · April 16, 2021, 12:45am

Yeah very much so -- with custom executors we'll be able to implement various specific strategies.

One of which which has been useful in Akka is a PinnedDispatcher (well, PinnedExecutor in our world in Swift) which is pinned to a specific thread (or jus "specific thread executor" ). One can pin such thread to a specific CPU core if one wanted to.

This does not make much sense on apple devices as it'd lose out on the efficient use of performance cores only when needed etc. However this indeed is a thing that can be useful for server systems, where e.g. "all work related to X" is put on the same executor=thread=core. This also should play rather well with Swift's "actor hop avoidance" where we can avoid hops if we know both actors are on the exact same, serial, and willing to switch, executor etc.

Panajev · April 16, 2021, 4:59am

| ktoso Konrad ktoso Malawski
April 16 |

| - |

This does not make much sense on apple devices as it'd lose out on the efficient use of performance cores only when needed etc. However this indeed is a thing that can be useful for server systems, where e.g. "all work related to X" is put on the same executor=thread=core.

I would say that on a MacBook or iMac running some high performance console port, being able to pin execution of certain tasks to the same or close CPU cores might be an optimisation the game developer may want to carry across. Great news to have it on Apple systems too, thanks :)!

hassila · April 16, 2021, 6:35am

Thanks @ktoso, that is also great news. Agree it doesn't make much sense on (current) Apple devices, but typical deployment for us would be a minimum 32 core (non HT), usually quite a bit more - so plenty of room for getting creative (with no real requirements of caring about power efficiency).

This also should play rather well with Swift's "actor hop avoidance" where we can avoid hops if we know both actors are on the exact same, serial, and willing to switch, executor etc.

That also sounds great if it would work, fundamentally there are many use cases of logically partitioned code (actors) that want to run on the same executor, and willing to switch etc. It would just be super nice to be able to do it (while pinned and avoiding "hops") with a high level mental model basically out of the box.It seems then that implementing a serialExecutor for actors that returns a pinned executor and doing that carefully for various actor instances will fundamentally give the flexibility needed (instead of relying on the default serial executor).

So, fundamentally, this addresses many of the usability concerns I've had, there is just one final thing I'm curios about which is related:

What are the possibilities of tweaking wait strategies for an executor when waiting for work, and how would that work? Let's take the example below if I have a custom serial executor implemented that returns a pinned executor, presumably, if trying to call it and its busy so "switching" would be required, it'd be enqueued on the executor and run when the executor is not busy any more.
But how would it look if calling "cross executors" where two actor instances are having serialExecutors that returns two different pinned ones?

Here comes the question, presumably a default implementation would block waiting for a queue in such a case (or along those lines at least), but for a PinnedExecutor in a latency sensitive environment without power considerations, just (properly) busy looping on our own CPU would be of interest to cut down thread wakeup latency for the first event. It seems from reading @John_McCall draft of customer-executors that this also would be possible if that comes into play?

Fundamentally, we've seen a handful of issues that are critical to us, that all seems to be sorted (hooks provided for addressing, or built-in):

Excessive thread count due to blocking
Context switching (and locking) overhead
Thread wakeup latency

Overall, very happy where all of this seems to go (including structured concurrency, actors, async/await, custom executors, ...), really awesome.

ktoso · April 21, 2021, 3:06am

Yeah, exactly. This allows us to keep "well, logically those things are different actors" while keeping "yeah but in practice I want all those to run on the exact same executor as [something else]". This is exactly what comes up in event-loop based networking stacks -- you want to keep processing on the loop that handled the request as much as possible; and only hop off from it if you're going to block etc. Specifically, this should allow us to model Swift-NIO patterns which today are expressed by "please don't forget to hopTo(EventLoop)" by modeling it by passing right executors to actors (and/or running tasks on such executor, which happens to be the EventLoop).

You'd have to "hop" to the target executor, i.e. the call on the destination has to execute on the destination. Think of it this way, if an actor has some specific executor set, it The "dance" between executors is roughly alluded to in https://github.com/rjmccall/swift-evolution/blob/custom-executors/proposals/0000-custom-executors.md#actor-executors in the "canGiveUpThread" etc calls.

// Side note: it feels like you're working on finance systems...? At least over the last few threads there's been a theme of patterns I recognize from my previous work... I'd be curious if you could share anything (perhaps privately) about your usage of Swift?