Support custom executors in Swift concurrency

David_Smith · February 5, 2021, 12:20am

I had missed this implication in my initial reading, and it is extremely cool, so thank you for calling it out specifically.

Jason_Gregori · February 5, 2021, 1:22am

I'm at a loss. Can you give some examples of what this looks like in practice? Walk me through some switching maybe?

Jason_Gregori · February 8, 2021, 7:24pm

Do switching Executors still bring their own thread to execute jobs and only sometimes borrow other threads temporarily to do the switching?

John_McCall · February 8, 2021, 9:15pm

You could write one that did, but the default one for actors does not own a consistent thread and just enqueues work onto the default concurrent executor when borrowing fails.

dfunckt · February 12, 2021, 8:45am

I've only briefly read the pitch and there's a lot of context around it to get the complete picture but one thing that stands out to me is the signature of enqueue(_:):

  /// Enqueue a job on this executor to run asynchronously.
  func enqueue(_ job: UnownedJobRef)

I would expect this to be throwing or otherwise being able to signal to the caller that it can't take any more jobs. In my opinion, executors are the most appropriate level at which backpressure is enforced and propagated upwards.

At a practical level, the API as it is means executors can only use unbounded queues internally (which also means at least one memory allocation for each enqueued job) but there are use cases where bounded executors make more sense (and can be more performant due to their use of bounded queues which can be allocated once).

ktoso · February 12, 2021, 8:49am

Backpressure on submitting tasks is intended to be solved by task groups — there they can suspend an add if the group decides you’re submitting too much work. There at least we have some notion that this group is running amok, while detecting this globally seems a much fuzzier and hard to define problem.

Streams (i.e. AsyncSequence conforming publishers and similar to them), can handle flow control in a stream native way — which usually is asynchronous demand signalling much like reactive-streams and Combine whose backpressure is inspired by it work.

I don’t think and have never seen a system where a global scheduler decides “you’re submitting too much work” because there isn’t much notion about “you” — how would it decide, except some arbitrary number of pending tasks in the entire global job queue it has internally, and would that be really helpful or actionable in any way?

dfunckt · February 12, 2021, 9:12am

An executor has its internal queue of pending jobs to base any decisions on and queue length is ultimately decided by the developer of the program based on the constraints of problem they need to solve. That is generally true for local executors (ie. that are instantiated at specific parts of a program that perform a specific function), but I agree it does not really apply to a global executor.

An executor could also base its decision on some sort of user/application condition (eg. that it must just drain its queue and shut down), or as a response to system conditions (eg. too many open file descriptors at the time).

Refusing to accept a job is also actionable; this refusal is propagated all the way up to the part of the program that handles the event that triggered the job (eg. a user initiated action, or a client connecting to the server).

dfunckt · February 12, 2021, 9:46am

On a related note, I'm reading about the default global executor that is based on a fixed-width pool of threads. This indeed solves the thread explosion problem, but reopens up the initial problem that GCD apparently tried to solve by the spawning of new threads: it will now be possible to enqueue an arbitrarily high number of jobs, which is going to affect responsiveness. I think that a fixed-width pool should go hand in hand with the ability to refuse new jobs to keep load under control.

In any case, I'm raising this concern for your consideration -- I totally appreciate the work the team has done, it is truly brilliant.

Jason_Gregori · February 16, 2021, 1:17am

I understand why the system will switch an Executor to a thread (so it can continue the task's work without hopping threads), can you explain how the system decides when it's a good idea/ok to switch an Executor to a thread? Does it only do it when it knows the Executor has no other jobs enqueued? How do you ensure the correct job is run on the thread?

And for giving up a thread, does the system ask the Executor to give up the thread once the job is complete? What happens if an Executor does not give up the thread? Does the system ask again after every job completes?

John_McCall · February 16, 2021, 1:40am

I understand your point about wanting to provide back-pressure by not accepting arbitrarily many jobs, but it becomes extremely hard to write a reliable system when something as low-level as a function call to an actor – or even returning from a function call! — can potentially not just suspend but actually dynamically fail. I think back-pressure has to be addressed at a higher level, with an understanding that it's not always going to be a perfect solution and that things will occasionally back up a bit before the need for back-pressure is recognized. I would love to get your thoughts about how best to adjust to that, probably in the structured concurrency thread.

dfunckt · February 18, 2021, 4:16pm

Thanks for sharing your thoughts @John_McCall.

I agree with you that a system that throws exceptions all over the place (or has the potential to) is ultimately hard to deal with. I also agree that the executor is the wrong place to lay the responsibility of providing back-pressure, but I think with the current design we're implicitly doing just that.

The executor being unable to accept further jobs is a strong signal that things are getting out of hand, especially as executors will typically sit at the boundary of an application with the operating system. At that level, the most sensible option when that happens is to crash (which may be fine, but I think we should be explicit about it).

Please excuse my ignorance as I haven't invested the time needed to fully understand the concurrency proposals/pitches, so I'm not certain who calls into the executor (the task group?), but I'm wondering if there could be a feedback loop between the executor and its caller which ultimately dealt with back-pressure or errors thrown from the executor. That is, have the executor's caller use this information and, maybe combined with the programmer's intent, optimally choose what to do: retry, bubble upwards or crash.

Circling back to your first point, I think things get hard to deal with only if we choose to always bubble upwards, which was my initial proposal. But I'm hopeful that it doesn't have to be that way and seeing how these pitches embrace cancellation, maybe we can come up with an API to deal with this as well.

John_McCall · February 19, 2021, 5:17am

Tasks will need to run on different executors in ways that we can’t necessarily understand statically. My hope — which admittedly at this point is little more than a sketch — is that adding a new task to a task group will provide an opportunity to recognize back-pressure, so that e.g. if there are far too many tasks being added to some queue, or too many of any other kind of operation, we can flag that up the task hierarchy, and any operations that want to pay attention (such as task groups) can respond by waiting for conditions to clear before continuing. The chief problem with this idea is that it’s quite possible to flood the system with work before we recognize the bottleneck, of course.

But actually just refusing to do work feels like an unprogrammable model.

Chris_Lattner3 · February 22, 2021, 5:03am

This proposal is looking really great John, I love the clear problem description and motivation. Thank you for writing this up.

I only have a few minor comments/suggestions:

The Ref types are themselves value types, but they hold references to actors and executors, both of which must be reference types. It’s because of that (conceptual, not physical) indirection that I thought the Ref suffix would be clarifying.

I think we typically use the word "Pointer" for that. This was what we decided when designing UnsafeBufferPointer ages ago.

The type of the property must be convertible to Actor .

Not related to this proposal I suppose, but should the Actor protocol be named AnyActor?

serialExecutor will be synthesized as public final

How does this affect resilience and evolution of the actor? Making this final seems to be a different default, preventing subclasses from controlling their executor nature.

In "Explicit scheduling", the closures should be marked @concurrent.

The proposed ways for actors to opt in to custom executors are brittle, in the sense that a typo or some similar error could accidentally leave the actor using the default executor.

Why not tie this into protocol conformance? Someone making an explicit statement (by conforming to a "has custom executor" protocol would make it clear if they mess up the implementation.

-Chris

John_McCall · February 22, 2021, 7:35pm

UnsafeBufferPointer is still fundamentally pointer-ish, it's just a pointer-and-bounds. I've actually been thinking about just taking the thread's advice and removing the suffix, so that this becomes UnownedJob and so forth.

I don't think we've ever added an Any prefix to a protocol name before. That's always for something existential-like.

I think I described the resilience impact: it's inlinable only if the class is frozen. Otherwise callers outside the module will call it in a resilient way, which I think would allow it to be changed to something overridable in the future. I could be misremembering the impact of final on evolution, though; maybe it should just be treated as non-open outside the defining module.

Allowing actor subclasses to override the default executor would introduce a pretty major abstraction burden for something that doesn't seem very valuable.

Thanks, yeah, I'll fix that.

That still allows the error of forgetting the marker conformance. In general, we've been moving away from this kind of marker attribute when it doesn't express an interesting property of the type.

kiel · February 23, 2021, 1:12pm

Would I use a custom executor for unit testing? e.g.: writing two tests for tasks A and B completing in opposite orders?

Am I right in feeling this is similar to a Scheduler in reactive frameworks?

ktoso · February 23, 2021, 1:20pm

You could definitely write some form of test executor, e.g. single threaded or whatnot. Executors which change ordering have been looked into in the jvm ecosystem a bit but it turns out to be hard to make ont that fuzzes ordering but does not go “too crazy” with it.

In practice in akka we had a single threaded executor (there they’re called dispatchers, e.g. “calling thread dispatcher), but a “fuzzing executor” is less useful than it sounds in reality.

What you actually end up doing for fuzzing actor systems is intercepting at one level higher — at the actor level. There was some fun work back in 2015 by Colin Scott who instrumented Akka and fuzzed my akka-raft (my pet project which I hacked together in a few weeks, so it was known to have quite a few bugs; this was much before my time working on akka itself for many years after that). There’s a fun write up about the fuzzing techniques the paper employed and real bugs it discovered, available here: Fuzzing Raft for Fun and Publication - Rest for the Wicked and the full paper is available here: https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-scott.pdf

So yeah, test executors yes. But in reality “messing around with order” in an actively useful way is usually done on a level higher, not on the scheduler infra itself — based on my experience with actor runtimes at least.

To answer the naming question: yeah the naming differs based on ecosystem, but it’s usually called “executor” or “dispatcher” or “scheduler” depending what word the ecosystem likes to use for us it’s executors, in Rx it’s schedulers, in akka it’s Dispatchers… etc.

frameworklabs · February 27, 2021, 8:26am

Could you shed some light on how timers would work with Actors and Executors here.

Say I want a method in my actor to be called periodically. Will there be something like a TimerActor which signals my actor regularly or will the executor provide something here?

ktoso · February 27, 2021, 8:51am

That's a good distinction I think we made in Akka, where there's the Dispatchers (what we call Executors) and Scheduler (other systems call this Timers).

A Timer is simply something that "calls something at some point in time", the what that actually causes and where it executes is none of its concerns. E.g. "trigger this actor every 1 minute, to clean some caches" -- it may perform this for plenty arbitrary actors, however "where" they actually execute is none of it's concern.

As such, timers are completely separate from executors. You can totally use existing dispatch, nio, or other mechanisms to periodically, or on a delay, call an actor. None of these have any say with regards to what the execution semantics of given actor are.

We are not proposing any timer mechanisms currently.

Chris_Lattner3 · February 28, 2021, 5:18am

I was thinking about AnyObject, "The protocol to which all classes implicitly conform". The corresponding analog for actors seems like it should be called AnyActor.

-Chris

CTMacUser · March 1, 2021, 11:58pm

Would AnyActor be a subtype of AnyObject?