[Pitch 3] Custom Main and Global Executors

I really want to see a correct long-term solution! I’ve lost a lot of sleep over this and implemented both solutions. The problem is that hosting Chromium on the same main UI thread as Swift really relies on Chromium driving the message pump.
Making this work well is critical for Dia (and Arc), so I hope you are understanding.

Edit: This is only on Windows.

1 Like

I understand that the development process here has been driven by trying to fish out all of the places in the concurrency library that have special scheduling behavior. However, the result is that this is a really broad proposal that has to generalize a lot of individual features of the concurrency library. And we know that some of these things, like having a way to schedule something on a timer, are things we want to significantly generalize to accommodate things like async I/O libraries. I'm concerned that we're trying to deliver way too much in one proposal.

RunLoopExecutor

Is stop really an operation that all RunLoopExecutors provide? I get that it's a useful API that many executors will provide, but I don't know how this would be used generically. Also, presumably a RunLoopExecutor could involve multiple threads; does stop() just stop the executor on the current thread, or does it stop it on all threads? I guess I'm wondering how this is supposed to be used generically, because if it can only really be used with an exact executor implementation, maybe it should just be API on that concrete executor type.

I have a similar question about run(until:), especially because it's documented to not be usable unless you're sure it's there. How does a client know that they can call this without knowing the exact implementation type? And if they know the exact implementation, why do they need the generic entrypoint? Are there guarantees about the default main executor type?

ExecutorFactory

I feel like it's weird to have a whole protocol devoted to what's ultimately a very one-off process initialization step. If it's going to be this ad hoc anyway, why tie the two requirements together into a single protocol vs., say, just looking for global or static member variables with those specific names? And then maybe we don't need to expose PlatformExecutorFactory at all, unless you're imagining that people will want to decorate the platform factories (which, I dunno, seems extremely inadvisable to me as a long-term approach).

Task executor properties

Task.currentExecutor feels very similar to APIs that have specifically been a problem in the past on Darwin, because it assumes that there's a canonical answer. I'm also worried that it encourages code to be semantically sensitive specifically to the current executor when we really want to encourage APIs that are parameterized by isolation. I don't think "it's useful for Task.sleep is a good enough justification for exposing this. (Also I think parameterization by isolation is probably all that Task.sleep actually needs, since it can only be called from a task anyway.)

Also, what is the "task executor for the current thread"? Is it a new concept that this can be discovered dynamically somehow?

Exposing implementation details

I think we probably need to document the layout of the buffer provided by withUnsafeExecutorPrivateData. Size and alignment are both 2 * sizeof(void*).

I feel like exposing the job kind is premature; we're not providing any way to use that information besides the guarantee that a task has an allocator. You can specifically test for the presence of an allocator already.

I wonder if most clients of the allocator API are going to want DTRT APIs that allocate on the job allocator if available and otherwise use malloc.

UnsafeMutableBufferPointer.allocate doesn't return optional itself, so I'm not sure why these APIs do.

SchedulingExecutor

We know we're going to want to generalize this to a more arbitrary event system. It's not the end of the world to have this a special case, I guess.

Why have the enqueue API that takes a duration as a customization point on the protocol? Don't we always want the implementation layer to be working with instants and not constantly mapping back and forth, since we know that introduces slop every time it happens?

Why are these methods on a new protocol rather than on Executor with default implementations? Having to check for a SchedulingExecutor is a dynamic cast, which is a pretty non-trivial operation — maybe not in comparison to a sleep, but still, I'd hate to pay it unnecessarily.

8 Likes

RunLoopExecutor

I think they probably all should, but you're right that there are certainly exceptions (Dispatch being the primary example; DispatchMainExecutor presently calls fatalError if you try to stop it; maybe it should call exit() instead?). I guess, as with run(until:), perhaps the argument is that we shouldn't make this part of the RunLoopExecutor protocol — but the flipside of that is that almost every kind of run loop executor besides Dispatch can easily support both of these.

We also would like to be able to implement the existing runtime entry points on top of this protocol — there is already a swift_task_donateThreadToGlobalExecutorUntilImpl entry point. We don't have one for stop, mind.

No. RunLoopExecutor is not intended to have multiple threads. It's intended for cases where there is a single thread that is executing some kind of run loop, hence the name.

ExecutorFactory

Looking for global or static variables seems worse than looking for a global type, or one defined inside the @main struct. The point of having the protocol is to express the requirements for that type.

It is a bit odd, for sure, but we did consider a number of other options (command line arguments to the compiler and magic variables were both on the list).

Task executor properties

The justification for having these is that there are times where code would like to ensure that something runs on the same executor, to avoid having to hop executors (which might in the general case involve a thread hop and therefore a context switch). Task.sleep is certainly an example where this was happening previously, but it isn't the only situation where being able to actively choose to do something with the current executor is useful.

I'm fairly certain @FranzBusch has expressed an interest in being able to get hold of the current executor in the context of Swift NIO as well.

This isn't a new concept — it's how the task executor tracking is implemented. That is, there is a thread-local notion of the current task executor. Maybe we should word this differently for users of Swift, as opposed to people who know how the runtime works under the covers?

I get where you're coming from on the front of preferring to talk about isolation… maybe we should emphasise that that is normally going to be the right way to do things, and that these properties are exposed for low-level situations where you want very specific behaviour?

Exposing implementation details

That seems like a good idea.

I think this is a hang-over from a previous version, where you couldn't ask if there was an allocator directly, and had to do it by checking the job kind. We don't really need to expose this outside of the runtime.

They don't, at least not in the actual implementation. Perhaps I've got an error in the pitch document; I will check that and update it.

SchedulingExecutor

Some executors actually directly support durations rather than (or as well as) instants. Windows thread pools are one such example, and I had also had a request from some of the embedded systems folks I talked to, who didn't see why they should have to compute an instant when they had timers that worked using durations.

In addition, we need to be able to implement the existing functions in the Swift runtime on top of this interface, and one of those explicitly takes a duration rather than an instant. We could, of course, handle that entirely internally rather than exposing this as API, but in doing so we would then prevent executor implementations that can themselves deal with durations from doing the most efficient thing.

Because not all executors can necessarily schedule future work.

One might argue that the default implementation could hand off to the Clock, which should figure out what to do, but since we don't know what executor we're running on, ContinuousClock and SuspendingClock currently check if the executor they've been handed can schedule jobs and hand off to it; if it can't, they create a trampoline job for the executor they've been given, then schedule a job using the Task.currentSchedulingExecutor (if there isn't one, that's a fatal error). That would create a bit of a circularity problem…

Agreed, which is why there is an asSchedulingExecutor method on Executor, the idea being that executors that do support scheduling can implement that method and directly return themselves, which avoids having to use the dynamic casting code in the runtime.

3 Likes

Okay, it sure sounds to me like either Dispatch shouldn't implement this protocol or you need to remove these two optional protocol requirements. I do not understand the point of having "requirements" that clients can never safely use without additional information that cannot be learned through the protocol.

Should it derive from SerialExecutor, then, if it guarantees serial execution? As it stands, the novel operation that the executor is supporting is thread donation, which is not tied to single-threaded execution.

Also, stopping the executor from another thread is inherently racy unless that's somehow synchronized with the execution thread. You can't know which invocation of the executor you're stopping.

Command line arguments are discussed briefly in the proposal, but magic variables aren't discussed at all. Certainly magic variables still have the characteristic of being specified by the program. The struct approach is just giving access to a pair of magic variables anyway.

I have several concerns about the struct. One is that, in order to allow selective override of one of these executors, you're forced to override both. So this approach only works for overriding these two things; if we find we want other things to be customizable about program start, we'll need to invent yet another new mechanism. And that ties in with the second concern, which is that in order to allow overriders to delegate back to the platform implementation for executors they don't want to override, you have to make the platform implementations still available in the process via PlatformExecutorFactory. So aren't we, e.g., still required to link Dispatch on Linux (and initialize it on first use) unless we can provide that this facility is completely unused?

Right, so I'm specifically ask why the tools the language already provides for isolation inheritance (iso: isolated (any Actor)? = #isolation and nonisolated(nonsending)) aren't good enough here. I know these tools are relatively new, so they didn't exist in their current form when you started this design, but I think we ought to consider the language as it stands now.

Hopping executors is a common concern for async functions, but async functions don't just hop arbitrarily — they hop when either the task's preferred executor changes (always an explicit operation, obviously) or isolation changes, so if you just inherit isolation, there's no hop. And the formal isolation of a function is a concept we already have in the language and which doesn't have the same foundational problems as trying to return a single current executor in a synchronous context.

Outside of that context of not preserving the computation of an async function, I'm extremely skeptical about just having random synchronous functions try to schedule stuff asynchronously onto the current executor. The Darwin SDK, for example, has a ton of APIs that behave differently when invoked on the main actor, and I think that's something we want to get away from, not specifically add features to encourage.

I'm worried about this turning into what's essentially a contract that we need to preserve either an instant or a duration through every API that works with them, instead of treating allowing a duration as a convenience to be swiftly translated to an instant.

No, but the system can certainly provide a default implementation that schedules future work onto a system executor to schedule immediate work back on self, and I don't know what a client is ever going to do that's better than that. The fact that you're talking about high-level APIs fatal-erroring if the task happens to not be running on a scheduling executor, as if that's not just an acceptable answer but somehow to be preferred to the additional hop, seems like a real problem that ought to demand reconsideration.

I'm thinking in part of the future extension to support scheduling based on arbitrary events. No custom executor is likely to ever support the full spectrum of what an event looks like, because that could come from an arbitrary combination of system and user-defined events. Imagine, say, asking a job to be run on the first of either an I/O event or a new value appearing in an AsyncStream. One of these is a kernel signal, the other is a userspace event; what executor is going to handle this intrinsically? Instead, the executor going to recognize specific events that it can handle directly, and otherwise it's going to have to delegate to a system implementation which will coordinate the different event sources as efficiently as it can.

2 Likes

Right now there is essentially no way to check if the current task is running on the executor that a developer might expect while we do provide ways to check for the current isolation.

However, actor isolation aren’t the only isolation regions. There are multiple isolation regions in our concurrency model:

  1. Actor based isolation
  2. Task isolation
  3. Custom isolation regions established by constructs such as Mutex

Not everything can or wants to be modeled with an actor. We have many cases across the ecosystem where something needs to run on a specific thread and task executor preferences are used to ensure that. This is commonly modeled like this:

final class MyThreadedResource {
  private let executor: TaskExecutor

  func interact() async {
    assert(Task.currentExecutor === self.executor)
    // Interact with the underlying resource
  }
}

I understand the argument that you are making that this through the latest language features that this is already enforced at compile time but I think we should still allow users to check this at runtime.

So to this point, I don’t expect developers to use the Task.currentExecutor to schedule new tasks onto it but rather to check their invariants. FWIW, even without this API developers are capable of mimicking this by doing Task(taskExecutorPreference: Task.preferredExecutor) { ... }. Task.init inherits the surrounding actor context and the passing along the preferred executor is currently the only other information that influences scheduling.

Okay. I can absolutely support having an API for querying whether code is currently running on a given task executor. That is the shape we've given the corresponding API on actors.

1 Like

That's a fair point. I've re-factored things such that the Dispatch executor doesn't implement RunLoopExecutor, but instead a new more restrictive ThreadDonationExecutor. That does model what Dispatch gets up to rather better.

[Talking about RunLoopExecutor]:

Yes, that seems sensible.

Indeed, doing that only makes sense if there's only a single thread in run(), which is the expectation for RunLoopExecutor but you're right in principle someone might have a ThreadDonationExecutor that supported multiple threads.

I think they might have been discussed in the original (internal) document that I was handed to start with, but didn't make it into the proposal text as it stands. I agree that the struct+protocol approach is notionally equivalent to magic variables, though I think it is better in a few respects:

  • The protocol lets us enforce the types of the variables cleanly and in a way that Swift users naturally understand (versus hard-coding the types into the compiler). I do appreciate that we are still teaching the compiler about the protocol.
  • Exposing a type rather than two variables seems less likely to cause symbol clashes.
  • It's very likely that in most cases users will want to replace both executors at once. Replacing only one of the two seems unlikely to be common in practice.

I've tried to address that in an update to the proposal by providing default implementations for ExecutorFactory that use the platform executor.

I don't think what we're doing here prohibits us from providing some other mechanism in future (and maybe deprecating this one) if we come up with something better.

Not in the long term, no; we have to provide some default executor, but it doesn't have to be Dispatch-based. We need to keep Dispatch for the present in any case because a lot of packages rely on being able to make use of Dispatch.main.

I agree that would be undesirable, but I don't think this is likely to be a big issue here, because the enqueue() methods are hard to use outside of the runtime (since you can't make an ExecutorJob object yourself). The only way someone could try to do this is using an executor preference, and I think the isolation tools are going to be a better fit for most use-cases. I've added some wording to the proposal directing people to isolation-based approaches.

The main use-case for these properties is debugging; they are useful as an implementation detail within the runtime, and might conceivably be useful occasionally in some other contexts (e.g. if you have a program with multiple RunLoopExecutors, as might be the case in a fancy Windows program, and want to trigger nested run loop execution from some function, or want to cause the run loop to stop).

Understood, but the counter-point is that reading a clock in order to turn a duration into an instant (or vice-versa) is not necessarily a cheap operation, and there will certainly be cases where the underlying system is able to work specifically with a duration or specifically with an instant. A good example here might be an embedded system with a hardware down-counter that triggers an interrupt or sets a flag when it reaches zero. There's no point trying to calculate an instant in order to use such a thing.

I also think there are potentially subtle semantic differences in some cases (e.g. there might be a specification for percentage overrun for a duration, and maybe an absolute overrun for an instant).

I don't think higher layer APIs necessarily need to preserve durations or instants all the way down. I just think that at this lowest layer, we should provide the option.

2 Likes

Well, by "which invocation" I actually meant which time somebody tried to donate the thread, which seems like a problem even with a serial executor. The whole point of stop() is that a thread might get repeatedly donated, right? We're not actually imagining that somebody's going to donate the thread, get stopped, and then leave all the jobs to rot forever.

It seems very reasonable for a job currently running on the executor to stop it, but I don't know how somebody ever uses this correctly from a different thread entirely.

This is quite surprising to me, actually. I mean, I get the "I just don't want to use Dispatch" use case, but in the future world where non-Darwin platforms don't default to using Dispatch, I would expect it to be somewhat likely that someone would want to customize the default thread pool and very unlikely that someone would have any real problem with Swift's default main executor. Perhaps I misunderstand the use cases, though.

Well, I guess what I really mean is that, whatever the default implementation is, the fact that it's always exposed by the Concurrency library as PlatformExecutorFactory makes it a whole-program problem to avoid linking it, whereas otherwise we could just decide that when building the main executable based on the executors it specifies.

Certainly my thinking here is shaped by my expectation that this is not a permanent constraint.

This not at all a reason to expose them as public API outside of the runtime.

If you want to a function to render the current executor as a String, I would be much happier with that than something that just returns the current executor.

Okay, I can accept this.

1 Like

Agreed, calling stop() from another thread entirely would be unusual, and yes, not only might a thread be repeatedly donated, but even in some cases you might nest thread donation.

I see what you mean. I've mostly been thinking about cases like NIO (or hypothetical cases like a libuv-based thing), where the library you're using wants to take over both the executors, but yes, if you were going to customise one or other of them it'd make sense that it was the thread pool one.

In any event, adding default implementations for the protocol should solve that problem.

I'm not entirely sure about this part; the compiler won't refer to the DefaultExecutorFactory type in the _Concurrency library unless it fails to find one in the user's main module. There might be issues relating to Swift's linkage model that mean that having the implementation in the _Concurrency library itself, rather than in some other module, is a problem? Are you proposing that we move the default implementations to a separate module?

Oh, interesting. That I hadn't considered. Why would we want to expose ways of creating an ExecutorJob? (Aside from debugging/testing, obviously.)

Were you thinking this might be useful for I/O support or something?

Returning the current executor is potentially useful though. Consider for instance the case where you had a function that wants to do something with the current RunLoopExecutor; with this exposed it can do e.g.

(Task.currentExecutor as! RunLoopExecutor).runUntil { n > 17 }

or similar. Plainly that code is going to expect to be run on a RunLoopExecutor and not on some random TaskExecutor, but it would work on any RunLoopExecutor. Similar use-cases might exist for specific executors — maybe some of them might have their own additional methods that you might wish to call, or additional information you might wish to query, and without Task.currentExecutor you'd need to build some way to track which instance of which executor you're using.

Right now, you can't make ExecutorJobs outside of the runtime, so this honestly doesn't seem like much of a hazard, and even if that changed we certainly could deal with it by documentation — for instance explicitly stating that enqueue() should not be used in lieu of task isolation.

It also occurs to me that us not providing this feature doesn't really stop people from getting at this information — it just makes it harder. If they're determined to abuse things, they'll read the executor directly from the TLS data themselves anyway.

Yes, I agree.

Right, but this is public API, so presumably any random piece of the program can use PlatformExecutorFactory whenever it likes.

Creating jobs is often useful in executor implementations, which it should eventually be possible to do in Swift. But also, fundamentally executors are just a job-running mechanism, and while we want to encourage programmers to use high-level facilities like Tasks, there’s no reason you shouldn’t be able to construct a job from a closure and ask your favorite executor to run it.

Okay, this is all still completely abstract. I would really like some concrete use case before we make this API public, because, again, I have specific concerns about it being misused as presented.

I've been thinking a lot about this and I strongly believe we should expose the currentExecutor API. If we don't, then there's no easy way for a program that uses its own custom executor to invoke custom methods on the appropriate instance of that executor, since it doesn't know what the current executor is.

I did briefly explore the idea of creating a semi-opaque type that we could use as the return type for currentExecutor that would actively refuse to provide a direct Executor or SerialExecutor reference, but that could supply references of other types. That would address your concern, I think, but:

  • It doesn't seem worth restricting preferredExecutor in that way — after all, the program already has a reference to the preferred executor it set, so could easily call methods on that directly.
  • It turns out that implementing currentExecutor is easier for users to do than I thought — I had thought it would involve either poking about in TLS or reimplementing executor tracking, but @FranzBusch pointed out that you can get an UnownedSerialExecutor from the current isolation.

In terms of a concrete use-case, it's common on many GUI systems to implement mouse-tracking with a nested event loop… most of them can also admit a state-machine approach with event handlers instead, but that approach can be less performant, particularly on slower devices, can also allow other things to run that maybe shouldn't, and is significantly more complicated to implement. The exact specifics do vary from platform to platform… on Darwin we might do something like

let mouseEvents: NSEvent.EventTypeMask = [.leftMouseUp, .mouseEntered, .mouseMoved, .mouseExited]
while event = window.nextEvent(matching: mouseEvents, until: .distantFuture, inMode: .tracking, dequeue: true) {
  // Process `event`
  ...
}

while on Windows it might look more like

var msg = MSG()
while GetMessage(&msg, hWnd, WM_MOUSEFIRST, WM_MOUSELAST) {
  // Process `msg`
  ...
}

Anyway, the point is that we might well want to provide a mechanism by which the Swift Concurrency executor (which is normally the thing that runs the underlying event loop) can be used to do this kind of thing; it would necessarily be system specific, since we aren't proposing writing a cross-platform UI library, but it might look something like:

let executor = Task.currentExecutor as! WindowsExecutor
while let msg = executor.getMessage(hWnd, WM_MOUSEFIRST, WM_MOUSELAST) {
  // Process `msg`
}

and that would allow Swift Concurrency to continue running code while waiting for the appropriate messages.

Similarly on Darwin, there are situations where you want to run the run loop in a specific run loop mode; this is platform specific, and we aren't providing a method for this at present, but one might imagine doing something like

let executor = Task.currentExecutor as! DarwinRunLoopExecutor
executor.run(mode: .eventTracking)

and then later on have an event handler do

let executor = Task.currentExecutor as! DarwinRunLoopExecutor
executor.stop()

to get that run to return.

2 Likes

I don't see what you're getting at with mouse tracking. On Apple platforms, your example would just be a @MainActor function, and the same idea applies on Windows even if it's somewhat more complex: while different windows can be processed by different threads, a Windows GUI library for Swift would surely still want to model those threads as a kind of actor, and so you'd still just write this function using the existing tools for actor isolation (e.g. declaring or asserting that it's isolated to the appropriate WindowActor) rather than dropping down to the executor level.

Again, this does not seem like something that arbitrary code is going to do; it's something you should only do when you already know that you're in a specific execution context, in which case I don't know why the runtime's tracking is required in order to recover the information.

1 Like

So the idea is that we don't need Task.currentExecutor because we could instead do something like

guard let isolation = #isolation else {
  fatalError("Not isolated")
}

let executor = isolation.serialExecutor as! WindowsExecutor
while let msg = executor.getMessage(hand, WM_MOUSEFIRST, WM_MOUSELAST) {
  // Process `msg`
}

in an actor-isolated context, and so we don't actually need Task.currentExecutor?

That doesn't seem unreasonable, though we'd still want the executor to handle the detail of spinning the event loop, because otherwise any Swift Concurrency work that's expecting to run on the same thread will stop while inside the loop.

I guess that also applies to the run(mode:) example too.

Fine. In that case, Task.preferredExecutor I think should remain public, since I think that's still a useful thing to have, but the other two can be non-public.

Just to expand on this. It must remain public. There is code that relies on being able to extract the preferred executor of the current task and then create a new unstructured task and set the preference there as well.

Right. I am not a Windows GUI programming expert, and I’m sure some of these details vary between the different APIs. But from what I do know, the threading model seems to expect that data races are avoided by only accessing UI objects on the thread that owns their window’s message queue. A good integration of that into Swift should aim to rule out those data races. Now, maybe that can just be done by making all UI objects non-Sendable, in which case no actors need be involved. However, I suspect that’d be too strict for normal, idiomatic uses, and a more expressive model would be make them Sendable but prevent accesses from outside the owning thread, as we do with actors and global-actor-isolated objects. The natural model for that would be to have some kind of message queue actor and, via some new language feature, associate GUI objects with that actor, such that things like methods on those objects would be implicitly isolated to the message queue by default — basically like actors do, but allowing objects to delegate their actor-ness to other objects.

Once you accept that such an actor should exist, I think it becomes clear that programmers should be mostly interacting with it rather than directly with the underlying executor.

4 Likes

On Windows, UI objects are typically owned by a particular thread, but it's possible to interact with them from any other thread in the process. For vanilla Win32 things, generally speaking that means using SendMessage(), which will call the target window's WNDPROC directly if the window is owned by the same thread, but will enqueue it on the target thread's message queue otherwise, and won't return until that thread has processed that message (which happens from GetMessage(), PeekMessage() et al, in the target thread's message pump… those functions won't return messages queued by SendMessage()).

Some controls have an API associated with them, but it's normally just a wrapper around SendMessage().

(This is very similar to actors, in fact; individual calls are atomic, but a sequence of them is not.)

For more modern COM-based APIs, the story is a bit more complicated, but provided marshalling support exists for the interface you're using (it normally will for Microsoft-provided interfaces), the COM runtime can create a proxy object for you that allows you to talk to an object that is owned by another thread, assuming that object doesn't itself support free-threaded accesses.

1 Like

For those following this thread, I've split out the Clock related work into a separate proposal titled Delayed Enqueuing for Executors, which is a prerequisite for this proposal as otherwise we can't implement the Concurrency runtime's C API for cases where custom executors are in use.

7 Likes