[Pitch] Custom Actor Executors

Hello everyone,
I'd like to share the first pitch exploring the realms of custom executors in Swift.

This proposal focuses on the minimum viable pieces to make actor executors configurable by library authors or end-users. It is based on the early draft of custom executors by @John_McCall which we had shared way back when Swift Concurrency was first introduced, but we never got to formalizing the ideas it introduced.

Over the last years, we discovered what works, and what needs more work of thought. This proposal focuses on standardizing the basic SerialExecutor mechanisms and how actors can customize where they want to run tasks. We also acknowledge future work areas that were previously pitched, we don't intended to tackle them all at once in this proposal; please refer to the Future Directions section to learn more. Interestingly, some parts of this API were silently introduced and baked into the ABI of Swift concurrency -- where applicable, we explain those relationships and how we make the proposed APIs fit those existing requirements.

This pitch also relates a little bit to the recently pitched unsafeAssumeMainActor idea, because it opens up the realm of various actors sharing the same serial executor and introduces assertions based on that.


You can read the full pitch text on the Swift Evolution, and here's a quick table of contents to get a feel for what it discusses:

In case the text has any typos, or small mistakes, please comment on the PR directly, so that the pitch thread isn't interrupted with many typo fixups :+1:

edit: fixed PR link to go directly to PR

53 Likes

Very excited to see this pitched, thanks!

Will need some time to process, but a couple of comments on the future directions:

Executor Switching

Sounds interesting, we've done similar things in the past for low-latency optimizations in certain cases with great results (in our case we switched between thread stealing and running work inline vs. enqueuing it for another thread to run; as long as we could keep up we stole the thread, when we couldn't we'd take the cost of running two to scale better - not sure if this would be exactly applicable here though - would canGiveUpThread be evaluated continuously as needed, or how would it work?).

### Specifying Task executors

Perhaps providing both suggested semantics would be reasonable?

Task(startingOn: someExecutor) { ... }
and
Task(runningOn: someExecutor) { ... } (always hops back after await)

2 Likes

I'm still working my way through the proposal, but before I forget:


Under the heading "Jobs" in the detailed design:

A Job is a representation of a chunk of of work that an executor should execute. For example, a Task effectively consists of a series of jobs that are enqueued onto executors, in order to run them. The name "job" was selected because we do not want to constrain this API to just "partial tasks", or tie them too closely to tasks, even though the most common type of job created by Swift concurrency are "partial tasks".

Whenever the Swift concurrency needs to execute some piece of work, it enqueues an UnownedJobs on a specific executor the job should be executed on. The UnownedJob type is an opaque wrapper around Swift's low-level representation of such job. It cannot be meaningfully inspected, copied and must never be executed more than once.

I'm a bit confused because the following code description is for Job not UnownedJob, and the definition of UnownedJob doesn't actually come until later.


The description of Executor.enqueue() makes it sound like UnownedJob is something deprecated, but the later discussion seems to suggest that it's not. It's sort of odd if UnownedJob isn't deprecated, but the way that you use it (enqueue) is. Or am I missing something here?


Also in the "Jobs" section:

Eventually, an executor will want to actually run a job. It may do so right away when it is enqueued, or on some different thread, this is entirely left up to the executor to decide. Running a job is done by calling the runJobSynchronously method which is provided on the SerialExecutor protocol.

and:

Executors are required to follow certain ordering rules when executing their jobs:

  • The call to SerialExecutor.runJobSynchronously(_:) must happen-after the call to enqueue(_:).

At first I thought this meant that runJobSynchronously would run a previously enqueued job, but I'm confused by the fact that both enqueue and runJobSynchronously have a job parameter, so they apparently refer to different jobs. The code description of SerialExecutor.runJobSynchronously(_:) doesn't seem to have anything to do with a previously enqueued job.

What is the significance of "happen-after" then, in the above quote? Or is enqueue really just runJobAsynchronouslyAtAPossiblyLaterStartTime?

1 Like

This is a great idea; I'm glad these APIs are becoming official.

It's important to continue supporting UnownedJob because it can be cast to UnsafeMutableRawPointer for use as the context pointer passed to dispatch_async_f and other C based workloops.

1 Like

Nothing useful to add other than this is extremely exciting! :tada:

1 Like

I assume the backwards-deployment story is “cannot be backwards-deployed” because of the new protocols and such. If that’s the case, what will happen if someone sets up a whole actor network with custom executors and tries to run it on an existing runtime? Do we think the necessary availability guards are enough of a tip-off that people won’t try to do that?

(Alternately, if it can be backwards-deployed at all, then “neat” and also “how will that work for the introduction of new protocols and requirements?”)

1 Like

Looks like a solid pitch to me! Just one question/concern/addition:

Currently, if you annotate two functions as @MainActor and one calls the other, it doesn't need to await the result (assuming the function isn't marked as async explicitly. What happens if I adopt the asme SerialExecutor in two actors, would Swift be able to check it's running on the same SerialExecutor? Or would I still need to mark my calls between these two pieces of code with await?

4 Likes

I think @John_McCall answered exactly this in another discussion thread yesterday:


Isn't that what the @_alwaysEmitIntoClient is for, or does it not work on whole types or protocols?

2 Likes

It only works for functions.

2 Likes

The proposed ways for actors to opt in to custom executors are brittle, in the sense that a typo or some similar error could accidentally leave the actor using the default executor. This could be fully mitigated by requiring actors to explicitly opt in to using the default executor; however, that would be an unacceptable burden on the common case. Short of that, it would be possible to have a modifier that marks a declaration as having special significance, and then complain if the compiler doesn't recognize that significance. However, there are a number of existing features that use a name-sensitive design like this, such as dynamic member lookup (SE-0195). A "special significance" modifier should be designed and considered more holistically.

There was a pitch about this, a few months ago: Pre-Pitch: Explicit protocol fulfilment with the 'conformance' keyword

1 Like

I'll update the doc to stick more to using Job more in the proposal text, but for what it's worth: they're the same, except for the ownership/safety. The Job is move-only, consumed by running it and therefore generally safe, and UnownedJob whose lifetime is not managed, and is unsafe to access after it was "consumed" (e.g. by runJobSynchronously(theJob)).

That is what is happening here. We have an existing API today that wasn't documented, but exists, that was accepting an UnownedJob: Executor.enqueue(UnownedJob). This API exists and is becoming deprecated in favor of the move-only Job accepting version Executor.enqueue(Job).

At the same time, current limitations of the early version of move-only types (which are being pitched, but have not reached a formal review yet) can be limiting enough that users may need to resort to using the UnownedJob. The type is therefore not deprecated, and should be treated as an escape hatch, that you may need to use when job is used in a generic context (e.g. like the proposal mentions: you cannot store Job in an array, but you can store UnownedJob). Therefore, entry-point is deprecated - we lead you towards the safe API, however the escape hatch is not deprecated.

Hope this clarifies why the enqueue method is deprecated, but the type not.

This is the method to "run a job", synchronously, immediately when and where this method is invoked:

// in an SerialExecutor:
self.runJobSynchronously(job)

The happens-before and happens-after typical terminology in concurrent systems to express ordering guarantees, here we just mean that this order of an enqueue's effects must be visible to the run; if there is any synchronization necessary to make this happen, the executor must take care of that.

No; it is as the name implies "run this job now, synchronously, on the current thread".

The proposal includes several existing (!) types that are @available(5.1) because they've been already back deployed ever since the Swift concurrency was back deployed. We never formalized those types through an SE review, so we're doing this now, and while doing so, adding the new APIs.

AFAICS backdeployed things will continue to work as they have until today, and some of the APIs probably we can back deploy -- for example, there never was an official API to "run a job" before this proposal, but the runtime function to do so obviously existed already back then -- it is what Swift itself uses to run jobs -- so I think we can try a bit and backdeploy the runJobSynchronously method perhaps. So old implementations, using UnownedJob only, could at least run a job using an official API rather than an underscored one.

I don't have a full picture of what we can and cannot do though with this.

@DevAndArtist is right that @John_McCall just explained this elsewhere actually: `nonisolated lazy let` on an actor - #13 by John_McCall

You'd have to write await but it wouldn't actually suspend if they shared the same serial executor. To get this guarantee into the type-system, we'd need what is discussed in Future Directions: DelegateActor property of the proposal

The previous pitch of custom executors included a concept of a delegateActor which allowed an actor to declare a var delegateActor: Actor { get } property which would allow given actor to execute on the same executor as another actor instance. At the same time, this would provide enough information to the compiler at compile time, that both actors can be assumed to be within the same isolation domain, and await s between those actors could be skipped (!). A property (as in, "runtime behavior") that with custom executors holds dynamically, would this way be reinforced statically by the compiler and type-system.

3 Likes

I'd like to try not to dive to deep into not-yet-completely-designed future directions discussions in this thread, but a few quick comments:

We're not sure about how this would be best exposed. It might be a tricky tradeoff dance and may need various ways to opt into this "sticky" behavior. Perhaps it is a property of an executor, rather than just how one passes it somewhere?

We have not thought enough about this space yet, but it is clear that a "sticky executor" (where we always hop back to it, rather than to the global pool) would definitely be of interest for things like event-loop based systems. But then again, wouldn't that just be a Task on an actor that has that specific executor? :thinking:

We've not thought it through in depth yet, so I'd like to be careful and not promise anything - I don't know if, when or how we'd surface these semantics.

3 Likes

How does “enqueue” imply synchronicity?

2 Likes

Executors are required to follow certain ordering rules when executing their jobs:

  • The call to SerialExecutor.runJobSynchronously(_:) must happen-after the call to enqueue(_:).

I get most of what you say, but I'm still confused about this. What is ordered here? The execution?

If the runJobSynchronously(_:) must happen-after the call to enqueue(_:), then aren't you saying that the enqueued job executes before the synchronously run job? How is that "immediately"?

Let me try saying this another way, since I'm presumably just misunderstanding you:

Since the custom executor can choose to run an enqueue()-ed job at a time it chooses, then there seem to be 3 possible outcomes when a runJobSynchronously(_:) is subsequently called:

  1. The enqueued job has finished executing already. In that case, the synchronous job can run "now", and it will therefore happen-after.

  2. The enqueued job is still executing, In that case the synchronous job has to wait for the enqueued job to finished. The synchronous job would still happen-after, but not "now", only "next".

  3. The enqueued job hasn't started executing. The synchronous job can run "now", and therefore it will happen-before the enqueued job, not after.

Is my understanding flawed here?

I think Quincey’s confusion suggests a good argument for why this API shouldn’t be on SerialExecutor. This function is a necessary function within the internal implementation of an executor, but putting it on SerialExecutor means it will present as a public function of every conforming type, which in this case means at least every default actor. I wouldn’t want someone to see this method on an actor and think that it’s essentially dispatch_sync.

3 Likes

Yeah it seems the move of that method from a free (undocumented) func onto the executor made it more confusing than helpful.

We can put it directly on the job instead, like this: Job.runSynchronously(some SerialExecutor) which should cause less confusion I hope.

Edit: I just confirmed that I just missed that __consuming already works, so yeap, we can express it on Job which is probably the best place :+1:

3 Likes

This looks awesome! I anticipate the day when NIO EventLoopFutures are no more. :slightly_smiling_face:

Regarding this code snippet…

@available(SwiftStdlib 5.9, *)
extension Job {
  // TODO: A JobPriority technically is the same in value as a TaskPriority,
  //       but it feels wrong to expose "Task..." named APIs on Job which 
  //       may be not only tasks. 
  //
  // TODO: Alternatively, we could typealias `Priority = TaskPriority` here
  public struct Priority {
    public typealias RawValue = UInt8
    public var rawValue: RawValue

    /// Convert this ``UnownedJob/Priority`` to a ``TaskPriority``.
    public var asTaskPriority: TaskPriority? { ... }
    
    public var description: String { ... }
  }
}

…wouldn’t it make more sense to type-alias TaskPriority to Job.Priority? As is stated, the point of declaring Job.Priority separately instead of directly referencing TaskPriority is to decouple jobs from tasks from a naming standpoint since jobs can, in theory, generalize beyond tasks. Why not let Job.Priority be the “source of truth”, so to speak, instead of TaskPriority?

typealias TaskPriority = Job.Priority

…instead of…

extension Job {
    typealias Priority = TaskPriority
}
1 Like

I don’t think we want to emphasize Job to general users of Swift Concurrency. I can very easily imagine people thinking they should be making their own Jobs to run specific functions, which is how you interact with concurrency libraries in a lot of other systems, including Dispatch and Java.

2 Likes

This is great, async await does not play well with frameworks that rely on thread locality. I imagine both core data and realm can benefit from this. I'm also curious about the backporting story here especially for the people who have already been relying on the underscored functions to implement their own executors. Another question I have is about the stickiness of when a task is executed. For instance if we wanted to adapt core datas managed object context to execute jobs the only way i've come up with always hop back to a specific actors executor is to pass an isolated parameter between function calls.

ex.
In the case of core data managed objects they are only valid inside a managed object contexts perform block. This makes it really useful to hop back to the managed objects contexts queue after every suspension point. In the example below is there an easier way to express this currently?


actor DB {
    
    private let _unownedExecutor: CoreDataExecutor
    ....
  
    func transaction<T>(resultType: T.Type = T.self, body:  @escaping (isolated DB) async throws -> T) async throws -> T { 
    }
}

func registerUser() async throws  {
    try await db.transaction { context in
       await networkService.makeReq() // scheduled on global executor
       // hop back to the db's executor
       try repo.insert(context: context) // continue on db's executor because we pass in an isolated DB 
      instance
    }
 }

I recognize this is outside of the current pitch but I would be in favour of having task executors be sticky to to the executor they were started on or at least have the option in the api. This would would also open up the possibility of using task locals to inject database connections (managed object contexts) instead of needed to explicitly pass an isolated instance around.

There are several different ways to handle a "model object"-style database that I can see, depending on the answers to a pair of questions:

  • Can the database be concurrently modified from outside of your program?
  • Does the database need to process separable transactions concurrently?

If the answer to both is "no", then a lot of complexity disappears because suddenly transactions can't really fail. This might apply if, say, you're just using the database for persistence. I tend to think that this is the only scenario where you should consider modeling the database as an actor, because actor operations also can't "fail", at least not at the most basic level. (As a programmer, of course, you can define a higher level of "failure" where e.g. an actor method checks some preconditions, discovers they're no longer true, and exits early. But this doesn't happen implicitly, whereas it's pretty intrinsic to concurrent databases.)

In this case, you'll be using the actor's normal scheduling to manage transactionality. Normally, Swift's actors are non-reentrant, which means any await will break up the transaction. I'm not sure you can fix that at the executor level — the information just isn't there to tell you whether a suspending task has finished executing or not. To run an async operation as an atomic transaction, you'd have to have a function like your transaction() that can register the current task and block anything else from running. But it's at least an fair question whether this is something you should even try to do: if transactions can suspend to await arbitrary async operations, that can leave the database blocked for a very long time, or even lead to deadlock.

Regardless, you should make model objects non-Sendable so that you can't escape them from the actor, and then you can just manage them as internal state of the actor. And you can store all sorts of other useful state directly on the actor object, since it's all guarded by the same exclusive executor.

By modeling the database as an actor and using actor isolation as the transactional tool, you strongly encourage transactions that are more complex than a single closure to be written as extensions on the actor type. I think that's fine, though — it doesn't violate encapsulation in either direction to do that in a private extension, and it makes it very clear which parts of your code are database operations and which aren't.

Once you start answering "yes" to either of the questions above, I think an actor becomes a less appropriate model for the database, because transaction failure becomes an unavoidable concern. Swift isn't going to roll back your changes to the actor's isolated properties when the database operation fails. In this case, you still want your model objects to be non-Sendable, but you'll probably be creating them fresh for each transaction. That's still a kind of data isolation, but there's no additional isolation that you're relying on from an actor.

If you've got a distributed or high-performance database, I personally am fairly skeptical of using a model object approach at all. In this case, I think you need to think much more carefully about transactions and conflict resolution, and being able to write naive code that silently turns into an unnecessarily sweeping transaction is probably an anti-pattern.

2 Likes