Interlocking in a Hierarchy of Actors

michelf · December 10, 2020, 2:14pm

I think it's a bit of a problem that actors can't talk to each other synchronously. I'm trying to find a model that would allow synchronous calls from actor to another actor while avoiding deadlocks and limiting blocking. So this an the idea I've come with. I'll be keeping an updated version of this document here.

A Hierarchy of Actors

This is an attempt to define an Actor system where some actors can make synchronous calls to others. It tries to keep blocking on the margins — where it is more tolerable — and to not allow anything that could cause deadlocks.

There are two basic rules to add to the current actor pitch:

Actors can have a parent actor chosen when initializing the actor.
A child actor can call synchronously methods of its parent actor.

A typical application could look like this. The Main thread actor has child task actors, which can have subtasks:

       Main
  +-----^-----+
  |           |
Task A      Task B
          +---^----+
          |        |
      Subtask C  Subtask D

`@interlockable`

A hierarchy is created by giving the @interlockable attribute to a let property of an actor class that is used as the parent:

actor class ChildTask {
   @interlockable let parent: SomeParentTask // also an actor class
}

The type of the property must be an actor class, but it can be optional, and it can be weak or unowned. It must also be a let to guaranty it'll never change after initialization: this also ensures there'll be no cycle. You can only use @interlockable inside an actor class.

`interlock`

The @interlockable property is blessed by the compiler to permit special calls. Since the parent is an actor, you normally have to use:

    await parent.method()

to call anything on it. You should still use it most of the time to avoid blocking. But since this property is @interlockable, you can also use:

    interlock parent.method()

which will "lock" the self during the call, preventing new partial tasks of this to be interleaved. This avoids the creation of a suspension point and ensures the state of our ChildTask will not change during the call. Because there's no suspension point allowed, interlock may only be used to call synchronous methods on the parent actor. It is also not possible to have both interlock and await in the same expression.

Since interlock enables synchronous access, you can use it to mutate properties and subscripts of the parent, or pass them as inout to other functions. This is allowed:

    interlock parent.someCounter += 1
    interlock self.mutatingMethod(&parent.someCounter)

The parent of an actor becomes a place where children can share common state.

When to Use

interlock should be used sparingly when you need to avoid a suspension point. For instance, an actor could purchase an item from its parent like this:

func buyFromParent(item: Item) {
    let price = await parent.price(for: item)
    guard let funds >= price else {
       throw SomeError.insufficientFunds
    }
    interlock try parent.purchase(item, at: price) // no suspension point here
    funds -= price // skipped when the purchase fails
}

Because interlock guaranties no interleaving of other partial tasks on this actor, you can deduct the funds after a successful purchase. Without interlock, you'd have to code more defensively so the last two lines can handle the case where multiple calls to buyFromParent are in flight:

func buyFromParent(item: Item) {
    let price = await parent.price(for: item)
    guard let funds >= price else {
       throw SomeError.insufficientFunds
    }
    funds -= price // take the funds here so interleaved tasks know they are not available
    do {
       await try parent.purchase(item, at: price) // suspension point here
    } catch {
       funds += price // put back the money in the funds
    }
}

While this defensive coding protects an invariant where funds >= 0, it can lead to racy results. If your actor has funds to purchase only one item but tries to purchase two, and if the first purchase throws for some reason, the second purchase might also fail due to insufficient funds if by chance it is executed in the time window where the funds have been temporarily removed. This race does not exist with interlock.

Blocking-Free Where it Counts the Most

In a UI application, blocking the main thread will cause the UI to become unresponsive, so blocking should to be avoided. Since the Main actor sits at the top of the hierarchy, it has no parent and can never lock itself while calling another actor.

Deadlock-Free

Locking is often known to cause deadlocks. This occurs when creating a cycle: A blocks waiting for B while B blocks waiting for A; both are waiting on each other and will wait forever. The cycle can have more participants, but the result is the same. Deadlocks often depends on timing and can be hard to reproduce.

With the interlockable hierarchy there is no cycle possible. An actor can only interlock with its parent, and the parent can interlock with its own parent, but a parent can never interlock with a child or another arbitrary actor. If we were to allow parents to interlock with their children, cycles could be formed. So we're not allowing that.

This is also why async methods cannot be called within interlock. A cycle would be formed if a child is locked while calling a parent and that parent decides to call an async method on the same child: it would wait forever for the child to unlock.

Inconsequential Suspension Points

Note that when we say interlock locks the actor, it must only suspend running partial tasks of this particular actor. If multiple actors are running on the same thread or queue, the executor must be able to to continue running their partial tasks. An interlocking implementation that blocks the thread is problematic unless you can guaranty only one actor is using the thread. How this is implemented is decided by the executor.

One way this could be implemented would be this equivalence:

    interlock parent.method()
    // same thing as:
    self.suspendRunningPartialTasks = true
    await parent.method()
    self.suspendRunningPartialTasks = false

where the executor honors the suspendRunningPartialTasks flag of an actor by not running its queued partial tasks.

With this implementation, interlock might still create a suspension point, but this suspension point has no consequence on the actor state since it does not allow interleaving. From the actor's point of view, it's as if there was no suspension point.

`init` & `deinit`

It is not clear to me a the moment if the initializer and deinitializer of an actor class are running within the actor's execution context or not. Assuming they runs from the actor's execution context, the usual rules for interlock apply.

However, if they do not run from the actor's execution context, then we may need to prevent an interlock during init and deinit which will necessitate a new set of rules.

Complex Interlockable Graphs

To prevent deadlocks, all you need is to ensure there is no locking cycle. A tree is obviously cycle-free, but more arbitrary graphs can be free of cycles too. Something like this:

       Main
  +-----^-----+
  |           |
Task A      Task B      Database
          +---^----+ +-----^
          |        | |
      Subtask C  Subtask D

Or like this (since this is a directed graph):

       Main
  +-----^-----+------------+ 
  |           |            |  
Task A      Task B      Database
          +---^----+ +-----^                    
          |        | |
      Subtask C  Subtask D

Actors are guarantied to be free of cycles because the @interlockable forces the property to be a let. You can't express a cycle with only let properties.

This latest graph is interesting though: Subtask D has two paths to Main. Because it's hard to guaranty a locking order in those circumstances, and also to avoid the need for recursive locks, interlock does not permit synchronous access to both parents at the same time. This is allowed inside Subtask D:

    let record = interlock databaseActor.records[1234]
    interlock taskBActor.fetched.append(record)
    // one interlocked actor at a time

And this is not:

    interlock taskBActor.fetchedRecords.append(databaseActor.records[1234])
    // error: can only interlock with one actor at a time

Alternatives Considered

No `interlock` keyword

We could allow synchronous access to the parent directly with no keyword:

    parent.someCounter += 1

This looks like a synchronous call, and that's effectively what it is. But it also hides the cost that our actor is locking itself while calling another actor.

`wait` instead of `interlock`

This reads well and somewhat mimics the well known wait() multithreading primitive:

    wait parent.someCounter += 1

But it's too close to await for confort in regard to meaning, pronunciation, and spelling. wait is also a commonly used identifier so it'd create confusion and ambiguities.

Get rid of `@interlockable` as it's not actually needed!

Did I mention you can't create a cycle with only let references?

We could get rid of the @interlockable attribute and allow interlock to work with any let property of an actor.

ktoso · December 10, 2020, 2:39pm

Hi there,
thanks for taking the time to write this up.

I think this is attacking the problem of reentrance "backwards", assuming that reentrance is a must, and we need to what the model on it's head and break through the most basic requirement of actor communication -- that it is done by asynchronous message passes.

The proposed model also complicates things a lot when there are actors im play which absolutely definitely you cannot invoke without suspending, so we'd need even more special cases around this interlock, to allow it on local-only but no other actors...

This proposal is not deadlock free.

The parent does not need to "interlock" with anyone, it simply needs to message a child in order to complete a request from it -- and the child will never be able to reply, because it is "interlocked with its parent", thus the parent never gets a reply, thus the child never unlocks.

child: interlock parent.greet() // greet me
parent: greet() runs eventually
  - parent.greet: await child.name() // what's your name?
<deadlock: child cannot answer, since it is non-reentrantly interlocked.>

I would rather encourage the following way of thinking about actors and how to build "privileged" pairs of them, such that they can avoid the executor "hop":

technically everything is async messages
"it just so happens that"
- a) async calls to self execute directly; don't even have to give up the thread (this is part of today's proposals already)
- b) if and only if, the executor of the sender and the recipient are exactly the same, and effectively 'single threaded' (we need some better name for this), then the message to such parent may be executed directly.
  - We can guarantee this is safe because such executor gives us the guarantee that only one of those actors may ever execute at the same time; as such, we are not breaking the model by immediately donating "our" thread to the recipient; we can do a back-and-forth without ever giving up the thread this way.

Note that we never "break the model" here, it's all async messages, but "just so happens that" some of them actually execute immediately, but that's hidden internally, rather than exposing and breaking the interaction model and adding another interaction another style.

It is not necessary to structure trees of actors to get the benefit of "I specifically know that other actor is on the exact same event loop as I am, and such I can abuse this a little to get immediate calls through it".

This would require a change to the concept of executors we have today, but to be fair executors today have not been specified very well yet, so this is a pretty small change boiling down to:

actors are not executors, but they have executors
- vast majority of actors simply has the default executor; this is all the same as in today's proposals
today's enqueue(PartialAsyncTask) function of the Actor protocol to be replaced by an executor instance; we can of course specialize an executor for "we know this is the global default executor"
today's enqueue(PartialAsyncTask) moves onto an Executor protocol.

This will enable:

the just explained "actor hopping avoidance"
- which e.g. even runtimes like Swift NIO could benefit from
specialized actor pairs which are closely related to eachother and want to for some reason avoid the "hops"; these happen super rarely to be honest in my experience, but I can confirm the pattern exists in the wild out there; usually when the executor is a known "single thread" (like main, or a specific dedicated EventLoop)
other runtimes, which do not have multiple threads available (e.g. wasm) to simply implement an Executor that will be used by actors on that runtime, rather than having to somehow get that functionality into the enqueue function of each and every actor (though the mechanism how we "select default executor" would still be some configuration I suppose).

Not sure if I expressed the idea clearly enough, I'm hoping to put together a more detailed version of this at some point. It is important because it enables single threaded runtimes as well as event loop based runtimes to adopt actors without unnecessary actor executor hopping penalties which is of utmost importance for high performance libraries such as NIO.

anandabits · December 10, 2020, 3:20pm

I think you expressed it quite well. It looks like this might address some of the use cases where I thought I would need to use global actors. I'm looking forward to seeing this fleshed out, especially the bit about how an actor specifies its executor.

michelf · December 10, 2020, 4:29pm

I thought I had covered this by saying:

Because there's no suspension point allowed, interlock may only be used to call synchronous methods on the parent actor. It is also not possible to have both interlock and await in the same expression.

When saying you can only call synchronous methods on the parent actor, I meant you can't call async methods. Since only async methods can await, I wonder how it can end up awaiting on the child as you suggest.

This model can work. But I fear "it looks async but it just so happen it isn't" will end up training people to assume it isn't async when it says it is. await loses its meaning if it cries wolf too often for no reason. If there was a way to drop the await in such cases that'd make the model more interesting.

This might be good enough though if the goal is just to make things more optimized. But if your logic depends on await not awaiting, it makes the code contorted as you're relying on await lying to you.

If an actor has only async functions, then you can't call any of them with interlock.

michelf · December 11, 2020, 2:15am

I had to think a bit about this interpretation of my pitch. What actually could break the actor model? I don't think this pitch breaks actors as a model, but it all depends on the answer to a simple question:

Can an actor temporarily suspend the execution of partial tasks and still be an actor?

To me the answer is yes: an actor that sometimes block is still an actor. It may result in poor scheduling, but I don't think an actor has to be perfectly scheduled to be an actor.

Blocking the execution of partial tasks on the current actor is the only actual effect of interlock. The rest of the pitch is only consequences of that plus a set of rules to prevent deadlocks.

That we can call any a non-async methods of another actor, including those with inout parameters, is a consequence of the current actor no longer running partial tasks until the call has ended. The call can still be sent asynchronously to the other end, but it'll be synchronous from our end (because we're blocked), hence why inout works.

That we can't call async methods with interlock is a deadlock prevention rule. We could easily add unprotected_interlock to allow calling async methods while still blocking the current actor.

On the practical side, the ability for an actor to block itself in some circumstances allows smaller actors to compose well with each other, in contrast with big monolithic actors. Let's analyse the buyFromParent example from the pitch that describes an issue with the non-blocking implementation.

In the absence of a way to block execution of partial tasks on the actor, someone wanting to fix this will face two choices: merge the child actor so it lives entirely within its parent (it can then synchronously make a purchase), or implement a custom locking mechanism that'll most likely block the thread. The blocking-the-thread choice is probably the worse thing you can do (what if the parent uses the same thread?), but unfortunately it's also the easy fix. Merging everything in a single actor makes things less composable.

With this pitch, the easiest fix is to replace one await with interlock in a single place. Everywhere else the child actor can continue to use await and run concurrently with its parent actor. It's also good to know interlock won't cause a deadlock.

There's also the solution you hinted at: have both actors share a thread/queue so they always run serially. The first drawback is that everything will run serially for those two actors whereas with interlock you only force serialization for some operations that actually require it. The second drawback is you'll have to write await at those places where actually awaiting would be a bug. interlock express the intent of "no-awaiting-here" better than await will ever do, and the "no-awaiting-here" behavior is guarantied.

This remains a valuable optimization technique to increase performance in certain situations. But I'd be wary of writing actor code that can break sporadically once await start to actually mean await.

Will people then use this to abuse actors, treating them as some sort of mutex to protect shared state? They will, obviously!

It's very convenient to have a locking system compatible with actors which can't cause deadlocks and has some compiler support to protect against inappropriate sharing. You'd be a fool to not use those tools if it works for your use case.

I think it'd be a shame if because we're giving the name "actor" to our state isolation unit it couldn't be used in any other way to protect state in a concurrent system.

John_McCall · December 11, 2020, 2:59am

I've been toying with a closely-related idea of allowing an actor to declare an actor to which it delegates. The actor functions on the child would always run on the delegate actor's executor, and so they would always be able to make synchronous calls on the delegate. It's a way to allow a single logical actor to be split across several objects, most of which could come and go dynamically as the code sees fit.

The delegate-actor reference would have to be a non-optional strong reference for this to have reasonable semantics.

The main advantage of your "interlocked" actors is that different "child" actors can operate independently, but that seems to come with a lot of extra semantic problems and probably a lot of implementation complexity, when probably it's more sensible for closely-related actors like this to simply be scheduled on the same underlying executor.

michelf · December 12, 2020, 5:15pm

That's actually quite close to my pitch... I'll try to simplify it in that direction then.

We could simplify like this:

actor class AnActor {
   @interlockable let other: OtherActor
}

there's only one @interlockable property allowed
it must be set at initialization time (it's a let)
- this sets it to use the same executor as other (or one that'll run serially with it)
you use the keyword interlock to make a synchronous call to other

That sounds easier to implement.

It has the downside of serializing everything with the other actor, but...

We could add a flag to a PartialTask to tell whether it contains an interlock or not. PartialTasks with no interlock could be allowed to run concurrently as long as they belong to different actors. It'd be up to the executor to do something with that flag... or not and always run tasks serially.

John_McCall · December 12, 2020, 6:21pm

Well, if you think the “child” actors running concurrently is an essential aspect, then my design doesn’t really work for you. That’s okay! That’s why I’m still just toying with it: it feels like a cohesive idea that fits well with our ideas about data isolation, but I don’t have a good sense of whether it’s useful.

michelf · December 12, 2020, 7:58pm

I think the most essential aspect is that there's an indication in the code where two actors depend on this synchronous interaction (something different from async), and it should be validated by the compiler at compile time.

Once you have that, the compiler has all that's needed to tell the executor whether a partial task depends on this synchronous access and can run concurrently, so why not?

There's a tradeoff with allowing concurrency though (which I failed to mention in my pitch).

If you allow concurrency, the two actors must still exchange data with a proper data isolation protocol (ActorSendable or similar) even for synchronous calls. Whereas if the two actors always run serially, they can share whatever they want with no isolation constraints.

We could make that a choice in this system:

actor class ChildActor1 {
   @interlockable(concurrent) let other: OtherActor
}
actor class ChildActor2 {
   @interlockable(serial) let other: OtherActor
}

And with the later the compiler would never create a partial task with the flag saying it can run concurrently. What you get in exchange is you don't have to make your arguments ActorSendable.

At the very least, it would be useful for refactoring a big monolithic actor into smaller ones without having to address all the issues in a single step. What I just described creates a four-step ladder:

everything in one big actor
smaller actors with some synchronous interlock (but still always executed serially)
smaller actors with some synchronous interlock and data isolation (some concurrency allowed)
independent asynchronous actors with no interlock (full concurrency allowed)

You can climb the ladder one step at a time and you can stop whenever things are good enough.