[Concurrency] Actors & actor isolation

BigSur · October 31, 2020, 10:08am

I feel using actor instead of actor class as one single keyword is better intuitively, even much shorter and simpler to write.

actor itself is class based, a restricted concurrency oriented isolated class which act as an Actor role. That's unnecessary to compose two consecutive keywords for one simple thing especially there is NO actor struct/enum things exists at all.

Furthermore, we can also simplify @UIActor class - attribute based annotation to uiactor keyword.

So overall, transform complicated actor class and @UIActor class into actor and uiactor respectively (same.. there's NO @UIActor struct/enum too); actor for all background scenarios and uiactor for UI main thread only.

Any thoughts..?

Pampel · October 31, 2020, 12:01pm

I guess this is the requirement around actor code not having suspension points between dependant reads and writes of the actor's state. E.g. this would be an unsafe version of the transfer function:

extension BankAccount {
  func transfer(amount: Double, to other: BankAccount) async throws {
    assert(amount > 0)
    
    if amount > balance {
      throw BankError.insufficientFunds
    }
    
    await other.deposit(amount: amount) // racy
    
    balance = balance - amount
  }
}

IANACE so I have no idea what's possible in terms of the compiler providing warnings or errors, or what patterns or structures can be used to encourage or enforce well formed code.

gparker42 · October 31, 2020, 1:50pm

Absolutely. As it stands the proposal is over-promising and under-delivering on data safety.

This proposal promises that actors protect their state against data races. It omits the disclaimer that "data race" is narrowly-defined: a single access to a single field is protected from concurrent access to that same field. These actors provide only weak protection against other data-corrupting race conditions because of the threat of re-entrancy at any await.

The proposal then uses a bank account example. It again fails to mention that the traditional bank account bugs go beyond data races narrowly-defined. It almost completely glosses over the other subtle things the code is doing to be safe from race conditions.

The BankAccount.transfer(amount:, to:) seems to be the only place where race conditions other than low-level data races are mentioned: a comment indirectly mentions the requirement that there be no suspension points between the statements that check and edit the balance. Even this comment is unclear: it doesn't say "await" anywhere and doesn't describe why there can't be any.

The BankAccount.close(distributingTo:) example is also more fragile than it lets on. Why does it use Task.runDetached? Because if it didn't there would be a race due to the await; it would then be possible to double-spend by closing an account and simultaneously withdrawing from it.

IMO actors should not be re-entrant by default. Fewer deadlocks but more data-corrupting race conditions is a bad trade-off. If you do allow actor re-entrancy (by default or not), you also need mechanisms to re-establish data safety other than "don't use await".

Lantua · October 31, 2020, 1:52pm

Would actually be a single sync function is protected from running concurrently, which is still a pretty narrow margin.

michelf · October 31, 2020, 3:33pm

I'm not too sure wether being re-entrant by default is good or bad. It's good for reading operations, and it's an easy pitfalls for any write operation. Perhaps writing to the actor's memory (when mixed with await) should require some sort of explicit choice between "exclusive" vs. "concurrent" (or "interleaved"), but that'd be quite a burden.

frameworklabs · October 31, 2020, 4:59pm

I also think that actors should not be re-entrant by default. That would be the safer behavior out of the box. Of course, now you have the problems with deadlocks. But I see it somehow equivalent to the issue of retain cycles - devs have to think about the dependencies in their code which is always a helpful thing.

Chris_Lattner3 · October 31, 2020, 5:16pm

I'm also a bit concerned about this - I think we'll need implementation and usage experience in practice to see if it works. However, there are good reasons why the proposed direction is theoretically better - not only does it define away deadlocks, it composes to large scale designs better. You won't run into problems where API evolution introduces deadlocks, and the communication pattern isn't implicitly a part of the API.

This composability seems like a real win, but I agree that this is a big bet in the current proposal. In my opinion, this is the most researchy/unproven part of the actor model proposal, but could be a huge breakthrough if it works well.

-Chris

John_McCall · October 31, 2020, 6:02pm

Introducing a new actor does not, in itself, add any new concurrency. In explicit work-queue systems like Dispatch, as well as many other actor systems, adding a queue/actor can have this effect because the default interaction with a queue is to add something to it to run asynchronously, which creates a new path of computation. But in Swift actors don't normally initiate work themselves, they simply act in response to requests by tasks, and tasks wait for their requests to complete by default. So the places that can "go wide" are the places that can create new tasks, not the places that can create actors.

This also creates an extremely important optimization opportunity, because the most basic operation in scheduling is not an always-asynchronous "dispatch" but a synchronous-to-the-task "switch". My intention is that the scheduler will attempt to follow the task across actor boundaries. That is, if a task calls a function on a different executor, and both the new and current executors are cooperative, then the scheduler will attempt to synchronously switch to processing the new executor. Only if there's contention — if the new executor is an already-running actor — will we actually suspend computation.

More broadly, we also hope to avoid some of the thread-explosion problems that can plague systems like Dispatch when many tasks are runnable at once. Since Swift will not need to block threads except on currently-running work (e.g. when waiting on a mutex), the scheduler does not need to spawn new threads in the hopes of unblocking the queue. We do not intend to offer or support any primitives (like traditional condition variables) that permit this kind of thread-blocking on unstructured future work; if you use them from an actor, you will be inviting deadlock.

tclementdev · October 31, 2020, 6:06pm

I've been there. We had written an heavily async program with actors-like entities (many classes with their own internal private queues used in every method). We ran into those re-entrancy/interleaving issues. Since everything was async in our program, our only solution was to suspend the internal queues while running our functions. We actually wrote a new queue interface that provided a completion handler to the work items so they could later indicate completion and resume the queue. We ended up using that in a lot of places because there was no way to be sure when interleaved calls might happen. Deadlocks were also a problem to some extent, I remember we had to implement timers to try and detect operations that would never be completed.

Which is why I think we're missing the real problem here: actors should probably be used very carefully at the right places and not others, not everything should be made into an actor and into an async interface. The biggest issue is that there's no easy way to tell how to do this other than careful considerations of the program concurrency. I'm afraid actors will not encourage developers to do the right thing here, quite the opposite. They are shiny objects that developers will want to use everywhere.

Lantua · October 31, 2020, 6:16pm

I think that sync function Actor is a pretty good encapsulation. That's where the intuitive guarantee lies (there will be no overlap, and each sync function will run to completion without interleaves).

What if we allow only sync functions on actors, but they must be called asynchronously from outside of that actor? So the sync part stays the same.

actor class X {
  func foo() { }
}

extension X {
  func bar() {
    foo() // ok
  }
}

Async methods become invalid for actor classes, and calling from outside of actor-isolated closures/functions makes the function async:

func foo(x: X) async {
  await x.foo()
}

To handle closure, we might need to split non-escaping into three types:

(unknown) () -> ()
@actorIsolated () -> ()
@actorIndependent () -> ()

with @actorIndependent being a subtype of @actorIsolated, and the actor-isolated are restricted to not overlap each other. The tricky part is how (unknown isolation) interact with actor-related closures. The current behaviour seems to be close to allowing conversion to-and-from (unknown) closures, which trades the safety for convenience (there will be some cases that's actually not actor isolated).

Pampel · October 31, 2020, 6:23pm

I'm not sure this would work, given that a call to an Actor has to be async wouldn't that mean actors couldn't call methods on other actors?

Enforcing no re-entrancy would solve this problem intuitively without constraints on synchronicity within an actor. Of course, that would have other issues.

Lantua · October 31, 2020, 6:34pm

That is be true . We need async on Actors for composability. It could become a paradigm that uses all sync functions as atomic functions (so back to status quo).

It would be interesting if actors can have container-child relationships, where the container actor can call children's functions. That'd be quite close to true encapsulation... but that's quite a tangent from here.

It'll most likely be intuitive, but I think it's also quite close to simply synchronizing everything, which explains the intuition.

tclementdev · October 31, 2020, 6:49pm

Yet actors force interfaces to go full async which will have major consequences on program designs as callers awaiting will need to be turned to async themselves. I've been there before, I've seen async slowly taking over my program by such contamination, but in the end it made little sense. I'm not even sure actors can be made to solve this problem, because the way I see it it's a program design problem that need the programmer to pause and think about what makes sense, and actors are not doing that, they're trying to provide a solution that doesn't involve the developer thinking about the design of the program. I wish I'm wrong and you can pull it off, I'm just dubious.

Chris_Lattner3 · October 31, 2020, 6:50pm

I agree that this is a critical problem and that solving it is a huge opportunity and will be a huge contribution for Swift concurrency. Unfortunately, "phase 1" of the plan here doesn't solve this. It is introducing a memory unsafe actor model akin to Akka actors. This is a very useful step in that it provides a design pattern to help structure concurrent code, but is not far enough IMO. Also, taking a half step here will introduce serious problems with getting to the memory isolation and race safety.

As I mentioned in the roadmap thread, we can pretty easily fix this. I will try to get an outline of this together today or tomorrow to share with the community. UPDATE: it's in this thread.

Here is a detailed review of this draft of the proposal. I include a few large topics that need detailed discussion on their own, then a number of smaller points at the end. I'm really thrilled to see the progress in this area!

`actor class` vs `actor`

Much of the discussion upthread is about actor class vs actor. I tend to agree with people that actors are primal enough to be worth burning a keyword on, here is some rationale:

The documentation and diagnostics will inevitably all talk about "actors" and not "actor classes", so it makes sense to align the language with this.
Actors can't subclass classes and visa-versa, they are a "different thing"
They are a "another kind" of reference type in Swift (along with classes, functions, unsafe pointers, etc).

At the very least, I would recommend capturing some of the tradeoffs in alternatives considered section. On the flip side, calling them actors means we would have to survey all of the places we use classes and reconsider them, e.g. class methods, how to rationalize actors subclassing NSObject (see below), etc. I think it is reasonable for actors to not have static members and class methods though, as the whole idea is to get rid of global state.

Separating access control from async for cross-actor reference validity checks

"Synchronous functions in Swift are not amenable to being placed on a queue to be executed later. Therefore, synchronous instance methods of actor classes are actor-isolated and, therefore, not available from outside the actor instance." ... "It should be noted that actor isolation adds a new dimension, separate from access control, to the decision making process whether or not one is allowed to invoke a specific function on an actor. " <== Please let's not do this!

I mentioned this to John previously, but it seems better to keep access control orthogonal to cross-actor reference issues. The proposed design will end up producing a lot of async wrappers for sync functions just to allow those sync functions being called across actor boundaries. There is no need for this boilerplate:

actor class BankAcount {
   .. state..

   // This method is useful both within and from outside the actor.
   public func computeThing() -> Int {
      ...
   }

   // I need to manually write a wrapper, and now I have a naming problem.  :-(
   public func computeThingForOthersToUse() async -> Int {
     return computeThing()
   }
}

Instead, I'd recommend make the model be that cross-actor calls are defended by access control like normal, and a cross actor call to a sync function is implicitly async (thus requiring an await at the call site):

   // some other actor can call the sync function, because it is public!
   await use(myBankAccount.computeThing())

The compiler would synthesize the thunk just like it does reabstraction thunks. This provides a more consistent programming model (not making our access control situation more complicated) and eliminates a significant source of boilerplate. Similarly (as part of the base async proposal), it should be possible to fulfill an async requirement in a protocol with a normal sync method implementation.

This realigns the async modifier on actor methods to be about the behavior of the method, not about whether it can be called by other actors, which is what access control is about.

Your deposit(amount:) example is a great illustration of the problem here: there is nothing about its behavior or implementation that leads to internally suspendable. Declaring it as async means that any intra-actor calls will have to await it for no reason.

Furthermore, doing this solves a significant amount of complexity elsewhere in the proposal: accesses to cross-actor state (whether it be let or var) is gated simply by access control. Any cross-actor access would be correctly async, and synchronization in the most trivial cases allowed by the proposal would be optimized out by the compiler using the as-if rule. This keeps the programmer model simple and consistent.

More related points in the "let" section next:

Cross actor `let` property access

I am very concerned about allowing direct cross-actor to let properties, because we don't have the ability to support computed let properties. Allowing this will harm our API evolution of properties: we currently allow things to freely move from let properties to vars with public getters, but this will break that. I don't think that "let-ness" is resilient across library boundaries at all right now (for good reason).

Furthermore, as you mention, reference types completely break the actor memory safety guarantees here, the entire stated purpose of this proposal. :-) You don't want cross-actor uses of this thing to have access to data your mutating within the reference type. You need something like the reference type proposal (which I'm hoping to work on) to gate this.

I feel like you're trying to walk an awkward line here, and I don't think it will work well: actors are supposed to be islands that can only be "talked to" asynchronously. The "let's and @actorIndependent things can be talked to synchronously" breaks the contract and muddles the water.

Overall, I would recommend subsetting this out of the initial proposal and discussing it as a later extension. It isn't core to the programming model, and introduces a lot of issues.

Global actors

On global actors in the detailed design section, I don't understand the writing and what is being conveyed here. There are both small and large examples of this. Some larger questions:

What does "The custom attribute type may be generic. " mean? Does this mean that @globalActor struct X<T> { is allowed? If so, the semantics are that there is one instance of the actor for each dynamic instantiation of the type T, right? I think that this is required because shared will be instanced multiple times.

This is a very powerful capability: is there a use case for it? If not, I'd recommend subsetting it out of the initial version of the proposal, it can always be added later.
I don't understand what this means: "Two global actor attributes identify the same global actor if they identify the same type." Don't they have to be lexically identical attributes if that is the case?
There are some implied semantics of a declaration being marked as a global actor, but I'm not sure what they are.
The whole discussion of "propagation" of the global actor attribute is vague and I find it to be confusing.

I would recommend splitting this whole discussion of global actors out to its own sub-proposal. The issues involved are complicated and could use its own motivation, examples, and exploration to develop it, and this is additive on top of the base actor model. To be clear, I'm not saying that we should adopt actors without solving this proposal, I just think that it would be easiest to review and discuss it as a separate thing.

Other

Some more minor comments and questions:

Writing/framing nitpick: "The primary difference is that actor classes protect their state from data races." --> I don't think this is the primary difference between actors and classes. The primary difference is that actors have a task/queue associated with them, and they are used as a design pattern in concurrent programs. Actors are not guaranteed to protect state, e.g. in the face of unsafe pointers.
The behavior with escaping closures and actor self makes sense to me.
The "Escaping reference types" section is really troubling as I mentioned at the top. I don't think that this proposal can stand alone without a solution to this problem.
Actor isolation also needs a solution for global state like global variables and static members of classes. I don't think the "proposed solution" section or "detailed design" touches on this at all.
Another writing issue: The discussion of @globalActor and @UIActor in the "proposed solution" section is too vague for me to understand it.
"As a special exception described in the complementary proposal Concurrency Interoperability with Objective-C, an actor class may inherit from NSObject." --> It isn't clear to me why this is needed. Isn't enough to mark the actor as @objc? I thought all @objc things already inherit from NSObject?
As I mentioned above, I think that the way you are conflating access control with cross-actor references is confusing and problematic. This @actorIndependent attribute is another example of this. I think this whole topic needs further consideration. Flipping the behavior as mentioned above seems like it would simplify the proposal significantly, by relying on our (already overly powerful) existing access control mechanisms.
Shouldn't the closure parameter to run be @escaping? If not, you can trivially violate the actor safety properties due to the self capture rules described earlier in the proposal.
On enqueue(partialTask:): I love that this is user definable. Why can't it be marked final? This seems like it should only be defined on root actors though. I'd love to see a longer exploration of this topic on its own, because just this single method has a huge set of tradeoffs that are worth exploring.
"Non-actor classes can conform to the Actor protocol, and are not subject to the restrictions above. This allows existing classes to work with some Actor-specific APIs, but does not bring any of the advantages of actor classes (e.g., actor isolation) to them." Ok, out of curiosity, why is this important? I can see the utility of having an actor protocol that unifies all the actors, but I don't see why it is useful for normal classes to conform. I also don't see any harm, just curious what the utility is.
As I mentioned a couple times above, I would rather not have @actorIndependent at all, I'd rather that cross-actor accesses be gated by normal access control, and any cross-actor reference just being async. This seems like it will lead to a simpler model, less boilerplate, and less language complexity.
I also don't think there is any great need to have actors be able to provide non-async protocol requirements. This seems directly counter to the approach of actors. Such a need can be handled with simple struct wrappers, which seems like it would factor the language complexity better.
I don't understand what is being conveyed in the "Overrides" section. An example would be very helpful.

Overall, I'm very very excited to see the progress on this. This is going to transform the face of Swift programming at large!

-Chris

JJJ · October 31, 2020, 6:54pm

My first thought was that developers can manage this interleaving just fine. And it could actually be a nice challenge But I fear bugs due to interleaving can be harder to detect and debug than deadlocks.

func f() async {
  guard ...something depending on the actor's state
  ...work relying on the guard
  g() // Call to ordinary sync function
  ...work relying on the guard
}

then some future maintainer makes g() async and fixes all call sites in a big sweep, so this becomes

func f() async {
  guard ...something depending on the actor's state
  ...work relying on the guard
  await g() // Call to async function
  ...work relying on the guard **ASYNC FAILURE**
}

This potential bug can be elusive, and potentially hard to debug.

michelf · October 31, 2020, 9:06pm

I don't think this is worse than a completion callback though. Consider this:

func f() async {
  guard ...something depending on the actor's state
  ...work relying on the guard
  g() { // Call to function doing async work, calling closure on completion
    ...work relying on the guard **ASYNC FAILURE**
  }
}

With the callback you don't really know from looking at the code whether it is synchronous or not. With await it's clear there's suspension point there. So in theory, if you can get accustomed to await, things become easier to read than what we have now.

I agree this potential bug can be elusive and potentially hard to debug. It's not really a new thing however. The function should recheck its invariants after each await, or each callback, if it still depends on them.

There's no solution to this if you want to things to stay asynchronous. If you block other tasks from running on the current actor, g() simply becomes a synchronous call.

JJJ · October 31, 2020, 9:46pm

Yes, g() becomes a synchronous call as seen from the actor; not otherwise. And that's the whole point of async/await in my opinion. That you map concurrent code to a simplified sequential view where it's easy for humans to reason about what happens. The actor isolation proposed here is sort of the missing link that makes async/await work as it should

It would be awesome if both modes were supported initially, though. Either by having a more verbose way to introduce a suspension point where reentrancy is allowed, or by somehow marking an atomic block of code where async calls will not allow reentrancy.

Karl · November 1, 2020, 4:59am

FYI class constraints were merged with AnyObject in SE-0156, and AnyObject is now the preferred spelling.

frameworklabs · November 1, 2020, 11:04am

Thinking about this interleaving issue at suspension points - maybe an await could throw or somehow indicate to the caller when interleaved partial task changed common state in order to refetch state or change their computation?

Lantua · November 1, 2020, 2:15pm

That sounds complicated. If we need to check for mutation after every partial task, we might as well assume that they always mutate. And there's no distinction between an incomplete mutation (that needs guarding) and a completed one, is there?

It may be useful to have an explicit barrier, but that is no different from synchronously wait for the async task, which should already be provided in some form.

+1. The touchEnded example also has similar problem:

@UIActor func touchEnded(...) { ... }

@UIActor func touchEndedAsync(...) async {
  touchEnded()
}

Having both versions is of little use. You have essentially only one choice in any scenario (different actor, same sync actor, etc). It's all busy work here even if we use the same name* for both sync and async functions. While touchEnded is an event handler and probably is't called directly, ui-related functions like updateUserUI is also a good case.

* Overloading doesn't help here. The compiler could misinterpret as calling async version, which it currently does, causing infinite-loop.