[Concurrency] Actors & actor isolation

Douglas_Gregor · October 30, 2020, 5:33pm

Hello all,

One of the main parts of the Swift Concurrency model is actors, which provide a mechanism for isolating state in concurrent programs to eliminate data races. This proposal introduces actors and actor isolation into Swift. It is part of the Swift Concurrency effort detailed in the roadmap thread.

This post has a subset of the proposal. The latest, full version of the proposal is always available here.

Doug

Actors

Introduction

The actor model involves entities called actors. Each actor can perform local computation based on its own state, send messages to other actors, and act on messages received from other actors. Actors run independently, and cannot access the state of other actors, making it a powerful abstraction for managing concurrency in language applications. The actor model has been implemented in a number of programming languages, such as Erlang and Pony, as well as various libraries like Akka (on the JVM) and Orleans (on the .NET CLR).

This proposal introduces a design for actors in Swift, providing a model for building concurrent programs that are simple to reason about and are safer from data races.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

One of the more difficult problems in developing concurrent programs is dealing with data races. A data race occurs when the same data in memory is accessed by two concurrently-executing threads, at least one of which is writing to that memory. When this happens, the program may behave erratically, including spurious crashes or program errors due to corrupted internal state.

Data races are notoriously hard to reproduce and debug, because they often depend on two threads getting scheduled in a particular way.
Tools such as ThreadSanitizer help, but they are necessarily reactive (as opposed to proactive)--they help find existing bugs, but cannot help prevent them.

Actors provide a model for building concurrent programs that are free of data races. They do so through data isolation: each actor protects is own instance data, ensuring that only a single thread will access that data at a given time. Actors shift the way of thinking about concurrency from raw threading to actors and put focus on actors "owning" their local state. This proposal provides a basic isolation model that protects the value-type state of an actor from data races. A full actor isolation model, which protects other state (such as reference types) is left as future work.

Proposed solution

Actor classes

This proposal introduces actor classes into Swift. An actor class is a form of class that protects access to its mutable state, and is introduced with "actor class":

actor class BankAccount {
  private let ownerName: String
  private var balance: Double
}

Actor classes behave like classes in most respects: the can inherit (from other actor classes), have methods, properties, and subscripts. They can be extended and conform to protocols, be generic, and be used with generics.

The primary difference is that actor classes protect their state from data races. This is enforced statically by the Swift compiler through a set of limitations on the way in which actors and their members can be used, collectively called actor isolation.

Actor isolation

Actor isolation is how actors protect their mutable state. For actor classes, the primary mechanism for this protection is by only allowing their stored instance properties to be accessed directly on self. For example, here is a method that attempts to transfer money from one account to another:

extension BankAccount {
  enum BankError: Error {
    case insufficientFunds
  }
  
  func transfer(amount: Double, to other: BankAccount) throws {
    if amount > balance {
      throw BankError.insufficientFunds
    }

    print("Transferring \(amount) from \(ownerName) to \(other.ownerName)")

    balance = balance - amount
    other.balance = other.balance + amount  // error: actor-isolated property 'balance' can only be referenced on 'self'
  }
}

If BankAccount were a normal class, the transfer(amount:to:) method would be well-formed, but would be subject to data races in concurrent code without an external locking mechanism. With actor classes, the attempt to reference other.balance triggers a compiler error, because balance may only be referenced on self.

As noted in the error message, balance is actor-isolated, meaning that it can only be accessed from within the specific actor it is tied to or "isolated by". In this case, it's the instance of BankAccount referenced by self. Stored properties, computed properties, subscripts, and synchronous instance methods (like transfer(amount:to:)) in an actor class are all actor-isolated by default.

On the other hand, the reference to other.ownerName is allowed, because ownerName is immutable (defined by let). Once initialized, it is never written, so there can be no data races in accessing it. ownerName is called actor-independent, because it can be freely used from any actor. Constants introduced with let are actor-independent by default; there is also an attribute @actorIndependent (described in a later section) to specify that a particular declaration is actor-independent.

Note: Constants defined by let are only truly immutable when the type is a value type or some kind of immutable reference type. A let that refers to a mutable reference type (such as a non-actor class type) would be unsafe based on the rules discussed so far. These issues are discussed in a later section on "Actor isolation".

Compile-time actor-isolation checking, as shown above, ensures that code outside of the actor does not interfere with the actor's mutable state.

Asynchronous function invocations are turned into enqueues of partial tasks representing those invocations to the actor's queue. This queue--along with an exclusive task executor bound to the actor--functions as a synchronization boundary between the actor and any of its external callers. For example, if we wanted to make a deposit to a given bank account account, we could make a call to a method deposit(amount:), and that call would be placed on the queue. The executor would pull tasks from the queue one-by-one, ensuring an actor never is concurrenty running on multiple threads, and would eventually process the deposit.

Synchronous functions in Swift are not amenable to being placed on a queue to be executed later. Therefore, synchronous instance methods of actor classes are actor-isolated and, therefore, not available from outside the actor instance. For example:

extension BankAccount {
  func depositSynchronously(amount: Double) {
    assert(amount >= 0)
    balance = balance + amount
  }
}

func printMoney(accounts: [BankAccount], amount: Double) {
  for account in accounts {
    account.depositSynchronously(amount: amount) // error: actor-isolated instance method 'depositSynchronously(amount:)' can only be referenced inside the actor
  }
}

It should be noted that actor isolation adds a new dimension, separate from access control, to the decision making process whether or not one is allowed to invoke a specific function on an actor. Specifically, synchronous functions may only be invoked by the specific actor instance itself, and not even by any other instance of the same actor class.

All interactions with an actor (other than the special cased access to constants) must be performed asynchronously (semantically one may think about this as the actor model's messaging to and from the actor). Asynchronous functions provide a mechanism that is suitable for describing such operations, and are explained in depth in the complementary async/await proposal. We can make the deposit(amount:) instance method async, and thereby make it accessible to other actors (as well as non-actor code):

extension BankAccount {
  func deposit(amount: Double) async {
    assert(amount >= 0)
    balance = balance + amount
  }
}

Now, the call to this method (which now must be adorned with await) is well-formed:

await account.deposit(amount: amount)

Semantically, the call to deposit(amount:) is placed on the queue for the actor account, so that it will execute on that actor. If that actor is busy executing a task, then the caller will be suspended until the actor is available, so that other work can continue. See the section on asynchronous calls in the async/await proposal for more detail on the calling sequence.

Rationale: by only allowing asynchronous instance methods of actor classes to be invoked from outside the actor, we ensure that all synchronous methods are already inside the actor when they are called. This eliminates the need for any queuing or synchronization within the synchronous code, making such code more efficient and simpler to write.

We can now properly implement a transfer of funds from one account to another:

extension BankAccount {
  func transfer(amount: Double, to other: BankAccount) async throws {
    assert(amount > 0)
    
    if amount > balance {
      throw BankError.insufficientFunds
    }

    print("Transferring \(amount) from \(ownerName) to \(other.ownerName)")

    // Safe: this operation is the only one that has access to the actor's local
    // state right now, and there have not been any suspension points between
    // the place where we checked for sufficient funds and here.
    balance = balance - amount
    
    // Safe: the deposit operation is queued on the `other` actor, at which 
    // point it will update the other account's balance.    
    await other.deposit(amount: amount)
  }
}

Closures and local functions

The restrictions on only allowing access to (non-async) actor-isolated declarations on self only work so long as we can ensure that the code in which self is valid is executing non-concurrently on the actor. For methods on the actor class, this is established by the rules described above: async function calls are serialized via the actor's queue, and non-async calls are only allowed when we know that we are already executing (non-concurrently) on the actor.

However, self can also be captured by closures and local functions. Should those closures and local functions have access to actor-isolated state on the captured self? Consider an example where we want to close out a bank account and distribute the balance amongst a set of accounts:

extension BankAccount {
  func close(distributingTo accounts: [BankAccount]) async {
    let transferAmount = balance / accounts.count

    accounts.forEach { account in 
      balance = balance - transferAmount             // is this safe?
      Task.runDetached {
        await account.deposit(amount: transferAmount)
      }  
    }
    
    thief.deposit(amount: balance)
  }
}

The closure is accessing (and modifying) balance, which is part of the self actor's isolated state. Once the closure is formed and passed off to a function (in this case, Sequence.forEach), we no longer have control over when and how the closure is executed. On the other hand, we "know" that forEach is a synchronous function that invokes the closure on successive elements in the sequence. It is not concurrent, and the code above would be safe.

If, on the other hand, we used a hypothetical parallel for-each, we would have a data race when the closure executes concurrently on different elements:

accounts.parallelForEach { account in 
  self.balance = self.balance - transferAmount    // DATA RACE!
  await account.deposit(amount: transferAmount)
}

In this proposal, a closure that is non-escaping is considered to be isolated within the actor, while a closure that is escaping is considered to be outside of the actor. This is based on a notion of when closures can be executed concurrently: to execute a particular closure on a different thread, one will have to escape the closure out of its current thread to run it on another thread. The rules that prevent a non-escaping closure from escaping therefore also prevent them from being executed concurrently.

Based on the above, parallelForEach would need its closure parameter will be @escaping. The first example (with forEach) is well-formed, because the closure is actor-isolated and can access self.balance. The second example (with parallelForEach) will be rejected with an error:

error: actor-isolated property 'balance' is unsafe to reference in code that may execute concurrently

Note that the same restrictions apply to partial applications of non-async actor-isolated functions. Given a function like this:

extension BankAccount {
  func synchronous() { }
}

The expression self.synchronous is well-formed only if it is the direct argument to a function whose corresponding parameter is non-escaping. Otherwise, it is ill-formed because it could escape outside of the actor's context.

inout parameters

Actor-isolated stored properties can be passed into synchronous functions via inout parameters, but it is ill-formed to pass them to asynchronous functions via inout parameters. For example:

func modifiesSynchronously(_: inout Double) { }
func modifiesAsynchronously(_: inout Double) async { }

extension BankAccount {
  func wildcardBalance() async {
    modifiesSynchronously(&balance)        // okay
    await modifiesAsynchronously(&balance) // error: actor-isolated property 'balance' cannot be passed 'inout' to an asynchronous function
  }
}

This restriction prevents exclusivity violations where the modification of the actor-isolated balance is initiated by passing it as inout to a call that is then suspended, and another task executed on the same actor then fails with an exclusivity violation in trying to access balance itself.

Escaping reference types

The rules concerning actor isolation ensure that accesses to an actor class's stored properties cannot occur concurrently, eliminating data races unless unsafe code has subverted the model.

However, the actor isolation rules presented in this proposal are only sufficient for value types. With a value type, any copy of the value produces a completely independent instance. Modifications to that independent instance cannot affect the original, and vice versa. Therefore, one can pass a copy of an actor-isolated stored property to another actor, or even write it into a global variable, and the actor will maintain its isolation because the copy is distinct.

Reference types break the isolation model, because mutations to a "copy" of a value of reference type can affect the original, and vice versa. Let's introduce another stored property into our bank account to describe recent transactions, and make Transaction a reference type (a class):

class Transaction { 
  var amount: Double
  var dateOccurred: Date
}

actor class BankAccount {
  // ...
  private var transactions: [Transaction]
}

The transactions stored property is actor-isolated, so it cannot be modified directly. Moreover, arrays are themselves value types when they contain value types. But the transactions stored in the array are reference types. The moment one of the instances of Transaction from the transactions array escapes the actor's context, data isolation is lost. For example, here's a function that retrieves the most recent transaction:

extension BankAccount {
  func mostRecentTransaction() async -> Transaction? {   // UNSAFE! Transaction is a reference type
    return transactions.min { $0.dateOccurred > $1.dateOccurred } 
  }
}

A client of this API gets a reference to the transaction inside the given bank account, e.g.,

guard let transaction = await account.mostRecentTransaction() else {
  return
}

At this point, the client can both modify the actor-isolated state by directly modifying the fields of transaction, as well as see any changes that the actor has made to the transaction. These operations may execute concurrently with code running on the actor, causing race conditions.

Not all examples of "escaping" reference types are quite as straightforward as this one. Reference types can be stored within structs, enums, and in collections such as arrays and dictionaries, so cannot look only at whether the type or its generic arguments are a class. The reference type might also be hidden in code not visible to the user, e.g.,

public struct LooksLikeAValueType {
  private var transaction: Transaction  // not semantically a value type
}

Generics further complicate the matter: some types, like the standard library collections, act like value types when their generic arguments are value types. An actor class might be generic, in which case its ability to maintain isolation depends on its generic argument:

actor class GenericActor<T> {
  private var array: [T]
  func first() async -> T? { 
    return array.first
  }
}

With this type, GenericActor<Int> maintains actor isolation but GenericActor<Transaction> does not.

There are solutions to these problems. However, the scope of the solutions is large enough that they deserve their own separate proposals. Therefore, this proposal only provides basic actor isolation for data race safety with value types.

Global actors

What we’ve described as actor isolation is one part of a larger problem of data isolation. It is important that all memory be protected from data races, not just memory directly associated with an instance of an actor class. Global actors allow code and state anywhere to be actor-isolated to a specific singleton actor. This extends the actor isolation rules out to annotated global variables, global functions, and members of any type or extension thereof. For example, global actors allow the important concepts of "Main Thread" or "UI Thread" to be expressed in terms of actors without having to capture everything into a single class.

Global actors provide a way to annotate arbitrary declarations (properties, subscripts, functions, etc.) as being part of a process-wide singleton actor. A global actor is described by a type that has been annotated with the @globalActor attribute:

@globalActor
struct UIActor {
  /* details below */
}

Such types can then be used to annotate particular declarations that are isolated to the actor. For example, a handler for a touch event on a touchscreen device:

@UIActor
func touchesEnded(_ touches: Set<UITouch>, with event: UIEvent?) {
  // ...
}

A declaration with an attribute indicating a global actor type is actor-isolated to that global actor. The global actor type has its own queue that is used to perform any access to mutable state that is also actor-isolated with that same global actor.

Global actors are implicitly singletons, i.e. there is always one instance of a global actor in a given process. This is in contrast to actor classes which can have none, one or many specific instances exist at any given time.

Detailed design

Available in the full proposal

schuett · October 30, 2020, 6:27pm

I have written a lot of Erlang code (GitHub - scalaris-team/scalaris: Scalaris, a distributed, transactional key-value store). What is unique of Erlang: Actors resp. user-space processes can fail. This is not possible in other languages resp. libraries.

Jumhyn · October 30, 2020, 6:41pm

Super excited to see this reach the pitch stage. Enormous congratulations to all those who have been working on this behind the scenes. From a first pass, this all seems to be pretty intuitive to me and (in tandem with async/await) definitely addresses many of my personal pain-points with concurrency in Swift. Following are some of my initial questions about this proposal's specifics!

Should there be a requirement that non-async methods on an actor class be private? We'd obviously still need the special diagnostic for when such methods are referenced on non-self instances from within the actor itself, but it might help to solidify the model in the minds of users that synchronous methods on actors are not appropriate as API surface.

This section purports to address closures and local functions, but I was left not entirely clear on what the rules are for local functions. Are they always treated as escaping closures, or no?

What about passing actor-isolated properties to inout parameters on async methods on the actor class itself? I think this would have to be disallowed, since from within that method the actor class would not have the necessary context to keep the parameter from escaping to an asynchronous context—is that correct?

The terminology in this section is confusing me. My understanding is that "value type" and "reference type" are simple descriptions of what sort of declaration defines the type in question (struct/enum and class, respectively). This section seems to be using those terms to refer to concepts that I usually see referred to as "value semantics" and "reference semantics."

This sentence in particular:

Seems to contradict the TSPL section on "value types":

All structures and enumerations are value types in Swift.

Overall, I can't wait to see these features make their way through the evolution process. Thanks again to everyone involved in the effort so far!

ETA: I've also opened a PR with some minor spelling/punctuation/wording adjustments to the proposal text.

John_McCall · October 30, 2020, 6:56pm

It's possible that we may want to allow synchronous methods on an actor to be used from outside the actor in certain ways. For example, we could allow an implicit hop over to the actor to call a synchronous method, as long as you're in an async context and you await it. We may also be able to infer that a closure is meant to be an actor function for a particular non-self actor. There's exploration to be done here to see what we can reasonably do.

Local functions are something of a weak point in Swift even without actors. I believe the current rule is that they're treated as escaping unless they capture something that cannot escape, like an inout parameter. Arguably it ought to be based on how they're used.

Joe_Groff · October 30, 2020, 6:58pm

SILGen does in fact emit closures over local functions based on how they're used, but I think our type checking still treats them as always escaping.

Douglas_Gregor · October 30, 2020, 7:38pm

No, because actor isolation checking is orthogonal to normal access control. It's reasonable to extend an actor class in module (say, by adding a new async method), and make use of public synchronous APIs in that new method.

The same problem that's described in the the pitch applies here: when you call an async method, you might suspend. This can even happen if you're calling the method on self if, e.g., someone else has enqueued a higher-priority task on that actor.

Hmm. I think of value types as "types that have value semantics" and consider the TSPL's use to be incorrect. However, I can see how this proposal is confusing in that regard... and can move toward "value semantics" terminology.

Doug

Lantua · October 30, 2020, 7:45pm

Wouldn't we need a separated axis for that? In particular, how would it work with DispatchQueue.concurrentPerform which is concurrent (and so racing) and non-escaping.

Douglas_Gregor · October 30, 2020, 7:52pm

We might need to introduce yet another kind of function type for "concurrent, non-escaping". There's more design to do here, and we need to be mindful of the amount of complexity it might introduce.

Doug

Lantua · October 30, 2020, 7:54pm

Ok, so it's probably better to be defensive here, i.e., the current proposal.

John_McCall · October 30, 2020, 7:59pm

Right. We've thought through some of the consequences here, but we don't want to over-complicate the first set of proposals, which can stand on their own as real progress even without a complete thread-safety story.

Lantua · October 30, 2020, 8:26pm

Actor.enqueue(partialTask:) is a protocol requirement, while Actor.run(operation:) is protocol extension.

Shouldn't it be the other way around, or both be extensions? AFAICT, PartialAsyncTask is completely opaque, so overriding enqueue wouldn't provide any benefit.

OTOH, I can see an actor delegating run onto another (dynamic) actor.

The enqueue(partialTask:) requirement is special in that it can only be provided in the primary actor class declaration (not an extension), and cannot be final .

Could someone explain why having it be non-final would is required? I'm not sure why this would be different from having no-one overriding it. Maybe this is related to the previous question?

Should global actor be a protocol, instead of type annotation? It doesn't seem to require complex requirement, like property wrapper does, and we already need to define another protocol (Actor) anyway.

Or maybe we can even have it as a single global declaration:

// top-level
@globalActor let uiActor = SomeActorInstance.

We may be able to have actor imply class, but I think this is about a good balance.

John_McCall · October 30, 2020, 8:38pm

Enqueuing a partial task is the primitive, low-level operation that we want all executors to provide. The partial task fully encapsulates the unit of work that needs to be scheduled without any added overhead. Wrapping that up as an ordinary first-class function value would introduce a lot of overhead: partial tasks are one-shot and self-consuming, function values are not. It would also be semantically problematic because a call to an async function always happens as part of a task, but an executor itself is not a task; the async function would have to ignore its given context and introduce the appropriate task context.

John_McCall · October 30, 2020, 8:43pm

Ideally, I would like global actors to be global (singleton) instances of an implicit corresponding class, so that you wrote something like:

public global actor UIActor {
  func enqueue(...) {}
}

and the identifier UIActor would (in most contexts) resolve as a reference to the singleton instance of the UIActor class. I think that is much cleaner. It is also, however, a whole feature in and of itself.

Lantua · October 30, 2020, 8:59pm

PartialAsyncTask needs to stay as is for performance reason, got it.

What about execute being protocol requirement, and not separated extension, and other non-final shenanigan? It feels like this is more appropriate to be in an extension:

extension Actor {
  func execute(...)

or even global function:

func execute(partialTask: ..., using actor: ...)

And run seems like it should be overridable:

protocol Actor {
  func run(...)
}

John_McCall · October 30, 2020, 9:04pm

enqueue needs to be the customization point, or else the general enqueue has to forward to something.

The details of turning the function passed to run into a partial task and enqueuing it are not actually interesting to customize in an executor implementation.

beccadax · October 30, 2020, 9:05pm

I'm pretty sure we do need another value here for "actor-confined"; withoutActuallyEscaping(_:do:)'s documentation explicitly discusses using it to run non-escaping closures concurrently, and changing our minds about that being legal seems tantamount to a source break.

wear_here · October 30, 2020, 9:35pm

Copying a discussion from the roadmap thread per suggestion from @Lantua:

If these accesses are disallowed, from the "First Phase: Basic Actor Isolation" example in the roadmap thread:

    // error: an actor cannot access another's mutable state
    otherActor.mutableArray += ["not allowed"]

    // error: either reading or writing
    print(other.mutableArray.first)

What is the meaning of mutableArray being declared internal?

I see how this proposal adds an axis beyond access control—mutableArray is restricted even beyond what private would signify:

synchronous functions may only be invoked by the specific actor instance itself, and not even by any other instance of the same actor class.

(my emphasis)

What I am wondering is if these axes are orthogonal: is it at all meaningful that mutableArray is internal here? I'm not sure that access control modifiers really matter at all for actor state. Perhaps they would matter if the state was annotated @actorIndependent? I wonder if the language or the developer tools might clarify this at all.

Jumhyn · October 30, 2020, 9:44pm

At the very least, access control modifiers will affect the visibility of declarations from extensions, which are actor-isolated.

Lantua · October 30, 2020, 9:46pm

AFAICT, public actor-dependent and private actor-independent both make sense:

actor class X {
  public var dependent: ...
  @actorIndependent private var independent: ...
}

/// Same File
func foo(x: X) {
  x.dependent // error: actor isolation
  x.independent // ok
}

/// Separate module
extension X {
  func bar() {
    x.dependent // ok
    x.independent // error: access control
  }
}

Looks pretty orthogonal to me.

wear_here · October 30, 2020, 7:44pm

If these accesses are disallowed, from the "First Phase: Basic Actor Isolation" example:

    // error: an actor cannot access another's mutable state
    otherActor.mutableArray += ["not allowed"]

    // error: either reading or writing
    print(other.mutableArray.first)

What is the meaning of MyActor having declared access to mutableArray to be internal? Is mutable actor state automatically considered private?