Swift Concurrency Roadmap

Ben_Cohen · October 30, 2020, 5:18pm

A PR with this document can be found here

Swift Concurrency Roadmap

Our goal is to make concurrent programming in Swift convenient, efficient, and safe.

This document outlines a number of proposed additions and changes to the language to achieve this, through the implementation of asynchronous functions and actors. These additions will be proposed separately, but they will in many cases depend on each other. This document serves to join them together. Unlike a manifesto, which might describe multiple possible directions and in some cases unlikely directions, this document describes a single intended plan for tackling concurrency in Swift.

The end state of these changes will:

make asynchronous programming convenient and clear at the point of use,
provide a standard set of language tools and techniques that Swift developers can follow,
improve the performance of asynchronous code through better knowledge at compile time, and
eliminate data races and deadlocks in the same way Swift eliminates memory unsafety.

The introduction of these features will span multiple Swift releases. Features will be introduced in broadly two phases. The first phase introduces the async syntax and actor types; this will allow users to organize their code around actors in a way that will reduce, but not eliminate, data races. The second phase will enforce full actor isolation, eliminating data races, along with number of features to allow efficient and ergonomic interoperation of actors needed to make that isolation practical.

As a roadmap, this document does not go into the same level of detail as will those proposals. It also discusses features for the second phase, but detailed proposals for this area will wait until after the first phase is better defined.

There are a number of other related topics not covered in this document, such as asynchronous streams, parallel for loops, and distributed actors. Many of these features complement what is described in this roadmap and may be introduced at any point.

Motivating Example

The basic patterns that we encourage for concurrency today are good: we tell people to protect their data with queues instead of locks, and to return the results of slow operations with asynchronous callbacks instead of blocking a thread.

But doing these things manually is awful and error-prone. Consider the following snippet of code that demonstrates these patterns:

internal func refreshPlayers(completion: (() -> Void)? = nil) {
    refreshQueue.async {
        self.gameSession.allPlayers { players in
            self.players = players.map(\.nickname)
            completion?()
        }
    }
}

There are 3 observations we can make about this code:

There’s too much ceremony. This function is fundamentally just calling a function, transforming the result, and assigning it to a property. But there’s so much extra work involved with the queue and the completion handlers that it’s hard to see that.
This extra ceremony makes it easy to introduce bugs. There is a direct assignment to the self.players property in the completion handler. What thread is that on? It is unclear. This is a potential data race: the callback may need to dispatch back to the right queue before it does the assignment. Maybe this is handled by allPlayers, but we cannot reason locally about whether this code is thread-safe.
This code is needlessly inefficient. Several function objects need to be separately allocated. References like self which are used by these functions must be copied into them, requiring extra reference-count operations. The functions may run multiple times or not at all, often preventing the compiler from avoiding these copies.

Furthermore, these problems are inescapably tied together. Asynchronous callbacks are always eventually run exactly once, which means they cannot participate in a permanent reference cycle. Since Swift doesn't know this, it requires self to be explicit in the closures. Some programmers respond to this by reflexively adding [weak self], increasing both the runtime overhead and the ceremony of the callback, since it must now handle the possibility of self being nil. Frequently such functions immediately return when self is nil, making it harder to reason about their correctness since arbitrary amounts of code may have been skipped.

So the patterns shown here are good, but expressing them in Swift loses important structure and creates problems. The solution is to bring those patterns into the language. That will reduce boilerplate and let the language make the patterns safe, eliminating bugs and giving programmers the confidence to use concurrency more pervasively. It will also give us a chance to improve the performance of concurrent code.

Here is the above code rewritten using our proposed new syntax:

internal func refreshPlayers() async {
  players = await gameSession.allPlayers().map(\.nickname)
}

Things to note about this example:

refreshPlayers is now an async function.
allPlayers is also an async function, and returns its result instead of passing it into a completion handler.
Because of this, we can use expression composition to call the map function directly on the returned value.
The await keyword appears before the expression calling allPlayers, indicating that the refreshPlayers function can become suspended at this point.
await works similarly to try, in that it need only appear once at the start of an expression that can suspend, not directly in front of every call that can suspend within that expression.
The explicit self. has been eliminated from property accesses, because no escaping closures capturing self are required.
The accesses to the properties allPlayers and players now cannot have data races.

To understand how the the last point is achieved we must step out a layer and look at how queues are used to protect state.

The original code was a method on a class that used refreshQueue to protect its internal state:

class PlayerRefreshController {
  var players: [String] = []
  var gameSession: GameSession
  var refreshQueue = DispatchQueue(label: "PlayerRefresh")
    
  func refreshPlayers(completion: (() -> Void)? = nil) { 
    ... 
  }
}

This is a common pattern: a class with a private queue and some properties that should only be accessed on the queue. We replace this manual queue management with an actor class:

actor class PlayerRefreshController {
  var players: [String] = []
  var gameSession: GameSession

  func refreshPlayers() async { ... }
}

Things to note about this example:

Declaring a class to be an actor is similar to giving a class a private queue and synchronizing all access to its private state through that queue.
Because this synchronization is now understood by the compiler, you cannot forget to use the queue to protect state: the compiler will ensure that you are running on the queue in the class's methods, and it will prevent you from accessing the state outside those methods.
Because the compiler is responsible for doing this, it can be smarter about optimizing away synchronization, like when a method starts by calling an async function on a different actor.

Having this static relationship between the actor and its functions and properties lets us enforce the isolation of data to an actor and define away data races. We statically know whether we’re in a context that can safely access an actor’s properties, and if not, the compiler handles switching into that context.

Above we've shown an actor class, where you’ve got a tightly-encapsulated set of properties and code. But the way we do UI programming today often spreads code across a large number of classes that you’re supposed to use from a single main thread. That main thread is still a kind of actor — it’s what we call a global actor.

You can mark classes and functions as being tied to that actor with an attribute. The compiler will let you reference this class from anywhere, but to actually call this method, you need to be on the UI actor. So, if it was appropriate for all the actions of PlayerRefreshController to be performed on the global UI actor, we would represent it like this:

@UIActor
class PlayerRefreshController {
  var players: [String] = []
  var gameSession: GameSession
    
  func refreshPlayers() async {  ...  }
}

Proposals for the First Phase

We will be pitching the following proposals in the coming weeks in support of the first phase:

async/await introduces a coroutine-based async/await model to Swift. Functions will opt into to being async, and can then await the results of other async functions, allowing asynchronous code to be expressed in a more natural "straight-line" form. Pitch here, proposal here.
Task API and Structured Concurrency introduces the concept of a task to the standard library. This will cover APIs to create detached tasks, task "nurseries" for dynamically creating child tasks, and the mechanics for cancellation and prioritization of tasks. It also introduces scope-based mechanisms for awaiting values from multiple child tasks, based on the principles of structured concurrency. Pitch here, proposal here.
Actors & Actor Isolation describes the actor model, which provides state isolation for concurrent programs. This provides the foundation for actor isolation, the mechanism through which potential for data races will be eliminated. This first phase proposal will introduce partial actor isolation, leaving full isolation enforcement to a subsequent proposal. Pitch here, proposal here.
Concurrency Interoperability with Objective-C introduces automated bridging between Swift's concurrency features (e.g., async functions) and the convention-based expression of asynchronous functions in Objective-C. This will allow existing asynchronous Objective-C APIs to be immediately usable with Swift's concurrency model by providing an alternate Swift translation of the API into an async function, alongside the callback-based version. Pitch here, proposal here.
Async handlers introduces the ability to declare a synchronous actor function as an asynchronous handler. These functions behave externally like a synchronous function, but internally are handled like an asynchronous function. This allows traditional “notification” methods, such as those on UITableViewDelegate, to perform asynchronous operations without cumbersome setup.

Actor Isolation and The Second Phase

The goal is for Swift to prevent data races on mutable state by default. The system through which this is achieved is called actor isolation, both because actors are centrally important to how the system works and because the system is centrally concerned with preventing actor-protected state from being accessed outside the actor. However, actor isolation restricts code even when actors aren’t directly involved, when it is necessary for the correctness of the system under concurrency.

We intend to introduce the features described in this roadmap in two phases: first by introducing the ability to create async functions and actors; and second, by enforcing full actor isolation.

The basic idea of actor isolation is similar to the idea of exclusive access to memory, and builds upon it. Swift’s concurrency design aims to provide an easy-to-use and composable approach to safe concurrency by starting from the natural isolation of actors and then using ownership as a complementary tool.

The problem of actor isolation reduces to the problem of ensuring that all ordinary mutable memory is accessed only by a particular actor or task. This in turn reduces to an analysis of how the memory is accessed and who can access it in the first place. We can categorize memory into a few groups:

Properties of an actor will be protected by the actor.
Immutable memory (such as a let constant), local memory (such as a local variable that’s never captured), and value component memory (such as a properties of a struct, or an enum case), are already protected from data races.
Unsafe memory (such as an arbitrary allocation referenced by an UnsafeMutablePointer) is associated with an unsafe abstraction. It’s actively undesirable to try to force these abstractions to be used safely, because these abstractions are meant to be usable to bypass safe language rules when necessary. Instead, we have to trust the programmer to use these correctly.
Global memory (such as a global or static variable) can in principle be accessed by any code anywhere, so is subject to data races.
Class component memory can also be accessed from any code that hold a reference to the class. This means that while the reference to the class may be protected by an actor, passing that reference between actors exposes its properties to data races. This also includes references to classes held within value types, when these are passed between actors.

The goal of full actor isolation is to ensure that these last two categorizations are protected by default.

First Phase: Basic Actor Isolation

The first phase introduces safety benefits. Users will be able to protect global variables with global actors, and to protect class members by converting them to actor classes. Frameworks that require access on a particular queue can define a global actor and default protocols to it.

A number of important cases of actor isolation will be enforced in this stage:

actor class MyActor {
  let immutable: String = "42"
  var mutableArray: [String] = []

  func synchronousFunction() {
    mutableArray += ["syncFunction called"]
  }
}

extension MyActor {

  func asyncFunction(other: MyActor) async {
    // allowed: an actor can access its internal state, even in an extension
    self.mutableArray += ["asyncFunction called"]
  
    // allowed: immutable memory can be accessed from outside the actor
    print(other.immutable)

    // error: an actor cannot access another's mutable state
    otherActor.mutableArray += ["not allowed"]

    // error: either reading or writing
    print(other.mutableArray.first)
    
    // allowed: async functions can call async functions on other actors
    await other.asyncFunction(otherActor: self)
    
    // error: only asynchronous functions can be called from outside the actor
    other.synchronousFunction()    
  }
}

These enforcements are non source breaking because actors and async functions are a new feature.

Second Phase: Full Actor Isolation

Even after the introduction of actors, there will still exist the possibility for race conditions, through global variables and values of reference type:

var racyGlobal: [String] = []

@MyGlobalActor
var safeGlobal: [String] = []

class PlainOldClass {
  var unprotectedState: String = []
}

actor class RacyActor {
  let immutableClassReference: PlainOldClass

  func racyFunction(other: RacyActor) async {
    // protected: global variable protected by a global actor
    safeGlobal += ["Safe access"]
  
    // unprotected: global variable not in an actor
    racyGlobal += ["Racy access"]
    
    // unprotected: racyProperty is immutable, but it is a reference type
    // so it allows access to unprotected shared mutable type
    other.takeClass(immutableClassReference)
  }
  
  func takeClass(_ plainClass: PlainOldClass) {
    plainClass.unprotectedState += ["Racy access"]  
  }
}

In the first phase, we intend to preserve Swift's current default behavior: global variables and class component memory are not protected from data races. "Actor unsafe" is therefore the default for this memory. Because this is the default of current Swift, enabling this first phase is non-source breaking.

In the second phase, the introduction of further features will provide a full set of tools for working with fully isolated actors. The most important of these is the ability to restrict a type to be "actor local". When a type is marked actor local, the compiler will prevent it from being passed between actors. Instead, the reference would need to be somehow cloned/unshared before being passed across a boundary.

This in turn will allow the defaults to be changed:

Global variables will be required to be protected by a global actor, or marked "actor unsafe".
Classes (and types that contain class references) will change from being from being "actor unsafe" by default to being "actor local".

This change in default will necessitate a source break, and will need to be gated by a language mode. Code that touches a mutable global variable or shares a class reference across actor boundaries fundamentally cannot be shown to be safe from data races, and will need to change to ensure that it (and code written in the future) is safe from data races. It is hoped that this source break will not be onerous:

it is expected that use of global variables should be sparing, and most global variables can be protected by a global actor;
as long as classes are not shared across actor boundaries, "actor local" annotation should not affect code within an actor;
where references must be passed across boundaries, the language should make this obvious and the solutions simple;
by further encouraging and simplifying the use of value types, the need to share classes across actor boundaries should be reduced;
the time between the two phases will give users an opportunity to factor their code into actors and async functions, preparing it for full isolation.

Unlike the pitches for the first phase, which will be pitched and proposed, the language features needed for the second phase will initially be raised for discussion on the evolution discussion section of the Swift forums. One of the main drivers for the two-phase approach is that of wanting to give Swift users time to become accustomed to async functions and actors before moving to a full isolation model. It is expected that the experience of porting large working code bases to actors and async functions will inform the functionality needed to enforce full actor isolation. This feedback should inform the discussion of features in the second phase.

Features expected to be discussed as part of a second phase include:

introduction of an actorlocal restriction on types;
compiler support for guaranteed correct "copy on write" types via a mutableIfUnique class type;
attributes to opt-out of actor isolation, for example where thread safety is being handled through some other means.

Glossary of Concepts

These are the basic concepts that will be used throughout the design, briefly defined here.

A synchronous function is the kind of function that Swift programmers are already used to: it runs to completion on a single thread, with no interleaving code except for any synchronous functions it calls.
A thread refers to the underlying platform’s concept of threads. Platforms vary, but tend to share basic characteristics: true concurrency requires creating a platform thread, but creating and running platform threads is expensive. C function calls, and ordinary synchronous Swift functions, require the use of a platform thread.
An asynchronous function is a new kind of function which does not have to run to completion without interruption. An interruption causes the function to be suspended. The point at which an async function may give up its thread is a suspension point.
A task is an operation that runs asynchronously. All asynchronous functions run as part of some task. When an asynchronous function calls another asynchronous function, the call is still part of the the same task, even if the call has to change actors. A task is the analogue of a thread for asynchronous functions.
An async function can create a child task. Child tasks inherit some of the structure of their parent task, including its priority, but can run concurrently with it. However, this concurrency is bounded: a function that creates a child task must wait for it to end before returning.
A program wishes to initiate independent concurrent work that can outlast its spawning context uses a detached task rather than a bounded child task.
A partial task is a unit of schedulable work. When the currently-executing function in a task is suspended, that is the end of the partial task, and a new partial task is created to continue the overall task’s work.
An executor is a service which accepts the submission of partial tasks and arranges for some thread to run them. An async function that is currently running always knows the executor on which it's running. An executor is called exclusive if the partial tasks submitted to it will never be run concurrently.
An actor is an independent part of the program which can run code. It can only run one piece of code at a time — that is, it acts as an exclusive executor — but the code it runs can execute concurrently with the code run by other actors.
An actor can have protected state which can only be accessed by that actor. The system through which this is achieved is called actor isolation. The long term goal is for Swift to guarantee actor isolation by default.
An actor class is a reference type, each instance of which is a separate actor. Its protected state is its instance properties, and its actor functions are its instance methods.
A global actor is a global object. Its protected state and actor functions may be spread across many different types. They can be marked with an actor-specific attribute, which Swift can infer in many cases.

jdmcd · October 30, 2020, 5:53pm

This is really great to see, thank you for your hard work (and the hard work of others!) on this roadmap.

I'm not sure if this is answerable or even known, but is there an estimate of the language version the first phase is targeting, and if so, the calendar timeline for that Swift version? Again - understandable if that's not public or answerable :)

Lantua · October 30, 2020, 6:07pm

Some (potential) typos:

Should this be marked with await?

Should this be actor class RacyActor?

var.

Ben_Cohen · October 30, 2020, 6:25pm

Thanks @lantua! And we’ve already implemented the typechecker for these features so I have no excuse :)

Typo and similar feedback is best directed to the PR.

Lantua · October 30, 2020, 6:25pm

Ah, ok. Will do next time!

John_McCall · October 30, 2020, 9:49pm

4 posts were merged into an existing topic: [Concurrency] Actors & actor isolation

John_McCall · October 30, 2020, 9:45pm

15 posts were merged into an existing topic: [Concurrency] Asynchronous functions

Karl · October 30, 2020, 8:02pm

Very cool!

Has there been any thought about actors as an interface for heterogenous computing? The idea of having actor-isolated functions running on actor-isolated data seems like it should scale to modelling some GPU kernel running on data that must be copied to/from the GPU's own memory.

trs · October 30, 2020, 8:58pm

Looks great and very exciting! I am especially happy that your plan includes synchronous concurrency with nurseries!

John_McCall · October 30, 2020, 9:39pm

I hope people don't mind too much, but I've moved (and will continue to move) some discussions that seemed topic-specific over to the dedicated thread for those topics. Please feel free to post general questions and feedback here, and if we think they should be moved, we'll move 'em.

Lantua · October 30, 2020, 9:51pm

I think concurrency tag would be useful once these pitches start drifting apart.

John_McCall · October 30, 2020, 9:52pm

I'm marking them now, thanks!

John_McCall · October 30, 2020, 9:59pm

I think it would be very natural to express that as an async API. Whether it's a good fit for an actor is an interesting question. To me, something can be logically be an "actor" — that is, it fills a similar role as a language-supported actor — without necessarily being a good fit for the language-supported actor features. The difference between a well-encapsulated actor class and an ordinary class with primarily async methods is very little, basically just the semantics of extension methods (i.e. whether they logically execute on the actor's executor or not).

Doug · October 30, 2020, 10:05pm

Wow, this looks great, I very much look forward to using it all in the future

Robuske · October 30, 2020, 10:15pm

The roadmap states that:

But (sometimes) we have multiple releases per year. Is there at least an idea of how long the first phase will take to be done? Months, years (decades )?

asdf · October 30, 2020, 10:28pm

will it need to implement ownership and borrowing?

Lantua · October 30, 2020, 10:32pm

You mean the entire async/await features, actors, and all the shebangs? Apparently not, judging from all the pitches.

tclementdev · October 30, 2020, 10:33pm

The main problem I see with this is that it seems to encourage anti-patterns that have the potential to do much more harm than good. This document seems already to be recommending anti-patterns like using queues instead of locks to protect shared state and using private queues to protect access to properties (which is what actors are). The maintainer of the libdispatch at Apple has repeatedly explained that developers should not do this. I have compiled together a page that sums up all this, including many references at the bottom of the document:

gist.github.com

https://gist.github.com/tclementdev/6af616354912b0347cdf6db159c37057

libdispatch-efficiency-tips.md

# libdispatch efficiency tips

The libdispatch is one of the most misused API due to the way it was presented to us when it was introduced and for many years after that, and due to the confusing documentation and API. This page is a compilation of important things to know if you're going to use this library. Many references are available at the end of this document pointing to comments from Apple's very own libdispatch maintainer (Pierre Habouzit).

My take-aways are:

* You should create very few, long-lived, well-defined queues. These queues should be seen as execution contexts in your program (gui, background work, ...) that benefit from executing in parallel. An important thing to note is that if these queues are all active at once, you will get as many threads running. In most apps, you probably do not need to create more than 3 or 4 queues.

* Go serial first, and as you find performance bottle necks, measure why, and if concurrency helps, apply with care, always validating under system pressure. Reuse queues by default and add more if there's some measurable benefit to it. Do not attempt to go wide by default.

This file has been truncated. show original

Multithreading is a complex thing, developers need to think very hard about it, about threads and how to use them wisely in their programs. The libdispatch maintainer explained that going wide by default is a mistake. Developers need to start serially and only apply concurrency when needed and with great care. My worry is that developers will misuse this just as they are misusing the libdispatch, they will make actors out of everything, have way too many queues, dispatch way too small execution blocks and get terrible performance out of it.

Lantua · October 30, 2020, 10:41pm

Do you have some examples in mind? It's hard to tell right now if your recommendation is incongruent with the design as things like prefer long queue aren't exactly applicable.

Fabio_Kaminski · October 30, 2020, 10:44pm

Theres also the question of control...

The inner implementation will implement a thread pool? if yes, with how many threads it will start?
If no, so it will dispatch a new thread for each time?

What about thread affinity and resources that need to assure they run in the same thread it runs before?

The actors pattern is cool and all but in my experience at least they are no the best pattern for every multithreading scenario. So forcing this directly in the language, of course it will make it easy for most people, but as i 've said before if they were built with the current language constructions it would be even better..

Edit: to make it clear.. ? for instance point to optional<> what about an async<> than a sugar could be used of course, but async<> could than be optmized for different cases.. IDK, just an idea..