Preventing Data Races in the Swift Concurrency Model

Chris_Lattner3 · January 4, 2021, 2:17am

Catching up after the holidays:

Awesome, I'm super thrilled to see more focus being put on making the concurrency model safer out of the box for Swift programs. Defining away major classes of concurrency problems is going to be a huge boon.

I think that this is a great step forward, I have several strong +1s and several strong concerns, including those that are explained in the Actor Isolation for Global State document that I wrote back in November. I realize that the discussion is spread across many different threads and places, so I'll try to summarize my position and the concerns I have in one place here.

Naming bikeshed for `ActorSendable`: +1

The protocol-based actor isolation pitch proposes the use of a protocol ( ActorSendable ) to describe types that are safe to transfer across actor boundaries. It is a promising model; we're using the term "shareable" here because the problem isn't limited to crossing actor boundaries.

I'm really happy to see embracing something like the ActorSendable idea. I agree with your premise that we should find a better name than ActorSendable for this concept, given that it should be applies to structured concurrency as well. I'm not sure that Sharable is descriptive enough though. Would you be open to a longer name like ConcurrentlySharable or something else to scope this and make it more obvious that this is a specific kind of sharing?

Actor properties being implicitly async from outside: +1

I just wanted to +1 and send love to:

Actors... protects its stored instance properties. An actor class is a synchronized type , and it maintains that synchronization by serializing access to its stored instance properties: any use of an actor instance member must either be on self or must be performed asynchronously.

It makes perfect sense for references to actor properties to be subject to access control, and if they are cross-actor, they are made implicitly async.

The non- self accesses will be scheduled using a serial executor, which ensures that only one thread will execute code on a given actor instance at any one time. Therefore, the model prevents data races on actor instance properties.

Just to check my understanding, this just means that it is scheduled on the same executor queue thingy that the actor is using, ensuring the accesses are safe right?

Non-shareable types + Closures: +1

Agreed on this section, just one nit pick:

In general, the problem here is one of escape analysis : we need to determine whether a given non-shareable value can "escape" out of the current context and into a context where they might be used concurrently. The suggestions above likely to not account for all of the ways in which a value of non-shareable type can escape.

This isn't really about escape analysis. This is just applying the ActorSendable type restrictions to the capture list of the closure -- the compiler already forms the capture list, so it isn't a new sort of escape analysis. As mentioned in my comments below, you only apply these restrictions to concurrent closures, so compatibility with existing code shouldn't be affected.

Massive source breakage: -100

The proposal has several aspects that make it massively source breaking, and alternate designs have been proposed that don't do that. The biggest problem is the requirement that all global variables (and static stored properties) be marked with a global actor. This requires annotating existing code (breaking any of them) and then requires any use of them to be async or ascribed to the same global actor - breaking all the clients. The detailed problems with this proposal are explained in this section of a previous whitepaper, quoting from it:

One approach to handling global state is to eventually require that all global state be annotated with an unsafe attribute or be attributed to a global actor. This end state is problematic for a few reasons:

This doesn’t provide a solution for the actor equivalent of “thread local” variables. This means we have no way of modeling side-channel state that comes up in builder patterns, which are not associated with one specific actor.

This is an ugly end state, because all global variables (including static stored members) end up having attributes on them.

This isn’t acceptable for top level variables in playgrounds and scripts, which mean we’d have to give them a forked and inconsistent model.

Users will likely complain that “let magicNumber = 42” should work at global scope. Enabling that will further complicate/confuse the model.

Furthermore, getting to this end state is a problem, because existing Swift code already has global state without these annotations. The apparent idea is to roll this out in a two step approach where the first step is a memory unsafe model (see the problems with that above) in “Actors 1.0” and then roll out a massive source break in Actors 2.0.

This seems like a suboptimal approach given that we have multiple straight-forward options to roll out a memory safe model in Actors 1.0, and don’t need an “Actors 2.0” with a massive source break.

Beyond being source incompatible, changing a global to be ascribed to a global actor won't be an ABI compatible move for a package or framework.

All global and static var stored properties must be protected by a global actor

Again, this is not the only model, there are multiple alternate models in the main proposal and in the alternatives considered section.

A lot of existing Swift code would be ill-formed under the rules above, even if it has been carefully created to be free of data races, because the tools to communicate the concurrency behavior don’t exist yet. The Swift Concurrency effort introduces those tools, but we can’t refactor the world overnight.

I remain very concerned about the idea of a two phase rollout here. All of the carrot is in the first stage, and the stick is in the second stage. People will adopt the first and never adopt the second, and the world will have the idea that Swift's concurrency story is memory unsafe.

Beyond the phasing issues, there is no reason for the source compatibility in the first place - we should just roll out a safe and mostly source compatible model in one round.

Ascribing special behavior to `let` declarations: -1

The writeup above ascribes special behavior to let declarations that var declarations don't have, making them implicitly sharable in situations where vars are not. This is problematic for several reasons:

They break our API evolution and resilience model: it is important to Swift evolution that we can turn stored properties into computed properties (and visa versa) without affecting API. However, Swift doesn’t have computed let properties. This is a major reason why Swift doesn't have other "let-only" behavior in its design - we want everything to tie nicely into the resilience model.
As described in your writeup, the actual model is that "only certain kinds of lets" are safe to share because of the ActorSendable story. This leads to a model that is difficult to explain.
The mainline actor proposal (at least the last draft I read) proposed allowing let declarations be accessed across actor boundaries without doing an await. While this can be convenient for some (very) narrow case of "initialization only data", this breaks the simple model of "all communication with an actor is async", and isn't very useful.

In short, I'm not sure why initialization-only data is worth special casing like this: it's not very expressive to only allow cross-actor/task sharing of init-only data, and there are alternate (much more general ways) to allow actors to provide sync requirements, including vars and function requirements. I can elaborate if you'd like.

If your concern is about local variables + structured concurrency, then the right solution (IMO) is for concurrent closures to default to capturing values by-copy instead of by-reference.

Escaping closures being implicit concurrent: -1

I'm very concerned about this:

We propose a specific heuristic to determine whether a given closure will execute concurrently, which is based on an observation: To call a closure on another thread, one must escape the value of the closure to another thread. ... Therefore, we consider a closure to execute concurrently if it is an escaping closure.

While it is true that only escaping closures can introduce unstructured concurrency (and thus affect captured values) it isn't true that all escaping closures are concurrent. This proposal is similar in my mind to conflating async with error handling, two orthogonal things that happen to go together often.

I see the situation either as:

a tower, where you have non-escaping, then escaping, then escaping+concurrent closures, or
a matrix where escapingness is orthogonal to concurrency, allowing "non-escaping concurrent closures" like you'd want in a parallel map.

This is why I think we should have an explicit @concurrent attribute to model this behavior, either as the top of the tower, or the orthogonal axis from @escaping.

Rationale: the proposed design is going to be significantly source breaking if you start imposing new ActorSendable restrictions on captured values in existing escaping closures. Using a new attribute only imposes those restrictions on new code. Furthermore, concurrent closures really don't want by-ref captures, they want by-value captures by default. This is a pretty huge behavior change, which should be opt-in.

A local function is always assumed to execute concurrently, because they are always assumed to escape.

Why?

Global actor markers on global functions

A global or static stored property annotated with a global actor is protected by the global actor, so it will always be accessed from only a single thread at a time.

Ok, makes sense, but I wasn't aware that you were advocating that global functions could also be marked as being ascribed to a global actor:

Douglas_Gregor:

For example:

@MainActor func bumpGlobalCounter() {
  globalCounter = globalCounter + 1  // okay: bumpGlobalCounter is in the same global actor
}
func queryGlobalCounter() -> Int {
  return globalCounter  // error: globalCounter is isolated to the global actor `MainActor`
}

This is a major part of the "global actor" design that I haven't seen explained fully. While such a model composes, I'm very concerned about two things (on top of the compatibility/progressive disclosure problems described above):

Attribute explosion: I'm very concerned that this will cause an explosion of attributes on a lot of declarations, similar to "const propagation" problems in C++.
Type compatibility: "main actor global functions" can't be safely compatible with their apparent function type. The closest safe type for bumpGlobalCounter above is () async -> () or @sharable () -> (), but not () -> (). This incompatibility will restrict adoption of global actor annotations, since existing higher order functions won't be async aware. This exacterbates the source compatibility issues.

While there is a space for "truly global" state, it should be rare in practice.

Playgrounds and Scripts and progressive disclosure of complexity: -1

The proposed design for global annotation marking breaks all playgrounds and scripts (which pervasively use globals). While you could default all globals and functions in a script/playground to a global actor, such a change will be surprising and incompatible. It will be incompatible because the type of a function/var on a global actor is different than normal functions, so you won't be able to pass them to higher order functions defined in existing packages/frameworks.

Overall, this is a very promising step forward in the concurrency design. I'm really thrilled to see it, but I still think is more progress needed.

-Chris