Preventing Data Races in the Swift Concurrency Model

Perhaps it was a bad example. I'm not too surprised read-write locks aren't a big gain for user defaults now that I think about it. I guess all you have to do under the lock for a read operation is a dictionary lookup, copying the value pointer, and then incrementing its reference count. All in all a pretty quick operation unlikely to create much contention.

The catch is that if two readers are waiting for an upgrade, one of the two has to give in otherwise it's a deadlock. I'd advise against doing upgrades in general: either take a writer lock in advance if you might need one or cancel what you were doing with the read lock and do the whole thing anew with a writer lock when you end up needing to write. Upgrading is basically the later but with a fast path in case the upgrade succeeds.

You indeed need to have a very high reader to writer ratio for this to be worth it. And those readers also need to access the lock simultaneously often enough that it'd create contention otherwise.

Right. I might be a bit overenthousiatic about read-write.

And I'm having this silly thought right now: you don't necessarily have to decide about the type of lock in advance. You could have an executor pick at runtime to the most appropriate strategy based on contention statistics. Assuming the compiler can tag those partial tasks well enough as reads or writes.

1 Like

This would definitely be a neat trick :slight_smile:

Yes, I think you're right about that. I end up talking about parameters/results/variables/etc. when it's really about the values. Then let vs. var is about whether the value at that memory location can change.

Yes, that's a good generalization!

Right, and maybe-some-time-in-the-future we get some kinds of types that are uniquely-referenced so we know they aren't ever actually shared.

I brought this up over in the protocol-based actor isolation pitch. I think we want a special kind of marker protocol that has no runtime footprint, and have an idea of how it might work.

Doug

1 Like

Beyond being a neat trick, I wonder if there is a genuine performance need to vary locking characteristics at runtime. On machines with low core counts, like iOS devices, it’s hard to beat a fast unfair lock as a default locking primitive (IOW don’t bet against os_unfair_lock). On big iron that often has more cores than threads you may need some fairness at least some of the time to avoid starvation.

1 Like

Yeah that’s very likely. Scaling Swift to very wide machines like that is going to present a whole new world of optimization requirements in general (eg refcounting becomes much trickier to do quickly as parallelism increases).

1 Like

The important part is you can pick a strategy easily without having to rewrite your code. If the compiler can determine what is a write and what is a read, then changing strategy can be either a runtime switch (with some overhead) or a one-liner somewhere in the code.

My first idea was actually to have the executor produce statistics and offer recommendations about the strategy to use. Perhaps this could also be part of profile guided optimization.

I'm sure the iPhone and the Mac Pro run a lot of the same code. If you know the target hardware and can pick the right strategy, that's great. But often you don't know the target hardware in advance.

I think most of the code that is worth tuning for the runtime architecture is code from the standard library and the runtime (AFAIK, ref counting is performed by runtime functions, not by code generated by the compiler). As long as you can optimise this code without breaking the ABI, you should be able to run the same apps on iOS and on a Mac Pro without too much overhead.

1 Like

Catching up after the holidays:

Awesome, I'm super thrilled to see more focus being put on making the concurrency model safer out of the box for Swift programs. Defining away major classes of concurrency problems is going to be a huge boon.

I think that this is a great step forward, I have several strong +1s and several strong concerns, including those that are explained in the Actor Isolation for Global State document that I wrote back in November. I realize that the discussion is spread across many different threads and places, so I'll try to summarize my position and the concerns I have in one place here.


Naming bikeshed for ActorSendable: +1

The protocol-based actor isolation pitch proposes the use of a protocol ( ActorSendable ) to describe types that are safe to transfer across actor boundaries. It is a promising model; we're using the term "shareable" here because the problem isn't limited to crossing actor boundaries.

I'm really happy to see embracing something like the ActorSendable idea. I agree with your premise that we should find a better name than ActorSendable for this concept, given that it should be applies to structured concurrency as well. I'm not sure that Sharable is descriptive enough though. Would you be open to a longer name like ConcurrentlySharable or something else to scope this and make it more obvious that this is a specific kind of sharing?


Actor properties being implicitly async from outside: +1

I just wanted to +1 and send love to:

Actors... protects its stored instance properties. An actor class is a synchronized type , and it maintains that synchronization by serializing access to its stored instance properties: any use of an actor instance member must either be on self or must be performed asynchronously.

It makes perfect sense for references to actor properties to be subject to access control, and if they are cross-actor, they are made implicitly async.

The non- self accesses will be scheduled using a serial executor, which ensures that only one thread will execute code on a given actor instance at any one time. Therefore, the model prevents data races on actor instance properties.

Just to check my understanding, this just means that it is scheduled on the same executor queue thingy that the actor is using, ensuring the accesses are safe right?


Non-shareable types + Closures: +1

Agreed on this section, just one nit pick:

In general, the problem here is one of escape analysis : we need to determine whether a given non-shareable value can "escape" out of the current context and into a context where they might be used concurrently. The suggestions above likely to not account for all of the ways in which a value of non-shareable type can escape.

This isn't really about escape analysis. This is just applying the ActorSendable type restrictions to the capture list of the closure -- the compiler already forms the capture list, so it isn't a new sort of escape analysis. As mentioned in my comments below, you only apply these restrictions to concurrent closures, so compatibility with existing code shouldn't be affected.


Massive source breakage: -100

The proposal has several aspects that make it massively source breaking, and alternate designs have been proposed that don't do that. The biggest problem is the requirement that all global variables (and static stored properties) be marked with a global actor. This requires annotating existing code (breaking any of them) and then requires any use of them to be async or ascribed to the same global actor - breaking all the clients. The detailed problems with this proposal are explained in this section of a previous whitepaper, quoting from it:

Beyond being source incompatible, changing a global to be ascribed to a global actor won't be an ABI compatible move for a package or framework.

  • All global and static var stored properties must be protected by a global actor

Again, this is not the only model, there are multiple alternate models in the main proposal and in the alternatives considered section.

A lot of existing Swift code would be ill-formed under the rules above, even if it has been carefully created to be free of data races, because the tools to communicate the concurrency behavior don’t exist yet. The Swift Concurrency effort introduces those tools, but we can’t refactor the world overnight.

I remain very concerned about the idea of a two phase rollout here. All of the carrot is in the first stage, and the stick is in the second stage. People will adopt the first and never adopt the second, and the world will have the idea that Swift's concurrency story is memory unsafe.

Beyond the phasing issues, there is no reason for the source compatibility in the first place - we should just roll out a safe and mostly source compatible model in one round.


Ascribing special behavior to let declarations: -1

The writeup above ascribes special behavior to let declarations that var declarations don't have, making them implicitly sharable in situations where vars are not. This is problematic for several reasons:

  1. They break our API evolution and resilience model: it is important to Swift evolution that we can turn stored properties into computed properties (and visa versa) without affecting API. However, Swift doesn’t have computed let properties. This is a major reason why Swift doesn't have other "let-only" behavior in its design - we want everything to tie nicely into the resilience model.
  2. As described in your writeup, the actual model is that "only certain kinds of lets" are safe to share because of the ActorSendable story. This leads to a model that is difficult to explain.
  3. The mainline actor proposal (at least the last draft I read) proposed allowing let declarations be accessed across actor boundaries without doing an await. While this can be convenient for some (very) narrow case of "initialization only data", this breaks the simple model of "all communication with an actor is async", and isn't very useful.

In short, I'm not sure why initialization-only data is worth special casing like this: it's not very expressive to only allow cross-actor/task sharing of init-only data, and there are alternate (much more general ways) to allow actors to provide sync requirements, including vars and function requirements. I can elaborate if you'd like.

If your concern is about local variables + structured concurrency, then the right solution (IMO) is for concurrent closures to default to capturing values by-copy instead of by-reference.


Escaping closures being implicit concurrent: -1

I'm very concerned about this:

We propose a specific heuristic to determine whether a given closure will execute concurrently, which is based on an observation: To call a closure on another thread, one must escape the value of the closure to another thread. ... Therefore, we consider a closure to execute concurrently if it is an escaping closure.

While it is true that only escaping closures can introduce unstructured concurrency (and thus affect captured values) it isn't true that all escaping closures are concurrent. This proposal is similar in my mind to conflating async with error handling, two orthogonal things that happen to go together often.

I see the situation either as:

  1. a tower, where you have non-escaping, then escaping, then escaping+concurrent closures, or
  2. a matrix where escapingness is orthogonal to concurrency, allowing "non-escaping concurrent closures" like you'd want in a parallel map.

This is why I think we should have an explicit @concurrent attribute to model this behavior, either as the top of the tower, or the orthogonal axis from @escaping.

Rationale: the proposed design is going to be significantly source breaking if you start imposing new ActorSendable restrictions on captured values in existing escaping closures. Using a new attribute only imposes those restrictions on new code. Furthermore, concurrent closures really don't want by-ref captures, they want by-value captures by default. This is a pretty huge behavior change, which should be opt-in.

A local function is always assumed to execute concurrently, because they are always assumed to escape.

Why?


Global actor markers on global functions

A global or static stored property annotated with a global actor is protected by the global actor, so it will always be accessed from only a single thread at a time.

Ok, makes sense, but I wasn't aware that you were advocating that global functions could also be marked as being ascribed to a global actor:

This is a major part of the "global actor" design that I haven't seen explained fully. While such a model composes, I'm very concerned about two things (on top of the compatibility/progressive disclosure problems described above):

  1. Attribute explosion: I'm very concerned that this will cause an explosion of attributes on a lot of declarations, similar to "const propagation" problems in C++.
  2. Type compatibility: "main actor global functions" can't be safely compatible with their apparent function type. The closest safe type for bumpGlobalCounter above is () async -> () or @sharable () -> (), but not () -> (). This incompatibility will restrict adoption of global actor annotations, since existing higher order functions won't be async aware. This exacterbates the source compatibility issues.

While there is a space for "truly global" state, it should be rare in practice.


Playgrounds and Scripts and progressive disclosure of complexity: -1

The proposed design for global annotation marking breaks all playgrounds and scripts (which pervasively use globals). While you could default all globals and functions in a script/playground to a global actor, such a change will be surprising and incompatible. It will be incompatible because the type of a function/var on a global actor is different than normal functions, so you won't be able to pass them to higher order functions defined in existing packages/frameworks.


Overall, this is a very promising step forward in the concurrency design. I'm really thrilled to see it, but I still think is more progress needed.

-Chris

10 Likes

I have the same feeling about this. We bring in lots of new attributes to already explosive attrs list to swift.

I suggest using

  • actor(main) => @mainActor
  • actor(global) => @globalActor
  • actor(local) <==> actor
  • actor(<name>) => @someGlobalActor

local/global/main are reserved built in keyword for actor scenarios.

1 Like

I'm not sure what Chris means by "explosion of attributes". Is it concern about the number of new attributes, or the fact the a lots of declarations everywhere will now have to be declared with attributes.

Renaming the attributes will only solve the first concern, not the fact that too much declarations will require attributes.

I'm concerned about the later - similar to how marking one parameter const in a c++ program means you may need to go spread const everywhere.

2 Likes

Hi Doug,

In an attempt to hill climb on a solution here, I think it would make sense to start by talking about the closure part of this design. Are you opposed to having a @concurrent attribute on function types, which is orthogonal to but works similar to @escaping?

The semantics of an escaping closure would be that it would be subject to ConcurrentlySharable checking (both for captures and arguments), I think it would default to capturing by value instead of by reference, and we could even make them default to async.

If we go this direction, it has a number of advantages because it is a new additive features that is opt-in. For example:

  1. It allows modeling the full cross product of "concurrent but not escaping" "escaping but not concurrent" "neither" and "both".
  2. It would eliminate one of the major sources of source breakage in this direction, because you wouldn't impose sharability checks on existing code.
  3. It implies that local functions would only be concurrent if declared as a @concurrent func foo() marker.
  4. It works great with actors and structured concurrency alike, and be a unifying theory across the two.

What do you think about this?

-Chris

Yes, I'm open to longer names to make things more clear here. In my experimental implementation of this idea I ended up going with ConcurrentValue, because the fundamental property here is that values of the type can be used concurrently. I think @Joe_Groff's formulation as being about reachability to unsynchronized shared mutable state is the thing to capture.

Yes.

Good point, thanks.

I disagree fairly strongly with the main direction of that paper, as I noted in my response. Implicitly turning global variables into per-actor-instance variables is a huge semantic change, and I don't think it's going in the right direction. Technically not "source breaking", because your code will still compile, but will behavior very strangely the moment anyone introduces an actor that touches a global---even for correctly-synchronized global variables, such as a singleton that handles synchronization internally (it would no longer be a singleton in your proposed). At least with a source break, you know that semantics have changed.

Do you want to debate that point further, or are there other models you want to discuss?

This is incorrect. You can evolve a let into an @actorIndependent var without breaking API or ABI.

In this model, "only certain kinds of async functions" can be called from another actor because of the ActorSendable/ConcurrentValue story. Either the whole model has this problem or it's not a real problem; there is no argument specific to let here.

I've been writing code against this model, and it is so very, very useful in practice. This current pitch on Preventing Data Races takes away the special case from actors and makes all let's with ActorSendable/ConcurrentValue/Shareable/wathever types accessible concurrently.

I truly don't understand where you're coming from with this line of reasoning. Immutability is perhaps the simplest and most pervasive tool we have for making concurrent code safe from data races. Swift has nudged programmers toward using let because immutability makes code simpler to reason about, and it also makes it safe to use concurrently when working with ActorSendable/ConcurrentValue types---something Swift has also nudged programmers worked with pervasive value semantics. Why would we not embrace this wholeheartedly in our concurrency design?

Nor are all nonescaping closures non-concurrent. Per your hill-climbing question:

I'm not opposed to having a @concurrent attribute on function types. I think we'll need them regardless.

The prototype and my pitch take the approach of having new code---code that is within an actor or async code---opt in to the checking based on the "escaping closures are assumed concurrent" notion. Having @concurrent types be completely opt-in leaves large holes in the checking model that lets us diagnose data races in code that uses actors with existing APIs. Most examples are more subtle, but let's be blatant:

actor class MyActor {
  var count: Int

  func f(array: [Int]) {
    runLater {
      self.count = self.count + 1
    } 
  }
}

Where we got runLater from some library:

func runLater(_ body: @escaping () -> Void) {
  // toss this on a timer somewhere to run
}

My proposed scheme and the prototype will complain about this because escaping implies concurrent by default. But it allows non-escaping closures to just work

extension MyActor {
  func silly(array: [Int]) {
    array.forEach { i in count += i } // okay because closure is non-escaping, so it's in the actor's domain
  }
}

With @concurrent being a purely opt-in feature, we'll either silently admit the data race in the first example, or have to reject the second example.

Having escaping imply @concurrent by default doesn't mean that escaping and concurrent aren't orthogonal; rather, the defaults are lined up and you have different spellings for the exceptional cases:

  • @escaping () -> Void: escaping, concurrent
  • @escaping @nonconcurrent () -> Void: escaping, non-concurrent (expected to be very, very rare)
  • () -> Void: non-escaping, non-concurrent
  • @concurrent () -> Void: non-escaping, concurrent (parallel/concurrent algorithms would use this)

The first and third are the most common cases, and with the approach I'm describing, they don't have to change at all (neither ABI nor API). But we can still address the uncommon cases to fill out the matrix.

Because it took me longer to figure out that this can be checked directly.

It would still be implicit, because most closures will become @concurrent implicitly based on context. That's probably okay in practice.

Doug

3 Likes

Cool, ConcurrentValue sounds great to me!

Yes I absolutely do, but I think there are several other parts of the model that are worth converging on first (which also have nothing to do with global variable semantics), so I suggest we focus effort there and come back to this topic.

Ok, the confusion is that @actorIntependent hasn't come up in this particular thread.

To recap the other threads, I have not seen clear motivation on the design of @actorIntependent. My understanding is that it has something to do with wanting actors to conform to protocols with sync requirements. If that is the case, then we need to discuss the topic and possible solutions, because there are better (in my opinion) ways to solve that problem.

However, it is entirely possible I've misread the goals here (again, because I haven't seen the motivation). In any case, we need to discuss this further.

I'd love to know more about the use cases, because it has simply been presented as "the design" without motivation in the proposal. I offered to write up a doc exploring the issue but you requested that I not do that. I gave a short version in response. I haven't had a chance to get back to that thread, so it is on me I guess, but I'd prefer to see actual rationale in a doc that explains @actorIndependent so I'm not trying to understand and discuss a moving target.

In any case, the entire design of actors + sync requirements has alternate designs that haven't been evaluated (in the forums). I don't see it as a forgone conclusion that @actorIndependent is a good design, and if it doesn't stick, then the let resilience / API migration issue is a very important concern. I suggest that we split this issue out to its own discussion thread.

I'm a huge fan of immutability and immutable types, even immutable reference semantic types (that's one of the motivations for the ConcurrentValue proposal). However that isn't what this proposal does or is about. This concept is about allowing actors - which are inherently bags of mutable state that are accessed asynchronously - to have narrow initialization semantics with special case sync access behavior.

This muddies the water by complicating the model, and isn't motivated anywhere that I've seen. Arguing that this is motivated by "immutability being good" doesn't seem sufficient: while we can agree that immutability is good :-) we also need to look at the effect of the design decisions on the resultant programming model because there are many factors to weigh.

This is fantastic news. I'd love to see this get nailed down, I think it will help clarify a bit chunk of the global concurrency model.

I don't understand this, can you please explain more? There are two designs possible here (the "matrix" and the "tower" approach). I don't have a strong opinion either way, but I'll use the matrix to explain my understanding of how this works.

In the matrix design, @concurrent is a function type attribute that is orthogonal to @escaping and you can have the full cross product of attributes. This means that you get a natural subtype relationship as well: @concurrent () -> () can implicitly convert to () -> () just like @escaping () -> () implicitly converts to () -> ().

Given a design where @concurrent is a new opt-in feature, you get no data race and don't reject the second example. The first example fails to build when you implement runLater with an actor:

func runLater(_ body: @escaping () -> Void) {
  // toss this on a timer somewhere to run
  otherActor.enqueueAfterTime(1.0, body)
  // ^ error, cannot pass non-concurrent closure across actor boundaries because it isn't a `ConcurrentValue`.
}

or with structured concurrency:

func runLater(_ body: @escaping () -> Void) {
  // toss this on a timer somewhere to run
  taskGroup.spawn {stuff(body) }
  // ^ error, cannot capture 'body' in a concurrent closure because non-concurrent closures aren't `ConcurrentValue`s.
}

This is because spawn (or async let, or whatever) takes a concurrent closure, and they require ConcurrentValue conformance of anything they capture and pass/return as argument/results.

In your second example:

extension MyActor {
  func silly(array: [Int]) {
    array.forEach { i in count += i } // okay because closure is non-escaping, so it's in the actor's domain
  }
}

This is totally fine. forEach doesn't take a concurrent closure, so it doesn't have ConcurrentValue checks etc. It is just local synchronous code, protected by the actor as you'd expect. There's no change to type checking for existing code.

I don't have time to read your large patch, can you please explain what behavior you are proposing now?

The obvious and simple thing is to require the @concurrent attribute on the local function to keep it consistent with closures: the decl attribute would change the type of the local function.

Trying to infer syntactically might be possible, but it seems like it will cause problems with non-trivial cases. That said, I don't know what you're proposal is and shouldn't speculate.

Yeah totally agree, it should work just like the existing @escaping logic, where "the escaping bit" (and thus, "the concurrent bit") is inferred and checked based on the type inferred by the closure if there is context. This is very common since you're often passing the closure to a function.

As a concrete example, in parallelMap(stuff, {...}) the closure is inferred to be @concurrent, so it gets ConcurrentValue checks applied to it.

-Chris

4 Likes

A very common use case of unprotected lets is, say a server app that loads its configuration at startup and lets everyone read it. Ideally it would enforce immutability of this data and also unprotected access.

One caveat here is that this data should be synchronized once before other threads can read it. If tasks that access this immutable data start after its creation then afaik it's guaranteed to be synchronized.

Another caveat like Doug already mentioned, is unprotected let object references that allow modification.

Not sure how the above can be resolved, but other than that immutable unprotected data seems like a very useful thing.

Right - I am very +1 on immutability and immutable types. Just make something like that a class or struct with a bunch of let properties, and make it be a ConcurrentValue and you're done.

My point is that there is no reason to make such a bag of immutable state an actors itself - actors are about protecting other mutable state from concurrent accesses. If the state is immutable, then you don't need actors / queues / async etc, they can directly share it without wrapping it in actor.

-Chris

1 Like

Yes but if an app wants to be 100% correct it will take the "everything is an Actor" model as a basis, I presume. Therefore all data should belong somewhere, either MainActor or any Actor.

I might have a case against this. Immutable states still need synchronization guarantees. In a lower-level multithreaded world this can be achieved by ensuring that shared immutable data is created before the other threads are created since each pthread_create() issues a sync.

I'd be most comfortable if say Swift's Actor constructors issued an implicit sync at the end of their execution (similar to what pthread_create() does) and let me access immutable data within them, unprotected. Because without this I'd be limited to defining e.g. my server app's configuration in global non-actor vars and somehow ensure their non-laziness.

Am I asking too much, or is there a way to achieve this that I'm missing?

No, this is a common misunderstanding, akin to saying "classes should be eschewed, real swift programmers only use structs and protocols". While it could be possible to use actors for everything, that wouldn't be a very efficient way to solve all concurrency problems. Some things are really best solved by shared mutable state with synchronization (e.g. concurrent hash tables) or by shared immutable types. The model already supports both of those.

Immutable state doesn't need synchronization by definition: nothing mutates it, so you can't have read/mutate races. Read/read races aren't concurrency problems. I think you may be concerned that the pointer to an partially initialized instance might escape before it is fully set up?

The important observation here is that Swift already guarantees that the self pointer for a class/struct/etc cannot escape until all of the members of the type are initialized. If all of the members are immutable, then this means that getting a pointer to an instance guarantees it is fully initialized and (since it is immutable) you know it won't/can't change.

This isn't an accident: this has been a key design point of the initialization model from the beginning. This is a key aspect of the "designated initialization model" and why there are different rules before and after the "super init" call.

-Chris

1 Like

No, it's a different problem. Immutable data, too, requires synchronization though only once. When you write something to memory on a multicore system it's not guaranteed to become immediately visible to all cores.

However you know it's immutable and you don't want to waste cycles by syncing it every time you read it. So what you usually do in lower-level multithreading is: you ensure your immutable data is initialized before the threads that will be reading it are created, since thread creation issues a sync and therefore it will be safe to read from those threads, unprotected.

Now putting the pieces together, what I was proposing: if say there's a guarantee Actor issues a sync at the end of its constructor (not familiar with the internals yet, but there's some chance it's what actually happens) it means any immutable data defined in the actor is safe to read unprotected.

If on the other hand, actors don't expose their immutable data then what options do I have with say my scenario with the server config structures that will be initialized once and then read a lot? Think of the gigantic configs that Apache or nginx load at startup and then use all the time.

You are suggesting that I create unprotected globals (also ensure they are not lazy!), then what is the guarantee that they are synced and safe to read? There's a good chance some synchronization will happen somewhere before this data is read, but then a rare chance it won't be.

Of course exposing actor lets should be considered very carefully anyway, but on the surface it seems like a good idea that will allow me to follow "everything is an actor" model where it would be easier to provide a formal proof of correctness of the program, and have immutable unprotected data with faster access, too.

Oh, but that is mostly true.

1 Like