Swift 6.2 hidden actor race condition

The following program compiles and runs using the Xcode Swift 6.2 compiler in Swift 6 mode with no warnings or errors:

import Foundation

class Stateful {
  var x: Int

  init() { x = 0 }
}

class Entry {
  let id: Int
  var s: Stateful

  init(_ id: Int, s: Stateful) {
    self.id = id
    self.s = s
  }

  func asyncWork() async {
    s.x += 1
    print("Task \(id): x=\(s.x)")
    for _ in 1...500 { }
    print("Task \(id): x=\(s.x)")
  }
}

@MainActor class MyClass {
  var entries: [Entry] = []

  func tenTasks() async {
    let s = Stateful()
    entries = (0...9).map { Entry($0, s: s) }
    let totalCount = entries.count
    await withTaskGroup(of: Void.self) { group in
      var i = 0
      while i < totalCount {
        let entry = entries[i]
        group.addTask { await entry.asyncWork() }
        i += 1
      }
      for await _ in group { }
    }
  }
}

print("Begin")
await MyClass().tenTasks()
print("End")

Yet it has an obvious race condition on x, with its value changing from the point of view of a task without the task yielding or modifying it. Here’s one output:

Begin
Task 0: x=1
Task 1: x=2
Task 2: x=3
Task 3: x=4
Task 0: x=4
Task 5: x=5
Task 4: x=6
Task 6: x=7
Task 7: x=8
Task 1: x=8
Task 9: x=9
Task 8: x=10
Task 2: x=10
Task 5: x=10
Task 3: x=10
Task 6: x=10
Task 4: x=10
Task 7: x=10
Task 9: x=10
Task 8: x=10
End

Were I to change the line

entries = (0...9).map { Entry($0, s: s) }

to

let entries = (0...9).map { Entry($0, s: s) }

then the compiler would correctly diagnoses the problem:

Passing closure as a 'sending' parameter risks causing data races between main actor-isolated code and concurrent execution of the closure

but the program as originally listed races without generating any errors or warnings. Is this intentional?

I’m running Swift 6.2 on the current release Xcode 26 with language set to Swift 6 and “nonisolated(nonsending) By Default” set to YES.

4 Likes

This is definitely a bug to me.

My manual RBI analysis of your code is as follows.
Because tenTasks is MainActor-isolated, let entry should also be MainActor-isolated. The closure that captures entry -- { await entry.asyncWork() } -- then cannot be in a disconnected region.

  • if it is in a MainActor-isolated region, then the compiler should reject the entry.asyncWork() call, because a non-Sendable value (entry) is passed to a nonisolated-async function (Entry.asyncWork).
  • if it is in a task-isolated region, then the compiler should reject the addTask call, just the same as when you change entries = xxx in to let entries = ....
2 Likes

agreed, this appears to be a bug in region based isolation[1]. it seems that it is not new however, as i see the same behavior going back to the 6.0 compiler, nor does the upcoming feature appear to be relevant.

i tried to reduce it a bit, and my suspicion is that it has to do with region merging behavior seemingly not working or behaving differently when aliasing the actor-isolated state through a collection subscript vs a method. e.g. if you change the line let entry = entries[i] to access through a method like this:

extension [Entry] {
    func entryAtIndex(_ idx: Int) -> Entry { self[idx] }
}

// ...

let entry = entries.entryAtIndex(i)

then the analysis identifies the aliasing correctly and diagnoses the invalid send:

<source>:38:15: error: passing closure as a 'sending' parameter risks causing data races between main actor-isolated code and concurrent execution of the closure
36 |         // let entry = entries[i]
37 |         let entry = entries.entryAtIndex(i)
38 |         group.addTask { await entry.asyncWork() }
   |               |                     `- note: closure captures 'entry' which is accessible to main actor-isolated code
   |               `- error: passing closure as a 'sending' parameter risks causing data races between main actor-isolated code and concurrent execution of the closure
39 |         i += 1
40 |       }

here's a reduction that shows there's a hole in RBI that allows sending actor-isolated non-sendable state[2]:

nonisolated func send<T>(_ value: sending T) {}

open class Entry {}

@MainActor class MyClass {
  var entries: [Entry] = []

  func invalidSend() async {
    let entry = Entry()
    self.entries = [entry]
    let alias = self.entries[0]
    // change to this access pattern and you get an error:
    // let alias = self.entries.first!
    send(alias) // sending `alias` but it's accessible to main-actor-isolated code
  }
}

FYI @hborla @Michael_Gottesman


  1. you can report it here, or i might do it if i remember... ↩︎

  2. related godbot ↩︎

4 Likes

Thank you! Opened a swiftlang bug for this.

4 Likes

Yes, it looks like a bug to fail to produce the appropriate warning.

A few observations, though:

  1. When we post Swift 6.2 questions, the “nonisolated(nonsending) By Default” build setting is good to include, but, in this case, the salient build setting is “Default Actor Isolation”. Now, we can reasonably guess that your “Default Actor Isolation” was ”nonisolated” (not “MainActor”), but I might suggest being explicit about that in your bug report. (Or, as I have in my examples below, just be explicit/redundant about the isolation of the types in the code snippet, and it eliminates any question about this “Default Actor Isolation” build setting.)

  2. If you forgive me, your example output does not actually manifest any race. Just because x is changing between the two print statements is immaterial. This simply can happen with reference types in multithreaded code, whether thread-safe/Sendable, or not.

    To manifest races, you generally have to have a lot more iterations than 10. For example, here is a non-thread-safe example that illustrates your point:

    import Foundation
    
    nonisolated class Stateful {
        private var x = 0
    
        func increment() {
            x += 1
        }
    
        var value: Int {
            x
        }
    }
    
    nonisolated class Entry {
        let id: Int
        let s: Stateful
    
        init(_ id: Int, s: Stateful) {
            self.id = id
            self.s = s
        }
    
        func asyncWork() async {
            s.increment()
        }
    }
    
    @MainActor class MyClass {
        var entries: [Entry] = []
        let count = 1_000_000
    
        func manyTasks() async {
            let s = Stateful()
            entries = (0 ..< count).map { Entry($0, s: s) }
    
            await withTaskGroup(of: Void.self) { group in
                for i in entries.indices {
                    let entry = entries[i]
                    group.addTask {
                        await entry.asyncWork()
                    }
                }
    
                await group.waitForAll()
    
                let finalResult = s.value
                if finalResult != count {
                    print("final count was \(finalResult), not \(count)!!!")
                } else {
                    print("final count was \(finalResult); no race was manifested")
                }
            }
        }
    }
    

    That actually manifests the race you were looking for:

    final count was 997897, not 1000000!!!
    

    And, FWIW, we can see if we make Stateful and Entry both Sendable, then it works fine:

    import Foundation
    import Synchronization
    
    nonisolated final class Stateful: Sendable {
        private let x = Mutex(0)
    
        func increment() {
            x.withLock { $0 += 1 }
        }
    
        var value: Int {
            x.withLock { $0 }
        }
    }
    
    nonisolated final class Entry: Sendable {
        let id: Int
        let s: Stateful
    
        init(_ id: Int, s: Stateful) {
            self.id = id
            self.s = s
        }
    
        func asyncWork() async {
            s.increment()
        }
    }
    

All of this is not intended to take away from your broader point, namely that the compiler is failing to generate the appropriate warnings/errors. It’s just a few observations you might consider when composing your bug report.

3 Likes

This is probably the same as #83121. Both regressed (failed to produce diagnostics) since 6.0.

đź“– pedantic aside...

to be fair, none of the examples' output alone is sufficient to conclude that there are races; that requires an understanding of the program, execution model and its output.

the original example program has no explicit synchronization mechanism governing access to the mutable state[1]. given that the asyncWork() method has a synchronous implementation that increments the mutable value and then reads & prints it twice, we can conclude that if the print statements for different iterations interleave, there must be concurrent access to the mutable data.

the modified example you provided that uses an explicit Mutex for synchronization is a good demonstration of how the program's output alone is insufficient to conclude very much about the presence of races. e.g. if we changed the increment() method to no longer be atomic, and split the read and write into two separate lock accesses, it would contain 'race conditions', but not any 'data races'; the printed output could indicate a 'race' in the same way as the implementation without a mutex, but all memory accesses would still be 'safe'.

we do have access to a tool specifically designed to detect runtime data races: thread sanitizer. using it to demonstrate the presence of races may be more convenient and provide more compelling evidence in reports of these issues than bespoke printing & invariant checks (granted, not sure if it can be enabled in a Playground though).

i think the underlying issue here may be distinct, since the sample code in the issue you referenced was diagnosed in the 6.0 compiler, and this one seemingly has never been diagnosed by any compiler version.


  1. well, other than actor-isolation, which it demonstrates is subverted ↩︎

The specific kind of race condition the code in the original post exhibits is often called a data race. I’m not aware of any formal semantics for Swift concurrency, so I’m using nomenclature and definitions from other modern languages, which generally have converged on a similar set of specifications for pragmatic reasons I’ll get to. I’ve done a fair amount of work with and defining such specifications in other widely used languages.

Data Races

At a high level a data race is usually defined as something like:

Two non-atomic accesses to the same memory location by two parallel streams of execution, neither of which occurs before the other in a happens-before graph, where at least one of the accesses is a write.

That definition uses a lot of terms of art I’m not going to get into the details of at the moment:

  • What is meant by a “memory location” from the point of view of a high-level program is an interesting and fairly deep question, but an Int generally counts as one.
  • What counts as a “parallel stream of execution” also varies. Threads are some examples, as are tasks that can run in parallel. Tasks on the same actor are not parallel streams of execution for this definition because the only way to switch from one to the other is for the first one to explicitly yield by, for example, awaiting something.
  • What counts as “atomic” varies. In Swift Atomic<Int> are atomics. Ordinary Ints are not.
  • The happens-before graph is constructed out of all of the synchronization constructs in the language used by a particular execution of a program. Common synchronization constructs that add edges to the graph include release-acquire edges, mutex edges, and synchronization provided by various system libraries. In a nondeterministic language a program typically can have many possible executions, each with its own happens-before graph.

There are many details I'm glossing over here, but that's the high-level view.

Example

The program in the original post is sufficient to demonstrate a data race, at least with the plausible assumption that print doesn't affect the value of x, nor does it synchronize accesses to it. The tasks are running in parallel, and a single task can see multiple values of x without writing to it, yielding to the other tasks, or synchronizing with anything that can change x.

Motivation

A language could resolve data races, by, for example, defining the language so that all memory accesses done in a program appear to be in some sequentially consistent order. Languages used to do that but it’s problematic on modern computers because it interferes with both compiler optimizations and modern multicore processor memory models. It’s possible for core 1 to perceive writes to x and y in one order while core 2 perceives the writes to x and y in the other order. In general there is no sequentially consistent order for all such memory operations. We could make one, but then the program would run much slower (we’d need to use sequentially consistent instructions on the target chip architecture), so we typically only do it for sequentially consistent atomics and other such synchronization primitives.

The other place this comes up is in compiler optimizations. We want to define a language’s memory model so that common optimizations are allowed by the language’s semantics. For example, an optimizer might want to turn

var x
…
let a = x
let b = x

into:

var x
…
let a = x
let b = a

This is a valid optimization as long as the compiler can assume that the program is free of data races.

A more useful scenario is turning:

var x
…
while … {
  let a = x
  …
}

into:

var x
…
let a = x
while … {
  …
}

where nothing else in the loop modifies x, synchronizes with other tasks that can modify x, or gives up control to let them run, for example, yielding an actor that manages access to x.

These are valid optimization for programs without data races. Data races can make them invalid, so we usually don't want to allow data races in a language. Some languages, such as C++, just turn them into undefined behavior and the compiler can optimize assuming that they don't happen (which itself sometimes leads to hilarious consequences for the unwary). Swift 6 seems to use a combination of compiler and run-time enforcement to prevent them, at least in code that doesn't escape into unsafe constructs. Java got the semantics badly wrong and had to make significant revisions to the definition of the language to fix them. Even C++ made mistakes in the definition which were found and corrected only recently via formal analysis.

2 Likes

Yep.

That is a good definition of a “data race”. A data race is the mutation of the contents at a memory address while that memory is being simultaneously accessed from another thread. The Swift 6 compiler is trying to protect us from these data races (but failed to do so here).

And, yes, the original question’s code snippet has a data race does about which the compiler failed to warn us.

Nope.

The fact that the printing of x after the spinning for loop was not what it was before the spinning loop is not evidence of a “data race”. It is a manifestation of a high-level race condition. It is an example of races resulting from the timing of individual accesses, not that two low-level accesses happened at the same time.

In fact, it’s quite easy to alter your original example such that it is entirely free of any data races, but still generates the exact same sort of output. (See my example at the end of the question.)

I belabor this distinction because the Swift 6 errors identify potential data races only. They will not identify high-level race conditions. Sure, we use Swift actors to make sure we use the right level of abstraction to tackle these high-level race conditions, but that’s not what the Swift 6 language mode errors are designed to identify. The Swift 6 error messages only tackle “data races”.

Again, you have successfully identified a bug (because, irregardless of whatever high-level, logical race conditions illustrated by the output, the original code snippet also suffered from a low-level data race about which the Swift 6.2 compiler failed to warn us). But the sample output in the original question is not evidence of a “data race”.


FWIW, here is an example, entirely free of any data races, which exhibits the same basic output as the original example of the question.

import Foundation
import Synchronization

nonisolated final class Stateful: Sendable {
    private let _x = Mutex(0)

    var x: Int {
        _x.withLock { $0 }
    }

    func increment() {
        _x.withLock { $0 += 1 }
    }
}

nonisolated final class Entry: Sendable {
    let id: Int
    let s: Stateful

    init(_ id: Int, s: Stateful) {
        self.id = id
        self.s = s
    }

    func asyncWork() async {
        s.increment()
        print("  Start task \(id): x=\(s.x)")
        for _ in 1...5_000 { }
        print("    End task \(id): x=\(s.x)")
    }
}

@MainActor class MyClass {
    func tenTasks() async {
        let s = Stateful()
        let entries = (0...9).map { Entry($0, s: s) }
        await withTaskGroup(of: Void.self) { group in
            for index in entries.indices {
                let entry = entries[index]
                group.addTask { await entry.asyncWork() }
            }
        }
    }
}

print("Begin")
await MyClass().tenTasks()
print("End")

That produces:

Begin
  Start task 0: x=1
  Start task 1: x=2
  Start task 2: x=3
  Start task 3: x=4
  Start task 4: x=5
  Start task 5: x=6
  Start task 6: x=7
  Start task 7: x=8
  Start task 8: x=9
  Start task 9: x=10
    End task 0: x=10
    End task 1: x=10
    End task 2: x=10
    End task 4: x=10
    End task 5: x=10
    End task 9: x=10
    End task 3: x=10
    End task 6: x=10
    End task 8: x=10
    End task 7: x=10
End

We must acknowledge that when dealing with actors, we generally don’t have to worry about properties mutating behind our backs like this. The vast majority of the time, this only rears its ugly head when dealing with await suspension points in our code (because actors are reentrant). But shared reference objects that employ their own low-level synchronizations to achieve their thread-safety can exhibit the above behavior. Judicious design can mitigate these scenarios, but this is beyond the scope of this question.

The point is that the Swift 6 compiler warnings/errors can help us mitigate “data races” (the compiler bug you identified notwithstanding), but catching and fixing logical races rests solely on our shoulders.