Passing large amounts of data between actors without making copies

Andropov · June 30, 2025, 6:01am

tera:

Illustrating example

// Edit: simplified a bit

actor State {
    private var state = 0
    func get() async -> Int {
        await someLogStatement()
        return state
    }
    func set(_ newValue: Int) async {
        await someLogStatement()
        state = newValue
    }
}

let state = State()

Task {
    // data race
    await state.set(state.get() + 1)
}
Task {
    // data race
    await state.set(state.get() + 1)
}
Task {
    try await Task.sleep(nanoseconds: 100_000_000)
    let result = await state.get()
    print("resulting value is \(result)")
}

func someLogStatement() async {
    try! await Task.sleep(nanoseconds: .random(in: 0 ..< 1000_000))
}

RunLoop.main.run(until: .distantFuture)

That, if I remember correctly, is one of the reasons why Swift doesn't have auto-synthesized async setters for actor properties It is possible to write code with the same issues as the dispatch queue version, but the ergonomics of actors nudge you in the right direction. Here you had to write your own getters/setters, which feels 'alien' enough in Swift to hopefully make you reflect a bit on the code you're writing,

tera · June 30, 2025, 9:03am

Note that the actual code could be more complicated than just:

      await state.set(state.get() + 1)

to justify having get and set separately, for example:

await state.set(transformer.transform(state.get()))

However, even if you repackage the code to not have those get/set explicitly:

    func update(_ update: (Int) async -> Int) async {
        await someLogStatement()
        await state = update(state)
        await someLogStatement()
    }

with this usage:

    await state.update(transformer.transform)

the underlying high-level data race is still there:

Full illustrating data-race example without getters / setters

actor State {
    var state = 0
    
    func update(_ update: (Int) async -> Int) async {
        await someLogStatement()
        await state = update(state)
        await someLogStatement()
    }
}

actor Transformer {
    func transform(_ value: Int) async -> Int {
        await someLogStatement()
        return value + 1
    }
}

let state = State()
let transformer = Transformer()

Task {
    // data race
    await state.update(transformer.transform)
}
Task {
    // data race
    await state.update(transformer.transform)
}
Task {
    try await Task.sleep(nanoseconds: 100_000_000)
    let result = await state.state
    print("resulting value is \(result)")
}

func someLogStatement() async {
    try! await Task.sleep(nanoseconds: .random(in: 0 ..< 1000_000))
}

RunLoop.main.run(until: .distantFuture)

"which feels 'alien' enough in Swift" ... "to hopefully make you reflect a bit"

Frankly I don't think this is enough... It should be a much stronger nudge... like a compile time error / warning or at least runtime trap / crash.

Ideally it should be: "if it compiles - it works correctly". In other words the code that contains data races should be impossible to write, be them low-level or high-level data-races or anything else.

vns · June 30, 2025, 9:38am

Maybe weight of await got lost over time in comprehension, but these are crucial points and the major and important difference from example with queues.

Lots of await-s is what signals you to be careful with the logic and drives the attention to. They are used too liberally, like they mean nothing, while that’s markers for the developer.

For example, this instantly makes me question why simple state update inside an actor requires that many async calls? I can suspect logic errors here almost immediately.

To be fair, to be suspicious about such places you have to be bitten by reentrancy 1-2 times, but then you learn quickly that every await adds up to a complexity and start reduce and avoid them.

But see how many tweaks and complications you have to make consciously to reach this point. Within each one considered as anti-pattern for actors.

In contrast, with queues version it is much harder to suspect that there is something wrong ever.

jaleel · June 30, 2025, 9:50am

Tbh you'll still need await logging for locks and then it's just easier to write an actor with some checking for reentrancy inside.

Also, as a side note think for actors it's better to pass another actor instead of a closure, e.g.

actor State {
    
    var state = 0
    
    func update(using transformer: Transformer) async {
        await someLogStatement()
        await state = transformer.transform(state)
        await someLogStatement()
    }
}

actor Transformer {
    func transform(_ value: Int) async -> Int {
        await someLogStatement()
        return value + 1
    }
}

Returning back to the original question—is the problem only in big chunks making copies? What if we can split data into smaller chunks (events, diffs, change set, ...) and compose it in AppState afterwards?

tera · June 30, 2025, 10:17am

How many is "that many"? Is just one - too many yet?

    func update(using transformer: Transformer) async {
        await state = transformer.transform(state)
    }

Note, that this still data-races...

A simplified version of a data-race

actor State {
    var state = 0
    
    func update(using transformer: Transformer) async {
        await state = transformer.transform(state)
    }
}

actor Transformer {
    func transform(_ value: Int) async -> Int {
        await someLogStatement()
        return value + 1
    }
}

let state = State()
let transformer = Transformer()

Task {
    // data race
    await state.update(using: transformer)
}
Task {
    // data race
    await state.update(using: transformer)
}
Task {
    try await Task.sleep(nanoseconds: 100_000_000)
    let result = await state.state
    print("resulting value is \(result)")
}

func someLogStatement() async {
    try! await Task.sleep(nanoseconds: 0)
}

RunLoop.main.run(until: .distantFuture)

await here is unavoidable if transformer is another actor. So, where is the anti-pattern here? That I merely split the transformer into another actor? And more importantly, could compiler warn me about this?

YMMV, but for me, the presence of low-level data races (which "occasionally" manifest as crashes or runtime traps) is what actually nudges me to address those issues. Paradoxically, eliminating low-level data races can be a disservice. The mere presence of await markers is too subtle: I could spend days reviewing a PR with a handful of await s in it and still miss high-level data races like the one above. Bear in mind, the example above is very simple compared to what I've witnessed in practice.

jaleel · June 30, 2025, 10:27am

IMHO not an anti-pattern particular, but await and throws are basically effects in Swift, and should be considered thoughtfully when used.

vns · June 30, 2025, 10:39am

Anti-patterns are in previous examples: getters/setters and update closure. Still, every await is a critical point in the program and requires attention.

Clearly there are people capable of doing so by merely looking at code at review, but quite often that’s just random crashes with unclear stack trace, which leaves you with guessing the nature of this low-level issues. When you left with high-level issues only, you can now inspect critical points in the app.

tera:

How many is "that many"? Is just one - too many yet?
    func update(using transformer: Transformer) async {
        await state = transformer.transform(state)
    }
Note, that this still data-races...

Yes, but now it is much simpler to reason about: you have all effects of the mutating state in one place. Instead of being scattered across the codebase. Local reasoning is much simpler to analyze in the end.

tera · June 30, 2025, 11:01am

But that's at least somewhat better compared to all green tick marks during the compilation and then the app not working correctly sometimes?

vns:

tera:
    func update(using transformer: Transformer) async {
        await state = transformer.transform(state)
    }
Note, that this still data-races...
Yes, but now it is much simpler to reason about: you have all effects of the mutating state in one place. Instead of being scattered across the codebase. Local reasoning is much simpler to analyze in the end.

Honestly, would you question that line quoted above during a PR review (with, say, 30 other changes) and you didn't know upfront there's a high-level data race there? If so what would be your PR review comment for that line?

And how would you fix it?

I'd also like to find some common ground. Could we agree that "if it compiles - it's free of high-level data races" is a great thing to have at the end of the day? (Keeping aside for a moment the question of whether this is possible to achieve short-term or not). Or do you think that manually auditing await points is enough?

That means maintaining two copies of data and exchanging diff messages to sync it?

vns · June 30, 2025, 11:45am

There are higher chances that I would question that line compared to queue at least.

First of all I wonder why there is the need of two actors and need of actors at all. I get that this is an example code, nevertheless that’s first questions I ask in such cases (myself or others).

Then, if the use of actors or async calls is unavoidable there, the next question would depend on the nature of the state processing. If that is indented to be a sequential change of a state in a queue manner, then there is obviously need for internal queue of some sort. Also, decompose use of actor value of state from async points to get clear picture of the state I’m working with and how suspension points affect it: with value type it can be local variable, with reference types I’d add checks in-between await-s (if copy of such object too expensive or impossible due to some reasons).

That’s a great thing, I just don’t believe it is possible in programming languages to reach that point ever — not only in short-term. I’m not sure it is possible even if you won’t have shared mutable state, and appropriate functional language.

I think it is much better than we have had before. I guess there is the room for improvement, but not that it will be impossible to write incorrect concurrent code.

Andropov · June 30, 2025, 12:40pm

I don't think it's possible for the compiler to remove these high level data races, short term or not. There's nothing inherently wrong with reading a value asynchronously, modifying it in some way, and writing it back asynchronously.

Of course, if you want that operation to be an atomic update, and go on to write code assuming that method will update a value atomically, then the assumption will be broken an your code will perform unexpectedly. But how would the compiler know that you assumed such behavior if you never wrote that invariant down anywhere?^[1]

Ultimately it's up to the programmer to write an API with the right transactionality, as what is "valid" or not will depend on domain knowledge about the code.

This invariant can be written quite elegantly in Swift: an "atomic" operation on an actor's protected state can be written as a synchronous function in that actor. And now the compiler will check that invariant for you. ↩︎

tera · June 30, 2025, 1:06pm

vns:

There are higher chances that I would question that line compared to queue at least.

First of all I wonder why there is the need of two actors and need of actors at all. I get that this is an example code, nevertheless that’s first questions I ask in such cases (myself or others).

Then, if the use of actors or async calls is unavoidable there, the next question would depend on the nature of the state processing. If that is indented to be a sequential change of a state in a queue manner, then there is obviously need for internal queue of some sort. Also, decompose use of actor value of state from async points to get clear picture of the state I’m working with and how suspension points affect it: with value type it can be local variable, with reference types I’d add checks in-between await-s (if copy of such object too expensive or impossible due to some reasons).

Great stuff. I'd just like to see those questions, reasons and suggestions in a form of compile time errors and warnings (with a possibility to opt-out) (†).

Example:

func update1(using transform: (Int) async -> Int) async {
    let stateCopy = state
    let newState = await transform(stateCopy)
    state = newState
    // actually... same warning as below.
}

func update2(state: inout Int, using transform: (Int) async -> Int) async {
    state = await transform(state)
    // 🔶 "Warning: update of state could be lost!, if you know what you are doing to opt-out of this warning do this and that, otherwise fix your code".
}

I don't think high level data races are possible in pure functional programming languages that do not have shared mutable state. I'm not saying this would be the best approach (or better than (†)), but one of the approaches to make a subset of Swift safe in those regards would be to limit the safe portion of the language to be pure functional (e.g. have "safe" coloured pure functions that could only call other safe functions).

We could use exactly the same line of reasoning for the version above where the variables were protected with queue.sync... It's only fair to highlight that Swift actors are not immune to high-level races when we dismiss the above queue-synced version on the grounds that it is prone to having high-level races.

Slava_Pestov · June 30, 2025, 2:13pm

Actors are essentially shared mutable state any way you slice it. Once you can have a “counter” actor that responds to get/set messages to update an integer value internal to the actor, you can have a high level data race, and there isn’t much a type system can do to avoid it, except for limiting the actor model itself in some way.

This doesn’t require shared mutable state at the language level, because you can imagine implementing a an actor as a recursive function that calls a blocking receive() primitive, processes the message, and then recursively calls itself again with updated “state”.

siracusa · June 30, 2025, 2:13pm

The core problem is having a large amount of non-Sendable data that is generated in a time-consuming operation off the main thread that later needs to be viewed and edited in the UI (which runs on the main thread) and then later still processed some more in another time-consuming operation off the main thread, and so on.

For small amounts of data, this it's no big deal to make (Sendable) copies when passing between actors. But for large amounts of data, making copies can eat up a lot of memory, and trying not to make copies (e.g., by accessing the data asynchronously while it stays inside a non-main actor) can be tedious, risky (data races!) and doesn't mesh well with what SwiftUI (and AppKit, for that matter) expects when building UIs.

Yes, you can also make (Sendable) copies of smaller subsets of the data to try to avoid ever having a complete 2x RAM footprint for your data, but that's just adding complexity and the chance for bugs. (It's also what I'm currently doing in my real app, so I know it "works." But it seems like a cleaner solution should be possible…)

tera · June 30, 2025, 3:24pm

Yep, I am thinking along the lines of some limitation. Would be semi-equivalent to an error/warning in this synchronous and otherwise correct code:

      var x: Int
      x = 1
      x = 2 // error/warning wanted: the previous update was lost
      print(x)

But that would just stack overflow eventually with enough updates?

Slava_Pestov · June 30, 2025, 3:26pm

Functional languages typically have guaranteed tail call optimization.

tera · June 30, 2025, 3:50pm

But how to replicate the above high-level race in that case? Is that possible?

jaleel · June 30, 2025, 4:34pm

It's up to implementation, could be in one place I guess.

Ah, got it. AFAIK big message also a problem for other actor implementations (and having a dedicated store is one of the solutions), but would agree would be nice to have some way for local actors not to copy big chunks if there are some guarantees.

From my very little experience with Haskell you anyway land with MVar or some other abstractions when touching concurrency. Btw in Erlang process is basically something like that—a recursive loop that pass it's own non-shared state in-between (of course with some BEAM magic on top). As I remember something like (haven't touched in a while):

loop(Cache) ->
    receive
        {From, N} ->
            {Result, NewCache} = fib(N, Cache),
            From ! {self(), Result},
            loop(NewCache);
        stop ->
            ok
    end.

Still while processes are free from data races there is another concurrency problem—you can deadlock them, afaik that's why Swift implementation is reentrant.

Ah, would be glad to have better tco control in Swift.

vns · June 30, 2025, 5:31pm

BTW, why Mutex isn’t fit for you here? Given that you guarantee to access it from one place at time, and process large amounts of data, overhead from using mutex should be negligible. I agree that given your constraints it is a bit redundant, but I don’t see harm either.

Christopher_Kornher · June 30, 2025, 6:32pm

I am certainly not an expert, but I am not aware of any general-purpose language that guarantees high-level data race safety. I would like to see an example before we go down that road. There are many issues remaining around the ergonomics of structured concurrency including disconnecting as brought-up here. At best, this would add another, tougher constraint to the design of the language. It would be an interesting research project, but as a goal, even in the long term, it does not seem wise.

rayx · July 1, 2025, 8:55am

tera:

actor State {
    var state = 0
    
    func update(using transformer: Transformer) async {
        await state = transformer.transform(state)
    }
}

actor Transformer {
    func transform(_ value: Int) async -> Int {
        await someLogStatement()
        return value + 1
    }
}

I agree with @Andropov that actor property state might be overwritten with stale value returned by transform(_:) isn't necessarily a data race . It depends on application requirement and compiler has no knowledge to determine it. I believe if one translates the code to Erlang it would have the same issue. However, it's easy in Erlang to wait for specific messages, which effectively blocks all other messges. In Swift it might be like one of the following, I think:

func transform(_ value: Int) async(blocking) -> Int { 
    // All async function calls in the body are blocking.
}

or

// All async function calls in the expression are blocking.
await(blocking) state = transformer.transform(state)

This doesn't need to block the current thread. It just blocks the actor, which means saving all asyn frames in heap and putting further function calls on this actor in a internal queue until the actor receives return value of transform(_:). I understand this probably breaks Swift concurreny's forward progress contract, but people have put a lot of effort (e.g. internal semaphore or queue) to achieve this in practice even when Swift doesn't support it.

EDIT: @tera if we all agree your example is a race condition, then I think the rule to detect it is simple: if an async function takes an actor's property as its parameter and updates actor's property with its return value, it's race condition.