That, if I remember correctly, is one of the reasons why Swift doesn't have auto-synthesized async setters for actor properties
It is possible to write code with the same issues as the dispatch queue version, but the ergonomics of actors nudge you in the right direction. Here you had to write your own getters/setters, which feels 'alien' enough in Swift to hopefully make you reflect a bit on the code you're writing,
Note that the actual code could be more complicated than just:
await state.set(state.get() + 1)
to justify having get and set separately, for example:
await state.set(transformer.transform(state.get()))
However, even if you repackage the code to not have those get/set explicitly:
func update(_ update: (Int) async -> Int) async {
await someLogStatement()
await state = update(state)
await someLogStatement()
}
with this usage:
await state.update(transformer.transform)
the underlying high-level data race is still there:
Full illustrating data-race example without getters / setters
actor State {
var state = 0
func update(_ update: (Int) async -> Int) async {
await someLogStatement()
await state = update(state)
await someLogStatement()
}
}
actor Transformer {
func transform(_ value: Int) async -> Int {
await someLogStatement()
return value + 1
}
}
let state = State()
let transformer = Transformer()
Task {
// data race
await state.update(transformer.transform)
}
Task {
// data race
await state.update(transformer.transform)
}
Task {
try await Task.sleep(nanoseconds: 100_000_000)
let result = await state.state
print("resulting value is \(result)")
}
func someLogStatement() async {
try! await Task.sleep(nanoseconds: .random(in: 0 ..< 1000_000))
}
RunLoop.main.run(until: .distantFuture)
"which feels 'alien' enough in Swift" ... "to hopefully make you reflect a bit"
Frankly I don't think this is enough... It should be a much stronger nudge... like a compile time error / warning or at least runtime trap / crash.
Ideally it should be: "if it compiles - it works correctly". In other words the code that contains data races should be impossible to write, be them low-level or high-level data-races or anything else.
Maybe weight of await got lost over time in comprehension, but these are crucial points and the major and important difference from example with queues.
Lots of await-s is what signals you to be careful with the logic and drives the attention to. They are used too liberally, like they mean nothing, while thatâs markers for the developer.
For example, this instantly makes me question why simple state update inside an actor requires that many async calls? I can suspect logic errors here almost immediately.
To be fair, to be suspicious about such places you have to be bitten by reentrancy 1-2 times, but then you learn quickly that every await adds up to a complexity and start reduce and avoid them.
But see how many tweaks and complications you have to make consciously to reach this point. Within each one considered as anti-pattern for actors.
In contrast, with queues version it is much harder to suspect that there is something wrong ever.
Tbh you'll still need await logging for locks and then it's just easier to write an actor with some checking for reentrancy inside.
Also, as a side note think for actors it's better to pass another actor instead of a closure, e.g.
actor State {
var state = 0
func update(using transformer: Transformer) async {
await someLogStatement()
await state = transformer.transform(state)
await someLogStatement()
}
}
actor Transformer {
func transform(_ value: Int) async -> Int {
await someLogStatement()
return value + 1
}
}
Returning back to the original questionâis the problem only in big chunks making copies? What if we can split data into smaller chunks (events, diffs, change set, ...) and compose it in AppState afterwards?
How many is "that many"? Is just one - too many yet?
func update(using transformer: Transformer) async {
await state = transformer.transform(state)
}
Note, that this still data-races...
A simplified version of a data-race
actor State {
var state = 0
func update(using transformer: Transformer) async {
await state = transformer.transform(state)
}
}
actor Transformer {
func transform(_ value: Int) async -> Int {
await someLogStatement()
return value + 1
}
}
let state = State()
let transformer = Transformer()
Task {
// data race
await state.update(using: transformer)
}
Task {
// data race
await state.update(using: transformer)
}
Task {
try await Task.sleep(nanoseconds: 100_000_000)
let result = await state.state
print("resulting value is \(result)")
}
func someLogStatement() async {
try! await Task.sleep(nanoseconds: 0)
}
RunLoop.main.run(until: .distantFuture)
await here is unavoidable if transformer is another actor. So, where is the anti-pattern here? That I merely split the transformer into another actor? And more importantly, could compiler warn me about this?
YMMV, but for me, the presence of low-level data races (which "occasionally" manifest as crashes or runtime traps) is what actually nudges me to address those issues. Paradoxically, eliminating low-level data races can be a disservice. The mere presence of await markers is too subtle: I could spend days reviewing a PR with a handful of await s in it and still miss high-level data races like the one above. Bear in mind, the example above is very simple compared to what I've witnessed in practice.
IMHO not an anti-pattern particular, but await and throws are basically effects in Swift, and should be considered thoughtfully when used.
Anti-patterns are in previous examples: getters/setters and update closure. Still, every await is a critical point in the program and requires attention.
Clearly there are people capable of doing so by merely looking at code at review, but quite often thatâs just random crashes with unclear stack trace, which leaves you with guessing the nature of this low-level issues. When you left with high-level issues only, you can now inspect critical points in the app.
Yes, but now it is much simpler to reason about: you have all effects of the mutating state in one place. Instead of being scattered across the codebase. Local reasoning is much simpler to analyze in the end.
But that's at least somewhat better compared to all green tick marks during the compilation and then the app not working correctly sometimes?
Honestly, would you question that line quoted above during a PR review (with, say, 30 other changes) and you didn't know upfront there's a high-level data race there? If so what would be your PR review comment for that line?
And how would you fix it?
I'd also like to find some common ground. Could we agree that "if it compiles - it's free of high-level data races" is a great thing to have at the end of the day? (Keeping aside for a moment the question of whether this is possible to achieve short-term or not). Or do you think that manually auditing await points is enough?
Returning back to the original questionâis the problem only in big chunks making copies? What if we can split data into smaller chunks (events, diffs, change set, ...) and compose it in AppState afterwards?
That means maintaining two copies of data and exchanging diff messages to sync it?
Honestly, would you question that line quoted above during a PR review (with, say, 30 other changes) and you didn't know upfront there's a high-level data race there? If so what would be your PR review comment for that line?
And how would you fix it?
There are higher chances that I would question that line compared to queue at least.
First of all I wonder why there is the need of two actors and need of actors at all. I get that this is an example code, nevertheless thatâs first questions I ask in such cases (myself or others).
Then, if the use of actors or async calls is unavoidable there, the next question would depend on the nature of the state processing. If that is indented to be a sequential change of a state in a queue manner, then there is obviously need for internal queue of some sort. Also, decompose use of actor value of state from async points to get clear picture of the state Iâm working with and how suspension points affect it: with value type it can be local variable, with reference types Iâd add checks in-between await-s (if copy of such object too expensive or impossible due to some reasons).
Could we agree that "if it compiles - it's free of high-level data races" is a great thing to have at the end of the day?
Thatâs a great thing, I just donât believe it is possible in programming languages to reach that point ever â not only in short-term. Iâm not sure it is possible even if you wonât have shared mutable state, and appropriate functional language.
Or do you think that manually auditing await points is enough?
I think it is much better than we have had before. I guess there is the room for improvement, but not that it will be impossible to write incorrect concurrent code.
I'd also like to find some common ground. Could we agree that "if it compiles - it's free of high-level data races" is a great thing to have at the end of the day? (Keeping aside for a moment the question of whether this is possible to achieve short-term or not). Or do you think that manually auditing await points is enough?
I don't think it's possible for the compiler to remove these high level data races, short term or not. There's nothing inherently wrong with reading a value asynchronously, modifying it in some way, and writing it back asynchronously.
Of course, if you want that operation to be an atomic update, and go on to write code assuming that method will update a value atomically, then the assumption will be broken an your code will perform unexpectedly. But how would the compiler know that you assumed such behavior if you never wrote that invariant down anywhere?[1]
Ultimately it's up to the programmer to write an API with the right transactionality, as what is "valid" or not will depend on domain knowledge about the code.
This invariant can be written quite elegantly in Swift: an "atomic" operation on an actor's protected state can be written as a synchronous function in that actor. And now the compiler will check that invariant for you. â©ïž
There are higher chances that I would question that line compared to queue at least.
First of all I wonder why there is the need of two actors and need of actors at all. I get that this is an example code, nevertheless thatâs first questions I ask in such cases (myself or others).
Then, if the use of actors or async calls is unavoidable there, the next question would depend on the nature of the state processing. If that is indented to be a sequential change of a state in a queue manner, then there is obviously need for internal queue of some sort. Also, decompose use of actor value of
statefrom async points to get clear picture of the state Iâm working with and how suspension points affect it: with value type it can be local variable, with reference types Iâd add checks in-betweenawait-s (if copy of such object too expensive or impossible due to some reasons).
Great stuff. I'd just like to see those questions, reasons and suggestions in a form of compile time errors and warnings (with a possibility to opt-out) (â ).
Example:
func update1(using transform: (Int) async -> Int) async {
let stateCopy = state
let newState = await transform(stateCopy)
state = newState
// actually... same warning as below.
}
func update2(state: inout Int, using transform: (Int) async -> Int) async {
state = await transform(state)
// đ¶ "Warning: update of state could be lost!, if you know what you are doing to opt-out of this warning do this and that, otherwise fix your code".
}
Thatâs a great thing, I just donât believe it is possible in programming languages to reach that point ever â not only in short-term. Iâm not sure it is possible even if you wonât have shared mutable state, and appropriate functional language.
I don't think it's possible for the compiler to remove these high level data races, short term or not.
I don't think high level data races are possible in pure functional programming languages that do not have shared mutable state. I'm not saying this would be the best approach (or better than (â )), but one of the approaches to make a subset of Swift safe in those regards would be to limit the safe portion of the language to be pure functional (e.g. have "safe" coloured pure functions that could only call other safe functions).
But how would the compiler know that you assumed such behavior if you never wrote that invariant down anywhere?
We could use exactly the same line of reasoning for the version above where the variables were protected with queue.sync... It's only fair to highlight that Swift actors are not immune to high-level races when we dismiss the above queue-synced version on the grounds that it is prone to having high-level races.
I don't think high level data races are possible in pure functional programming languages that do not have shared mutable state.
Actors are essentially shared mutable state any way you slice it. Once you can have a âcounterâ actor that responds to get/set messages to update an integer value internal to the actor, you can have a high level data race, and there isnât much a type system can do to avoid it, except for limiting the actor model itself in some way.
This doesnât require shared mutable state at the language level, because you can imagine implementing a an actor as a recursive function that calls a blocking receive() primitive, processes the message, and then recursively calls itself again with updated âstateâ.
Returning back to the original questionâis the problem only in big chunks making copies? What if we can split data into smaller chunks (events, diffs, change set, ...) and compose it in AppState afterwards?
The core problem is having a large amount of non-Sendable data that is generated in a time-consuming operation off the main thread that later needs to be viewed and edited in the UI (which runs on the main thread) and then later still processed some more in another time-consuming operation off the main thread, and so on.
For small amounts of data, this it's no big deal to make (Sendable) copies when passing between actors. But for large amounts of data, making copies can eat up a lot of memory, and trying not to make copies (e.g., by accessing the data asynchronously while it stays inside a non-main actor) can be tedious, risky (data races!) and doesn't mesh well with what SwiftUI (and AppKit, for that matter) expects when building UIs.
Yes, you can also make (Sendable) copies of smaller subsets of the data to try to avoid ever having a complete 2x RAM footprint for your data, but that's just adding complexity and the chance for bugs. (It's also what I'm currently doing in my real app, so I know it "works." But it seems like a cleaner solution should be possibleâŠ)
Actors are essentially shared mutable state any way you slice it. Once you can have a âcounterâ actor that responds to get/set messages to update an integer value internal to the actor, you can have a high level data race, and there isnât much a type system can do to avoid it, except for limiting the actor model itself in some way.
Yep, I am thinking along the lines of some limitation. Would be semi-equivalent to an error/warning in this synchronous and otherwise correct code:
var x: Int
x = 1
x = 2 // error/warning wanted: the previous update was lost
print(x)
This doesnât require shared mutable state at the language level, because you can imagine implementing a an actor as a recursive function that calls a blocking receive() primitive, processes the message, and then recursively calls itself again with updated âstateâ.
But that would just stack overflow eventually with enough updates?
But that would just stack overflow eventually with enough updates?
Functional languages typically have guaranteed tail call optimization.
Functional languages typically have guaranteed tail call optimization.
But how to replicate the above high-level race in that case? Is that possible?
That means maintaining two copies of data and exchanging diff messages to sync it?
It's up to implementation, could be in one place I guess. ![]()
Yes, you can also make (Sendable) copies of smaller subsets of the data to try to avoid ever having a complete 2x RAM footprint for your data, but that's just adding complexity and the chance for bugs. (It's also what I'm currently doing in my real app, so I know it "works." But it seems like a cleaner solution should be possibleâŠ)
Ah, got it. AFAIK big message also a problem for other actor implementations (and having a dedicated store is one of the solutions), but would agree would be nice to have some way for local actors not to copy big chunks if there are some guarantees.
I don't think high level data races are possible in pure functional programming languages that do not have shared mutable state.
From my very little experience with Haskell you anyway land with MVar or some other abstractions when touching concurrency. Btw in Erlang process is basically something like thatâa recursive loop that pass it's own non-shared state in-between (of course with some BEAM magic on top). As I remember something like (haven't touched in a while):
loop(Cache) ->
receive
{From, N} ->
{Result, NewCache} = fib(N, Cache),
From ! {self(), Result},
loop(NewCache);
stop ->
ok
end.
Still while processes are free from data races there is another concurrency problemâyou can deadlock them, afaik that's why Swift implementation is reentrant.
Functional languages typically have guaranteed tail call optimization.
Ah, would be glad to have better tco control in Swift. ![]()
BTW, why Mutex isnât fit for you here? Given that you guarantee to access it from one place at time, and process large amounts of data, overhead from using mutex should be negligible. I agree that given your constraints it is a bit redundant, but I donât see harm either.
I'd also like to find some common ground. Could we agree that "if it compiles - it's free of high-level data races" is a great thing to have at the end of the day? (Keeping aside for a moment the question of whether this is possible to achieve short-term or not). Or do you think that manually auditing await points is enough?
I am certainly not an expert, but I am not aware of any general-purpose language that guarantees high-level data race safety. I would like to see an example before we go down that road. There are many issues remaining around the ergonomics of structured concurrency including disconnecting as brought-up here. At best, this would add another, tougher constraint to the design of the language. It would be an interesting research project, but as a goal, even in the long term, it does not seem wise.
actor State { var state = 0 func update(using transformer: Transformer) async { await state = transformer.transform(state) } } actor Transformer { func transform(_ value: Int) async -> Int { await someLogStatement() return value + 1 } }
I agree with @Andropov that actor property state might be overwritten with stale value returned by transform(_:) isn't necessarily a data race . It depends on application requirement and compiler has no knowledge to determine it. I believe if one translates the code to Erlang it would have the same issue. However, it's easy in Erlang to wait for specific messages, which effectively blocks all other messges. In Swift it might be like one of the following, I think:
func transform(_ value: Int) async(blocking) -> Int {
// All async function calls in the body are blocking.
}
or
// All async function calls in the expression are blocking.
await(blocking) state = transformer.transform(state)
This doesn't need to block the current thread. It just blocks the actor, which means saving all asyn frames in heap and putting further function calls on this actor in a internal queue until the actor receives return value of transform(_:). I understand this probably breaks Swift concurreny's forward progress contract, but people have put a lot of effort (e.g. internal semaphore or queue) to achieve this in practice even when Swift doesn't support it.
EDIT: @tera if we all agree your example is a race condition, then I think the rule to detect it is simple: if an async function takes an actor's property as its parameter and updates actor's property with its return value, it's race condition.