Actor Races

In-between the TaskQueue and reentrant Actor, isn't there some room for non-reentrant actors ?

It looks to me that non-reentrant actors would make it trivial to implement a taskqueue, while only providing a taskqueue would make people like me bypasse the actor concept entirely, and implement a poor-man version of non-reentrant actor using TaskQueues (which would be a bit of shame ?)...

One of the biggest issues I personally have with the actor model is its non-deterministic functionality. Enqueueing something into the TaskQueue has zero guarantees which addOperation method wins in a case where multiple calls happened at the same time. While two tasks could await the chance to enqueue an operation, there is no guarantee that the task that came first will win. We really need some strong guarantees for a FIFO like TaskQueue here.

4 Likes

One way to look at a Swift actor is that it’s a class-like thing with an implicit mutex around its operations. In this view, an “atomic” or “non-reentrant” actor method is equivalent to taking a mutex in a method that makes an async call, and releasing it in a callback¹… which is the sort of thing that sends experienced thread wranglers running for a panic button.

It would be preferable if we could find an idiom for these use cases that isn’t a differently-shaped foot cannon.

¹ With the distinction that it doesn’t matter if the continuation happens on a different thread

3 Likes

I assume the design documents about Actors already went through all the tradeofs involved in the various flavors of actor systems, but although i'm absolutely not an expert, i had the feeling non-reentrant actors did exist in other systems (either by locking, or implementing some sort of mailbox for buffering calls) and made local actor state invariants much much easier to reason about.

EDIT: i may have read your message wrong. You may have been talking about how a swift-specific implementation of non-reentrant actors would look like for the current state of the langage, in its most straightforward solution. In which case you may be 100% right as i have no idea how the current system is implemented under the hood.

This is true, but it comes at the cost of making non-local interactions harder to reason about, hence the foot cannon description above. I'm glad we're discussing improving this situation but I agree with @jayton that we should be wary of just trading one problem for another.

2 Likes

I see what you mean, however one could easily argue that, at least in the general case, it is impossible to have a good global understanding of a system whose component's local state is hard to figure out: If each actor's local state is hard to make sense of because of reentrency, then how would one expect to make sense of a system with tens or hundreds of actors ?

1 Like

Indeed, the "right answer" is not especially obvious, since all the obvious ones have clear tradeoffs. I'm very curious to see what ends up being designed for this.

To be perfectly honest, I don't think there's any design which is going to make this easy. Programmers working in concurrent systems need to learn to think transactionally, which is already a stretch, because there are inevitably going to be ways to compose transactions together non-transactionally. Swift can
stop you from writing await actor.foo = actor.foo + 1, and maybe it can advise you that await actor.setFoo(actor.foo + 1) still looks really questionable, but it can't actually force you to add a proper incrementFoo() method or whatever makes sense transactionally for your situation, and that's always going to be the biggest problem here.

Composing actor operations by preventing interleaving during async actor functions seems like it's just trading one problem for another, because now every async actor operation is a potentially unbounded serialization point, and if scalability is important, that can easily be just as wrong as the potential non-atomicity, and you have to chase all of those points down and rework the code. I've seen so many towers of awful workarounds based on recursive locks and careful unlock/relock regimens. Eventually you're back in the situation that Swift puts you in: calls to peer actor operations have to be treated as re-entrant because they might need to unblock actor progress — either they're written that way currently or they will be in the future.

I agree with Adrian's point that the more pressing issue is not having a way to maintain FIFO-ness except by basically switching to channels with AsyncSequence.

19 Likes

Well, that’s basically all mutable properties touched by a task, since the compiler doesn’t know whether some piece of data is shared or not.

In my opinion, actors will be very common in most applications. I think it is rare that you can just give immutable input to a task and let it work in its own little world without it needing to coordinate with some state on the outside.

1 Like

Not necessarily. My vote would be that this is opt-in (hence my use of an "atomic" keyword). By default, it would work just as now. A developer could choose to make very specific operations opt-in. Yes, they would have to understand the repercussions of that. It would be nice if Swift could detect a case of reentrance at run time, and at least give an exception for that.

Doesn't this have the same issues? If a running task submits to the same queue, presumably it will deadlock. In this sense, a queue is exactly the same as an actor.

1 Like

I wonder if a "problem" (once again, absolutely not an expert here, so take no offense) of the current design isn't that the communication medium between two actors isn't represented anywhere.

The tradeoff between blocking vs nonblocking reminds me of channels in go. Whenever establishing a channel between two sequential processes, one can decide whether the call will block or not, and if not how large is the buffer storing the calls allowed to grow. i know go is using CSP and not actors per say, but it makes me think whether @atomic vs reentrant vs, etc wouldn't rather be properties of the layer "under" actors, and if, once that layer becomes visible, it wouldn't open the path to even more interesting customisation properties.

QFT:

That’s it. That’s the whole game right there.

I too see the appeal of having a TaskQueue-like something-or-other in the standard library. The thing is, the generally non-blocking nature of Swift’s concurrency model is a huge and hard-won strength. Task queues and atomic methods are both tempting abstractions, but they reintroduce all the problems of thread pool exhaustion and performance-degrading blocking and deadlock that Swift’s concurrency model is assiduously designed to avoid.

I agree with John: the problem here isn’t Swift’s concurrency implementation, or a lack of features; the problem is that coarse atomicity is a fundamentally problematic approach to concurrency. The problem isn’t in the language or the library; it’s in our heads. Transactional thinking is hard. (I suspect that anybody who takes on this problem is going to find themselves reinventing the last 50 years of work on databases.)

I would be in favor of Swift finding an ecosystem of language features + library features + idioms — the last of those probably being the most important — that encourage transactional programming patterns without going “full relational DB.”

One of the problems I see in the current model is that there are potential transaction boundaries at every await, and those are hard to spot. I proposed a couple of ideas about that upthread. (I’d be curious in particular to hear thoughts on whether (1) in that message is a dead end.) Neither of those ideas solves the larger transactionality problems John talks about in his message, but they might at least help surface the problems.

8 Likes

Coming back to this concrete example:

I'd like to suggest (again, but in more detail this time) that the problem with this is not limited to the concurrency in the system - i.e. that the problem is not that you made two requests, but they ran out-of-order due to task scheduling. The problem expressed by this example is inherent to most networking operations; perhaps the user is in the middle of a handover between the cellular and WiFi networks, or some data needs to be retransmitted due to packet loss - even if you place scheduling restrictions on each task, so requests are guaranteed to be made in the order [A, B], the responses may arrive in the order [B, A], and so your database overwrites fresher data with stale data.

Serial queueing is the most conservative, most extreme attempt to solve this problem. It performs one network request at a time, and waits for its response to arrive before making another request, so there is no risk of responses arriving out of order. However, it also goes against all advice I've ever seen related to making network requests - it means more load for servers, as each client's connection lasts longer or each client keeps disconnecting and reconnecting, and it is difficult to take advantage of multiplexed protocols such as HTTP2/3; for mobile devices, it means keeping power-hungry radios alive or at higher power states for longer. It's also incredibly slow - imagine if when you visited a website, every resource on the page was loaded in a serial queue. Page load times would be orders of magnitude slower.

But at least the data evolves in a predictable way now, right? Ah, well... I said that with a serial queue, responses arrive in order. That is not the same as saying there is no risk of overwriting fresh data with stale data. Even if the system makes requests in the order [A, B], and even if the responses arrive in that order [A, B], a later response might still not necessarily give you a more recent version of the data, due to effects such as caching by the remote systems which process the request - in other words, even if you serialise everything on the client side, you still might overwrite fresh data with stale data. Even if we're willing to pay the performance and power costs for all of that, we still can't guarantee order.

So even serialisation cannot entirely solve this problem.


I get that this is just an example, but what I'm trying to show here is that for many kinds of systems, trying to impose order locally is just a losing battle. In my experience, it is better to accept the system as it is rather than fight it.

So how do you deal with data that can arrive out-of-order, or where you might unexpectedly see past values? IMO, the best place to tackle that is at the data model layer.

The freshness of the data isn't a property of how or when you made the request - in general they are completely unrelated. It's a kind of metadata, and the information you have and how you process that tends to be highly application-specific.

9 Likes

The problems I have with this argument are two fold:

  1. I have been building systems from asynchronous operation queues for years. They work great, and are extremely stable and predictable. If parts of the sequence are better handled in parallel, you can of course do those bits concurrently for performance. But the overall guiding process is still serial.
  2. I'm honestly wondering what the purpose of actors is at all, if they can't give any guarantees about shared data validity. I had thought that was one of the main purposes of actors. Otherwise, we could just use async/await functions in a class. If you just copy things into local variables anyway, I'm not really seeing the difference.
7 Likes

Actors guarantee that partial tasks (i.e. the regions between awaits/returns) on the actor happen sequentially with respect to each other. Classes make no such guarantee.

You are perhaps confusing data races with data validity. A “data race” refers to two parallel operations accessing the same data at the same time in ways that cause undefined behavior. Unless you use the explicitly unsafe operations, Swift 6 eliminates data races at compile time: the places where one execution path can see another’s work are always well-defined. That would not be possible with the approach you describe.

However, “well-defined” does not mean “logically correct with respect to the design goals of the software.”


An analogy: Swift also offers memory safety. The language promises that you can’t cause buffer overruns and dangling pointers (unless, again, you use explicitly unsafe operations). This does not mean, however, that any algorithm you implement with arrays or reference types is guaranteed to give correct results. It simply means that the behavior of the abstractions you’re using is well-defined.

10 Likes

Classes make no such guarantee.
An async function works exactly the same in a class as an actor. They use awaits in the same way. And given the advice to make local copies of the data, they work almost the same.

It is true that you have to be very careful when you make the copies, and when you return the result to update the shared data. You do need something to guarantee there is no race at those moments. Most people just use the main thread.

From my perspective, given the advice is to copy any shared data before making changes inside an operation, to avoid races, I am seeing very little utility. Avoiding the types of races I mentioned above is very challenging without any form of high level serialization. Interleaving is just as difficult to deal with as multithreading.

1 Like

Generalize that to assign different subsets of the data to separate “main threads,” and poof! you have actors.

Actors aren’t “you don’t have to think anymore” magic. They’re just codifying exactly the common sorts of patterns you are describing, so that you don’t have to keep rolling your own.

And the language is adding a layer of safety: when you say “you have to be very careful when you make the copies, and when you return the result to update the shared data,” well, thanks to Sendable and Swift’s static association of specific variables to specific concurrency islands, Swift now takes care of part of that carefulness for you! Same patterns, more or less, but less burden.

But it’s a lot easier to deal with when you’re not building on top of footgun abstractions! This is akin to arguing that we might as well code in C, with its raw memory access, because “algorithm correctness is just as difficult to deal with as manual memory safety.” I mean, that’s true, but wouldn’t you rather just have one of those two difficult problems instead of both at once?

“The lava is perfectly safe, just don’t fall in it” is never a software principle that holds up well.

5 Likes

Enqueueing... I agree. This has been a big problem for me and my usages of Actors.

2 Likes

Insofar I've been running into this, I think I can summarize the problem as follows:

I want my serial calls to an actor to start running in the actor in the same order I made them from that serial source.

If the execution of those operations later interleaves that's an actor implementation problem, and one that should be solvable within the confines of the actor. The problem is that when I go from one serial context to another I don't have an ordering guarantee.

3 Likes

That's what I take from reading this thread, too.

If a programmer uses await calls within an actor, then it's the programmers responsibility to ensure they don't create race conditions.

In the case where an Actor doesn't have any async functions/spawned Tasks, its contained state is easy to locally reason about as there is guaranteed to only be one concurrent operation occurring on the actor at any one time. That seems fair.

But in the case where we'd like to use an Actor to protect some resource where order of operations must be executed serially, like a database, we're stuck.

Intuitively, one might think that we could get around this by using a database that had blocking equivalents for its async calls – that way preventing any further Task being executed on the Actor until the current database operation completes. Blocking the task. That could work.

Except, the Swift team states that a Swift Task is not supposed to block (async worker threads must always make forward progress).

Is that really the case? So, if we can't block on a Task, how do we do this?

It seems we shouldn't use Actors/Tasks at all for accessing something like database. Is that the recommendation?

3 Likes