Actors 101

ibex10 · August 10, 2024, 2:51am

Please share here in this topic what you think is essential to know for employing actors wisely.

Here is one, transactionality, which I discovered today, posted by @nkbelov.

I will update this topic whenever I discover more.

Here is more...

Check assumptions after an await

Remember that await is a potential suspension point.
If your code gets suspended, the program and world will move on before your code gets resumed.

Any assumptions you've made about global state, clocks, timers, or your actor will need to be checked after the await.

Details

From: Protect mutable state with Swift actors - wwdc2021

Synchronous code on the actor always runs to completion without being interrupted.
So we can reason about synchronous code sequentially, without needing to consider the effects of concurrency on our actor state.
We have stressed that our synchronous code runs uninterrupted, but actors often interact with each other or with other asynchronous code in the system.

Let's take a few minutes to talk about asynchronous code and actors.

But first, we need a better example.

Here we are building an image downloader actor.
It is responsible for downloading an image from another service.
It also stores downloaded images in a cache to avoid downloading the same image multiple times.

The logical flow is straightforward: check the cache, download the image, then record the image in the cache before returning.
Because we are in an actor, this code is free from low-level data races; any number of images can be downloaded concurrently.

The actor's synchronization mechanisms guarantee that only one task can execute code that accesses the cache instance property at a time, so there is no way that the cache can be corrupted.

That said, the await keyword here is communicating something very important.
Whenever an await occurs, it means that the function can be suspended at this point.
It gives up its CPU so other code in the program can execute, which affects the overall program state.
At the point where your function resumes, the overall program state will have changed.
It is important to ensure that you haven't made assumptions about that state prior to the await that may not hold after the await.

Imagine we have two different concurrent tasks trying to fetch the same image at the same time.
The first sees that there is no cache entry, proceeds to start downloading the image from the server, and then gets suspended because the download will take a while.

While the first task is downloading the image, a new image might be deployed to the server under the same URL.
Now, a second concurrent task tries to fetch the image under that URL.
It also sees no cache entry because the first download has not finished yet, then starts a second download of the image.
It also gets suspended while its download completes.
After a while, one of the downloads -- let's assume it's the first -- will complete and its task will resume execution on the actor.
It populates the cache and returns the resulting image of a cat.

Now the second task has its download complete, so it wakes up.
It overwrites the same entry in the cache with the image of the sad cat that it got.
So even though the cache was already populated with an image, we now get a different image for the same URL.

That's a bit of a surprise.

We expected that once we cache an image, we always get that same image back for the same URL so our user interface remains consistent, at least until we go and manually clear out of the cache.
But here, the cached image changed unexpectedly.
We don't have any low-level data races, but because we carried assumptions about state across an await, we ended up with a potential bug.

The fix here is to check our assumptions after the await.
If there's already an entry in the cache when we resume, we keep that original version and throw away the new one.
A better solution would be to avoid redundant downloads entirely.
[See the code in the next section]

Actor reentrancy prevents deadlocks and guarantees forward progress, but it requires you to check your assumptions across each await.

To design well for reentrancy, perform mutation of actor state within synchronous code.
Ideally, do it within a synchronous function so all state changes are well-encapsulated.

State changes can involve temporarily putting our actor into an inconsistent state.
Make sure to restore consistency before an await.

And remember that await is a potential suspension point.
If your code gets suspended, the program and world will move on before your code gets resumed.

Any assumptions you've made about global state, clocks, timers, or your actor will need to be checked after the await.

Code

From: Protect mutable state with Swift actors - wwdc2021

One solution

actor ImageDownloader {
    private var cache: [URL: Image] = [:]

    func image(from url: URL) async throws -> Image? {
        if let cached = cache[url] {
            return cached
        }

        let image = try await downloadImage(from: url)

        // Replace the image only if it is still missing from the cache.
        cache[url] = cache[url, default: image]
        return cache[url]
    }
}

// Dummies
struct Image {}

func downloadImage (from url: URL) async throws -> Image {
    Image ()
}

A better solution

actor ImageDownloader {
    private enum CacheEntry {
        case inProgress (Task <Image, Error>)
        case ready (Image)
    }

    private var cache: [URL: CacheEntry] = [:]

    func image(from url: URL) async throws -> Image? {
        if let cached = cache[url] {
            switch cached {
            case .ready(let image):
                return image
            case .inProgress(let task):
                return try await task.value
            }
        }

        let task = Task {
            try await downloadImage(from: url)
        }

        cache[url] = .inProgress(task)

        do {
            let image = try await task.value
            cache[url] = .ready(image)
            return image
        } catch {
            cache[url] = nil
            throw error
        }
    }
}

// Dummies
struct Image {}

func downloadImage (from url: URL) async throws -> Image {
    Image ()
}

crontab · August 10, 2024, 7:49am

Start with a non-actor interface and see if it works for you.

Sometimes eliminating states reduces a certain functionality to a single entry point meaning that you don't need an actor, you just need a standalone async function. nkbelov's comment is a great illustration of that, i.e. if

func loadImage(at: URL, addingTags: [String], saveToDisk: Bool) async throws

is all you have then declare it as a static function and there will be no need for an actor altogether. In fact network calls backed by URLRequest are almost always purely functional and don't require any states or actors (unless you use local caching but that's a different story).

In general, the functional approach is a lot more concurrency-friendly, i.e. when you move your state to the stack.

I'm still not entirely sure if I know exactly when to use actors in client apps. At first glance you use them when you have a state that can be accessed from more than one execution thread.

Most of the time though you don't have additional execution threads in your client app unless either (1) you create them because you want something to happen in parallel with your UI, or (2) the OS forces you to, usually via hardware related API's, such as camera or audio.

For example I've been struggling with this - [Concurrency] Actors and Audio Units, now trying to rewrite my audio engine for Swift 5.10/6 and it's still not clear whether I should use actors and how. I'll probably come back to the forum with some new questions.

vns · August 10, 2024, 8:10am

Actors bring new programming model/paradigm to Swift, and that’s quite often missed. I’ve made the same wrong take first time, tried to avoid actors mostly. While in fact you probably want to try the opposite. Actors model is really similar to OOP in the way that it treats everything as actor, akin to everything is object. So with actors you actually model everything as actor or part of one — there are mostly no opt-outs from that.

If you think how Swift Concurrency is designed, most of your code is part of an actor, because in order to make asynchronous work, you have to be isolated (or sendable, but even in that case more likely the code will eventually run inside an actor). Which makes use of actors in concurrent code just inevitable: even if you’ve made your types thread-safe, say, using mutex, it will still be isolated on an actor.

The same way it makes a little sense to not use objects in OO-languages, not using actors in Actors model is also would’ve been odd. Currently I try to make more use of actors and global actors: the latter allow you to design subsystems isolated to the same global actor, but spread through a number of types, that logically belong to this subsystem you isolating.

crontab · August 10, 2024, 8:18am

That's exactly what I call an actor trap. You may easily end up with code that sends messages instead of directly calling functions with no benefit whatsoever and a performance penalty too. Actors are only needed where there's true parallelism, you don't want to inject phony parallelism where there's none.

vns · August 10, 2024, 8:49am

Actors are a model for concurrency, not just parallelism. That’s an extreme narrowing of a concept.

crontab · August 10, 2024, 8:50am

What is concurrency if not a synonym of parallelism?

vns · August 10, 2024, 8:52am

Parallelism is a subset of a concurrency. Concurrent execution isn’t necessarily parallel.

crontab · August 10, 2024, 9:49am

Some examples would be great because I don't understand what you are saying.

vns · August 10, 2024, 10:08am

Simplest illustrative example is single-core environments: it is possible to have concurrency there, yet no job would ever execute in parallel, because there is not enough resources to run all scheduled jobs simultaneously. In a more broad sense, system can have hundreds of tasks scheduled for execution, and even make progress within each of these hundreds, but not for all of them at the same time, making executing of them concurrent, but not parallel.

I like this visual illustration, while it is not being 100% accurate on all the nuances that can be in real system, general notion of what's happening is pretty good:

Concurrency vs Parallelism

crontab · August 10, 2024, 10:15am

I understand that, although I'd argue that from the high-level perspective you usually don't care how many cores you have, the underlying system will take care of efficiently using them (or not).

But we digress from your main thesis that everything can be (should be?) an actor the same way as everything can be an object in OOP. In my opinion, it's a trap in which you may end up with inefficient implementations where simple calls become async ones for no good reason. In my original comment above I was arguing that true concurrency emerges only where you create tasks that can be or should be executed in parallel with others, or enforced by the OS when dealing with hardware.

vns · August 10, 2024, 12:24pm

But that’s what actors model all about. It exactly states that “everything is an actor”. Yes, you still need to consider how it might affect overall execution flow and whether there would be unnecessary jumps back and forth between isolation domains, if we talk about Swift, yet this is a design detail on how to use this model efficiently.

Asynchronous calls between actors not necessarily ineffective. There can be domains in which actors might prove itself inefficient, but that’s certain cases, not the general rule. They not necessarily add a significant overhead, especially if you don’t need nanoseconds-level performance (or even milliseconds).

On the other hand, designing in terms of actors and their isolation boundaries is often effective, both in terms of structure and performance.

I’m not sure what to understand under “true concurrency”, but unless you directly controls execution environment and schedule jobs in precise logic — meaning writing your own runtime with such rules enforcement, which is not the case for most of the concurrency systems in languages, you don’t have control in what manner a job is executed: it can be run in parallel, it can be run on the same thread, options are limitless. You can’t and shouldn’t tell the difference, since that’s an implementation detail.

crontab · August 10, 2024, 12:37pm

You do have certain control via await. You may have a program that runs everything serialized even though your code is sprinkled with async/await and actors, and looks like there's concurrency whereas there's none. Concurrency is always hard, and this is why I'm strongly against "actors are like objects in OOP, use them whenever you can". But I already repeat myself.

jaleel · August 11, 2024, 3:46pm

This is my personal opinion, and could be wrong in some aspects, so please write if something is off. But think in order to understand actors, it's better look a bit into history, and it's quite important for summary in the end. It's not enough to think just from client or single machine view, but Better to have perspective from different angles.

Overall actors are hard to crack sometimes just because of two simple things—we're:

used to deterministic nature of imperative programming, when every statement guarantees outcome;
forgetting what actual computation is.

For the first point in reality you could accomplish this guarantees only on single machine with one thread [1]. Adding one more thread already gives you a headache with data racing, locks and etc. Because, going to second point: computation is simply a state change. It's just hidden for you in imperative languages, as every statement could change some state without explicitly saying you that [2]. So in order to have multiple parallel statements with some guarantees, you need to be careful with those state changes.

Why you need multiple threads, though? Topic of multitasking, concurrency and parallelism is actually quite old (rooted back to 50s), but it should be looked at from two perspectives:

People tried to improve performance and economy to run multiple tasks on single machine, e.g. Dijsktra was bothered with problems described earlier and pushed for structured programming and come up with the concept of semaphore.
There was a need for computer networks, and a way to communicate and execute some work between them. Message passing was one of the natural concurrency models in this systems.

I wouldn't touch first point, but will focus on second, as you can see it related to the topic. Computers still compute, but everything is distributed now—this gives lots of pain points. What if one computer fails? Network disconnection? Meanwhile you still need consistency and reliability. There were different approaches and models at that time, including objects and actor model, but IMHO we should discuss not Hewitt's definition of actor model, cause he was trying to create a model of a computation overall, but rather Erlang and its processes.

Erlang started as a research project at Ericsson labs to come up with reliable distributed system. Company had exact computer networks, but in the form of switches, and this system should be reliable to be able to handle massive number of calls. And for this system, rather than focusing on guarantees, they focused on what exactly is not guaranteed:

One node is not enough.
Node can fail.
Message can fail.

and etc.[3] So they've come up with processes—lightweight abstractions with isolated state, which can only be changed by messages. Team was influenced by things like Prolog and Smalltalk, and lately realised they've basically rediscovered actor model, as you can see, but rather than focusing on just state isolation, they've also focused on errors. Fault tolerance also helped them to build concurrency-first language, and they've later realised that concurrency and fault tolerance come together.

I want go into details, but for someone interested suggest to check Joe Armstrong's thesis [3] or some other resources related to Erlang and how they've achieved reliable systems with this approach.

Ok, this ideas work fine in the context of distributed system, but one can ask that with Swift we're usually building single iOS/macOS apps, so regular Dijkstra semaphores should work? And answer as usual—it depends. Yes, mutex and semaphores are helpful. Especially if your app is not fully async, I would actually suggest to use them first. But as discussed, actors are not only about data racing, but also about handling errors, especially in concurrency context.

Think it's not about what to start first with, or how to write functions, it's all about right mindset. As soon as you have concurrency, several states and changes, and you're already feeling that something could fail (basically having throw somewhere)—probably it's a good idea to introduce actors to wrap this logic. Language gives you a good isolation tools for that.

This is not something new, and reliability topic is actually emphasised in Swift Concurrency Manifesto[4]. Note that it's discussing reliable actor, which never landed, but we have now Distributed module, which I really suggest to check (especially with Cluster System[6])

So, long story short, think actors are great abstraction for isolating computation considering failures.

Now, talking about all actors vs only when needed, think @vns got a right insight about actors—in actor model everything should be an actor, as it was with objects in Smalltalk for example[7], so comparing with OOP is correct. On other hand Swift's implementation (and Erlang's) is more specific, and language is general purpose, so of course not everything will be an actor. But, when you already have actors, and you need additional logic—best solution in most cases is actually to add more actors. Remember was struggling with something in distributed actors, and @ktoso just suggest add more actors, which worked... well. Especially when you can easily combine distributed with regular actors.

This sequential machine/one thread model is important though, as it's basically the only way how we can model computation (basically Turing machine).
In this regards, contrary to imperative approach, it's interesting to learn Haskell with its State# RealWorld and I/O Monad. Gives a good insight about computation.
Making reliable distributed systems in the presence of software errors
Swift Concurrency Manifesto. Part 3: Reliability through fault isolation
Distributed | Apple Developer Documentation
GitHub - apple/swift-distributed-actors: Peer-to-peer cluster implementation for Swift Distributed Actors
Hewitt and Kay actually co-influenced each others ideas.

tclementdev · August 11, 2024, 3:59pm

I think it's not so much that everything should be an actor, but rather everything should be in an actor. There's no problem in having few actors (and I'd argue it's even better to limit the number of awaits in a program). Also note that the runtime cost of dealing with local actors is much higher than with distributed actors, so you'd probably want to adopt different strategies between the two.

jaleel · August 11, 2024, 10:10pm

I'm referring to model itself, and in academia it's literally "everything is an actor" there. Even in Swift Concurrency Manifesto it's stated as one of the challenges.

Not sure I got the message, tbh, can you expand? Jumping between distributed and local actors is quite a powerful Swift feature for distributed systems, imho.

tclementdev · August 11, 2024, 10:24pm

"everything is an actor" might make more sense for distributed actors where the runtime cost is quite low compared to networking cost/latency, whereas the cost is high compared to synchronous function calls. Networking is also inherently asynchronous, but actor reentrancy is much more of a problem for apps development where you might better benefit from fewer bigger actors.

vns · August 12, 2024, 7:00am

At the same time, for me it is odd to discourage use of actors in a concurrency system that is based on actors. More thoughtful state isolation, so that suspension points won't dominate in calls, and utilisation of global actors (they are great for building isolated subsystems IMO), can give much more benefits to the app. In client-side apps especially, I don't think that in most cases additional cost of actors has any impact on apps performance.

Note that I don't suggest artificially push for more actors without any reasonable goal or design, but just that introducing one more actor to the system shouldn't be discouraged or use of actors shouldn't advised to be limited.

tclementdev · August 12, 2024, 7:10am

I'm not sure what makes you think that because the concurrency system is based on actors then there should be no discussion about how many actors might be appropriate, and that too many is not a thing. You could have said the same about dispatch queues, and some people certainly advocated for using as many queues as people wanted, and we now know it was the wrong call. I'm talking from very practical observation, for example that a good number of people are already getting burned by reentrancy, let's be a bit careful here and not be blinded by the new shiny tech.

vns · August 12, 2024, 7:40am

There should be a discussion, just currently it sounds more like "don't use them except few places" with the alternative seen as "us tons of them". I advocate for neither of these options. I think we would be better off using actor much more liberally, and try to navigate better in this new paradigm, rather then restricting it for eventual use for cases we reason suitable from the previous experience in completely different model. Avoid the extremums in both directions. Still, if the concurrency model of the language is based on something, it should be one of the building blocks for the programs that utilize concurrency.

Yes, because there was a clear, observable problem about that. And I am not saying that abusing anything wouldn’t cause problems — more likely it would. We need a balance. But since to many in Swift (myself including) actors are a new concept in day-to-day programming, we will be better exploring them, not avoiding.

Reentrancy is a thing to be mindful of, but I’d argue that this is not a problem of actors, but a more general and fundamental issue of a concurrent code. One can bump into similar problem as actors reentrancy has in many other constructs and languages. We hit by it only because we have a vast experience using libdispatch and aware of common pitfalls there, and how to design to avoid them, while with new concurrency we have much less practical knowledge, and haven’t get used to its structural way (exactly why reentrancy puzzles many — we look at sequential code, but haven’t used to treat await as critical point yet).

filiplazov · August 12, 2024, 8:34am

From my experience so far you only need to create actors when you want to protect shared mutable states. There are many actors pre-created for you that you consume but are not aware of when doing awaits. The unit of concurrency is the task.