Overhead of using actors at scale?

We are working on a large scale app (>100 engineers), with many subsystems.
We are trying to think of a path to migrate from a mostly GCD-based architecture, to a Swift Concurrency architecture.

One thing we are unsure about, is whether there is any anticipated overhead from using a large amount of actors (say, 100s of them). This makes it hard to decide if we want to encourage engineers to create an actor, or if we want to introduce a few global actors that we use to annotate classes with.

/// Option 1
actor Blah { ... }
actor Foo { ... }
actor Biz { ... }

/// Option 2
@BlahSystemActor class Blah { ... }
@FooSystemActor class Foo { ... }
@BlahSystemActor class Biz { ... }

Are there any known pitfalls or recommendations we should be aware of?

1 Like

Should be fine. The system will still only create one thread per core, regardless of how many actors you have.

6 Likes

Elaborating a bit, one thing to consider is which things you actually want to be asynchronous. One thing I see a lot of people do is decide that because they're "supposed to"[1] use actors now, they should use actors for protecting even trivial operations, and then they find that they have lots of unnecessary overhead and it's convoluted to use. Simply protecting bits of state is often best done with a Mutex (or on older OSs, an OSAllocatedUnfairLock).

This has nothing to do with the number of actors though, more with how they're used.

Using Mutex/UnfairLock also has performance benefits over several of the common patterns for protecting state with GCD:

  • the very common "concurrent queue + sync for reads + barrier async for writes" approach typically takes several orders of magnitude longer than the operation it's protecting, so is actually much slower than simpler approaches. It only possibly makes sense if the work being done for writes is more than say, a millisecond, and even then I would encourage trying a simpler approach first and measuring carefully.
  • using a serial queue + sync is much better than the first bullet point, but still has two downsides vs Mutex. One is that Swift tends to copy blocks to the heap when bridging out to C APIs like libdispatch, and the other is that serial queues are FIFO, which forces unnecessary thread switches under contention.

TLDR: actors are great for the things you actually want to be async, but not being async is still valuable and appropriate in many situations


  1. I strongly encourage people to avoid doing things things simply because they believe they're "supposed to", but beyond that, the Swift team hasn't said that ↩︎

17 Likes

Without knowing how you were using GCD it's hard to make a recommendation for how to use Swift concurrency. Actors and GCD queues are not equivalent, and as @David_Smith alluded to, there are faster and simpler solutions if all you need is thread-safety. If you can provide some examples of your current usage we can make more concrete recommendations.

1 Like

Also, I don't think there is a big difference between "having many actors" and "having a few global ones, but then having many types annotated with them". At least, as said, the number of threads is not affected by that (unless you specifically give your global actors custom executors that manage their own threads somehow, I guess).

I'm not sure if it's a valid thing to say, but I have always considered a global actor annotated type as kind of a "poor man's actor" in itself. As I understand it, such a type, while staying its own "thing" uses the same concurrency mechanics the actor uses to isolate its state, so if that's the issue (context switches) that gets you performance-wise, it should be more or less the same as just having many actors? (Perhaps somebody more knowledgeable can confirm or deny my understanding here).

1 Like

Thanks everyone for the responses! This helps a lot :raising_hands:

We do the pattern that @David_Smith mentioned (concurrent w sync for reads and barrier for writes) and we hit all of deadlocks, starving the pool, and badf00d (the magic three!) so we're trying to find a better approach...

In terms of usage - sounds like the split is "keep some state thread safe" and "genuinely async work". We have a lot from both - and the genuinely async work will have to call into thread safe state a lot as well. If we adopt the recommendation of actors (global or unique) for async work, and locks for thread safe state - can that break the contract of "forward progress"? Or is it still ok since eventually the lock will be obtained no matter the size of the queue?

Locks don't generally break forward progress, and modern locks APIs, like Mutex, ensure the lock is minimally held by scoping access with a closure rather than explicit lock and unlock calls. Locks are only dangerous if you lock on one thread and unlock on another. My usual pattern is to put any state I need to be thread-safe in a struct stored within a lock container like Mutex.

struct MutableState {
  var one = ""
  var two = 0
}

let state = Mutex(MutableState())

state.withLock {
  $0.one = "a"
}

That way you ensure minimal lock time and update all of your state atomically, as withLock modifies the value with inout. This basic pattern can serve both sync and async APIs without the need for actors.

Personally, I only ever use actors around state that either never needs sync access (since doing so is awkward) or is itself synchronous but should be interacted with asynchronously, like the keychain.

2 Likes

The key thing for the contract is: no additional async work needs to begin running to unblock things (because if all the threads are blocked, there's nowhere for it to run). Locks don't violate that because the thread that holds the lock can just do its thing then unlock.

Holding a lock for a long time may be a performance issue ("keep critical sections short" has been standard advice for decades), but it won't be a correctness issue.

7 Likes

Keep in mind that both of these statements are actually only true if you always take locks in the same order.

8 Likes

Hi David, are there any official docs, best practices, or example use cases (actors, async/await, mutex, etc.) that developers can reference? I’m asking because I haven’t seen things(best practices) like this mentioned in any official documentation.

4 Likes

It would be a bit easier to define „large scale app“ term to answer the question, scale itself could be applied to different stuff.

As a small note actors are very lightweight and quite performant by all tests I’ve seen and done (only mutex being a bit faster), while giving a lot out of the box, but of course it all depends™️.

To be fair we haven’t had built in Mutex until last year. :slightly_smiling_face:
And IMHO I find actors vs Mutex topic is not about simplicity/trivial of state protection, but more about level of abstraction.

1 Like

Locks can break “forward progress” if they are held over an await. This is related to what Jon said—

—but it may not have been obvious. It might not even be wrong to hold a particular lock over an await, but the advice becomes more situational at that point (and I would love to see someone more experienced than me with async spell out what that advice would be!).

3 Likes

Not built in, but the general approach has been possible, in a slightly less efficient form, for many years.

Yes, one of the other advantages of Mutex-like API is that the synchronous closure prevents direct awaiting while the lock is held. You can throw async work out, perhaps enclosed in a Task, but can't await directly.

3 Likes

It’s more of an answer to decision making, now it’s easier to recommend different solutions. :slightly_smiling_face:

2 Likes

Locks must not be held over an await. For almost all available locks, it's illegal to release the lock on a thread different to the acquiring thread. And awaiting something may and will switch threads which makes it illegal even if you could live with the lock held for long periods of time. (There are some very very narrow exceptions to this rule that I don't think are worth elaborating on here.)

(Mutex fixes this by not exposing lock()/unlock() directly and withLock { ... } not taking async closures.)

6 Likes

I think there is still a significant difference between many actors vs. a few global actors both semantically and in terms of performance.

Suppose you have (1) classes A and B isolated to a global actor, vs. (2) actors A and B.

Firstly, classes A and B can call each other's non-async methods without await, i.e. fully synchronously.

Second, even for async methods the requirements for cross-calls are more relaxed in terms of sendability etc.

I tend to plan the overall architecture of my app so that I have a certain number of global actors that is close to the number of CPU cores on some high-end device, or doesn't significantly exceed it. Each global actor has a certain semantic domain, e.g. one is for network calls, another is for image caching, yet another one is for e.g. audio file streaming and so on.

Scenario 2 which is "everything is an actor" is probably easier to reason about, as in, it doesn't require a lot of higher-level planning but certainly has a performance overhead of greater number of unnecessary cross-actor async calls.

I may be wrong of course, but Scenario 1 seems like a better strategy to me at the moment.

And yes, absolutely agreed on simpler locks where you just need to isolate a relatively simple data structure: you reduce the number of await's in your code even more.

1 Like

I realize I expressed that badly, but I agree with you.

What I tried to say is roughly this: It's not the number of actors that causes performance issues, it's the context switches between them. So when you have "many" and they call each other a lot (which is, admittedly, highly likely when you have "many"), you get problems.
Just reducing their number by making them global and then annotating lots of types accordingly may solve performance issues, but it also might not: If you slice your global actors in such a way that you still switch from one context to another a lot, you still have performance issues.

So the easy thing to measure ("How many actors do I have?") is not really a good way to say something about performance for sure. It's a little bit like code coverage and code quality...

Now that I think about it, perhaps me saying "number of actors is not telling you much" is overly restrictive, there sure may be a correlation. However, I'd hate when people, especially developers new to the concurrency model Swift pushes, just take away to "don't use many actors", or, worse, avoid them at all. :smiley:

That's a good approach, and I think in many, many cases you might even need less actors than the target platform has cores. Planning "work domains" and isolating them with global actors is a good advice. And if you need some state that two or more domains have to share, you may find a regular, non-global actor a good fit.

To be fair, this is not always the case. For example loading and uncompressing a lot of images at once can be and should be parallelized as much as possible, though that would require some non-trivial work with a detached task pool (we once had a very good discussion about this on the forum: Swift doesn't provide an out-of-the-box solution for fixed-width task groups).

However with audio or video, because you are probably going to uncompress and play only one audio or video at a time, a dedicated global actor would do the job fine.

So in any case reducing this problem to one trivial idea is not helpful, but I'm more leaning towards global actors when their work domain is clear, and when there are no benefits in parallelizing some work any further.

1 Like

Benefit of option 2 is that all the parts that are tagged with the same global actor can interact synchronously. This is actually a huge benefit, awaits make it easy to introduce logic bugs (one must keep in mind that every await introduces a new 'loading' state in the program).

1 Like

Sorry, to take a step back...

A lot of the discussion per the OP request seems focused on performance and scaling.

But the situation as I read it may have issues around correctness, which is the primary concern.

In any migration of a code base that's large (either in LOC/modules/complexity or in number of developers), or in a code base that's behaving in unanticipated ways, it's often tempting but risky to use the migration as a time to re-write to get things right.

Swift concurrency is very promising in this respect because it has compile-time guarantees. Unfortunately it is still evolving; the team is diligently (1) chasing down situations where the compiler is too conservative, (2) offering more features to avoid context switches, etc.

It's also easy to get wrong. New readers regularly complain that the picture is not clear and distinct, and experts still correct each other. Worse, there are many similar concurrent/parallel features in many languages, so people unknowingly carry invalid assumptions and have excess confidence.

For me the two things that prompted this longer aside are behaviors in Swift concurrency that trip people up when coming from other contexts:

  1. Actors are re-entrant. Async calls within the actor can interleave, breaking (invalid) assumptions about serial execution.
  2. Mutex lock is not recursive. A component using a lock to protect its state can deadlock when it calls its own functions that take the same lock.

This is as designed and documented, but users still trip on them.

I would think that if you were to decide to do a rewrite while migrating a large code base with many developers, the target semantics and implementation would need to be well-formed and battle-tested for your purposes. So one step there would be to write a small application using all the concurrency features you need, to validate that it (1) works as you expect (esp. is testable) and (2) ordinary developers understand it, not just in theory but how to recognize and solve problems that can arise. That app can also be a test-bed for performance and scaling, to prove to yourself whether there's problematic overhead to using actors or no possibility of deadlock. This testbed can push you into writing benchmarks of realistic workloads and scenarios that you can then apply to the real app. That validates the target state.

On the source state, one high-level question to ask is why there is any shared state at all outside of a final resolved result - i.e., whether message-passing, eventually-consistent behavior with stale values flying around can be correct. It's prompted by the situation where async work has to depend on shared state:

To the extent you can avoid that, you might side-step entire worlds of trouble. Often this is a change you can make before migrating (here to a new concurrency model), one that might make the code more correct. If you first get to the correct behavior (writ large and reliably tested), then it can really simplify the migration, because you can expect the exact same behavior (esp. if you've written the performance tests), and investigate whenever that's not true.

Sorry to raise the broader issue - I imagine your team has already done all this, but didn't want the clarity of the forum discussion to make readers think a risky jump is easy.

1 Like