Designing with concurrency in mind - actors vs. classes with a global actor isolation?

Joseph_Heck · March 9, 2024, 1:34am

I'm porting some code, a new-ish project, and since it's basically "fresh to swift", I'm trying to tackle it using Actors and concurrency foremost. As I'm going through it, the original code that I'm porting from has state spread throughout a lot of different objects (loosely classes/reference types) with fairly isolated purposes to keep the code relatively straightforward to read.

As I'm porting, I started leaning into making each of those reference types that held it's own state it's own Actor, but as I'm getting into porting more of the middle of this setup, the number of actors that I have spread around is feeling weird to me, and I don't know why - other than this is a completely new task, and I'm not clear on the tradeoffs I'm making.

In some places, I've resolved to merge some classes together to have more unified locations for where state is updated, but I'm also wondering if I'm trying to tackle this wholly incorrectly and these could just be "final class"s and annotated to be pinning to a global actor for the execution/isolation context.

Does that even make sense? I'm sure I'm balancing tradeoffs, but I haven't really used concurrency "in anger" to yet really understand what the tradeoffs are - and I'm not even entirely how to look for them. I'm guessing context hops between actors is the biggest thing I'm trading off against.

Anyone stumbled down this path and come away with any rules of thumb to share, or can share how they've looked at similar problems and measured any relevant tradeoffs? Useful instruments, benchmarks they leaned on, etc?

Thanks,

joe

vns · March 9, 2024, 8:29am

Not sure what you mean by "state spread"? Is it shared state, so it has to be protected in multithreaded environment? Or is it just pieces of state, isolated to certain classes? An example will help to understand.

I believe that need to use a lot of actors tells there need to be re-evaluation of design. In case of clear need to have isolation and be able to access object concurrently, actors are fine. But if there is no shared mutable state, I doubt their necessity.

Joseph_Heck · March 9, 2024, 4:59pm

Several classes with bits of mutable state in them, that could potentially be consolidated - along with the responsibility of the classes. In this case, there's a sort of "top level" class that represents a repository, and vends/manages instances of a reference-based class. Underneath, it has a storage subsystem for writing to persistence (filesystems, or remote back ends), a network subsystem (which contains a collection of 0 or more network transports over which to sync), and a synchronizer class that manages the sync state of what's been replicated into the documents.

Repo
  CollectionSynchronizer
     [DocSynchronizer] (one per document being managed)
  StorageSubsystem
     StorageProvider (that does the writing and reading)
  NetworkSubsystem
     [NetworkProvider] (one per transport that sync the documents)

The whole setup tracks a collection of documents, network peers, the state of those peers and the a marker that represents versions of the documents that the peers have.

As an example of the shared state, an instance of a DocSynchronizer is tracking available peers as they appear and disappear from the network, sending and receiving messages from them. The Network subsystem also knows about said peers, since it's what's collecting them from the one or more network providers that are accepting or making connections. The collection synchronizer passes information about peers being added or removed down to each of the DocSynchronizers.

If you want to dig through some code to see the mess at hand, you're welcome to poke around at https://github.com/automerge/MeetingNotes/tree/automerge-repo-package/Packages/automerge-repo/Sources/AutomergeRepo/. All of the above types are mostly stubs as I'm trying to sort out how I'd like to put this together, at the moment heavily influenced by the original code and function that I'm porting.

I suspect that I could have a single (or a few) actors - a top level actor that was responsible for knowing the state of all the documents, the network transports, the active peers in play, and their reported remote states - but again, I'm not entirely clear on what the tradeoffs are (other than denser code complexity and tigher coupling) and how using multiple actors effects the system in terms of performance (context switches, etc) or memory overhead compared to single actor (or a few). I'm hoping to learn if those tradeoffs are measurable, large or small, and even how to measure them or see them for myself.

vns · March 9, 2024, 6:37pm

Oh, I've wanted to get to know CRDT for a while now!

So the Repo is a facade to the whole system, which inevitably means it is going to be accessed concurrently. Synchronizers is synchronous (heh), so being isolated with the repo would be enough. Two subsystems are asynchronous, and even though isolated on a Repo actor, have to be protected by being sendable, and having them as actors as well seems right too.

With provider protocols, I believe they need to be sendable:

public protocol StorageProvider: Sendable {
    ...
}

With concurrency checks set to "Complete" you should have warnings currently on passing non-Sendable type between isolation domains. The InMemoryStorage, on the other hand, can be simplified to being a struct with sync methods on it, and still conform to Sendable because it will not involve any async calls, and so jumps to another executor.

I have been using similar structure of components for a year now, and it seems been working fine. One thing that might be beneficial here (I am still exploring it so far) is to introduce your custom executor for actors and use it among these actors, if you see it makes sense for them to share executor and decrease amount of hops between them.

I will definitely take another look to explore more in this demo project.

vns · March 9, 2024, 6:47pm

I highly recommend enabling strict check in Xcode project settings, they are currently off. Compiler will make suggestions in form of warnings or errors regarding unsafe code.

It also good approach to isolate all SwiftUI views on main actor:

@MainActor
struct PeerSyncView: View {
    // ...
}

SwiftUI has somewhat strange design decision on isolating only body property on main actor, which often leads to "surprises", and this isolation works out beneficial.

The more complications I see with correct isolation of Network framework, rather than this architecture of Repo. Apple hasn't made it ready yet, and compiler will be not happy about it

Joseph_Heck · March 9, 2024, 8:22pm

That's kind of the core question behind this thread - how do you see the hops and know if that's an issue or not? Is this something that you can infer by correctly using Instruments, something a CLI tool can measure, or something I can craftily measure to know that executor hops are notable in performance vs. not?

vns · March 9, 2024, 10:25pm

Every await is a potential suspension point and hop to an executor. Potential, because hop might not happen due to call happening on the correct executor. For instance, if you are calling another async method/function isolated to an actor from the same actor, there (more likely) won't be a switch. And if you are calling actor B from actor A, there (more likely) will be change of an executor. And all the nonisolated async methods/functions will hop to a generic executor. The "more likely" additions here is due to possibility of custom executor for actors, unsafe attributes, etc. - yet this is rare cases, that require additional code.

You can inspect SIL to get more precise understanding by compiling code with -emit-sil option (haven't tried it with Xcode though). There will be clear hop_to_executor sil function calls, yet - if I recall correctly - this function does not means hop is happening, since it also checks current executor.

On a performance side I have no experience, because so far I didn't have crucial performance issues or requirement due to Swift Concurrency and switches. I would probably go without concerning about that until there are performance issues. As for Instruments, I have never had a good experience using them, but there is a concurrency and probably a WWDC session about using it.

Also, I can recommend this article - How async/await works internally in Swift. I found it pretty accurately describing key points. And it is not so outdated, meaning it covers most of the latest changes. On the forum there were a lot of discussions related to concurrency with many insights, with downside of them being defragmented across various topics and time, but it worth exploring anyway.