Making actor non reentrant

nkbelov · July 17, 2024, 7:46pm

I don't think I have a good example in a vacuum (for now?), and I'll have to preface with a disclaimer that this all might be terribly wrong; the following is just a framework I'm currently thinking in, and it might easily get corrected by someone more experienced.

But I can start with a counterexample: it might be typical for an imaginaty todo app to have a class TodoManager where it loads, parses, stores, retrieves, filters etc. some todo items. There's typically a lot of business logic inside with some mostly constant handles to the local SQLite DB, observers, publishers and stuff, and people would attempt to just transform this into an actor.

First, I strongly opine that the business logic part has to straight up be written as a pure procedural global (or static) function like this

func loadRemoteTodos(ids: [String],
                     writeTo: some DatabaseHandle
) async throws { }

for two reasons:

It becomes a non-isolated function, which is actually the correct semantics: the function only touches the explicitly passed DB at some point, but otherwise the operation has no business of being serialised w.r.t. some other logic.
It is now a "mini concurrency domain" of its own: you will typically then only have to reason about the ordering of other calls to this function, but not any adjacent ones.

If one has correctly figured out the transactionality of this function and others alike, there's much less cognitive burden because it already sits in the global concurrency domain, and thus makes no implicit guarantees and hides no stateful information.

If I were to describe when to use actors in one sentence, it would be something like "only for compact, stateful data types that benefit from being highly concurrent", but I think it's easier to reason in terms of when not to use actors, similar to how this document does it:

Is there even an underlying reason for it to vend an async API (i.e. non-blocking I/O, networking, custom scheduling, computation offload)? If not, you (very likely) don't need an actor because you don't have an innate source of asynchrony.
Do you actually intend to allow your callers to suspend? If not, you don't need an actor, you need a leaf-level mutex.
Are you relying on strict FIFO execution order? If not, you don't need an actor, you need a (threadsafe) queue.
Does it matter if jobs get reordered due to underlying task's priority? If it does, you again need a FIFO queue, not an actor.

And so on. Benefitting from being highly concurrent is the key, because that's what actors are by design, while most problems in the typical app programming space aren't, and so people are losing the battle of trying to bend a tool to fit the problem it doesn't fit.

Good examples for such highly concurrent data types could be message queues, database handles, network ports, caches etc. It's easy to imagine them being generic; also I vaguely remember that actors originally were conceived as entities that manage network connections indeed.

Distancing oneself from the OOP paradigm helps in the portion where one tries to appoint a bunch of functionality to a singular "object", whereby the choices are pretty much either a struct, a class or an actor, only the latter being threadsafe by default, which is where people immediately get misguided. When the majority of functionality is implemented as standalone functions (which is, again, oftentimes even more semantically correct), there's no such decision paralysis anymore.

crontab · July 18, 2024, 8:16am

I'm also a big proponent of "any function that can be static should be static", however it gets more complicated in SwiftUI apps that talk to some backend system.

Any non-trivial real-world SwiftUI app forces you to step into the multithreading field where you end up choosing between:

Explicitly cornering your entire app into MainActor (and thus end up running on a single CPU core)
Using actors
Using lower-level "old-school" synchronisation

... or struggling with finding some reasonable combination of the three.

The reasons why you end up with this choice include:

Caching of network requests in memory and on disk
Maintaining an access token that's used in all network requests and should be periodically refreshed in a transparent manner, i.e. without disturbing the higher layers of your app
Ensuring that any non-trivial computation doesn't happen on the UI thread, for example uncompressing images that are downloaded from the network

As another example, there's also a paradigm that I use in all my apps that I call "multiplexing": combining multiple calls to the same endpoint into a single call and returning the result to all callers.

This, and possibly some other more app-specific things will force you to choose between staying on MainActor, using old-school synchronization, or going full structured concurrency nuts with actors and whatnot.

Where exactly in SwiftUI you step into multithreading, is generally where you use either the .task { } View modifier or just Task { } in places where you need to perform something asynchronous, typically make a network call, where SwiftUI doesn't support asynchronicity, for example a Button action.

Here's a library that I created that provides memory and disk caching of network objects; it currently uses a combination of actors and a global actor: GitHub - crontab/AsyncMux: Asynchronous caching and multiplexing layer for modern Swift client apps

Can you come up with a better solution to the problems this library is trying to solve? Would be happy to hear any ideas.

vns · July 18, 2024, 9:12am

The trouble of having mostly standalone (and, apparently, non-isolated functions) leads to inevitable requirements of everything you operate within to be Sendable. Arguably, there is no shared mutable state in such design, so this should be easily achievable, at least — in theory.

When it comes to practice, you quite rarely can design apps in such manner entirely. That’s possible for smaller apps, but as functionality increases, you’ll find yourself in either troubling to express a feature in that manner, or it will have performance implications. If modeling an app in such terms would've been greater, we'd used purely functional languages everywhere by now, I suppose.

We definitely need better thinking framework for actors (and global isolation as well). I am not really happy with the need to throw @MainActor everywhere on Apple platforms, I mean — not everything has to have isolation at the declaration level, as long as it is has some isolation at usage side (based isolation helped with this partially, so non-isolated types with async methods not completely useless anymore). As much as I’m not happy with throwing Sendable everywhere, and observe as many currently lean towards “let’s make it Sendable, so we won’t have a headache dealing with this later” even if that means it can greatly reduce compile time checks due to the need of using unsafe and unchecked. It can be that such issues won’t be an issue in a different framework of thinking, but so far I couldn’t find any (not counting purely functional approach, which is simply eliminates mutations entirely, making all of this obsolete).

nkbelov · July 18, 2024, 9:40am

I actually think this is a good requirement, and in my experience it only gets in the way when there are just too many objects overall, so if it gets clunky, it might be a sign that the app has too many stateful parts that don't have a good reason to exist in the first place.

Again, my line of thinking is the following: the innate reason for asynchrony for most apps is async I/O (basically, it's just networking and disk reads/writes), and so the majority of your actors should just be wrappers around this functionality. Besides these, the only types that really have to be Sendable are whatever data structures you pass to these functions as arguments, but they should mostly be plain old structs, where sendability is mostly trivial.

I'd love to see a counter-example, because I can only imagine something like SQLiteHandle, NetworkingSession and a few similar objects that would cover 95% of all async I/O needs of an app, so the total count of Sendable types that would not be plain immutable structs should be just around ten-ish, and this volume should be manageable. I can imagine architectures like VIPER or what not that require hundreds of objects, but I have lots of thoughts why this approach is plain wrong overall, and it being clunky with concurrency is just one of its symptoms.

There's a series of quite iconic language-agnostic videos to this part, particularly https://youtu.be/QM1iUe6IofM and https://youtu.be/0iyB0_qPvWk.

Also, region-based isolation and sending will lift a lot of these requirements soon when it comes to arguments passing, albeit it will mostly apply to the cases where plain old structs would've worked already.

Gero · July 18, 2024, 1:58pm

I think that's a really important insight!
Thinking back on it now, I am pretty sure I just avoided this trap (mostly) due to being overly concerned of tinkering with a "new thing". Or to put it differently: The years of playing table top games and having a DM ask me "Do you really, really want to do this?" (imagine a look displaying complete shock about how I can send my character off to certain doom) helped me out, finally!

When teaching/explaining structured concurrency, we all need to emphasize that actors are not "just thread-safe objects". As you said, they are about more data structures, or rather: they isolate state not execution, in a way.

vns · July 18, 2024, 2:39pm

Any complex enough app will have a lot, won't it? Once you pass threshold of "to do list" app, and get into realm of something that supports variety of features, that's inevitable.

In my vision we are getting Sendable-types obsession, once you've defined some protocol to have Sendable as requirement, it is with you all the way. If that's protocol somewhere in the core, it affects everything. Recently I have been reviewing core of a project on Sendable conformances and discovered one of the widely used ones have had such requirement for nothing: removing it allowed me to drop several @unchecked Sendable in implementations.

Yes, at the transport level, where there is no state, just messaging — that's easy to handle. But that doesn't mean most of your app has to be Sendable. For most of the objects and operations it is unnecessary requirement to be usable from anywhere. But this requirement for sendability currently spreads @MainActor annotations all over the place.

Abstractions is the way to handle complexity. Putting aside VIPER (clearly not my favourite thing), there is no way app that is a bit beyond basic "network - cache - UI" won't have significantly large codebase with hundreds of objects. The question how you organize and treat these objects. You are more likely want to define isolation regions with actors in which you'll operate then.

Re-iterating previous discussion in the thread, I don't think we should treat actors as data structures akin to dictionary. I agree that describing them as something different from everything, instead of thinking as "thread-safe classes", is definitely better paradigm. But actors are intended to hold logic and operate on the state, not be just bags. Therefore, limiting actors to just I/O is limiting their power.

nkbelov · July 18, 2024, 3:20pm

I don't think I can concur on any of your points, as I think pretty much the opposite:

protocols should rarely, if ever, declare Sendable super requirement
data structures, on the other hand, should default to Sendable wherever possible (Rust emits Send + Sync by default for all structs if able); generics should just do where T: Sendable
ergo, most of the non-UI parts of an app can (and should) be Sendable without much trouble
the complexity of functionality doesn't have to grow proportionally with the number of classes (of which now the task of managing concurrency arises), and this is where I most majorly disagree:

This is just the OOP way of structuring an app, and my original point was specifically addressing the fact that actors are a poor concurrency abstraction for these kinds of code structures, which is why people often start to fight them and find ways to make them non-reentrant etc. You can make very good abstractions without defaulting to OOP-style classes, and then the @MainActor problem is gone as a collateral.

There are definitely ways to have a complex app built with a minimal amount (i.e. again, 10–20 or so) of actors/objects, whereby they each govern a "data domain" (DB, cache, filesystem etc.) instead of vertical features (as it's typically done instead). These are not "just bags" and do hold some logic (e.g. the cache will have some eviction algorithm and stuff), but this logic can be made minimal enough to just enforce the required transactionality and be done with it.

Jon_Shier · July 18, 2024, 3:27pm

That seems overly broad. Any protocol abstracting values which must pass between concurrency boundaries must be Sendable. For instance, Alamofire's ResponseSerializer protocol abstract serializing the response and data for HTTP requests. By its nature it must start where the user creates it (an arbitrary domain) and then be run on Alamofire's serialization queue. Anything passed into a concurrent system must do the same. URLSession's various delegate protocols now inherit from Sendable as well, as you provide the delegate to URLSession from an arbitrary domain and it then interacts with it on the delegate queue.

nkbelov · July 18, 2024, 3:40pm

Sure, it's a fair requirement when there's a well-justified technical necessity, especially in library code; what I mostly meant is that a Sendable constraint pretty much dictates that "this type will be accessed concurrently from many different places at the same time", which I find hard to justify as a default, and arguably typical app code shouldn't just juggle concurrency like this all the time.

Also, if I correctly understand the semantics you've described, this one-off create-and-pass-elsewhere is actually safe for non-sendable types to be done with, and this is where sending arguments come along (in other words, the compiler-mandated Sendable requirement for such usage is/was just a type system limitation).

Jon_Shier · July 18, 2024, 3:47pm

Yes, sending is probably a replacement for some of these uses. Unfortunately it's limited to Swift 6+, so isn't available to solve these problems for anyone who needs to support older versions. Even then, it's really only suitable if the value only accessed from a single domain, which may not be possible go guarantee in concurrent systems.

vns · July 18, 2024, 3:48pm

Here is a contradiction with your previous suggestions: there's no way to have a bunch of non-isolated functions and aim to have everything as Sendable, and avoid restrictions on protocols. DatabaseHandle from earlier is more likely to have Sendable requirement or be designed in the way parameter can be marked sending, in both ways lifting up complexity to satisfy that.

I don't think protocols should expose such requirements either, at least without any great need and intended design in mind. But they would have to if majority of the app is Sendable, even if you don't need them to be.

Yes, it is other way around As the complexity growth, or to phrase this better – as feature add up to the app, you will have new things pop up as result of having these features.

Why do you think the need to have non-reentrant behaviour is battling against actors? Caching, for example, one of the major examples, where re-entrancy is a significant detail: if there are several requests to an expensive resource, you want to load it once, therefore you have to handle re-entrancy. That's not unique to actors, though.

How broad, let's say in messaging app, would be DB domain in such case? Or would it be possible to handle complexity of many different logical domains, that exist in any modern messaging app, within only one DB domain? Objects, procedures, or other abstractions, the complexity should live somewhere.

But why, continuing messaging example, ChatRoom shouldn't be an object (and even actor)?

nkbelov · July 18, 2024, 4:56pm

Yes, this would be one of those rare exceptions . I perhaps should have worded my statement differently; the way I read your previous post, this part:

suggested to me that defining too many protocols as Sendable is the problem. Of course, if the semantics of the protocol that it's explicitly modeling a type that by design is supposed to be used from multiple concurrent parts, then it's a fair requirement (in contrast to adding this constraint just to satisfy the type checker).

I just heavily suspect this to be a symptom of incorrect design, and I've done this many times myself before re-thinking the ways I use actors in my code.

The chat room is actually a good example, as I'm currently working on an app where it was implemented exactly as an actor, and this design is pretty flawed; we have many very hard to trace concurrency bugs, and I'm looking forward to rewrite it eventually in a way I'll describe shortly.

The major issues are:

a chat room has no business being its own concurrency domain (esp. being concurrent with respect to other chat entities or rooms, as it's not the only one chat-related class),
modeling a chat room as an object is just not a good abstraction, because data-wise it's just a String ID, and all the other data it might need (message cache, room name, picture etc.) is required by many other parts (e.g. message history search), so it can't just "own" this data (or rather, it can, but this encapsulation just doesn't work well).

Instead, I would design room-related functionality like following (all functions are global/static):

func getLastMessages(roomId: String, count: Int, cache: ChatCache, chatServer: ChatServerHandle) async throws -> [Message]

func send(message: String, to roomId: String, server: ChatServerHandle) async throws

func searchMessages(containing: String, in rooms: Set<String>, cache: ChatCache, chatServer: ChatServerHandle) async throws -> [Message]

// ... etc.

where ChatCache is really the only actor in the whole chat system. The cache is both in-memory and on-disk cache of user avatars, message history etc., it benefits from being highly concurrent and has clear transactional semantics.

I assume that servers (or rather, entities that talk to a server) are also usually designed as classes/actors, but I wouldn't do it this way, as it's pretty much stateless after having been configured once, and so there's no reason to serialize or otherwise impose additional semantics on the calls to it. So I'd just do the following:

struct ChatServerHandle {
    let host: String
    let token: String
    let inFlightRequests: Mutex<Dictionary<String, Task<Void, Never>>>
}

(the last field is just an example of the structure of the data in there; I'd have a prettier deduplication machinery for it in reality).

taylorswift · July 18, 2024, 6:15pm

i’ve learned the same lessons in my own code bases. the thing i haven’t figured out is code organization. when following OOP, there was a simple rule: one type per file. the rule had its drawbacks to be sure, but it saved a lot of time answering the question “where does this code live?”

ktraunmueller · July 18, 2024, 6:37pm

As long as moving things around is possible without too much pain (e.g., a good test suite will allow even aggressive refactorings), organization isn't the biggest concern, is it?

jaleel · July 18, 2024, 6:41pm

Won't question global function style, as it's subjective, but from this example looks like ChatCache then becomes a shared state. Is it just a singleton? How does this works on several nodes? What happens if getLastMessages constantly fails cause ChatCache died? Why getLastMessages should be async, when encapsulated per Room actor it could be synced?

nkbelov · July 18, 2024, 6:46pm

Yes.

We're talking about client (iOS app) architecture, so the cache is just a glorified wrapper around an in-memory dictionary, filesystem and/or SQLite connection, whatever fits best. It's not a distributed remote node (however, I don't see why the design would have to be different if it was).

Because it performs async I/O (talking to the remote server) in case the requested number is larger than currently held locally, so it would have to be async regardless.

nkbelov · July 18, 2024, 6:50pm

I honestly just roll with larger files until the functionality becomes clearly unrelated.

That is, instead of predefining some files and then wondering where to stick a new function, I'd just have Chat.swift for the longest time, and only when I see that there's a clearly disctinct group of methods (e.g. those that specifically prepare rendering data) will I force them into a separate file, namespaced to an empty enum perhaps.

jaleel · July 18, 2024, 7:08pm

It's not so scalable and not fault tolerant for distributed system. Second reason is overall why actor been developed as a concurrency model.

Ah, so it's called once when joining a chat (or paginated)?

taylorswift · July 18, 2024, 7:31pm

in a gestalt single-owner code base, this isn’t too bad - i do major reshufflings in my open source Swift libraries all the time - but it creates a lot of busywork for teams.

ktraunmueller · July 18, 2024, 7:34pm

Right, in team settings, and dependencies between efforts/branches, that often gets in the way.