How to create an object graph with serialized access using Swift concurrency?

Kai_Br · January 31, 2022, 9:03am

Note: I also posted this on Apple Developer Forums (How to create an object graph with… | Apple Developer Forums).

Lets assume a model which consists of a graph of mutable reference types. Such an object graph must be isolated to serialized access to avoid data races. Core Data is a good example for such a setup: the managed object context defines the serialized access using a dispatch queue (main or private queue), and all access from other queues to the managed objects of the context must go through the context’s perform methods to ensure proper isolation.

How do I create such a setup using Swift concurrency? A global actor would work, but would force all instances of the model to the same serialized context. This is not desirable: you may need one instance to run in the main actor for use by the UI and another instance to run in a background queue for e.g. an export operation.

With Swift 5.5 it seems to be possible to use an actor as a kind of serial dispatch queue: route all outside access through this actor, than dispatch back to the objects of the graph. If all objects in the graph have access to that actor (typically via a context-style concept like used in Core Data), any object can create actor-isolated tasks as needed by delegating to the actor.

This obviously works with all synchronous functions in the object graph, but it is not so clear with asynchronous functions. The recently accepted SE-0338 "Clarify the Execution of Non-Actor-Isolated Async Functions" (swift-evolution/0338-clarify-execution-non-actor-async.md at main · apple/swift-evolution · GitHub) says that it is currently not guaranteed that a non-isolated asynchronous function called from an actor will always stay on the executor of this actor. And it suggests that in the future it will be guaranteed that such functions immediately leaves the actor’s executor. If I understand this correctly, this means that such functions become concurrent with the actor. Please correct me if I’m wrong.

Note that asynchronous execution in model graphs is often needed, e.g. for file access.

Is there any way to implement this very common pattern with Swift concurrency?

Avi · January 31, 2022, 9:24am

I think the first step is to identify what it is you want to protect. Core Data requires the programmer only modify managed objects using the context from which they were obtained. It has facilities for detecting misuse at runtime, but can't enforce this at compile time.

An actor would you allow to protect the graph itself, if this was maintained entirely within the actor. However, if the links to other nodes are within the managed objects, the actor can't help. I don't think this is functionally different than Core Data, where the objects themselves can be modified from anywhere.

When custom executors are implemented, it should be possible to have a context and its managed objects share an executor. By making the objects themselves actors, you gain serialization across the entire object graph. However, you would still have to deal with reentrancy.

Kai_Br · January 31, 2022, 9:37am

While compile time checking is of course preferred, we would be happy with runtime only checking for now.

Does "maintained entirely within the actor" mean that you can’t use other classes and their instances to build the graph? If so, this would be too hard a restriction.

So such a custom executor would effectively act like a serial dispatch queue in GCD? That sounds promising, but of course does not help now.

So to use a common custom executor, all objects would have to be actors? That would be a serious restriction due to the loss of inheritance.

Sure, we are aware of that.

Karl · January 31, 2022, 9:51am

I think a global actor would be most appropriate for your use-case. As you note, that would disallow instances of the same model type from being isolated to different global actors. The global actors proposal suggests a future direction that would alleviate this: Global actor-constrained generic parameters.

This would allow you to write something like the following:

@ObjectGraphActor
class MyEntity<ObjectGraphActor: GlobalActor> {
 ...
}

As that is still a future direction, it hasn't been suggested how you would use that, but presumably the generic parameter could be inferred by a regular global actor attribute. So I'd expect something like the following:

@MainActor
let mainActorEntity: MyEntity = ...

@BackgroundActor
let backgroundActorEntity: MyEntity = ...

@MainActor
func getValueFromBackgroundEntity() {
  // We're on the MainActor, so the compiler knows we need to 'await'
  // to access things from BackgroundActor-isolated instances

  let someValueBG = await backgroundActorEntity.someValue

  // But not for MainActor-isolated instances.
 
  let someValueMain = mainActorEntity.someValue
}

Does that sound like what you're looking for?

Kai_Br · January 31, 2022, 10:06am

Hi Karl,

thanks a lot. Yes, instance-specific "global" actors as you sketched them seem to be what we are looking for. So it’s probably global/main actor for now - fortunately we do not currently really need the background aspect. We just hesitate to sprinkle code with @MainActor which needs local serialization only.

I’d assume that this is a very common need in app architectures, by the way.

Avi · January 31, 2022, 10:08am

Yes and no. Core Data allows creating objects and then inserting them into a context. The restriction on modification is on an instance originally obtained from the context. You could do the same with your code. The actor which maintains the graph could adopt objects, and even sub-graphs, created elsewhere.

As custom executors don't exist right now, this can't be answered definitively. It's entirely possible that there will be a mechanism to insert work onto a custom executor from a non-actor context.

Kai_Br · January 31, 2022, 10:20am

Ok, I understand that.
But if those other objects contain async functions, execution will become concurrent with the actor even if said async functions are called from within the actor, won’t it? At least after implementing SE-0338, if I understand it correctly.

Fair enough. May be this question can serve as an example why such a mechanism would be desirable.

Avi · January 31, 2022, 10:24am

I think so. That's akin to the CD rule that adopted objects can't be modified outside of the context. You'd have to enforce it yourself, as Structured Concurrency doesn't have any mechanisms, AFAIK, to detect misuse.

John_McCall · February 1, 2022, 9:03pm

The right design for architectures like Core Data is probably that managed objects ought to be non-Sendable and only available on the managed context thread. That does create a minor usability problem, though, where you can't easily maintain a reference to the managed object which you can pass back to the actor. We've had some conversations about the idea of actor "outposts": basically, handles to actor-isolated non-Sendable state that you can safely pass around outside the actor.

Karl · February 2, 2022, 6:38am

How would that differ from global actors? It seems like broadly the same concept - actor-isolated state which doesn't live inside of an actor instance.

However, when the data that needs to be isolated is scattered across a program, or is representing some bit of state that exists outside of the program, bringing all of that code and data into a single actor instance might be impractical (say, in a large program) or even impossible (when interacting with a system where those assumptions are pervasive).

SE-0316

John_McCall · February 2, 2022, 8:04am

Yes, it's strongly analogous to what you get by having a global-actor-qualified class type: you get a Sendable value that can be shared between actors, but to actually use it you have to be running on the correct actor.

But to be a global actor, the actor actually has to be a global singleton, which e.g. NSManagedObjectContext is not. In Core Data, any particular managed object is isolated to a specific instance of NSManagedObjectContext, and if you try to use it from the context thread for the wrong instance, that's just a racy as using it from a completely unrelated thread. So it's not good enough to use the global-actor language approach where we tie the internal isolation of a value to an actor type; we need to tie its internal isolation to a specific actor value.

I assume that the object graph that Kai has in mind is not inherently singleton; there might be several such graphs in the process, each independent from the others.

Kai_Br · February 2, 2022, 8:42am

This is exactly right. Fortunately, in our current project we do not depend on this, so we can continue with a global actor. But it feels kinda wrong for an object graph which isn’t conceptually a singleton.

As an example for the need to different graph instances running concurrently with each other: iOS kills an app if the main thread is tied up synchronously for too long. In a different project, loading our Core Data model with possible migration steps could take too long, so we have to load it on a background queue, and then re-instanciate the same model on the main queue for use by the UI. Works perfectly with Core Data. Of course, async programming would solve this particular problem without using a background queue, but Core Data is very synchronous.

I am not sure I understand this correctly. Several questions:

What does "managed context thread" mean in the context of Swift Concurrency? I thought threads are no longer a concept in the programming model, are they?
How would async code work in this idea? Assume a managed object which needs to access a file, aka func getData() async throws. Wouldn't execution of this function (which does not live in an actor) become concurrent to the managed object context? SE-0338 says this happens in rare and unclear to me circumstances currently and is guaranteed once the proposal is implemented.
I may misunderstand it, but the "minor usability problem" does not look minor to me. Having references to members of an object graph (managed objects) all over the place is very common in our architectures.

Karl · February 2, 2022, 9:07am

Hmm, that's interesting. So it would be isolated to any actor (maybe the one we're on, maybe not; we wouldn't be able to tell at compile-time, although we could avoid hops at runtime if the actors match). Kind of like the existential to global actor's generics.

I guess we'd also want to pin types to a specific global actor, so you can have both a @MainActor MyEntity for your UI logic and an @UnknownGlobalActor MyEntity for when you need a separate context.

Very interesting, but I'll leave it at that so as not to derail the thread

John_McCall · February 2, 2022, 5:02pm

NSManagedObjectContext is actually very much like an actor: it protects a large amount of state by restricting its use to a dedicated serial executor, which you can enqueue work on with perform(_:). Internally, that executor is a dedicated thread, but you’re right that this isn’t really part of the programming model.

A fairly direct async-ification of NSManagedObjectContext’s API would be to have an async perform which took a non-async closure that it promised was run on the isolated executor. But of course this doesn’t communicate the isolation relationships; a more complete async-ification would be to make it an actor.

Yes, you’re absolutely right; this is a common and significant problem with using this kind of architecture, and I didn’t mean to be dismissive about it. But in the absence of language support, this is the way to solve it: make your object types non-Sendable, make some Sendable wrapper that knows what the associated actor is, and allow the wrapper to be unwrapped if you’re dynamically on the right executor.

Kai_Br · February 3, 2022, 2:16pm

To be honest I am not sure that I correctly understand all suggestions you are making - Swift concurrency is a complex beast.

Let me phrase my central question differently: If I understand SE-0338 correctly, any async function outside of an actor (global or normal) will become concurrent with everything else after the first suspension point when this proposal is implemented.

Is this true?

If so, I can’t see how to write asynchronous code outside of actors. Even async functions on the same (non-actor) object may run concurrently to each other and such can’t safely access any data.

What we would need for our architecture is a means to run asynchronous code in an object graph on a single serial executor. Without this, we seem to be limited to either global actors (which can manage an object graph) or normal actors, which are limited to a single object.

Is this really how it is? Or will be, with SE-0338?