[Pitch] Distributed Actors

Thanks for the writeups, Doug!

As we've worked through this before I very much agree with all of this and more than happy to take the proposal to the next level with this! :100: Unless others will have much feedback here, I think I'll work this into our second version of the proposal as we work through implementing some of this in the coming weeks.


For additional context a few comments:

The DistributedFunction type

Yes, that is perfect! :slight_smile:

And we'd have that function(...) crash hard if misused, in order to prevent any weird misuse. It is the transport developers responsibility to make quadruple sure things are being invoked on the right actors and with the right types.

We'll have to work out what exactly it has to contain to support a few other advanced features (like for tracing we'd want to have the "#function" name so a trace span could use that as a nice default for the name of the trace). By lucky accident, recently there was a talk by the Reuben from the Orleans team, describing their source generator (they use source gen), and it basically is exactly that shape as well (including the pretty names for debugging/introspection), so that was a nice accidental confirmation of our design -- as I've seen it only after we came up with this actually :smiley: (Though to be fair, the requirements for such systems are all the same, so yeah, similar solutions arise).

Transport or SerializationRequirement typealiases

You mentioned that it's perhaps left for another discussion, but I'll quickly say that for many reasons, this is exactly what we'd want anyway, and were aiming for eventually anyway:

protocol DistributedActor {
  associatedtype ActorTransportType: ActorTransport
}

This has a small rippling effect on some of the other synthesis (type of transport accepted by initializer), but that's pretty simple and something we wanted anyway.

In practice I think we'd default to something like CodableTransport -- by the way, perhaps time to rename this as ActorMessageTransport and then CodableMessageTransport which just refines it with the typealias SerializedRequirementType = Codable :thinking:

Overall looking great and I think it'll work well.

I am not too concerned about the limitation on extensions -- this is fine IMHO. What matters the most is that a protocol itself does not have to specify this, and we can validate there is a specific type provided either at the concrete actor or at the time of resolving. Either being static compile time checks :+1:

In any case, this all sounds great and I'll work those into the next revision of the proposal :+1:

8 Likes

At an implementation level, I was thinking I'd use the mangled name here, since it has pretty much all of the information we need and the implementation guarantees uniqueness already. We could expose that to the user in a more structured manner so they could get, e.g., the actor type name, function name (greet(string:)), module name, and so on.

Yeah, I didn't want to pre-suppose, but I like this direction. It also means we could eliminate the existential AnyActorTransport storage from distributed actors to make them smaller and more concrete. De-existentialing the interfaces to distributed actors seems like a generally good thing.

Doug

2 Likes

I agree, it’s almost always best to avoid existentials when possible.

How would someone switch an existing code base to a different transport/serialization? Say we start with Codable, and then decide to switch to ProtoBuf for performance reasons? Also, is it going to be possible to have different transport and serialization for the same remote actor based on where the remote actor happens to be located? (Say a different process on the same physical machine vs a remote machine)

1 Like

Wouldn’t you just rewrite the impacted code? I don’t see any reasonable way around that.

In theory, you could write some sort of adaptor that decodes one and encodes the other, but there’d be little benefit in doing so.

I am aware of these options. The reason I am asking is further exploring the design space to see if we can do better to reduce dependency on concrete transport and serialization format at the source code level. Maybe something like some Transport and some Serializable, if you will. Something that would keep the ultimate generated code as concrete and statically optimized as possible, while accommodating for the cases that I mentioned.

2 Likes

If you want to switch a given distributed actor to a different transport, you would replace your

typealias ActorTransportType = <whatever transport you are using>

with a type alias to a different transport. You can centralize that definition in your code base, perhaps, or we could introduce some special defaulting rule. For example, imagine that if there were a type alias like this visible:

typealias DefaultActorTransportType = MyFavoriteActorTransportType

Then, any distributed actor that doesn't explicitly define an ActorTransportType would get the visible default. Perhaps libraries that define transports could also provide a public DefaultActorTransportType so you wouldn't have to do so yourself in most cases: import the library that has a transport, and your distributed actors use that transport by default.

Changing transports statically would mean changing the typealias(es). If you want to change the transport dynamically, we could define a type-erased transport like, say, AnyActorTransport<SerializationType>, and you can choose dynamically.

In the design I proposed for abstracting away Codable, we could not abstract away the serialization mechanism quite so easily, because we need a static type against which we can check all distributed functions. See the end of my post on this design, where we talk about needing the same-type constraints when extending DistributedActor with new distributed functions.

Doug

4 Likes

I am a little worried to see history repeating itself — especially as Distributed Objects are not even mentioned in the whole text.

Distributed Objects also started as cool and promising, but ended as a bunch of keywords which most developers don't even know.

However, despite not referencing it, this try seems to address some issues of the failed approach, so let's hope for more luck this time.

As someone who used DO (NeXTSTEP) as well as PDO (2.x, 3.x and 4.x on both HP-UX & Solaris...) extensively during a few years in the mid 90:s when it was relevant tech I can understand that concern, although not sharing it (yet) - the major problems with DO/PDO was robustness and performance (the "machd" emulation on Solaris did give us major pain), the actual productivity and ease-of-use was quite ok. Seeing that hooks seems to be in place for transport implementations and message serialisation (or lack thereof) is promising I think.

I definitely think there is room for a great solution for this design space which could greatly accelerate building distributed Swift on the server solutions - just hope it will get time to bake enough to ensure all the pieces of the design are there.

3 Likes

Also, one use case that I've encountered is where you have a distributed system that basically has a number of clusters running on different sites and they all together form a supercluster.

Just mentioning this use case as it may have implications for actor resolving (you might prefer locality of reference) and other things.

1 Like

Because every distributed actor conforms to DistributedActor protocol, would it be sufficient to just have:

actor Greeter: DistributedActor {
}

Like any other protocols we do compiler-level code synthesis? (Codable / Equatable etc).

1 Like

This is really cool, a nicely conceived vision, and I have nothing to useful to say about the details that others haven’t said better.

I have to ask the big picture question that’s been nagging at me since I first heard rumors of this work: What’s the driving use case here? Hasn’t the industry shifted pretty hard away from stateful RPC since the “distributed objects” craze of the 80s? This is definitely one hell of an improvement over Stateful Session Beans or CORBA (shudder). But does the world need a better CORBA?

I phrase these questions rhetorically, but I do mean them seriously! What’s the motivation here? What’s the application space?

I take it this is meant for tightly connected systems (LAN clusters, XPC, and the like), and not for the kind of “loose interop over the Internet” stuff that sunk SOAP…. (Or should people actually attempt to use this, say, as a wrapper for REST APIs, as they will doubtless attempt to do?)

A clearer sense of what’s driving it and how it’s going to be used would help focus feedback.

6 Likes

Yes, I also see it that way. It is the next logical step for a highly concurrent (single) actor system to scale beyond a single host. It is also important in separating subsystems into individual processes for security purposes. (XPC stuff).

1 Like

I think you have the right reasoning here – the microchip industry is moving towards custom accelerators (such as the Mac Pro's afterburner card, which is now in the M1 Pro / Max) which are all asynchronous, and which are much easier to work with when modelled as distributed actors (accelerators are mentioned in @Chris_Lattner3 's Concurrency Manifesto). However, I don't see why Swift's distributed actors working for XPC-like systems means that they couldn't also work for more large-scale "Internet stuff", especially on the cloud (think microservices or game lobbies).

I definitely don't think that the two are mutually exclusive – the genius of the distributed actor design is that the transport is completely abstracted away into some ActorTransport, meaning that you can have a really nice swifty api for both XPC-like systems and large-scale worldwide distributed systems.

7 Likes

I’m coming into this late, and without much expertise in this kind of server space, especially not compared to Konrad and others’ years working with Akka. But…I’m still not convinced this needs to be a language feature.

A story from the pre-1.0 days: we were wondering whether constructing new objects should have special syntax (new UIView(…)), or just look like a normal function call. We ended up going with plain old call syntax, as everyone knows, and one of the primary reasons was that any method can allocate and return objects, whether they’re initializers, factory methods, or just plain old instance methods on another object. Initializers are special in a number of ways, but calling them isn’t one of them. That’s how I feel about distributed: as soon as you wrap the method in a helper, it just looks like a regular async method to the rest of the program. To me, that neutralizes the idea that we’re marking the potentially-more-expensive methods to call.

But like initializers, the implementation of distributed methods is different. So that isn’t a reason not to do this on its own. I just can’t shake the feeling that while from inside this is a nicely integrated way to do distributed computing, from outside it’s a restricted dialect of Swift with a lot of magic in it.

I know there’s experience from Akka about a library-only solution, but it feels like this feature could be implemented with a code generation tool rather than compiler integration. From the perspective of other code, a distributed actor is something like this:

enum FooActor<Transport: ActorTransport> {
  case local(LocalFooActor)
  case remote(ProxyFooActor<Transport>)

  func doTheThing(_ params: FooParams) async throws -> FooOutput { … }
}

Actually using a representation like this (though probably not this exactly) has a handful of benefits, particularly that the escape hatch for local access is actually typed now. You could do this generation using the compiler, but you could also do it using SourceKit—though it’s true that error handling would be worse if, say, a parameter wasn’t Codable.

I guess I just don’t think this is worth putting into the language. That’s a big step, with compatibility implications of years. With a separate library and code generation tool, we can still make breaking-change releases, and they aren’t tied to other updates to the language. And I don’t think you’d actually lose much expressive power at all that way.

17 Likes

Right, clearly one can. My concern is about whether one should. Developers reached a pretty strong consensus between the mid-80s and the mid-00s that RPC in general and stateful objects in particular are a really problematic model for widely distributed systems, and industry seems to be continuing even now in that “coarse-grained state transfer via stateless request” direction (REST → GraphQL, for example, is such a step) rather than hungering for something with the shape of this proposal. One could use HTTP as a transport for distributed actors, but…if that were a design goal of this proposal, then I have serious concerns about its current design.

OTOH, your custom accelerator example makes tons of sense to me the way this proposal is shaped, even some OS services, along with previously mentioned examples of XPC and LAN clusters. So it would be nice to know from the proposal authors what particular problem spaces they’re intending to target.

2 Likes

Without being qualified to judge the necessity of making this a language feature instead of a library, I’ll add my concurrence with Jordan’s general philosophy here: Swift should, when possible, prefer adding language features to make a thing possible over adding that thing to the language directly.

I thought SwiftUI was an excellent model for this approach: property wrappers and function builders are finding excellent uses well beyond SwiftUI, even though SwiftUI was the star client for them.

If distributed actors do in fact need new features in the language itself, it’s perhaps worth a design pass to consider whether smaller, more composable language features might serve those unmet needs.

13 Likes

+1 for breaking language features down as much as possible. That also reduces the need for future language changes to add new functionality.

Just want to confirm my understanding of the section “Transporting Errors” - it seems it should be possible to implement a transport that will return from the await when e g. a message has been enqueued to the network (but can throw if that fails for some reason) and there are no requirement to await any reply from the remote end (for distributed actor methods that aren’t throwing and returns void).

Specifically this allows for streaming-style calls of distributed actor methods that are one way without requiring round trips. So it’ll be up to the transport on how to handle this afaict (the other end of the spectrum could be a transport that will round trip so we don’t return until we know a message was delivered).

Yeah that's correct. I know other proposal co-authors are a bit nervous about such "does not wait" call, but I think they are tremendously useful, especially in distribution where one really does not want to wait for replies sometimes, and just best effort shoot messages over, getting acks back asynchronously e.g. in batch. It could be done perhaps not with Void but some "DontWaitVoid" return type or something silly... (that the transport understands) :thinking:


I would love to have a real uni-directional call capability in the language and it's popping up here and there already, like in Kavon's initializer work: SE-0327: On Actors and Initialization (under review now), there is an assumption that:

// NOTE: Task.detached is _not_ an exact substitute for this.
// It is expected that Custom Executors will provide a capability
// that implements this function, which atomically enqueues a paused task
// on the target actor before returning.
func spawnAndEnqueueTask<A: AnyActor>(_ a: A, _ f: () -> Void) { ... }

would exist. Such operation is what allows for such construct on a language level... I don't know if we'll ever get such "send" or not, but it has certianly come up a few times and I'd personally advocate for it, it solves a lot of potential "high level" races. This is similar in the sense that it makes a new task, but it is known that we'll never wait for it's result. If we knew we're such call perhaps we could automatically do the "don't reply"... This is WILD SPECULATION though, not something we planned / designed so far.

I think what you ask for would be explicit in the types for now. And I agree there are good use-cases for uni-directional calls (ha, like Akka's good ol' "tell (fire and forget), don't ask (request reply)" motto :wink:).

1 Like