[Pitch] Distributed Actors

Ideally gRPC could be implemented as a distributed actor transport.

agree, that would be great! Although as noted earlier, in addition to the semantics of streaming, and keeping connections open, we would also need the storage on the wire be the protobufs stuff most people use in that case, instead of Codable. Hence if we ripped out our internal GRPC stuff, we would still prefer streaming functionality, even if we didn't get full GRPC compatibility. Personally I think it would be "cool" if protobuf support was moved into Swift, and you at least had an option via some kind of protocol markup to use protobuf storage..

I've thought about this a bit more, and I no longer think this is a good idea. The distributed tag indicates what the surface area of the distributed actor that cannot be local, and I think it's key to Resolving DistributedActor bound protocols.

I've also been thinking about source generation or, rather, how to remove it.

ActorTransport-based remote calls

We can extend the ActorTransport protocol with the ability to perform a remote call, such that each distributed function can be compiled into a remote call through the transport, eliminating the need for code generation on the sender side. For the receiver side, each distributed function will be recorded in compiled metadata that can be queried to match a remote call to its type information and the actual function to call.

ActorTransport can be extended with a remote call with a signature like the following:

protocol ActorTransport {
  // ... existing requirements ...

  /// Perform a remote call to the function.
  ///
  /// - parameters:
  ///   - function: the identifier of the function, assigned by the implementation and used as a lookup key for the
  ///     runtime lookup of distributed functions.
  ///   - actor: identity for the actor that we're calling.
  ///   - arguments: the set of arguments provided to the call.
  ///   - resultType: the type of the result
  ///
  /// - returns: the result of the remote call
  func remoteCall<Result: Codable>(to function: String, on actor: DistributedActor, arguments: [Codable], returning resultType: Result.Type) async throws -> Result
}

When a distributed function is compiled, the compiler creates a distributed "thunk" that will either call the actual function (if the actor is local) or form a remote call invocation (if the actor is remote). For example, given:

distributed actor Greeter {
  distributed func greet(name: String) -> String { ... }
}

the compiler will produce a distributed thunk that looks like this:

extension Greeter {
  nonisolated func greet_$thunk(name: String) async throws -> String {
    if isRemoteActor(self) {
      // Remote actor, perform a remote call
      return try await actorTransport.remoteCall(to: "unique string for 'greet'", on: self, arguments: [name], returning: String.self)
    } else {
      // Local actor, just perform the call locally
      return await greet(name: name)
    }
  }
}

The transport will have to implement remoteCall(to:on:arguments:returning:), and is responsible for forming the message envelope containing the function identifier and arguments and sending it to the remote host, then suspending until a reply is received or some failure has been encountered.

Responding to remote calls

When a message envelope for a remote call is received, it will need to be matched to a local operation that can be called. The function identifier is the key to accessing this information, which will be provided by the distributed actor runtime through the following API:

struct DistributedFunction {
  /// The type of the distributed actor on which this function can be invoked.
  let actorType: DistributedActor.Type
  
  /// The types of the parameters to the function.
  let parameterTypes: [Codable.Type]
  
  /// The result type of the function.
  let resultType: Codable.Type
  
  /// The error type of the function, which will currently be either `Error` or `Never`.
  let errorType: Error.Type
  
  /// The actual (local) function that can be invoked given the distributed actor (whose type is `actorType`), arguments
  /// (whose types correspond to those in `parameterTypes`) and producing either a result (of type `resultType`) or throwing
  /// an error (of type `errorType`).
  let function: (DistributedActor, [Codable]) async throws -> Codable
  
  /// Try to resolve the given function name into a distributed function, or produce `nil` if the name cannot be resolved.
  init?(resolving function: String)
}

The function property of DistributedFunction is most interesting, because it is a type-erased value that allows calling the local function without any specific knowledge of the types that the function types. The other properties (actorType, parameterTypes, etc.) describe the types that function expects to work with. Together, they provide enough information to decode a message.

The initializer will perform a lookup into runtime metadata that maps from the string to these operations. The compiler will be responsible for emitting suitable metadata, which includes a type-erased "thunk" to call the local function. For the greet operation in the prior section, that would look something like this:

func Greeter_greet_$erased(actor: DistributedActor, arguments: [Codable]) async throws -> Codable
  let concreteActor = actor as! Greeter   // known to be local
  let concreteName = arguments[0] as! String
  return await concreteActor.greet(name: concreteName)
}

The design here is not unlike what we do with protocol conformances: the compiler emits information about the conformances declared within your module into a special section in the binary, and the runtime looks into that section when it needs to determine whether a given type conforms to a protocol dynamically (e.g., to evaluate value as? SomeProtocol). The approach allows, for example, a module different from the one that defined a distributed actor to provide a distributed function on that distributed actor, and we'll still be able to find it.

Doug

11 Likes

I've thought more about this, too, and I believe I understand the shape of the solution. The basic idea is for the ActorTransport to gain knowledge of the serialization constraints that it places on the parameter and result types of distributed functions using that transport. This could default to Codable (which is shorthand for Encodable & Decodable), but for example a protobuf implementation could instead specify SwiftProtobuf.Message. For example, ActorTransport would gain an associated type like the following:

protocol ActorTransport {
  associatedtype SerializedRequirementType = Codable
}

A protobuf transport could specify SwiftProtobuf.Message here, such that any distributed actors using the protobuf transport serialize data through its message protocol. Now, this does require that distributed actors know how they serialize data, so at the very least we need DistributedActor to know the serialization requirement that applies to its distributed functions:

protocol DistributedActor {
  associatedtype SerializationRequirementType = Codable
  
  static func resolve<Identity, ActorTransportType>(id identity: Identity, using transport: ActorTransportType) throws -> Self?
    where Identity: ActorIdentity, ActorTransportType: ActorTransport, 
          ActorTransportType.SerializationRequirementType == SerializationRequirementType
}

In truth, a distributed actor is likely to want to know more about its transport than that, so we might just want to jump to having the transport be part of the DistributedActor protocol:

protocol DistributedActor {
  associatedtype ActorTransportType: ActorTransport
  
  static func resolve<Identity>(id identity: Identity, using transport: ActorTransportType) throws -> Self?
    where Identity: ActorIdentity
}

I'll leave that for a separate discussion, and assume the first version of DistributedActor. Given that, let's reconsider our Greeter distributed actor from before:

distributed actor Greeter {
  distributed func greet(name: String) -> String { ... }
}

The SerializationRequirement of Greeter will default to Codable. Therefore, the parameter and result types for the greet operation must satisfy the requirements of Codable. If instead we had a protobuf-based greeter, like the following:

distributed actor ProtobufGreeter {
  typealias SerializationRequirement = SwiftProtobuf.Message
  
  distributed func greet(name: String) -> String { ... }
}

Now, the parameter and result types for the greet operation (both String) would have to conform to the SwiftProtobuf.Message protocol, and all distributed operations on ProtobufGreeter would use protobuf messages. This affects how the code is type-checked and the shape of the distributed thunks that can perform the actual remote procedure calls (whether using source generation or my approach to eliminating source generation), but not the fundamental way that distributed actors behave.

This approach does introduce restrictions on distributed functions that aren't on a concrete actor type. For example, we could not define a distributed function in an extension of DistributedActor because there is no way to ensure type safety:

extension DistributedActor {
  distributed func getPoint(index: Int) -> Point { ... } // should Int and Point conform to Codable? SwiftProtobuf.Message?
}

To address this, we would have to require distributed functions on protocol extensions to specify their serialization requirement type explicitly:

extension DistributedActor where SerializationRequirementType == Codable {
  distributed func getPoint(index: Int) -> Point { ... } // Int and Point must conform to Codable
}

This entire design requires SerializationRequirementType to be dealt with fairly concretely. One cannot, for example, parameterize a distributed actor over its serialization requirement:

distributed actor SneakyGreeter<SR> {
  typealias SerializationRequirementType = SR // error: we cannot express that SR is an existential type that can be a generic requirement 

  distributed func greet(name: String) -> String { ... } // cannot guarantee that `String` conforms to `SR`
}

While one could imagine advanced type system features that would allow the above to be expressible, such a change would be far out of scope as part of distributed actors. I think we should accept that as a limitation of this particular approach, and it could be lifted in the future if the type systems eventually supports it.

This can layer on top of my approach to eliminating source generation by effectively deleting all of the Codable mentions from DistributedFunction, remoteCall, etc. Instead, use Any or no requirements at all, and left it to the type checker + transport to ensure that a distributed function cannot get called with the wrong serialization requirements.

Doug

8 Likes

Thanks for the writeups, Doug!

As we've worked through this before I very much agree with all of this and more than happy to take the proposal to the next level with this! :100: Unless others will have much feedback here, I think I'll work this into our second version of the proposal as we work through implementing some of this in the coming weeks.


For additional context a few comments:

The DistributedFunction type

Yes, that is perfect! :slight_smile:

And we'd have that function(...) crash hard if misused, in order to prevent any weird misuse. It is the transport developers responsibility to make quadruple sure things are being invoked on the right actors and with the right types.

We'll have to work out what exactly it has to contain to support a few other advanced features (like for tracing we'd want to have the "#function" name so a trace span could use that as a nice default for the name of the trace). By lucky accident, recently there was a talk by the Reuben from the Orleans team, describing their source generator (they use source gen), and it basically is exactly that shape as well (including the pretty names for debugging/introspection), so that was a nice accidental confirmation of our design -- as I've seen it only after we came up with this actually :smiley: (Though to be fair, the requirements for such systems are all the same, so yeah, similar solutions arise).

Transport or SerializationRequirement typealiases

You mentioned that it's perhaps left for another discussion, but I'll quickly say that for many reasons, this is exactly what we'd want anyway, and were aiming for eventually anyway:

protocol DistributedActor {
  associatedtype ActorTransportType: ActorTransport
}

This has a small rippling effect on some of the other synthesis (type of transport accepted by initializer), but that's pretty simple and something we wanted anyway.

In practice I think we'd default to something like CodableTransport -- by the way, perhaps time to rename this as ActorMessageTransport and then CodableMessageTransport which just refines it with the typealias SerializedRequirementType = Codable :thinking:

Overall looking great and I think it'll work well.

I am not too concerned about the limitation on extensions -- this is fine IMHO. What matters the most is that a protocol itself does not have to specify this, and we can validate there is a specific type provided either at the concrete actor or at the time of resolving. Either being static compile time checks :+1:

In any case, this all sounds great and I'll work those into the next revision of the proposal :+1:

8 Likes

At an implementation level, I was thinking I'd use the mangled name here, since it has pretty much all of the information we need and the implementation guarantees uniqueness already. We could expose that to the user in a more structured manner so they could get, e.g., the actor type name, function name (greet(string:)), module name, and so on.

Yeah, I didn't want to pre-suppose, but I like this direction. It also means we could eliminate the existential AnyActorTransport storage from distributed actors to make them smaller and more concrete. De-existentialing the interfaces to distributed actors seems like a generally good thing.

Doug

2 Likes

I agree, it’s almost always best to avoid existentials when possible.

How would someone switch an existing code base to a different transport/serialization? Say we start with Codable, and then decide to switch to ProtoBuf for performance reasons? Also, is it going to be possible to have different transport and serialization for the same remote actor based on where the remote actor happens to be located? (Say a different process on the same physical machine vs a remote machine)

1 Like

Wouldn’t you just rewrite the impacted code? I don’t see any reasonable way around that.

In theory, you could write some sort of adaptor that decodes one and encodes the other, but there’d be little benefit in doing so.

I am aware of these options. The reason I am asking is further exploring the design space to see if we can do better to reduce dependency on concrete transport and serialization format at the source code level. Maybe something like some Transport and some Serializable, if you will. Something that would keep the ultimate generated code as concrete and statically optimized as possible, while accommodating for the cases that I mentioned.

2 Likes
Terms of Service

Privacy Policy

Cookie Policy