Why doesn’t DistributedActor
refine Identifiable
? It obviously meets the requirements.
It does.
public protocol DistributedActor: Sendable, Codable, Identifiable, Hashable {
See: swift-evolution/nnnn-distributed-actors.md at distributed-revised · ktoso/swift-evolution · GitHub
Ah, never mind then. I missed that somehow, sorry.
Putting aside attempts at constructive criticism for a moment, I must say this is one of the most impressive proposed features I’ve ever seen. It’s the product of an immense amount of work since Swift’s inception, and it seems to rule out a staggering array of logic and coupling errors without sacrificing versatility or clarity.
I'm incredibly impressed by this pitch – I saw the development of distributed actors throughout the past couple of months by peeking at some of your git branches, but this is truly impressive, so congratulations.
I'm curious about this:
Once we are confident in the semantics and language features required to get rid of the source generation step, we will be able to synthesize the appropriate
Envelope<Message>
and represent everydistributed func
as a well-defined codableMessage
type, which transports then would simply use their configured coders on.
What language features are required? IIUC, code synthesis is nothing new to Swift, as Codable relies on it extensively. Why do we need to wait for more metaprogramming-related language features to land? Can't this synthesis behaviour be baked into the compiler, like so much of the distributed actors feature already is?
Thanks for the kind words
We're actively working on seeing what we can do to get rid of the source generation step -- you can assume this bit might change quite a bit.
We wanted to get the pitch out there so we can begin gathering feedback, I'm pretty hopeful we can get rid of the source-gen step eventually with enough sneaky tricks and thunks.
In general the tricky parts are:
- we got some bytes, we managed to lookup the actor, we even managed to decode the "message" (assuming we have some known representation for it), but how do we apply this message.
- so we need some "function handle that can be looked up from some serializable id and invoked"
- and how do we do this without forcing Codable and some specific representation
- the generation of messages, we would not want to "lock in" forever a specific message representation, say "all funcs have a case in a huge enum" since it may hit limitations or problems for specific transports or just in future evolution of the feature.
- so we need message reprs to be opaque if we were to synthesize them
The good news is that we have some ideas for both of those problems. They would affect the design a bit of course and we'll post a bigger update once we have something to share.
In the meantime it is useful to keep providing feedback about the general feature, thanks in advance!
I really don't find sufficient motivation in this feature for burdening Swift with new keywords, new magic dispatch rules, new blessed special cases for synthesis, and new declarations and restrictions that are inexpressible in normal user code. Is this truly a new concept in the language itself, a peer with structs and protocols?
I would urge that this feature be reconsidered as a library-based solution, and the initial effort directed towards enabling that library with general facilities in Swift that are accessible to all language users.
This is a very cool proposal. Swift's actors were "miniaturized" from distributed actors to fit within a process, and this proposal layers distributed actors back on top in a natural way. I have a couple of design-related comments and questions.
Properties
The section on properties describes two restrictions on properties in distributed actors:
- They cannot be
distributed
, so they cannot be accessed from outside the actor. - They cannot be
nonisolated
These two restrictions are not equal, though.
The first is a matter of policy: the proposal doesn't allow properties to be accessed from outside the actor because reading properties from a remote actor is slow, and you should probably collect your property accesses into a distributed method instead. However, the same could be said of normal actors: you probably should not read two properties from a normal actor in a row, because some other code could run on the actor in between your two reads and you would get inconsistent results. The await
is there in the code to indicate that there's a potential delay here. So, I'd prefer to be consistent with non-distributed actors here and allow property reads from outside the actor, rather than arbitrarily slice out this feature.
The second is more fundamental: one cannot have a nonisolated let
property in a distributed actor because doing so would require the let
value to be replicated wherever there is a reference to the distributed actor. Aside from the redundant storage, this means that when you resolve an actor address for a remote actor, you would have to communicate with that remote actor to get the replicated data, therefore requiring remote actor resolution to be async
. I would prefer for the proposal to expand on the reasons why nonisolated let
is a fundamental difference for distributed actors, and make this the only property-based restriction.
Distributed methods
Methods on a distributed actor need to be marked as distributed
to be used from outside the actor. To me, this feels like an implementation limitation (that code generators need to know about all of the distributed methods) that has crept into the language design.
I would prefer that we not require distributed
on functions to call them from outside the actor. That way, their semantics line up as closely with non-distributed actors as is possible. With non-distributed actors, you can write a method on an extension of an actor and call it from the outside as async
:
actor MyActor { }
extension MyActor {
func f() { }
}
func g(ma: MyActor) async {
await ma.f() // we can call f() as async when we're outside the actor
}
Distributed actors necessarily need to have calls from outside the actor be throwing, because transports can fail, which is explained well in the proposal. That would imply that we should be able to do this:
distributed actor MyDistributedActor { }
extension MyDistributedActor {
func f() { }
}
func g(ma: MyDistributedActor) async throws {
try await ma.f() // we can call f() as async when we're outside the actor
}
The proposal also requires that we mark f
as being distributed
, but this is unfortunate, because the function g
could be defined in a different module:
// module A
public distributed actor MyDistributedActor { }
extension MyDistributedActor {
public func f() { }
}
// module B
import A
func g(ma: MyDistributedActor) async throws {
try await ma.f() // can't do this because f() wasn't marked 'distributed'
}
Unfortunately, our author of module B is stuck: MyDistributedActor.f()
wasn't originally marked as distributed
, and that cannot be fixed without updating module A.
The requirement that DistributedActor
-inheriting protocols only have distributed
and nonisolated
members is another consequence of requiring distributed
on functions. If we didn't need distributed
on functions, DistributedActor
-inheriting protocols could follow the same rules as Actor
-inheriting protocols, with the one necessary change that calls outside the actor are both async
and throws
.
I think we can lift the implementation limitation that requires distributed
to be provided ahead of time by using a different approach, that I think would also work across module boundaries. If so, I would prefer to remove distributed func
(and the distributed var
I implied with my other comments above) from the language entirely, aligning the mental model of distributed actors much more closely with that of non-distributed actors.
Codable
requirement
Related to the comments above about distributed func
, I suspect it's possible to drop the Codable
requirement from the proposal by using some kind of ad hoc protocol to interact with the transport. It's probably worth spinning off a separate discussion about the implementability of this, though, and keep more focused on semantics here.
withLocalDistributedActor
I love how well this API worked out, even though you clearly don't want folks to actually use it ;). I think you need to make T: Sendable
for this to work, however, because when you're running locally there will be no Codable
round-trip.
Equatable
and Hashable
conformances
I don't quite know if you need these, because I think it depends a bit on the notion of identity that matters. Actors (whether distributed or not) are reference types that conform to AnyObject
, so one can compare their identity (with ===
) and use ObjectIdentifier
to get a Hashable
and Equatable
type. Can a given process create more than one remote actor instance for the same actor? If so, then identity as defined by ===
will differ from equality as defined by ==
, which feels (to me) like a semantic trap. On the other hand, ensuring uniqueness for remote actor instances means you probably have a big old lock in the actor runtime around actor resolution.
What are the semantics we want here? And if object identity and equality would always be the same, should we leave off the Equatable
and Hashable
conformances entirely?
If we do keep the Equatable
and Hashable
conformances, I think this paragraph needs an update to reflect SE-0309.
The property uses a type-erased struct to store the identity, because otherwise
Hashable
'sSelf
type requirements would prevent usingDistributedActor
bound protocols as existentials, which is a crucial use-case that we are going to implement in the near future.
Separating client and server implementations
The future work section on resolving DistributedActor
bound protocols talks about using a distributed actor protocol where the client and server have different implementations, and gives a protocol as an example:
protocol Greeter: DistributedActor {
distributed func greet(name: String) throws -> String
}
I don't think Greeter
benefits from being considered a distributed actor: you're not likely to get much use out of code synthesis for the message send implied by a call, when (e.g.) the server is implemented in some other language. Instead, I think distributed actors should focus in on the case where you are sharing the code across all of the cluster nodes/processes/etc. If someone would like to separate the client from the server, that can be done with a normal protocol that has async throws
operations on it:
protocol Greeter {
func greet(name: String) async throws -> String
}
Now, this absolutely can be implemented by a distributed actor:
distributed actor MyGreeter: Greeter {
func greet(name: String) -> String {
"Hello \(name)!"
}
}
but it can also be implemented separately on client and server, perhaps with bespoke implementations to match some existing protocol (gRPC or whatever). I have no doubt that some class of those implementations could be autogenerated from the protocol definition, and I would hope that some of the things we learn from working on distributed thunk synthesis can help there, but I think it's important not to view these protocols as distributed actors.
That came out a lot longer than expected. All that said, I think we're on a trajectory to build something great for distributed computing.
Doug
Another reflection:
What are the thoughts with regard to protocol compatibility between a “client” and remote service? Would distributed actors be limited to homogenously compiled “compatible” systems or is it envisioned to have some sort of protocol compatibility checks and support evolution of data transferred?
Many protocols allow backwards compatible evolution to decouple service/client - will this be possible with distributed actors?
I agree that this sort of complexity is an extremely slippery slope, and should be approached extraordinarily carefully.
Like with SwiftUI, the language features necessary to make this work should be as broad and versatile as possible. I am personally somewhat skeptical about distributed
as a keyword: it seems too specific.
I think the best path forward is making the new language concepts as orthogonal as possible, to enable functionality beyond the immediate proposal. Off the top of my head: I think some of these concepts, particularly the throwing behavior, could be broadened to fit delegate patterns in general.
I know people are skeptical about typed errors, but I think that they open up a lot of extremely useful safety features. Like with Result
, nothing would stop you from throwing Error
, and I think that could be an implicit thrown type in the same way that Void
is implicitly returned. In terms of type signature, meanwhile, you could say non-throwing types throw Never
.
The ability to opt into a stricter API contract would be very helpful though, especially for non-public code. I am often tempted to avoid throwing entirely in favor of using Result
, purely for that additional safety.
As for public typed throws, many options come to mind:
- A number of people have proposed adding an enumeration attribute that requires
@unknown
handling even when frozen. Adding new cases to such an enumeration would be additive rather than breaking, which may be fitting for thrown errors. - An extremely common pattern consists of wrapping thrown errors in an associated case. This allows fine-grained control over exposed type information, hiding implementation details that should not be part of the API contract.
- Errors do not need to be enumerations, common as that is: developers may choose to throw
struct
s for particularly dynamic errors.
I recognize that typed throws are beyond the scope of the pitch, but I do think it is preferable to introducing bespoke requirements. Remember, it would always be possible to cast to Error
: at worst, you end up with the current state of affairs.
Finally, I firmly believe that everything in documentation should be considered part of the API contract. If you document any thrown errors (and you should), changing which errors you throw (and even which reasons you throw them for) is likely already an additive or breaking change. Typed errors would merely make the compiler acknowledge that.
A lot of Swift’s language complexity already comes from not having typed errors, and I do not wish to add more.
public func map<T>(
_ transform: (Element) throws -> T
) rethrows -> [T]
could become syntactic sugar for
public func map<E, T>(
_ transform: (Element) throws E -> T
) throws E -> [T]
I appreciate the input, but this pitch/proposal isn’t where typed throws are going to happen. Let’s please focus on the topics at hand, it already is a very large and interesting surface area by itself. Nothing in the design prevents us from adopting typed throws IF they were to happen in a future swift release.
Hope this makes sense, thanks!
Sorry, you’re right. I do think this proposal should avoid burying too much in the language, though. It may even be best to make the implementation dependent on such a feature and delay it accordingly. Or just drop all Error
-related requirements for now.
There are no Error related requirements in this proposal.
The “how to transfer errors” section is merely suggestions to transport authors given the status quo of the language.
Errors thrown by the underlying transport due to connection or messaging problems must conform to the
ActorTransportError
protocol.
This is a suggestion as well. We can make that a “should”. Nothing breaks or doesn’t work if this would be the case and a transport violated this recommendation.
“Underlining” here means things like network errors - if an actor transport encounters network issues and fails a call because of that, it’s good for users to be able to know that was such kind of error vs. an error thrown by the alive and actually replying remote actor.
I agree completely, but I think that knowledge is impossible without typed throws. That’s more or less the entire point of typed throws.
I have encountered this precise issue repeatedly in my own code, and I’ve had to settle for throwing methods that return a Result
. It’s all rather tedious, and I look forward to a day where I can use a distributed actor that employs associated types to throw specific errors for a given combination of transport and method.
I have two requests to make, if its possible of course:
-
If possible reuse some well know binary encoding like protobuf, even if not using proto's IDL
and going all the way with Swift for IDL (which is a good idea anyway), having a well-known
widespread binary payload is pretty good for interfacing with other systems even outside of the Swift world like gRPC. -
If 1 is not the path taken, please make a documentation of the custom binary protocol that is going on the wire and the interface that is being used to generate the binary protocol available for us to call and generate the binary payload "by hand" if needed, given with this we will be able to interface and reuse it, playing well with other technologies that might want to interface with it.
In short, the payload that is going on the wire should be well-known or well documented if its a custom protocol, so that interfaces with other tech and with other languages are possible.
There is no protocol, it’s just Codable and a transport. I’m pretty sure you could have a transport act as an adapter between the two quite easily.