SE-0344: Distributed Actor Runtime

ktoso · March 9, 2022, 1:36pm

fullName is a specific term and refers to the hello(param:anotherParam:) format - same as the #function would give you. I will clarify this in the proposal though, as it wasn't spelled out explicitly.

Again though; there must be some shared scheme to identify targets. I said previously that stable names are specifically a thing we are interested in - giving users the ability to lock in a name for identification and evolution. I don't believe that tons of powerful customization is needed here -- having worked on wire format evolution and compatibility in Akka for many years -- often all you need is a stable name really; this also helps with compression as "frequent" messages can use short identification.

We'll expand the discussion of these things and will consider maybe pulling in stable names earlier, but I'm not quite ready to commit to a design right now, will need some time to think it through now. I think we also will have to make executeDistributedTarget take an RemoteCallTarget which we'd allow to be different representations that the Swift runtime can support... I agree we should dive deeper into bit and we will

Happy to hear more concrete examples and use-cases you have in mind though. We specifically have an use case where fullNames will be sufficient, and such system implementation would not allow overloads by types actually.

We also need to think about metadata storage and availability; We probably shouldn't emit all kinds of metadata to be looked up by... so that's another thing to consider... Probably for the storage of the accessor pointers, the mangled names (in current or improved mangling) are the best option, because we can derive the other representations from it during lookup (and cache results).

I guess what I'm pushing back on though is "we need a complete evolution story right now", I don't think that's the case. We just need to make sure we can roll one out in upcoming proposals.

Joe_Groff · March 10, 2022, 5:00pm

Thanks for the discussion, everyone! Based on this first round of review, the core team has decided to return the proposal for revision. We recognize that there is still some discussion going on in this thread, so you all are welcome to continue those threads here until the next round of review is ready. We anticipate some of these discussions continuing into the next review.

masters3d · March 16, 2022, 1:34pm

Is there a general idea of what’s to come in the medium term? I am curious about a full fledged error resiliency story.

The swift standard library uses runtime traps which imo are more like uncatchable exceptions; the traps exit the running program without any facilities to recover ( unlike rust/go that have some facilities to catch some types of panics). I would love to know how the Distributed Actor runtime could provide some form of lifecycle for distributed actors or perhaps this has to be done outside the runtime?

ktoso · March 16, 2022, 1:52pm

That is somewhat of a completely different topic...

The distributed actor work by itself does not introduce any "unwinding" facilities or other mechanisms similar to "panics". This is not the focus of the proposals under discussion right now.

It very much remains an area I'd personally, as server developer, be interested in improving in the future, but we don't have specifics to share here. You can see me describing some of the ideas for "soft faults" all the way back in 2019 here: [stdlib] Cleanup callback for fatal Swift errors - #4 by ktoso and it also is aligned with our interest of improving the backtrace experience and of course the ability to strongly isolate state which distributed actors bring to the table. So maybe there's potential here in the future, but right now we're not looking actively into this.

It is true that distributed actors provide a very strong isolation mechanism, that could be used to build upon and utilize strong guaranteed isolation to provide some ways for fault tolerance.

Today Swift does not offer "panics" or unwind mechanisms which would be necessary to implement this in-process; but if you wanted to achieve resilience by a process transparently spawning child processees in which "child (distributed) actors" a spawned and communicated with transparently from the main process... then those child actors could crash the entire process independently and without impact on others. So that is one way one could use distributed actors -- to isolate crashes thanks to process boundaries. I had prototypes of this approach, so it is absolutely doable, but remains to be seen if interesting to the community

Joe_Groff · March 16, 2022, 5:13pm

Yeah, distributed actors provide the right amount of isolation between the actor instance and the rest of the program to provide a semantic foundation for containing fatal errors. This is probably well in the realm of "future directions" but one could imagine the compiler generating distributed actor code in such a way that it runs in-process with an isolated heap and some amount of unwind info to let us handle fatal error traps coming from distributed actor code by unwinding its execution and blowing away its heap without disturbing the rest of the process.

Joe_Groff · March 22, 2022, 6:46pm

Sorry for not linking the two threads here, but here is the second review thread.