Runtime extensibility via distributed actors

pepi · May 1, 2024, 10:56am

Hi

As developers of a Swift CLI, we need to provide extensibility to our users so that they can extend workflows with their business logic. In the past, we achieved this using system processes and standard input and output pipelines, as Xcode does with Swift Macros and plugin executables, but standard pipelines are pretty limiting.

This limitation is something that I brought up at a recent conference talk, in which I referenced go-plugin as an example of how building extensibility in a compiled programming language is possible and @Alex-Ozun suggested to look into distributed actors as a foundation upon which we could build something like swift-plugin, for Swift developers interested in providing third-party extensibility in their Swift apps.

Would people in this group be interested in exploring this with Swift Actors? If so, I can put together a proposal for what a solution could look like and what would be the responsibilities of a package and toolchain similar to go-plugin (e.g., should we take care of the process of building, signing, and distributing those executables?). I'd appreciate suggestions for the next steps here, and I'd be super happy to explore this in collaboration with people from the community or Apple. We could dog-food the system with the CLI I'm responsible for maintaining, or why not, with Apple's Swift Macro's implementation.

Thanks

ktoso · May 9, 2024, 7:45am

Hi Pedro,
this is exactly the kind of thing distributed actors are designed for.

I believe we'll have a rising number of cross-process use-cases which we'll want to support with a system like this. The recently posted "crash tests in swift testing" would also benefit from this, as would other use-cases.

It would be greatly beneficial if we can collaborate on a single "ProcessSystem (: DistributedActorSystem)" which simply isolates some work using spawned processes and allows monitoring their failure, and on top of that we'll build the specific use-cases, be it in swift-testing or what you just explained for general swift CLIs etc.

I'm planning to provide more documentation on implementing distributed actor systems soon, but in general taking a stab at implementing DistributedActorSystem where the remoteCall is implemented by piping messages to a child process is a good start!

I'll try to make a PoC soon as well, but cannot promise timelines -- if you'd like to have a head start on this that'd be awesome. In general you can imagine the usage being something like this:

Please don't hesitate to ping me with questions about distributed actors!

I am wondering if we should do a very basic ProcessSystem that ONLY does spawning processes and monitoring for failure, along with delivering remote calls to them. And on top of that build the ProcessPlugin mini library. This way the plugins can reuse the process stuff from the actor system, and other applications which want distributed actors in processes can as well, even if they're not necessarily the "plugin style".

mlienert · May 13, 2024, 11:16am

Hello!

I have a POC somewhere doing exactly what you describe @ktoso

I had plan to revisit it following SE-0428 - Resolve DistributedActor protocols as it simplifies greatly the way to resolve actors in those scenarios but didn't get the time yet.

The code is in a very rough state as I was discovering how to implement a DistributedActorSystem at the same time but it worked pretty well. Most notably the management of failure (either from the main app or the plugins) is not ideal at all.

I also tested making the plugin a WebAssembly binary to have better isolation/security but didn't manage yet to make them work from within an embedded wasm runtime (of course it worked by spawing a external runtime).

ktoso · May 13, 2024, 11:32am

That's very cool to hear you kicked off a PoC

Even if in very rough state, I wonder if this is something you could OSS and folks interested in this thread could collaborate and help polish it up? WDYT?

mlienert · May 14, 2024, 8:31am

I will have a look at the code tomorrow and see if it's worth it

I may try to adapt it to use SE-0428 which may simplify a few things.

ktoso · May 14, 2024, 8:32am

Great! Don't be shy to reach out, happy to help, even if code is chaos -- we can figure out the shapes of things and work towards a reusable system There's more than enough interested folks that I'm sure some collaboration would be useful!

jaleel · May 14, 2024, 8:32am

Would add that this WWDC video was also helpful for me to understand, with the TicTacFish sample.

A bit off topic, but I see several ideas floating around here (like Wasm+DA), so really would be nice to collaborate. Not to bloat this thread, there is special #swift-distributed-actors channel in Swift Open Source Slack, we can continue to discuss there. Or find something different which will suit.

ktoso · May 14, 2024, 8:34am

Since it can be difficult to find the open source swift slack sometimes, here's a fresh link: Slack

There's a distributed channel there, feel free to ping me (@ktoso) or @jaleel there he played around with DA already quite a bunch :)

mlienert · May 14, 2024, 8:44am

I was not aware of this Slack, I will join too

mlienert · May 16, 2024, 5:17pm

Hello,

I just pushed my code on Github after some cleanup.

ktoso · May 17, 2024, 7:57am

Very nice! That's pretty good, roughly the shape I was envisioning

We can polish this up with an initial handshake, and ofc making use Swift 6's @Resolvable which was really made for exactly such situations -- I see you have it in there commented out, I assume just waiting for an stable Swift 6.0 to drop? Might be nice to just require 6 tbh, and see if we can find any issues while we work on it.

Next up it'd be interested to explore process monitoring so that if the process dies the parent can handle restarts; and inform watchers (from parent process) of actors in the child processes that they have terminated. The pattern to look at for this would be something like the LifecycleWatch from the cluster -- so we "watch" an actor, but the monitoring is handled by checking the process ("node" kind of).

Really great start and I'll want to give it a deeper look soon
Would be great if we can get a group together polishing this up and making it into a great small library for various "plugins" etc

jaleel · May 17, 2024, 12:43pm

uh, nice, that was quick and nice implementation!

Wonder though if we can improve the way not to create SubprocessActorSystem per each process and somehow have one system + plugin, which wraps executables into actors. Not sure yet how to implement this yet, though.

mlienert · May 20, 2024, 7:20am

As far as I understand how it works, we have to create a SubprocessActorSystem per process as it's the component that is performing the distributed calls serialization and the inter-process communication.

ktoso · May 20, 2024, 7:44am

An actor system is generally expected to be managing many actors -- there's no need to have multiple ones.

The actor's ID is how the system knows "where" to send a message whenever the remoteCall is invoked. Notice the remoteCall's parameters: on actor:target:invocation:throwing:returning: so we get the actor the call is made on, and can actor.id get that identifier, based on it, we identify which node/process to send the call to.

(Technically you can have a system which is one system per one actor, but it's a bit weird; I would recommend sticking to the usual way of a system being a collection of managed actors. And that's how other systems work today as well.)

mlienert · May 20, 2024, 12:06pm

I think the question was around having an actor system per process instead of per actor.

SubprocessActorSystem can handle several distributed actors per process. The Actor ID is the process pid + a local actor id (which is starting at 0 and incremented for every new actor created).

ktoso · May 20, 2024, 9:59pm

Ah I see what you're saying. It mostly depends what capabilities we'd like to expose, and I'm slightly leaning to just have one system manage things tbh -- it gets rid of "you used the wrong system!" runtime errors that might come up otherwise, but yeah it comes down to moving around where the 1:n is.

I think having a system manage multiple processes may be nicer (?) since you'd have a setup phase where you configure if an actor is created "just once" and all resolves get the same one, or if multiple instances/processes of the same actor should be created etc. While configuration on extra things like "how" the process is managed.

I guess I'm thinking about this the same way as I would about a connection pool, http client or cluster... I've seen some IPC systems adopt Distributed and they'd also opt in for the multiplexing behavior. That is to say that the "where" is more of a configuration change, and less of a code change "which actor system is being passed around"

It's probably worth trying some API examples though and see what feels better.

mlienert · May 21, 2024, 7:46am

Yes I was leaning on having one system implementation and abstract around the "transport" (ie in which channel are the message passthrough) to allow supporting launching subprocesses, hosting wasm runtime, using IPC, ...

If we talk in term of network topology the DistributedCluster is a mesh and here we want to create a star I suppose.

jaleel · May 21, 2024, 10:46am

I'm just wondering if we can have this extend workflows feature as @pepi suggested with this setup...

Can't see the whole picture. I know that your demo is for educational process, but it's a good starting point. Can you add a bit more info in readme and open discussion on github so that can potentially also help with next steps?

ktoso · May 22, 2024, 3:26pm

Star or a tree I suppose, child processes could also have child processes, but yes! I think we're not aiming to support a mesh in this actor system

Looking forward to figuring out how we can get together a mini community to polish this up as a full fledged project!

mlienert · May 22, 2024, 7:07pm

True ! I didn't take into account the fact that child processes could have other child processes. I will look at my implementation to see if I can make this possible

About progress I successfully implement plugin in WASM and host them with WasmKit into the Host application. It can be used along side child processes plugins.

Pretty happy about this achievement, first time I use swiftwasm and WasmKit together. But I had to implement a polling from the wasm guest to the host and was not able implement more Concurrency friendly.