Yeah you wouldn't want to go into distributed refcounting territory, that's quite a headache and not actually employed by any of the widely known distributed actor systems (akka, orleans, distributed erlang and their various ports and things they inspired). As such, the swift distributed cluster library also doesn't do distributed refcounting, instead you're tasked to figure out what suits your usage patterns, and handle it yourself.
You may be wondering if that's just punting the responsibility of a very hard problem to end users, and the answer is that not really because the usage-patterns driving distributed actor systems usually would actually not really benefit from such transparent counting IMHO anyway.
For example, the most popular and powerful (I really like linking Hoop's talk from 2014 to drive that point across, since he focuses nicely on the business and productivity side of what this programming model enabled them to do: https://www.youtube.com/watch?v=I91ZU8tEJkU&t=800s) way of using distributed actors is in a model where you "just assume the remote instance is always where", and leave it up to the cluster to either find the existing instance, or activate a fresh instance for given id (and potentially re-hydrate it with prior state). This pattern is called "virtual actors" (paper) as coined by the Orleans project, or cluster sharding + persistence by Akka, and we've not yet gotten to baking it into the distributed cluster library in Swift, but it'd certainly be a great addition (issue here) (as IMHO this is the most useful distributed actors pattern out there)
Imagine this example: you implemented a chess or card game using a distributed actor cluster. The "game instance" (state of the game) is stored inside a distributed actor in the server side cluster. There are two clients playing as opponents, maybe there's some spectators too. Now, the player's connections drop, but the game is still in a valid state and ongoing... Here's two ways to think about what to do:
- this is actual connection failures, so a cluster (like we offer) offers detecting those and you could indeed stop that game, but it's a business level decision of "yeah, they're gone, let's kill this game, maybe persist the state before we do so" etc;
- alternatively, and what akka and orleans systems are great at and you could build with our cluster system as well actually (!), the game can keep going, notice that remotes failed, kick off a timer that until they don't come back and make a move within 5 minutes... you'll actually kill that game instance.
So, IMHO, with distribution rarely is as simple as just shutting things down based on refcounts. Rather, often times the lifecycles are more interesting than that. I'm sure there are cases where this simple counting may be desirable, but I'd suggest not doubling down on it and either adopt an explicit way of communicating "usage".
Maybe also worth noting, what often is employed by all those distributed runtimes is "death pacts" or "lifecycle monitoring", which boils down to "if that actor [a, on some node] dies, I should died too" styles of behaviors. In the distributed actor cluster this is exposed as the LifecycleWatch
protocol which allows you to be notified about failures of others, but the keeping an instance alive is still bound to local refcounting - so e.g. you'd have some "KeepAlive" container and could remove yourself when you see noone will be using you anymore etc.
Specifically, in our cluster's case, simplified code would end up looking like this:
distributed actor Player { ... } // those exist on other nodes
distributed actor CardGame {
func startGame(player one: Player, player two: Player) async {
// store the players, start the game etc...
watchTermination(of: one)
watchTermination(of: two)
}
func terminated(actor id: ActorID) async {
print("Oh no! The player \(id) is dead! What should we do...")
// decide what to do... (kill the game, wait for player?)
}
}
If you really wanted to, you could build some "i'm using you / i've stopped using you" messages but it'd be done on the high level protocol, and not by propagating the raw refcounting traffic itself.
I think these lifecycle discussions are best done on concrete examples actually, so if you have a specific distributed actor system (transport) and use-case in mind, I'd be happy to help design how I'd deal with your requirements there 