Announcing new WebSocketActors library for client/server applications

samalone · December 5, 2023, 2:03pm

I think it's time to publicize my new WebSocketActors library for creating client/server applications using Swift distributed actors.

The library is based on Apple's TicTacFish sample code, but adds features like:

Simultaneous connections to multiple servers & clients
Automatic reconnection after network failures
Server calls to client actors (server push)
Logging with SwiftLog

I'm happy to answer questions and eager to get feedback. The documentation isn't finished, but already includes some short articles with tips on using distributed actors. I'll try to respond here, but the best places to reach me are on Github and Mastodon.

ktoso · December 8, 2023, 8:41am

This is very very cool, congratulations!

I skimmed just a bit of the project quickly, and had some questions that might help us inform and improve the distributed language feature.

Assigning IDs

I believe you need to do the

system.makeLocalActor(id: .greeter) {
  Greeter(actorSystem: system)
}

dance because you want to provide a "well known" ID that clients can get. This is a common pattern and need and I've been thinking how we can help actor system authors out here.

We'd basically need a way to "pass in" extra metadata into the actor creation, such that assignID can use an passed in ID, rather than be forced to generate one. Did I spot that right and do you think such capability would help so we could avoid having these tricks with closures?

Resolving protocols

I suspect your use cases are such that you'd like to have a server expose a distributed actor to "clients".

The current design is really tailored to symmetric nodes, so all nodes have the same code -- such as a cluster.

I'm very interested in lifting this limitation, such that we could avoid the awkward dance as Documentation needs to point out.

Ideally, we'd be able to (API not final, still working on a draft):

public protocol WeatherStationRegistry: DistributedActor 
where ActorSystem == WebSocketActorSystem {
    distributed func getStationsNear(
        latitude: Double,
        longitude: Double,
        radius: Double) -> [WeatherStation]
}

and then be able to ((any WeatherStationRegistry).resolve(...) on the client, without knowing the exact implementation.

You'd be able to put "the API" into a shared module, that clent and server depend on -- but only server actually has an implementation for

System IDs

I think that's an okey approach, with the client not having to care about identity.
Though I do wonder if it wouldn't be nicer to have the ActorSystem initializer itself to start in a specific "mode" rather than having to runServer later on
Did you have some thoughts on that and decided to go with the method after all?

It seems like we would NOT want to have a client side server runServer if it has a random ID after all -- or did you have use cases for it?

I was surprised though that the ActorIdentity uses NodeIdentity which is only a string -- rather than have the Node identity be (protocol, host, port). Have you considered using a triple like that for node identity -- this way a single client system can manage connections to multiple servers if it needed to do so.

Generally it is preferred to have "one actor system that manages connections" style wise, rather than "many actor systems, PER connection". That's how cluster and some other implementations work and I'd recommend trying that approach

Very cool overall, and I hope we can improve the language feature as well as library and get some great use out of it!

As a fun side project you could even consider supporting swift-distributed-tracing in it!

samalone · December 9, 2023, 4:54pm

Thank you for taking the time to get in touch. I would never have tackled this project without your encouragement!

I've only been thinking about distributed actors for about a month, but here are some responses to your questions:

Assigning IDs

Using a closure to assign the ID isn't a significant burden, but it does make distributed actors seem less like an integral part of the language. I would prefer for distributed actors to have an implicit base class so you could write:

distributed actor Person {
    typealias ActorSystem = WebSocketActorSystem
    
    var name: String
    
    init(name: String, actorSystem: ActorSystem, id: ActorIdentity = .random()) {
        self.name = name
        super.init(actorSystem: actorSystem, id: id)
    }
}

The super init would still call actorSystem.assignID to let the actor system adjust the actor ID, but the id hint would be passed in explicitly instead of through a @TaskLocal variable. Then local distributed actors could be constructed just like other actors:

extension ActorIdentity {
    static let alice = ActorIdentity(id: "Alice")
}

let actorSystem = WebSocketActorSystem()
let alice = Person(actorSystem: actorSystem, name: "Alice", id: .alice)

I think this would be simpler and clearer for both actor system authors and users.

Resolving protocols

Yes, having a way to separate interface from implementation would be nice. I would slightly prefer declaring a distributed protocol where all of the functions are implicitly distributed than having to mark each function signature as distributed individually. I'm not sure that it makes sense to mix distributed- and non-distributed- requirements in the same protocol, and marking the entire protocol as distributed might allow the compiler to place similar constraints on the protocol as it does on distributed actors.

I've considered creating a compiler macro that generates the implementation protocol and delegation automatically. That would save the user some boilerplate, though it may not be worth the trouble if distributed protocols are on the roadmap.

System IDs (actually Node IDs)

Let me first clarify that the ID you can pass in when you create a WebSocketActorSystem is not the ID of the actor system, it is the ID of the local node. While you can specify a fixed node ID for your server to simplify routing and reduce the need for a receptionist, this is not a requirement of my actor system. You can connect to multiple servers as long as they have distinct node IDs, either by assigning them different fixed IDs ("auth-server" vs. "api-server"), or by giving them random UUIDs (just as clients have).

I rejected using (protocol, host, port) to identify servers because in practice, the server will often be running inside a Kubernetes cluster behind an NGINX proxy server and won't know its own global address. A server might be listening on ws://0.0.0.0:80/ but be on the internet as wss://propercourse.app:443/api. The URL to reach the server is more of a route than it is an identity. I also needed clients to have a node ID, and mobile clients don't have a fixed address.

Being able to call both runServer and connectClient on the same node allows layering of services. My API server may itself be a client of a weather server, and the actor system supports this.

Note that connectClient returns a Manager instance that allows closing the connection. I intend to add a Manager method to retrieve the remote node ID, which will facilitate routing when the server's node ID isn't known in advance.

Given all of this, I think my actor system actually does support "one actor system that manages connections" style wise, rather than "many actor systems, PER connection", as you suggest.

samalone · December 11, 2023, 2:14pm

Here are a couple more thoughts on writing an actor system.

Task safety in the actor system

One of the actor system's main jobs is coordinating async operations on independent tasks, and I was quite surprised to find that the sample code in TicTacFish was using NSLocks to do this rather than being an actor.

As I worked on the code, I delegated several responsibilities to actors. Still, a handful of DistributedActorSystem methods are not async, which prevents the actor system from being an actor or using actors in their implementation.

If I could implement an actor system using an actor, the code would be a little cleaner, and I would have more confidence that the actor system itself was concurrency-safe.

Exchanging Node IDs between nodes

My actor system relies heavily on using Node IDs to record where distributed actors are located. These IDs are the dictionary keys to map IDs to network channels, and they provide a persistent ID if network channels are lost and re-established.

To support this, I need the client and server to know each other's Node IDs as soon as a connection is established. I considered using distributed actors to implement this exchange, but ran into trouble bootstrapping the system recursively like that.

My solution was to use HTTP headers to embed the Node IDs in the initial HTTP exchange before the WebSockets are established. It took me a lot of digging in the guts of NIO to figure out how to achieve this, but I eventually figured it out.

If you look at the bottom of my NodeIdentity.swift file, you'll see the HTTPHeaders extension that sets the headers. Showing the callers of this extension will reveal the code that performs the Node ID exchange.

ktoso · December 12, 2023, 4:38am

This won't be possible in general -- because we need to allow initializing distributed actors from synchronous code. If we made the actor system an actor; all of the method calls the compiler synthesizes in actor init and deinit would be asynchronous and therefore:

force all distributed actor initializers to be async (potentially something we could work with, unsure if it'd fit, the compiler would have to know the exact actor system and know it is an actor etc... this may make composition and "swapping" actor systems difficult),
and it would force the deinit to be async (!), which currently isn't possible. Alternatively it'd have to deinit { Task { system.resignID() } } which invites logic races on reusing IDs -- you'd think you have released an ID, but the resign had not run yet etc.

So no, on this point I do not think we have much of a choice. It should be noted that relatively few developers write actor systems, but a lot use them, so putting some burden on the system developer is preferable to complicating the user experience or compositional capabilities of actors.

As far as developer confidence, I recommend trying a pattern like this:

internal struct LockedValueBox<Value> {
    
    @usableFromInline
    let _storage: LockStorage<Value>

    /// Initialize the `Value`.
    @inlinable
    init(_ value: Value) {
        self._storage = .create(value: value)
    }

    /// Access the `Value`, allowing mutation of it.
    @inlinable
    func withLockedValue<T>(_ mutate: (inout Value) throws -> T) rethrows -> T {
        return try self._storage.withLockedValue(mutate)
    }
}
extension LockedValueBox: Sendable where Value: Sendable {}

which the compiler can then help you with noticing if a state is protected or not, rather than using raw locks. This isn't something I've done in the fish sample app to keep it simple, and it of course still needs caution as all locks do, but it helps a bit.

Yeah, good observation and impl looks fine I think.

The issue appears in every distributed system and is generally solved by having at least one "known" identifier. In the cluster implementation there is exactly one distributed actor instance per system that is "well-known" and it is the "receptionist". Actors in the cluster find out about other actors by asking their receptionist about known actors -- and receptionists on each node talk to each other diff synchronization algorithm...

Having that said, I think your solution is good for the websocket system -- it's very much expected to tie into the underlying transport capabilities for things like that

ktoso · December 12, 2023, 4:39am

Thank you for the other messages btw, I've been thinking through them and going to see what we can do in the language to help here -- definitely assigning IDs externally is on the list.

It will not be done as a super class, however we should be able to use plain old self.id = ... assignment and have the compiler notice this and NOT issue an assignID() call but rather maybe inform the system that a system was externally assigned... maybe assignedID(id:) or something then instead, or rather, have an assignID overload with options which explain what it was etc...