Guidance with distributed actors

zaneenders · May 4, 2025, 8:14am

Hi ,

I have been playing around with the swift-distributed-actors and swift-cluster-membership packages a bit and having fun so far, just had a few questions if anyone could give me a poke in the right direction.

Connecting two machines over public IP

I am having trouble getting the clusters to work with public ip addresses. Something like below works for two local address like 192.168.0.69.

var settings = ClusterSystemSettings(name: systemName, host: "local address?", port: 8080, tls: nil)
settings.autoLeaderElection = .none
settings.logging.logLevel = .debug
let system = await ClusterSystem(systemName, settings: settings)
Task {
   // Give the cluster a second.
    try? await Task.sleep(for: .seconds(1))
    let cluster = Cluster.Endpoint(host: "other ip", port: 8080)
    system.cluster.join(endpoint: cluster)
}
let machine = Machine(actorSystem: system, name: "zane")

But after I set up port forwarding on two routers and tried to connect the systems in a cluster, I ran into targetHandshakeAddressMismatch error. If I put the public ip addresses, I get a Cannot assign requested address (errno: 99), because my public IP isn't mapped locally, I guess (Still learning a lot about networking). I have tried 0.0.0.0, 127.0.0.1, but it looks like it's doing String == to check. Does this problem go away if I put this on AWS or something? I'm trying to avoid the cloud because I'm broke, but I have a couple of ~15-year-old computers I can try and orchestrate. I have some domain names and TLS certificates I can use after this step.

An aside, I noticed that swift-cluster-membership is using UDP, but swift-distributed-actors is using TCP. How come?

Raft

My only experience with distributed systems/clusters is Raft. I'm about 2 days into implementing Raft using distributed actors here.

I originally was thinking of hooking into the actorSystem.cluster.events and publishing the current leader, but it doesn't look like there are any public api's to publish cluster events right now.

For now, I figured I should focus on getting a complete working implementation before trying to mix in Swim. Is an implementation of Raft appropriate to use for distributed actors? Why Swim as the first ActorSystem? I have only taken one class on Distributed Systems, so still learning a lot, but I found it fascinating.
Long term, I was thinking a Swift version of zookeeper or something .

Some context, just graduated and still looking for my first software engineering job, and I have a lot to learn!

Any guidance is greatly appreciated .

Thanks,
Zane

jaleel · May 7, 2025, 9:49am

Hey! Nice to see some interest in distributed actors topic in Swift!

Regarding the issue—remember it was working for me will double check and get back, but probably for remote nodes it's better to use some sort of discovery, easiest would be clusterd, would try to push to be able to land it soon.

Think you can find some free stuff like render (but I haven't touched it in a while).

You mean Raft implemented using distributed actors? Why not, sounds interesting. Though never implemented one myself, so take with a grain of salt. Remember seeing something similar from @ktoso with Akka.
But tbh I have a feeling it could be done without cluster system.

Think Raft and Swim have different purposes, imho Swim works great for this kind of stuff—simple and effective.

What exactly do you have in mind? I'm sometimes thinking of something like Zookeeper, but when implementing apps using distributed actors it's a bit hard to justify, you already have things out of the box.

Nice to see anyway, and don't worry—I felt same several years ago with years of iOS experience. Distributed systems topic is not easy one and still feel like I need to learn a lot!

zaneenders · May 8, 2025, 5:22am

Swift is my favorite language right now, and distributed actors is the feature I'm most excited about.

Thanks, I might check it out. Currently, I'm having fun learning about networking, Linux/Docker, what should be in and outside the DistributedActorSystem?, understanding where limits might be. So, hacking my own networking stack together using NIO, TLS, and my own hacky JSON "RPC" calls between nodes. Currently bumped into this error.

SSL error: handshakeFailed(NIOSSL.BoringSSLError.sslError([Error: 268435703 error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER at /.build/checkouts/swift-nio-ssl/Sources/CNIOBoringSSL/ssl/tls_record.cc:125]))

Oh, wait, that's cool! I will take a look. Thanks!

I guess I don't really wanna rebuild zookeeper.
I have a lot of ideas about how I might want to interact with the multiple computers I deal with. But getting at least 3 to agree on a log is half of step one in my mind. But gonna cross that bridge one working component at a time. I also just think it's really cool to have multiple computers agree on something. Figure I can branch out on what they are agreeing on after that .

I also think this language feature might enable a lot of flexibility, not having to strictly depend on RPC semantics. I briefly recall someone telling me or a paper I skimmed that Google's servers spend roughly 15% of their time () on RPC calls, but don't quote me on that .

Anyways, I can share what I learn, if anyone is interested.

ktoso · May 8, 2025, 2:04pm

It's a pretty broken toy implementation I did more than a decade ago, but yeah you can skim it for fun

Would be really good to have a raft impl indeed, it'd likely replace membership of the current cluster system. It's a difficult task though, so make sure to set your expectations right -- can be a fun project, but a serious impli s a lot of work (on correctness).

I'm not entirely clear on the error, maybe NIO folks can remind us here what that might be @FranzBusch @lukasa.

johannesweiss · May 8, 2025, 7:58pm

BoringSSL's WRONG_VERSION_NUMBER almost always means that one peer is speaking plain text and the other one tries TLS. The TLS protocol communicates a version number and if those bytes don't match the expected values it'll throw that error.

Often that's a client connecting and trying to establish TLS but the server thinking it's plaintext HTTP (or so). Then it replies with HTTP 401 bad request or similar. But those bytes of course aren't a valid TLS packet and indeed contain the wrong bytes where the version number is supposed to be (first two bytes IIRC).