Swift Raft implementation

Hi folks,
As an exploration exercise for myself, I look into server-side swift and try Swift outside apps world. I have decided to implement a Raft consensus https://raft.github.io/raft.pdf. Raft is easy to understand but still provide many puzzles to solve.

As a reference implementation, I chose C++ CabinLog . But the protocol is generic and can be used by a different distributed application.

Currently, I have finished vote logic and started a distributed log. And I have some questions that I hope you can help me with.

  • Is there is a good example of working with the file system outside of Foundation module?
  • And is there are any best practices for NIO IO?

Here is my current progress GitHub - makadaw/swift-raft: Raft protocol implementation in Swift
And if someone is interested to help will be glad to discuss.
Thanks!

6 Likes

Maybe Swift System or TSCBasic.FileSystem?

2 Likes

cc @ktoso @drexin

1 Like

Very cool to see this work kicked off @makadaw!

A raft implementation for swift is very interesting to us. I'll watch the repo and perhaps can make some time to help out a bit. I worked on an implementation a few (many) years ago and made tons of small little mistakes in it along the way -- the protocol is seemingly "simple" but there's a ton of subtle things to get wrong in it :slight_smile: Happy to help getting it correct and efficient.


Suggestion: consider using actors right away.

I would really encourage expressing the protocol implementation in terms of actors (you can download a snapshot toolchain here Swift.org - Download Swift and enable them by passing -enable-experimental-concurrency. You can enable it for your package by following: Can "-Xfrontend -enable-experimental-concurrency" be enabled in a Package? - #3 by Diggory

Since you say this is a project to explore and exercise, let's not stick to pre-actors world as there's no necessity for doing so I assume? :slight_smile: On the other hand, modeling these protocols as actors is absolutely going to be the future IMHO.

I would suggest focusing on the core algorithm and building a large test suite confirming its correctness first, and we'll get it distributed soon enough :slight_smile: A strong test suite exercising the protocol as expressed as actors could be invaluable :star_struck: If you'd need any help how to get things set up let me know -- for now you'd need to use runAsyncAndBlock { ... } in the XCTests, but this will not be necessary eventually.


Thanks a lot for not pulling in Foundation here! That's a great decision for a library such as this.

API wise I think there's two options, either https://apple.github.io/swift-nio/docs/current/NIO/Structs/NonBlockingFileIO.html but it's API is a bit limited (on purpose kind of I guess), so perhaps you can get away with it. Otherwise I agree that the "right" API to use here is Swift System, althoughit is multi-platform not cross-platform so we're likely to need some small shim over it to issue "append to log" commands etc. But this should be easy enough to build as the number of operations you'd need for the log are not too crazy.

Eventually I'd hope Swift System or some other "async" APIs for IO to be provided by Swift to fit into the new async/await world, but we're not there yet. So all those calls will be blocking, so you'll need to put those "somewhere" in order not to block the event loop. For that purpose you could use a dedicated "IO" thread pool, by creating an instance of: https://apple.github.io/swift-nio/docs/current/NIO/Classes/NIOThreadPool.html and using it everywhere you'd need to perform this IO work.

Even with Actors this is true, so actors would be doing all the raft protocol work on their dispatchers, and for the log interactions we'd do the operations on the NIOThreadPool.

There are good ways to integrate EventLoopFuture with async/await so you can await elf on it -- let me know if you get to that point and happy to help, but it boils down to copy pasting this snippet: https://github.com/apple/swift-nio/pull/1701

NIO of course is an excellent choice for the networking, but we should try to keep distributed protocol implementations somewhat independent of NIO. You could have a look how we built such distributed protocol implementations in GitHub - apple/swift-cluster-membership: Distributed Membership Protocol implementations in Swift

There we implemented SWIM in a way that does not expose the fact it uses NIO in the "core protocol" implementation. That amount of separation may be too annoying here, but I think it would already be fantastic if we modelled peers as Actors with async functions on them, and implement the messaging they do there with NIO as a start. (so a peer would take some "transport" object that has NIO inside and send messages through that etc).


Overall, this is very exciting and I'd love to help out in some way so you can be successful with this implementation! :slight_smile:

4 Likes

Swift.System looks like what I need, thanks!

1 Like

Yes, the protocol is simpler than others, but still not trivial :slight_smile: Will appreciate the help.

This is actually a great suggestion. I will take a look at how to move to actors and also extract NIO part into a separate target.
Tests are must in such project. I have started with implementation to see that it actually works :) And if move the protocol outside of NIO it will be easy to model events and state.

Thanks. As all file IO will be in a log it should be easy to convert to Actor and do all threading inside the implementation.

Sure, I have started using NIO directly mostly because it was easy as I used Swift-gRPC server/client. I have saw SWIM implementation and I think it's possible to do such kind of separation. Especialy if model all async operations with actors model.


Thank you for a such detailed answer!

1 Like

Swift System is fairly new and there's still large swaths of API we have yet to develop: API Roadmap. I'd be very interested to know your experience using System and what you find lacking in it. We have a subforum here.

1 Like

Sure, give me some time to play with the library. On the first note, I don't know if this is a knowing issue, but swift crash when building a project with SwiftSystem dependencies (Snapshot toolchain, Swift System from the master, not 0.0.1). Tring to understand why.

There was an assert that got fixed in SILGen: Fix assertion failure when accessing a property with a __consuming getter [5.4] by slavapestov · Pull Request #35897 · apple/swift · GitHub, but I don't know if that has gotten incorporated into a recent snapshot.

This commit is the latest commit that doesn't hit that error, but it doesn't have all the FilePath convenient operations on it. You can workaround the compiler error by manually removing __consuming from the computed var here.

Cool, will try a workaround to move forward. With a bisect search, I have found exactly this commit.

This is awesome. Thanks for your work, its on our roadmap here too. I'll have our cluster guys review as well.

Hi @ktoso, I have made a PR Extract consensus logic into separate target by makadaw · Pull Request #1 · makadaw/swift-raft · GitHub to separate consensus logic from NIO.
I hope I have understood you correctly. Please, can you take a look if it makes seans?

1 Like

Fantastic, pardon the delay — a bit of a crazy week this week for me but I’ll give it a look soon!

Was able to have a look in the evening, this is looking like the right direction! :+1:
I'll add some more in-line detailed comments tomorrow, but nothing big AFAICS.

You're doing good type choices wrt. time etc. by the way :+1:

1 Like

Thanks! Will be appreciated for comments. In meantime, I look into file API for file log.

Hi folks. I'm still working on the implementation in my free time. I Will post updates about the progress here.

Have started to use SwiftSystem for FS access. @Michael_Ilseman. After using Swift snapshot 2021-02-16 master compiled without any problems. The package provides almost everything that I need, and all missing parts are in the roadmap. Temporary I have implemented those methods with FileManager (need directory operations and access cache folder for tests). Have found an issue with the group/others right on file creation and raised an ticket at the GtiHub.

Have moved a Log type into SwiftRaft target and integrated it into the actor. Add basic support for log by makadaw · Pull Request #5 · makadaw/swift-raft · GitHub

Wrote a simple append entries routine that covers heartbeat case. Add append message actor by makadaw · Pull Request #6 · makadaw/swift-raft · GitHub

@ktoso will be appreciated for a second opinion. And if someone has to help I have created few issues.

Next will focus on a log append messages.

3 Likes

Awesome, I'm watching the repo and will be reviewing the PRs :slight_smile:

I think you'll also love what Jepsen just have released: https://twitter.com/jepsen_io/status/1366777900075151368

An interactive workbench for any language to poke around and test distributed systems algs, including simulated latencies, message loss and visualizations of message exchanges :slight_smile: It would be most fantastic to find a way to use the workbench with this raft implementation. I ticketified that Consider using the jepsen workbench: maelstorm with swift-raft · Issue #7 · makadaw/swift-raft · GitHub

2 Likes

Following up on this, we just merged a workaround so you can use compiler toolchains that previously hit an erroneous assertion.

Could you share a link to the missing System features that you have had to implement yourself?

1 Like

Nice. Also snapshot toolchain (from the begging of march) has worked without problems :+1:

Sure
https://github.com/makadaw/swift-raft/blob/main/Sources/RaftNIO/Log/FilePath%2BRaft.swift
I did a fast and not full implementation to keep using FilePath everywhere.

4 Likes

Hi all. In the last few weeks don't have a lot of time to work on the project. But did a few changes.

I have written an integration with the GitHub - jepsen-io/maelstrom: A workbench for writing toy implementations of distributed systems. tool to use Jepsen tests with my implementation.

Separate logic of the protocol from a time. Now it's 2 separate modules, one own only logic and another integrate with NIO Event Loop to control time.
Protocol logic is pure actors implementation. And then it glues with NIO Event Loop in

5 Likes

Hi all, I still work on the project :slight_smile:

I have fixed few problems with the maelstrom protocol, how it works fast and steady.
Did several migrations to the new snapshot, fixed availability problem to run app.
Adopt Deque from the fresh Swift Collections module, which works like a charm.

Next month plan to start finally work on a replicated state machine.

5 Likes