New Approach to Data Concurrency with Forked

drewmccormack · December 14, 2024, 11:55am

I've just launched Forked, a new open source Swift framework for managing data concurrency, both on device and across networks.

Forked is Git for Shared Data in Swift

You can think of Forked as being like Git for Swift structs. It provides a Sendable data type, ForkedResource, which manages concurrent versions of a value (eg struct). As with Git, you can setup branches (known as forks), and update them independently. And you can merge forks to combine results as needed.

The Decentralized App

Forked turns the common wisdom about data concurrency on its head. Rather than force serial access to shared data, as with locks, queues, and actors, Forked embraces data concurrency and sees it as a natural part of any app. With Forked, even a single process is an intrinsically decentralized system.

In an app, the UI may be doing something independent to the sync engine, which in turn has no knowledge of the background importer or web service downloader. Even the UI may be partitioned, with editing contexts working on staged data not yet committed to disk.

These subsystems demand a clear and simple way to share data, and reconcile conflicts. As a developer, you should not lose sleep over questions like “What will happen if sync data arrives right when the user finishes editing?" or “Is there a possible race condition here which can arise but is very difficult for me to understand and test?”

With Forked, you assign a fork to each competing interest, and allow that subsystem to update its own copy of the data, and merge in changes from other forks when convenient. You never lose or clobber (accidentally write over) data.

Advanced Merging

One part of the Git comparison has been glossed over to this point: How do you handle merge conflicts? With a source control system, the developer themselves decides how to resolve conflicts. How does Forked automate that process?

If you wish, you can come up with your own merging algorithm, and add conformance to the Mergeable protocol. This protocol supports 3-way merging: it is passed two conflicting versions of the value, as well as the common ancestor value from when the two diverged. Using this information, you can diff with the common ancestor to see what changed, and decide how it should be resolved.

For most though, it is better to use the merging support provided in the package ForkedMerge. It includes various powerful merging algorithms, including some using the latest Conflict-free Replicated Data Types (CRDTs), which usually produce a merged result which seems more natural to people.

Data Model with Structs

Even easier than using the algorithms in ForkedMerge is defining a data model using ForkedModel.

Think of this as SwiftData lite, with value types. You use Swift macros to define the merging policies of the properties in your struct. Here is a simple example:

@ForkedModel
struct Forker: Identifiable, Codable, Hashable {
    var id: UUID = .init()
    var firstName: String = ""
    var lastName: String = ""
    var company: String = ""
    var birthday: Date?
    var email: String = ""
    var category: ForkerCategory?
    var color: ForkerColor?
    @Merged var balance: Balance = .init()
    @Merged var notes: String = ""
    @Merged var tags: Set<String> = []
}

Without going into too much detail, the properties with no @Merged attribute get merged atomically, in a property-wise fashion. The ones with @Merged are either using a custom Mergeable implementation, or one of the standard merge algorithms provided for String, Array, Set, and Dictionary.

CloudKit Support

One of the advantages of developing a data model incorporating merging from the very beginning, is that when it comes time to add sync support, you don’t have to change anything.

The ForkedCloudKit package can sync a ForkedResource via CloudKit with other devices to form a truly decentralized, local-first app. This can be integrated into an existing Forked app in about 10 lines of code.

Learn More

The launch announcement is on AppDecentral.

The Forked repository on GitHub also includes extensive docs and sample apps.

ktoso · December 16, 2024, 4:54am

Always happy to see more CRDT impls out there! I've not had time to dive deep in how you handle the merges yet but wanted to share my excitement for more of this kind of libraries appearing in Swift

drewmccormack · December 16, 2024, 8:27am

Thanks for the encouragement. The usage of CRDTs is a bit unusual. They are used basically as merge algorithms for 3-way merging. It makes a CRDT (eg text editing CRDT), diffs the changes between common ancestor and one fork, and applies those to the CRDT. It then gets the diffs from the other fork, and applies those to the CRDT. The final value is taken from the CRDT, and used, with the CRDT thrown away.

This doesn't give you all the advantages of a CRDT, in that it isn't recording times of actual keystrokes, but I find it still works pretty well as a merge algorithm. Better than most, and certainly better than discarding one of the conflicting forks data altogether.

One advantage of this approach, where the CRDT is used temporarily and discarded, is that the memory usage of the type doesn't grow over time. And another advantage is that if there is a bug in the CRDT, or a change to implementation, it should not matter: the type is not embedded in your data, so the risk of having to properly migrate it etc is not there. The data type you work with is literally just your struct. No magic in the data at all.

numist · December 16, 2024, 6:50pm

This is a really nice API, great work Drew.

I guess the tradeoff here is that it reintroduces the possibility for "wrong answers"^[1] because it's operating on diffs created at merge-time rather than a history of changes captured at edit-time. That's probably the right tradeoff for most applications, especially if your diff quality is good.

CRDTs are great at Collection-likes (including String), but scalar conflict-free types remain difficult to express in a friendly API because they require operator restrictions (e.g. numbers under only exponentiation, *//, or +/-). Curious if you have any thoughts/aspirations on the topic?

Assuming the CRDT is maintaining its conflict-free nature here and "picking a winner" based on the diff's application. ↩︎

drewmccormack · December 16, 2024, 7:33pm

Yes, there is a trade off, but the CRDT merge is still typically better than most other merges. Eg it won’t interleave letters in words entered concurrently. You can play with the text merging in the Forked Model sample, and see it generally does a good job.

Scalars can be handled by certain CRDTs. Eg a simple counter keeps subcounts for each peer, ie a dictionary of subcounts, and sums them to produce a result.

I haven’t added that though, because in Forked, where you always have a common ancestor, it is pretty trivial to make a little struct which is Mergeable and accumulates a scalar. The sample code includes examples (AccululatingInt, Balance).

It might be an idea to try to build some type of scalar accumulator in. Have to think about how best to do that in the API.