Introducing Automerge: enable collaborative, asynchronous syncing for your data structures

Joseph_Heck · October 21, 2023, 5:40pm

The Automerge-swift library enables collaborative, multi-user editing experience akin to Apple's Notes app or Google docs, or fully offline (local-first) editing with later seamless syncing of those updates. Using Automerge, you can use cloud services for convenience, instead of a critical black-box lynchpin to enable your apps functionality. Read through the 5-Minute Quick Start in the API documentation to get a quick sense of how to use the library.

Automerge-swift brings the popular open-source CRDT library Automerge to Apple platforms (iOS, macOS, and macCatalyst). Originally created in JavaScript, the Automerge team rewrote the core in Rust for improved performance and cross-platform use. Using this core library, Automerge provides cross-platform and cross-language binary compatibility for its documents, allowing you to sync between browser or server side apps and native iOS and macOS apps. The 0.5.2 release matches the (latest) Rust core library, which supports to the javascript Automerge 2.x releases.

The Swift library brings support for reading or writing any Codable data structure with an Automerge document, and provides a few reference types (AutomergeText and Counter) that update an Automerge document directly for more performant use with SwiftUI controls.

Along with the library, the project provides an open-source cross-platform (iOS, macOS) demo app, MeetingNotes, with app-level documentation and a walk-through of how the app uses Automerge to support a SwiftUI Document-based app. The README in the MeetingNotes GitHub repository includes links to videos showing network-based interactive sync over a Bonjour network as well as offline editing with later sync and integrations of those edits.

jaleel · October 25, 2023, 11:38am

Interesting, nice implementation! thx for sharing!

drewmccormack · October 28, 2023, 10:09am

This looks really interesting. Great job!

It seems to be mostly a document based format, which would mean uploading a whole store for a sync. Is there an option to use it as an operation-based CRDT, where the store syncs incrementally? Eg in a shoebox app?

Is there support for CloudKit? Or at this point is it a question of uploading the whole document on each change?

Joseph_Heck · October 28, 2023, 3:58pm

Hey @drewmccormack

It is a document-based format, but within the document you can grab the individual changes to sync as a delta-style update, and the sync mechanism that's built in is specifically built to do that, minimizing the size of the sync messages so that you don't have to replicate the entirety of the document to get to a consistently updated state.

Automerge-swift itself is network and filesystem agnostic - so there's no built-in, "you have to use" mechanism for transferring those sync messages. The API exposes them as blobs of bytes, and you can transfer them as you like. That does mean there needs to be another instance of Automerge running on the far side of whatever sync connection is happening, so in the case of iCloud/CloudKit, there's no Automerge receptor to sync and store your documents, but in the case of peers (for example, iOS apps), each instance can share with another.

For using CloudKit directly, you could store and manage the changes within records, but I'm not certain there are explicit gains there over using the file-oriented storage and syncing scenarios. The example app for this library (MeetingNotes) does exactly that, and illustrates a simple Bonjour-network based network connection over which the app replicates data. I'd encourage you to look closely at the sample app, and I'm happy to answer questions or brainstorm things. The Automerge community at large is likewise pretty open to chatting - there's a slack community link at the bottom of the Automerge.org web page.

Automerge as a project is actively expanding its "batteries built-in" strategy. As such, it just released updated related libraries (automerge-repo) (JavaScript) to host a sync server with support for multiple documents. Full integration with that is future work for the swift library side of things. That's been quite recently released, primarily using a WebSocket network connection - the details of which I'll be exploring and implementing first in the sample app, with the plan to extracting the relevant parts and dependencies into an additional library to use with Automerge-swift.

drewmccormack · October 30, 2023, 2:49am

That's great to know. Makes it a lot more attractive.

Yeah, that is the case I was thinking of. Eg. a single app running on different devices (iPad and iPhone), each uploading deltas to CloudKit and downloading deltas from other devices. If I understand your response, this approach is supported, correct? Or do you really need to have a server in the cloud ala Git?

I was thinking more like uploading each delta as a separate record. Would probably need some management of these records on the client, but should be possible. Do you think this approach would be possible?

One other question I had was whether it is a CRDT that grows perpetually, or is there some internal cleanup that can coalesce "ancient" changes? This continual growth is one of the achilles heals of CRDTs for app adoption I think, but it would certainly be possible to compress old history, accepting the possibility that there may be documents that are so far out of sync they can't be merged, and the app would have to pick one or the other.

Thanks for the tips! This is a direction I would love to see Apple moving. SwiftData is frankly just Core Data with lipstick on. (It was first released for Mac OS X 10.4 Tiger, long before we lived in a app decentralized world.)

Joseph_Heck · October 30, 2023, 5:04pm

If you were confident that all the deltas were available, you could absolutely re-assemble them and use that directly. I suspect that would be a lot of additional overhead, but it absolutely should work. The individual "changes" aren't inspectable (intentionally) in any reasonable fashion outside of the parsing that Automerge, so it's hard to know if you "have them all" or not - and if you miss a critical "middle change" that others depend on, I'm uncertain what the behavior would be.

This CRDT library does grow with all changes, yes - perpetually. When you wipe out the changes, you also wipe out the ability for any earlier copy to seamlessly merge with the document again. That said, you can "reset" the history yourself if there are good spots to do so for your logic - it's replicating the current state into a new Automerge document and away you go. If you're using Automerge for "collaborative sessions" that don't need a long-term history to cleanly merge, this is a perfect way to handle the situation, and you can even use Automerge's document format as a sort of "ephemeral-while-there's-an-active-session" mechanism, and keep your overall App's logic using it's own separated persistence.

drewmccormack · October 31, 2023, 7:36am

That's really useful, thanks. Pretty much in line with what I had expected.

Is the expectation that a single document can be fully loaded into memory? I'm thinking of a case like a Core Data store, which could be very large, and needs partial loading.

I guess given it is a JSON-like format, and goes into a single file, the assumption would be that the whole store is loaded atomically. If it were backed by, eg, a SQLite DB, I could imagine it could partially load data versions.

Joseph_Heck · October 31, 2023, 4:25pm

For Automerge, yes, it's all in memory at once. The data structures of CRDT that provide the merging means that it all needs to be available to process updates, and since Automerge is intentionally agnostic of network and persistence, it's all in memory. If you push a huge (multi-GiB) collection into it, you're likely to OOM kill yourself - as the data itself is expanded by the metadata and historical updates around it to provide those seamless merges, which can be a significant increase in memory consumed over the end-result data.

This is another place where "resetting history" (in the case of intentionally ephemeral use cases) works very well with Automerge, and the serialized/persistence forms are carefully run-length encoded with an eye towards keeping that expansion of data minimal, so it's not as large of an expansion when saved on disk. When in memory, it's larger.