Taking GRDB beyond local

For a long time, when using Swift, I have used a home-grown SQLite solution. It is a simple approach that uses the JSON abilities of SQLite to behave as a simple, fast noSQL storage. However, with changing needs and with the stringent requirements in Swift 6 I am strongly exploring a move to GRDB. Core Data feels archaic and makes decoupling hard, Couchbase and Firebase are proprietary and make me dependent on a third party, Realm is abandoned, SQLite.swift is ... less than I had before. Whether I like GRDB or not (and I hope I will): it really feels like it is the only game in town. I know this sounds a bit like it is my last pick, but what I'm really trying to say is that there is a very strong motivation to move to GRDB.

GRDB is great for local storage and I know this is its goal. Nevertheless, for my own needs, that's not enough. I develop on both iOS and Android (Room, so also SQLite). I want devices to sync, so there will be some server (Rust, some SQL database) or ad-hoc communication in the middle. I don't want to be tied to a third-party. And security is always a concern. This brings me to three requirements that I am missing. Two of them are "easy", one is hard. Each requirement leads to a number of questions which I have thought about, but I really would appreciate insides from those with extensive GRDB experience or those that contribute to GRDB how to best handle these.

Database security
There's no way around SQLCipher and, having worked with it in the past, I do feel the pain. But I did not know about Apple's data protection. This gives rise to two questions (yes, I want GRDB 7):

  • is there a guide on how to (manually) include SQLCipher in a build alongside GRDB? Is it simply following the source integration outlined here: SQLCipher Community Edition - Adding Full Database Encryption for SQLite to iOS and macOS | Zetetic ? Will GRDB 7 still have the API to use a SQLCipher database or am I on my own?
  • is there a worked-out example of using GRDB alongside Apple's data protection? I'm using SwiftUI, so the database setup would happen right at the struct ...: App. This gives very little opportunity to work with the APIs such as applicationProtectedDataDidBecomeAvailable(_:). And I don't immediately have an idea how to handle a situation where access isn't available.

Knowing when the database changed
For any kind of syncing it is going to be important that I know when the database changed. I haven't worked out the whens and wheres, but the basic idea is that a database change triggers a synchronisation request. This can obviously fail for various reasons, and it should also try to synchronise using a schedule too, but these would be the entry point for any synchronisation. How to synchronise is not a concern for GRDB, but sync will require some extra data on top of what is stored through GRDB. At its simplest, it requires an extra isSynced column. This leads to two questions:

  • what is the best way to detect database changes for such a low-level (background) task? Triggers? Or any of the Database observations tools: GRDB Documentation – Swift Package Index ? Please note that this ties in with the next requirement, as only some observations will be of interest.
  • I want to build tools on top of GRDB instead of within GRDB if this is at all possible, akin to e.g. GRDPQuery. What is the best way to build an extension on top of GRDB that enforces additional columns?

I am aware of Harmony (GitHub - aaronpearce/Harmony: Harmony provides CloudKit sync for GRDB and only GRDB.) but I haven't checked the code in any great detail. This may (or may not) be a good starting point.

Handling conflicts
I don't want the server to handle conflicts, when each device can do it itself. This naturally brings me to Conflict-Free Replicated Data Types (CRDT). And this is its own can of worms. Since this is related to syncing it comes with some very similar questions.

Handling CRDT requires extra database fields. For example, a simple and often sufficient CRDT is the "last write wins". If I have a struct with different properties, some of which can change, then I can assume that the latest write is correct. I can handle this by keeping track of an audit table that records changes to the changeable properties. For example, if I have a Movie struct, with a title and director, I can assume these will never change. However, a property isWatched and rating are expected to change. This change can happen on multiple devices and they could be offline when the edit happens. This requires some extra fields such as a logical clock value to know which is first, and a flag whether the change has synced. I keep saying fields, but since there can be many we are really talking about an extra table here. The questions are as follows:

  • I can implement this database structure manually. Even better would be if some simple property wrapper or protocol did the work for me. This is similar to the question before but, really, this time it feels more like property wrappers are creating an extra table instead of a protocol adding some extra rows. Can it be done and where do I best look for it?
  • Implementing CRDT is hard and part of the problem is that they tend to grow quickly. For example, assume an undecided user who is offline. He/she sets the rating to 3, then to 5, then to 4. There is no use in syncing all these values as the last write wins. Since none of it synced yet, I can trivially collapse this into the last value. How can I handle this? To be precise, this would be an insert trigger on an extra (CRDT) table that itself can delete (and maybe insert?) rows. Are triggers the way to do this? The ultimate goal would be to collapse it into only a single (last) row where the rating is set to 4.

I know this is a lot. Many thanks for any insights on how best to tackle these problems!

1 Like

I don't have answers for you in terms of GRDB and database fields, but I am extremely familiar with CRDTs and their complexities - and tradeoffs. (I maintain the swift overlay for Automerge, as well as have a CRDT library that's basic pieces that can be picked up and used at a low level).

The really vicious part of CRDTs are sequences, and in particular - handling preservation - and updating/merging changes to - ordering within a sequence. For what you're describing, that doesn't sound like it's really needed, so you may be able to get away with using simpler CRDT techniques - in a database table, or maintained externally.

The tradeoff that leads to CRDTs growing in size like Japanese knotweed is maintaining history to allow for consistent merging to get the "CRDT" effect. For what you've described, a "last write wins" mechanism may well suit exactly what you're after. In terms of low-level CRDT bits, you really only need to know if the change that you want to merge happened before or after the change you have already recorded - and only apply that if it does. A simple vector clock (say, an integer that increments on each change) could do the trick - and to keep the memory down, you delete any previous changes and only keep the highest vector clock value.

A fair number of CRDTs tend to keep around the history so that they can go "back in time" if needed, and that's what bloats the size - but it's also critical to reasonable results in merging deltas from different changes when you have sequences or ordering involved.

As long as you trust the changes (and that's really a big "IF" - but permissions and security are a whole nother stew), and you track a simple vector-clock value, you'll have a good sense of how to merge things. It sounds like what you're after may be an ORMap, and you're welcome to go poking into CRDT/Sources/CRDT/GenericCRDTs/ORMap.swift at main · heckj/CRDT · GitHub to see how that wrangles the logic for maintaining consistency, but doesn't tend to grow too far without bounds. (Note: it still does grow, especially where you consider the action of removing a key from the map - that ends up being more of a "you never remove the keys, only the values" kind of scenario).

I can't help you with the GRDB logic to enable the specifics you're after, but I'm happy to talk you through the details of the logic for ORMap, either here or directly (msg on the forums, slack, or whatever) if that would help.

3 Likes

Hello @kim.bauters,

Thanks for your interest in GRDB!

I want devices to sync, so there will be some server (Rust, some SQL database) or ad-hoc communication in the middle. I don't want to be tied to a third-party.

At first glance, this matches my own experience. I've worked on GRDB apps that perform sync through a remote server with a custom synchronization protocol.

is there a guide on how to (manually) include SQLCipher in a build alongside GRDB

The GRDB repository was able to link with SQLCipher through CocoaPods, but CocoaPods development has ended. There is no available SPM package for SQLCipher (sqlcipher/sqlcipher#371. I opened a discussion in the GRDB GitHub repository so that people can share their solutions: Questions regarding the future of GRDB + SQLCipher.

is there a worked-out example of using GRDB alongside Apple's data protection?

The recommended technique is to store the database in a dedicated directory, and to apply the protection level to this directory. When a database is protected, an app that runs in the background on a locked device won’t be able to read or write from it. Instead, it will catch DatabaseError with code SQLITE_IOERR (10) “disk I/O error”, or SQLITE_AUTH (23) “not authorized” (I've seen both). The app can catch those errors and react accordingly (such as waiting for applicationProtectedDataDidBecomeAvailable notifications).

When necessary, apps can run two databases (one protected, one unprotected).

What is the best way to detect database changes? [...] Or any of the Database observations tools? [...] Handling conflicts

A "synchronization operation" is usually performed in three steps:

  1. Fetch the "sync payload" from the current state of the database (i.e. the data that the server should be aware of)
  2. Wait for the server response.
  3. Handle the server response (write whatever has to be written in the database).

The app usually wants to perform a sync operation at will. For example:

  1. When the app boots.
  2. When the app becomes active.
  3. When the app receives a remote notification and needs to prefetch values.
  4. When the user hits a "Sync now" button, or performs a pull-to-refresh.
  5. When the database was modified.
  6. etc.

GRDB database observation definitely helps for (5). I use ValueObservation for this task.

The nature of the sync payload depends on your synchronization protocol, and if you're using an operation-based CRDT, or a state-based CRDT (as described by the Wikipedia article Conflict-free replicated data type). I personally only have experience with state-based CRDTs.

A general advice applies: make sure all writes are performed by a dedicated type, so that this type can enforce whatever is needed for the sync to work reliably. This type hides a private DatabaseWriter. See AppDatabase in the demo app for the basic setup.

Some people use triggers to record changes.

In the apps I've worked on, I do not use triggers, but each write increments a Lamport timestamp. I can later find all find all records that have a timestamp greater than the timestamp of the last successful synchronization: these are the records I need to send to the server.

Example

For example, here is some code I can write. When the user modifies the notes attached to a "player", I want to make sure those notes are included in the next sync payload:

extension PlayerStore {
    /// Updates the notes attached to a player
    func updatePlayer(_ id: PlayerID, notes: String) async throws {
        try await writer.write { db in
            // Bump the Lamport timestamp so that we can mark
            // changes that must be included in the next sync payload.
            try SyncClock.tick(db) { timestamp in
                // Updates notes, and the player edition date for conflict handling
                var player = try Player.find(db, id: id)
                try player.updateChangesForEdition(db) {
                    $0.notes = notes
                }
                // Add the player to the next sync payload.
                try PlayerSync(id: id, timestamp: timestamp).update(db)
            }
        }
    }
}

The SyncClock record is stored in a Single-Row Table which tracks the Lamport timestamp of the last change (incremented on each change), and of the last successful sync (updated on each server response).

updateChangesForEdition was inspired by Record Timestamps and Transaction Date.

A synchronized player is stored in two tables: one for player data (Player), and one that tracks its relationship to sync (PlayerSync). I use two tables in order to implement "tombstone" (i.e. the memory of a deleted player): a PlayerSync without a matching Player.

@Joseph_Heck describes other techniques.

Two last questions you should ask yourself:

  • Can sync operations be performed concurrently? If not, they must be serialized.
  • What should happen when the database is modified after a sync payload was sent to the server but before the server response is received? In my own apps, I discard the response in this case: the next sync operation will send the new data.

I am aware of Harmony

Harmony targets CloudKit. Surely easier than building a custom sync protocol and server ;-)

I don't want the server to handle conflicts, when each device can do it itself.

That's not wrong, but that's debatable as well. In the apps I'be been working on, where all sync is performed through the server (there's no device-to-device sync), I find it extremely soothing to think that conflicts are handled by the server, because that's a single point of failure, I don't have to wait for Apple review before I can ship a fix, and I can fix corrupted data on a device with a well-crafted server response.

For example, a simple and often sufficient CRDT is the "last write wins".

If your CRDT relies on Date comparisons, get inspiration from Record Timestamps and Transaction Date.


In summary, GRDB does not ship with any built-in sync mechanism, but ships with the tools that make it possible to implement one. It was possible to build Harmony, and I was able to implement reliable sync protocols as well.

As you say, the most difficult part is to design a sync protocol that handles conflict and guarantees convergence. Do not hesitate spending time on this task. There exist general-purpose CRDTs, but defining a custom one, that suits the specific needs of your app, is possible as well. Lamport timestamps and Vector clocks are your friends, if only because they help limiting the size of the sync payloads, and do not suffer from the imprecision of Date.

2 Likes

Many thanks for your reply. I read through Joseph Gentle's "5000x faster CRDTs" with great interest and watched Martin Kleppmann' talks on CRDT, among others. I don't foresee a need for sequences—and if I did I would rely on tools such as Automerge (though cross-platform support remains tricky for CRDT libraries in general). I am familiar with the CRD types apart from ... ORSet and ORMap. I tried to wrap my head around them with the minimalistic explanation on Wikipedia and started reading "An Optimized Conflict-free Replicated Set", but didn't get to finish it. I would love to talk through the concept and details of these, especially since you are referring to the same paper in your code. I'm happy to do this through Slack as it feels rather off-topic for this forum. I will send you an email to continue this chat!

Many thanks for your in-depth reply. There is a lot to unravel so let's go.

I've worked on GRDB apps that perform sync through a remote server with a custom synchronization protocol.

Do you have any insights on best practices? I'm confident in Swift and SwiftUI and have been working with them since their respective version 1, but syncing is a topic that I will be implementing for the first time.

The GRDB repository was able to link with SQLCipher through CocoaPods, but CocoaPods development has ended.

My apologies, I should have made this clearer. I am aware of #1495 . My question was a more general "now what?". I could use the DuckDuckGo repository, but I assume it will take some time to bring that to GRDB 7 if at all. Their process to include SQLCipher is clear but elaborate. Other than that there do not seem to be any solutions. Working with SQLCipher itself is also not without its problems and it is easy to make mistakes or run into problems. It is the same question time and again: is it still worth to put all the time into getting SQLCipher up and running? And this gets me to the next one.

The recommended technique is to store the database in a dedicated directory, and to apply the protection level to this directory. When a database is protected, an app that runs in the background on a locked device won’t be able to read or write from it. Instead, it will catch DatabaseError with code SQLITE_IOERR (10) “disk I/O error”, or SQLITE_AUTH (23) “not authorized” (I've seen both). The app can catch those errors and react accordingly (such as waiting for applicationProtectedDataDidBecomeAvailable notifications).

When necessary, apps can run two databases (one protected, one unprotected).

Another apology: I read the GRDB documentation in detail so I am aware of the implications. It's more about insights on what to do then. I don't immediately see how two databases could work. I thought about this solution too. On the one hand, when working with CRDT, I could use the unprotected database to at least store the changes. On the other hand, why? If the original data is protected in the first database, then obviously you can't make changes to begin with. It feels like the only approach that is user-friendly is to keep the loading screen up until the protected data becomes available. That becomes a bit of a fight with the SwiftUI system and sending it the update at the right time. A singleton somewhere to track these flags would work, I guess, yet then you're fighting with Swift 6 again :slight_smile: . And round and round we go.

The nature of the sync payload depends on your synchronization protocol, and if you're using an operation-based CRDT, or a state-based CRDT (as described by the Wikipedia article Conflict-free replicated data type). I personally only have experience with state-based CRDTs.

Indeed, I'm using state-based CRDT.

See AppDatabase in the demo app for the basic setup.

I'm working through all the demo code today! Good to know I can find some help in there too.

In the apps I've worked on, I do not use triggers, but each write increments a Lamport timestamp. I can later find all find all records that have a timestamp greater than the timestamp of the last successful synchronization: these are the records I need to send to the server.

Thank you for the code sample. On clocks and Swift 6: is a single-row table the best approach? I was playing around with a singleton actor to keep it ticking along nicely across threads.

Otherwise your approach is indeed in line with what I have been thinking. Dealing with CRDT and synchronization requires some extra bookkeeping, and I would like to know if it is possible to hide this away through protocols that automagically add a couple of rows or property wrappers that flag which columns to include in a table that lives alongside the main table for the type. I can definitely hand-code these types, but it would be even nicer if I can build a library that generalises this problem for myself and others to use.

Harmony targets CloudKit. Surely easier than building a custom sync protocol and server ;-)

Indeed, it is more a source of inspiration, or an alternative client for iOS-only syncs. My hands are tied on the custom server if I want Android syncs. In some ways the Android platform is nicer in that Room and flows work together much better than any first-party tools on iOS (see the bottom of this post for some more details). It's frankly a bit baffling how Core Data and SwiftUI put up such a fight, especially if you don't want strong coupling. And having lived through Swift 1/2 and SwiftUI 1/2 I really cannot handle SwiftData 1.

That's not wrong, but that's debatable as well. In the apps I'be been working on, where all sync is performed through the server (there's no device-to-device sync), I find it extremely soothing to think that conflicts are handled by the server, because that's a single point of failure, I don't have to wait for Apple review before I can ship a fix, and I can fix corrupted data on a device with a well-crafted server response.

Those are obviously good points. I do wonder how you set this up with CRDTs. Is it the server that does the merges? Or is it more of a mixed effort where both server and devices can do these tasks? Is it the server that handles the tombstones and their cleanup? On my end the entire architecture is still up for debate. It is for a brand new app that may or may not be a success, so I do need to keep costs down. Handling conflicts (or preventing them) on device keeps the server easier. Being able to sync through a free-ish solution such as iCloud helps further but obviously increases complexity and essentially becomes equivalent in handling to a device-to-device solution as you lose that central server.

GRDB does not ship with any built-in sync mechanism, but ships with the tools that make it possible to implement one. It was possible to build Harmony, and I was able to implement reliable sync protocols as well.

I hope I can take this a step further and provide some (simple) CRDT equivalents that can more easily sync with any database, rather than being tied to iCloud or a single application. The goal might be too lofty, but I feel it is definitely worth exploring.

[T]he most difficult part is to design a sync protocol that handles conflict and guarantees convergence. Do not hesitate spending time on this task. There exist general-purpose CRDTs, but defining a custom one, that suits the specific needs of your app, is possible as well. Lamport timestamps and Vector clocks are your friends, if only because they help limiting the size of the sync payloads, and do not suffer from the imprecision of Date .

The basics of the app I'm developing are there and the research is done. The easiest step now is to simply drop in Firebase and be at the mercy of Google and their graveyard. The solution with working out syncing myself and implementing things such as CRDT is the solution that will help me sleep at night :grin:. Development is halted for now until I find a suitable solution to storage and syncing.

My trip down this rabbit hole started with #240 and the struggles this brings on the SwiftUI platform. It is somewhat surprising how hard it is to implement the Repository pattern well in Swift/SwiftUI in such a way that (a) SwiftUI views properly update and (b) the pattern is ignorant to whether I am using @AppStorage or GRDB or anything else. #240 is obviously an old issue, but it still highlights the challenges. I agree with the // GOOD PRACTICE suggestion you made there, yet that obviously violates the pattern in multiple ways. Tools like GRDBQuery help, and you can move the Database to the implementation of the protocol, but you still end up in a type fight trying to then squeeze in an approach solely based on @AppStorage or directly on UserDefaults.

The silver standard would be a flow with Room and Combine in Kotlin. A flow is somewhat like an ObservableObject and changes as the source change (but a flow is always asynchronous). Room is akin to GDRB and can provide such a flow (like GRDBQuery. And Combine is like SwiftUI in how it is a declarative UI. The nice thing is that I can grab a flow from the database or convert a 'struct' into a flow using flowOf. This can greatly simplify development for the purpose of a #Preview in SwiftUI or a unit test in Swift. Or, as I was doing until now, rely on a simple @AppStorage to not worry about the details of storage and solve storage and syncing later. Alas, no such luck due to those Repository pattern struggles, and as you correctly say I have to sit down and spend all the time I need to think about how to handle storage and syncing well. And the Android side is only a silver standard, not gold. All the data arrives delayed because of the asynchronous nature, so those few milliseconds have to be handled carefully or you end up in weird situations. A completedBootstrapping that you default to false and becomes true a milliseconds later? Sure enough Combine has already popped that screen up. So not perfect, just a bit better :upside_down_face: .

The GRDB library makes it possible to implement a wide range of techniques needed by applications. That's why I make it, and that's why I make it available. So that I, and other people, can work and ship solutions with the help of a reliable tool.

Now, many of your questions fall out of the scope of the free support. It is your app, and nobody knows what it needs better than you do. You do not have to be afraid of making mistakes. It's just a database ;-)

[SQLCipher] My question was a more general "now what?"

You have all the available information, I did not hide anything.

the struggles this brings on the SwiftUI platform.

Data and synchronization exist, regardless of the user interface. For example, you do not need a SwiftUI view to be able to test the result of actions performed by the user, and their impact on sync.

I suggest strictly separating this layer from the user interface.

SwiftUI usually needs to passively read from the database, or passively observe the database, or perform some database modifications through a dedicated service that controls the possible writes. The GRDB and GRDBQuery demo apps are all based on this architecture.

A flow is somewhat like an ObservableObject and changes as the source change (but a flow is always asynchronous). Room is akin to GDRB and can provide such a flow.

If you ask the general question "how to have SwiftUI views update from a flow of external events", then it is a little bit too general, and I'm sorry but there isn't one single way to do it. That's due to the age of Apple SDKs. And that's why GRDB ValueObservation supports all several techniques (Combine, async sequence, etc). On the read/observation side, GRDBQuery completely hides this awful complexity from the app (but like SwiftData, its @Query property wrapper makes it difficult to test views that rely on it).

1 Like

Haha, no worries, many thanks for all your insights. As you say many of these questions are quite general. I find it often useful to hear about and explore solutions from others before diving in too deep. Hopefully I will be able to contribute one way or the other further down the line!