Advice for failable record initialisation

remcopoelstra · June 9, 2020, 6:15pm

Hi,

I need some advice on when to use a failable record initialiser.
I use SQLite as the document format for my application. This means that the user has direct access to the database and it's easy for the user to modify it. Or a malicious person could craft a database that would cause harm (but I don't know how though). I do however expect these circumstances to be very rare.
Would this warrant the implementation of failable record initialisers?

Thanks in advance.

Kind regards,

Remco Poelstra

gwendal.roue · June 9, 2020, 9:04pm

Hello @remcopoelstra,

GRDB does not provide built-in support for failable record initializers because the library has always focused on trustable databases, based on the premise that an SQLite managed by an app is not some rogue JSON downloaded from the internet, and that this premise provides some benefits.

Now, especially on macOS, where the user has access to the file system, I must admit that this premise shows some crack. And in the same time, the expected benefits are less clear as the library has matured over the years. But more analysis and work remains to be done: maybe we'll get a failable record initializer one day.

What is untrusted data, by the way?

Altered database schema? Missing tables, missing columns, modified relational constraints, foreign keys, column checks? This kind of attack can be checked when your app opens the database, by comparing the content of the sqlite_master database with an expected snapshot.
Altered database values? SQLite has a very weak type system, so you can basically assume that any column of any row can contain any kind of value (null, integer, double, text, blob). Unwanted nulls should be prevented with NOT NULL constraints. Other unwanted values can be avoided with CHECK constraints, but this is a real chore and surely has performance implications.

There may be other nasty alterations, I don't know yet.

Today, dealing with untrusted databases with GRDB requires more work on the application side. Precisely speaking, only converting raw database values into Swift value types and record requires more work. Other fatal errors reveal a misuse and require a change in the code.

Decoding values throws a fatal error when the database contains unexpected values, such as Could not decode database value 256 into Int8.

The fundamental GRDB defense against failed conversion is the DatabaseValue type, and the snippet below. It lets you handle null, valid values, and invalid values, without any fatal error, ever:

let dbValue: DatabaseValue = row[0]
if dbValue.isNull {
    // Handle NULL
} else if let int8 = Int8.fromDatabaseValue(dbValue) {
    // Handle valid Int8
} else {
    // Handle invalid Int8
}

The bad news is that absolutely no high-level construct of GRDB builds on top of it

That's enough for a first reply :-)

remcopoelstra · June 10, 2020, 7:17am

Maybe we should ponder what 'rogue JSON' actually means and what its problems actually are.
If the only consequence of someone altering the database file is that my application crashes, then I think I'm fine with that. Maybe an advanced user tries to automate some operations by performing them directly on the database, but he should actually use the script interface for that.
There is little danger in decoding rows as no completely different types/classes are returned while decoding that might present a security thread, which, I believe, is all the fuss about the NSSecureCoding protocol.
So I think I'll just leave it at the standard initialisers.

Regards,

Remco Poelstra

gwendal.roue · June 10, 2020, 7:53am

Thank you for your feedback.

Maybe we should ponder what 'rogue JSON' actually means and what its problems actually are.

You're right, that wasn't very clear. In my personal experience, there are always times where servers do not honor their contracts, little nasty servers - so production apps really have to defend themselves.

If the only consequence of someone altering the database file is that my application crashes, then I think I'm fine with that.

This has been my position as well so far, because of my bias: I mostly develop iOS apps myself where database tampering has a fairly low probability.

Developers who need to safely handle untrusted databases with GRDB have to go low-level. This may reveal painful and verbose. But this is possible, because I have always made sure, from day one until today, that the raw SQLite is exposed, for any deemed purpose. You know: The Obvious, the Easy, and the Possible.

After all, you can always replace:

// May fatal error when decoding Player
extension Player: FetchableRecord {
    init(row: Row) { ... }
}

let request = Player.filter(level: .good)
let players = try request.fetchAll(db) // [Player]

With:

// Can never fatal error when decoding Player
extension Player {
    init(row: Row) throws { ... }
}

// Fetch rows and convert to Player:
let request = Player.filter(level: .good)
let players = try Row.fetchAll(db, request).map(Player.init(row:))

// More performance with a database cursor:
let playerCursor = try Row.fetchCursor(db, request).map(Player.init(row:)
let players = try Array(playerCursor)

So I think I'll just leave it at the standard initializers.

Maybe one day one user will really insist, and we'll look at failable records straight in the eyes . Meanwhile, happy GRDB, Remco!

gwendal.roue · June 10, 2020, 6:32pm

Since GRDB 5 is currently in beta, we have a window for experimenting with failable value and record initializers. I'll be able to check the amount of breakage, as well as eventual performance regressions (I don't currently know what is the runtime cost of Swift error handling). I want to preserve really fast paths for requests such as Int.fetchCursor(db, sql: "..."). GRDB is also an efficient low-level SQLite library.

remcopoelstra · June 11, 2020, 6:11am

Hi,

I've been thinking about this, but I wonder whether there is more to it than failable initialiser. As you already mentioned there is so much that can go wrong with the database, like the schema being altered.
And in my situation I have polymorphic records, so simple checks on columns are not really useful. Some state is also cross table (such as the 'image' table references the 'layer' table, but the 'element' table reference both 'image' and 'layer'). It's quite easy to now have an element refer to an image on another layer. The only way to catch such situations is to check each and every value.

gwendal.roue · June 11, 2020, 7:03am

Unless I'm mistaken, what we are talking here is making sure that database tampering only yields catchable errors from GRDB.

For example, a deleted column should throw an error instead of crashing with "missing column" fatal errors. Same for values that become out of range, dates that become ill-formed, etc.

Of course, database tampering may induce logic errors inside your application code. Database constraints help avoid many of them, but not all. Your polymorphic records are such an tricky example. Preventing such problem remains an application concern: GRDB can't help here.

remcopoelstra · June 11, 2020, 2:43pm

True, but wouldn't it make more sense then to make a verification system that puts the database back into a state where non-failable initialisers can be used again? This might be in conjunction with the database migrator, which at least knows which schema is wanted.
Otherwise only half the problem is tackled.

gwendal.roue · June 11, 2020, 2:48pm

Well, this sounds pretty. Since these "verification" and "healing" processes have a lot of meanings, this strictly belongs to the application side. The app can use GRBD as a tool to this end.