SQLiteNIO

SQLiteNIO: Client for SQLite built on NIO.

The Vapor Core team would like to pitch its SQLiteNIO library to the SSWG incubation process. This provides a client for interacting with a local SQLite database.

Motivation

There are several (very good) Swift implementations of SQLite in use. However, in order to be useful in a server-side Swift ecosystem, an implementation should be built on top of SwiftNIO to avoid blocking the event loop when interacting with the database. SQLiteNIO uses a NIOThreadPool to hand off interactions with SQLite to a separate thread to avoid blocking the current event loop.

Additionally SQLite is popular in several use cases, from local development, testing and certain deployment scenarios. The SSWG should have an incubated client for these use cases.

Propsed solution

Vapor's SQLiteNIO library is already released and in use, with about 1500 unique cloners a week. The library currently relies on the system providing the underlying SQLite client. The solution has been fairly well tested and put through it paces in the 8 months since its 1.0.0 release.

A detailed implementation (and usage guide in the README) would be provided if the community though the package was useful and decided to proceed with a proposal. We are slightly constrained in that as we are already released, breaking changes would need to address significant issues to be warranted.

One change that should definitely be discussed is the use of the C library. The SSWG guidelines say packages should prefer a Swift native implementation instead of wrapping C where appropriate. We believe that in this case, using the C library is warranted as implementing a native Swift driver is a significant undertaking, but we're happy to hear other arguments. One discussion point however is the use of relying on system libraries vs vending the C library ourselves.

Vending the C library would add some additional work on our end in order to integrate it and properly namespace it and would also increase compilation times for those needing it. However it would make the library significantly more portable and remove the need for ensuring libsqlite3-dev or similar is installed on the system.

We look forward to your feedback!

11 Likes
  • SQLite support is definitely useful for our ecosystem, so +1 for moving forward
  • wrapping the C code in this case is a must I think. Contrary to most DBs, SQLite doesn't use a server and a wire protocol, the whole database system is implemented as a whole and directly uses the on-disk files. So re-implementing that in Swift would be incredibly hard and also very bug prone. So +1 on wrapping C.
  • Unless there are good arguments against, I'd be in favour of vendoring the SQLite Amalgamation. I just downloaded a copy and ran this command:
    for f in *.c *.h; do gsed -ri 's/sqlite3_/foobar_sqlite3_/g' "$f"; done
    
    which successfully name-spaced SQLite as foobar_sqlite3_ which means it won't clash. You could maybe make it use c_sqlite_nio_ as the prefix instead of sqlite3_ and it won't clash anymore.
7 Likes

Agree. SQLite is a special case because it involves heavy low-level operations which is unsuitable and unnecessary to reimplement in Swift. Also, it’s hard to keep a custom implementation of SQLite up to date, which is not so good for the ecosystem.

It would be safer not to rename the SQLite symbols, because renaming makes it easier to link multiple copies of SQLite into a single app, and linking multiple copies of SQLite into a single app is a known way of corrupting your database. Here's §2.2.1 of How To Corrupt An SQLite Database File:

2.2.1. Multiple copies of SQLite linked into the same application

As pointed out in the previous paragraph, SQLite takes steps to work around the quirks of POSIX advisory locking. Part of that work-around involves keeping a global list (mutex protected) of open SQLite database files. But, if multiple copies of SQLite are linked into the same application, then there will be multiple instances of this global list. Database connections opened using one copy of the SQLite library will be unaware of database connections opened using the other copy, and will be unable to work around the POSIX advisory locking quirks. A close() operation on one connection might unknowingly clear the locks on a different database connection, leading to database corruption.

The scenario above sounds far-fetched. But the SQLite developers are aware of at least one commercial product that was released with exactly this bug. The vendor came to the SQLite developers seeking help in tracking down some infrequent database corruption issues they were seeing on Linux and Mac. The problem was eventually traced to the fact that the application was linking against two separate copies of SQLite. The solution was to change the application build procedures to link against just one copy of SQLite instead of two.

6 Likes

This wouldn't be an issue though under the assumption that all instances of SQLite run through SQLiteNIO right? So even if you had several dependencies using SQLiteNIO, they would all use the same namespaced instance. The bug described is still possible but the opposite of not-namespacing and vending SQLite would see far more issues with namespace collisions

If you assume that all instances of SQLite are the one provided by SQLiteNIO, then there's still no need for renaming because the assumption means there are no symbol conflicts.

1 Like

But there's still the risk of clashing with the system installed copy of libsqlite-dev right?

I think we can't assume that nobody else isn't using another version of SQLite (possibly libsqlite-dev) in the same process and therefore we need namespacing.

But I think we can assume that no user would ask two libraries to open the same sqlite database from two libraries. If they're doing that, they're asking for trouble and in the very same way the could then also open the same database from multiple processes, we can't protect against it (and SQLite's global list of open DBs is merely a workaround that may protect some users in very specific cases).

It is safe to open the same database from multiple processes, if POSIX advisory locks work. It is not safe to open the same database twice in the same application with separate linked copies of SQLite, if POSIX advisory locks work. By “safe”, I mean doing so won't corrupt your database. Whether your app logic handles it is a separate concern.

I don't know that there's a good solution here (though maybe just using libsqlite-dev is one), but this is a real risk of vendoring SQLite that should be considered, even if it can't be solved perfectly.

I'm supportive of this pitch and I agree that this is a fine use case for wrapping C given the complexity of re-implementing in Swift.

On the vendoring vs. system library note, one positive I would note from my own experience vendoring a C library is that, in addition to portability and a better install experience for users, vendoring gives you control over exactly which version of the library a user is relying on, which is great from a maintenance perspective.

This might not be as much of a concern in this particular case - I don't know how often new libsqlite releases go out - but in the past when the MongoDB driver relied on a system install of the C driver, we often ran into cases where users had a too-old version of the library installed and got cryptic build errors about missing symbols. (I posted about this a while back and opened SR-9542; on that ticket my colleague @mbroadst notes that a pre-build script could alleviate this, so perhaps the proposal for extensible build tools in SPM could provide a workaround for such issues.)

3 Likes

I don't know much about SQLite's implementation but can you expand what exact scenario works fine with two processes (one linking SQLite driver A, the other one linking driver B) but doesn't work in one process (and two copies of SQLite, driver A and driver B)?

EDIT: Okay, I just see in SQLite's documentation that they use POSIX record locks (fcntl) which indeed are associated with a process and not the actually working BSD locks (flock). Both are advisory but the fcntl one holds the lock on the process and not on the file descriptor for some reason...

I still don't think this matters because somebody would again need to ask two different SQLite drivers to open the same database file in the same process for this to fail. I don't think it's reasonable trying to protect people against failure modes like that. Much like we can't protect them from the very same failure mode if they put their SQLite DB on an NFS mount, then the locks won't work and the DB will still get corrupted.

+1 would be great to have a SQLite client that is compatible with the server API and incubated along side the other database drivers

1 Like

I'm working on a SwiftBlocksUI server which uses SQLite and I'm not entirely sure this makes sense. By that I mean the NIO related abstraction, a common (synchronous) Swift API to SQLite would make sense.

Why. If you write an endpoint backed by SQLite, you do queries and custom work, and usually within a single transaction. Instead of hopping back and forth between the API event loop and the the bg thread, what you usually want is do most of the work within the SQLite TX on the background thread.
That is (somewhat, not really because DBs rarely scale to NIO levels) different to socket API based databases, which may block waiting for results.

2 Likes

Right, but you (the user) are responsible for creating and using that background thread correctly. And there are plenty of use cases where you might want to do standard queries without a transaction. We're not stopping that with this library.

(It's also worth noting that the higher level SQLiteKit supports transactions in a way that's easy to integrate.

It's not really about transactions, but about all the hopping which I think is (probably) wrong.

you (the user) are responsible for creating and using that background thread correctly

That's a fair point. So maybe we need to find a way to make that thing easier (synchronous endpoints within NIO).

All the async stuff is really nice when proxing to other services, but as soon as you hit a database (and that's not restricted to SQLite, you don't want to hit PG with 1k connections either), you usually don't want to be async.

For a SQLite library for Swift I wonder if coupling it to a NIO foundation makes as much sense, when if it was base purely on the upcoming Async/Await language features you could support SQLite things like iOS much more easily. The SwiftNIO package itself adds more than 50MB to the iOS app size and NIO is not generally used on the client side of things. However if the package was based on Async/Await language features you could use it with SwiftNIO Server Side and say something like GCD on client side.

Where did you get these figures from? As a comparison: SwiftNIO's example web server weighs in at 4.3MB (in release mode; debug mode is 6.1MB) for the whole binary. So I very much doubt it would add 50 MBs to an iOS app.

If you need just plain HTTP and JSON, then yes, SwiftNIO is very much overkill (because HTTP (through URLSession), JSON, and much much more is already covered by the SDK). But SwiftNIO is used for example in gRPC Swift on both the client and server side.

SQLiteNIO is focused on providing a SQLite driver for the server, any iOS compatibility comes second to that. Additionally, changing it to be purely async/await based will be several months at least, and make breaking changes to the package that the Vapor Core team can't accept at this point.

1 Like

Connection pooling is out of scope and higher level for this package (it gets complicated with SQLite, but for Postgres, it's in PostgresKit rather than PostgresNIO). And we need to clarify what we might by "you usually don't want to be async". From a simplistic point of view, it's definitely async because it's making a request to a service that doesn't return straight away and we can't block the current event loop.

Pooling is related to that, but only loosely. It just avoids the re-connect overhead.

With PG we are drifting off topic a little, because the original issue is the thread hopping when used w/ SQLite specifically (an issue you don't have w/ PG).

But unless an endpoint runs just super simple, non-transactional queries against PG, you'd want to bundle them up in a synchronous batch as well (and a very limited set, because PG connections are very expensive server side).
Stalling PG connections while you wait on other async input is significantly worse than even blocking the event loop :slight_smile: That's what I mean by "you usually don't want to be async".

E.g. you'd never want to do something like:

let con = async pg.grabConnection()
let info = async http.fetch("weather.org/abc")
sync con.insertInfo(info)

but rather sth like:

let info = async http.fetch("weather.org/abc")
pg.grabConnection.work { con in
  // synchronous code
}

Which brings us back to the SQLite thing, which needs the same "sync" setup for other reasons (so that the work block doesn't have to hop off the bg thread).