SQLiteNIO

SQLiteNIO: Client for SQLite built on NIO.

The Vapor Core team would like to pitch its SQLiteNIO library to the SSWG incubation process. This provides a client for interacting with a local SQLite database.

Motivation

There are several (very good) Swift implementations of SQLite in use. However, in order to be useful in a server-side Swift ecosystem, an implementation should be built on top of SwiftNIO to avoid blocking the event loop when interacting with the database. SQLiteNIO uses a NIOThreadPool to hand off interactions with SQLite to a separate thread to avoid blocking the current event loop.

Additionally SQLite is popular in several use cases, from local development, testing and certain deployment scenarios. The SSWG should have an incubated client for these use cases.

Propsed solution

Vapor's SQLiteNIO library is already released and in use, with about 1500 unique cloners a week. The library currently relies on the system providing the underlying SQLite client. The solution has been fairly well tested and put through it paces in the 8 months since its 1.0.0 release.

A detailed implementation (and usage guide in the README) would be provided if the community though the package was useful and decided to proceed with a proposal. We are slightly constrained in that as we are already released, breaking changes would need to address significant issues to be warranted.

One change that should definitely be discussed is the use of the C library. The SSWG guidelines say packages should prefer a Swift native implementation instead of wrapping C where appropriate. We believe that in this case, using the C library is warranted as implementing a native Swift driver is a significant undertaking, but we're happy to hear other arguments. One discussion point however is the use of relying on system libraries vs vending the C library ourselves.

Vending the C library would add some additional work on our end in order to integrate it and properly namespace it and would also increase compilation times for those needing it. However it would make the library significantly more portable and remove the need for ensuring libsqlite3-dev or similar is installed on the system.

We look forward to your feedback!

11 Likes
  • SQLite support is definitely useful for our ecosystem, so +1 for moving forward
  • wrapping the C code in this case is a must I think. Contrary to most DBs, SQLite doesn't use a server and a wire protocol, the whole database system is implemented as a whole and directly uses the on-disk files. So re-implementing that in Swift would be incredibly hard and also very bug prone. So +1 on wrapping C.
  • Unless there are good arguments against, I'd be in favour of vendoring the SQLite Amalgamation. I just downloaded a copy and ran this command:
    for f in *.c *.h; do gsed -ri 's/sqlite3_/foobar_sqlite3_/g' "$f"; done
    
    which successfully name-spaced SQLite as foobar_sqlite3_ which means it won't clash. You could maybe make it use c_sqlite_nio_ as the prefix instead of sqlite3_ and it won't clash anymore.
7 Likes

Agree. SQLite is a special case because it involves heavy low-level operations which is unsuitable and unnecessary to reimplement in Swift. Also, it’s hard to keep a custom implementation of SQLite up to date, which is not so good for the ecosystem.

It would be safer not to rename the SQLite symbols, because renaming makes it easier to link multiple copies of SQLite into a single app, and linking multiple copies of SQLite into a single app is a known way of corrupting your database. Here's §2.2.1 of How To Corrupt An SQLite Database File:

2.2.1. Multiple copies of SQLite linked into the same application

As pointed out in the previous paragraph, SQLite takes steps to work around the quirks of POSIX advisory locking. Part of that work-around involves keeping a global list (mutex protected) of open SQLite database files. But, if multiple copies of SQLite are linked into the same application, then there will be multiple instances of this global list. Database connections opened using one copy of the SQLite library will be unaware of database connections opened using the other copy, and will be unable to work around the POSIX advisory locking quirks. A close() operation on one connection might unknowingly clear the locks on a different database connection, leading to database corruption.

The scenario above sounds far-fetched. But the SQLite developers are aware of at least one commercial product that was released with exactly this bug. The vendor came to the SQLite developers seeking help in tracking down some infrequent database corruption issues they were seeing on Linux and Mac. The problem was eventually traced to the fact that the application was linking against two separate copies of SQLite. The solution was to change the application build procedures to link against just one copy of SQLite instead of two.

5 Likes

This wouldn't be an issue though under the assumption that all instances of SQLite run through SQLiteNIO right? So even if you had several dependencies using SQLiteNIO, they would all use the same namespaced instance. The bug described is still possible but the opposite of not-namespacing and vending SQLite would see far more issues with namespace collisions

If you assume that all instances of SQLite are the one provided by SQLiteNIO, then there's still no need for renaming because the assumption means there are no symbol conflicts.

But there's still the risk of clashing with the system installed copy of libsqlite-dev right?

I think we can't assume that nobody else isn't using another version of SQLite (possibly libsqlite-dev) in the same process and therefore we need namespacing.

But I think we can assume that no user would ask two libraries to open the same sqlite database from two libraries. If they're doing that, they're asking for trouble and in the very same way the could then also open the same database from multiple processes, we can't protect against it (and SQLite's global list of open DBs is merely a workaround that may protect some users in very specific cases).

It is safe to open the same database from multiple processes, if POSIX advisory locks work. It is not safe to open the same database twice in the same application with separate linked copies of SQLite, if POSIX advisory locks work. By “safe”, I mean doing so won't corrupt your database. Whether your app logic handles it is a separate concern.

I don't know that there's a good solution here (though maybe just using libsqlite-dev is one), but this is a real risk of vendoring SQLite that should be considered, even if it can't be solved perfectly.

I'm supportive of this pitch and I agree that this is a fine use case for wrapping C given the complexity of re-implementing in Swift.

On the vendoring vs. system library note, one positive I would note from my own experience vendoring a C library is that, in addition to portability and a better install experience for users, vendoring gives you control over exactly which version of the library a user is relying on, which is great from a maintenance perspective.

This might not be as much of a concern in this particular case - I don't know how often new libsqlite releases go out - but in the past when the MongoDB driver relied on a system install of the C driver, we often ran into cases where users had a too-old version of the library installed and got cryptic build errors about missing symbols. (I posted about this a while back and opened SR-9542; on that ticket my colleague @mbroadst notes that a pre-build script could alleviate this, so perhaps the proposal for extensible build tools in SPM could provide a workaround for such issues.)

3 Likes

I don't know much about SQLite's implementation but can you expand what exact scenario works fine with two processes (one linking SQLite driver A, the other one linking driver B) but doesn't work in one process (and two copies of SQLite, driver A and driver B)?

EDIT: Okay, I just see in SQLite's documentation that they use POSIX record locks (fcntl) which indeed are associated with a process and not the actually working BSD locks (flock). Both are advisory but the fcntl one holds the lock on the process and not on the file descriptor for some reason...

I still don't think this matters because somebody would again need to ask two different SQLite drivers to open the same database file in the same process for this to fail. I don't think it's reasonable trying to protect people against failure modes like that. Much like we can't protect them from the very same failure mode if they put their SQLite DB on an NFS mount, then the locks won't work and the DB will still get corrupted.

+1 would be great to have a SQLite client that is compatible with the server API and incubated along side the other database drivers

1 Like

Right, but you (the user) are responsible for creating and using that background thread correctly. And there are plenty of use cases where you might want to do standard queries without a transaction. We're not stopping that with this library.

(It's also worth noting that the higher level SQLiteKit supports transactions in a way that's easy to integrate.

For a SQLite library for Swift I wonder if coupling it to a NIO foundation makes as much sense, when if it was base purely on the upcoming Async/Await language features you could support SQLite things like iOS much more easily. The SwiftNIO package itself adds more than 50MB to the iOS app size and NIO is not generally used on the client side of things. However if the package was based on Async/Await language features you could use it with SwiftNIO Server Side and say something like GCD on client side.

Where did you get these figures from? As a comparison: SwiftNIO's example web server weighs in at 4.3MB (in release mode; debug mode is 6.1MB) for the whole binary. So I very much doubt it would add 50 MBs to an iOS app.

If you need just plain HTTP and JSON, then yes, SwiftNIO is very much overkill (because HTTP (through URLSession), JSON, and much much more is already covered by the SDK). But SwiftNIO is used for example in gRPC Swift on both the client and server side.

SQLiteNIO is focused on providing a SQLite driver for the server, any iOS compatibility comes second to that. Additionally, changing it to be purely async/await based will be several months at least, and make breaking changes to the package that the Vapor Core team can't accept at this point.

1 Like

Connection pooling is out of scope and higher level for this package (it gets complicated with SQLite, but for Postgres, it's in PostgresKit rather than PostgresNIO). And we need to clarify what we might by "you usually don't want to be async". From a simplistic point of view, it's definitely async because it's making a request to a service that doesn't return straight away and we can't block the current event loop.

Right, I think we're diverging quite far from this pitch now, right?

I think we're confusing async vs. sync APIs, async vs. sync I/O, etc with batching. IIRC, what Helge suggests is that there should be an API that allows you to batch multiple commands such that the DB connection isn't held by the client longer than absolutely necessary. And this applies irregardless of the API or the I/O being asynchronous or synchronous.

It doesn't matter whether "do some Swift work" is sync or async, what matters is that it's guaranteed to be "fast enough". Or did you want to imply with "nothing async" that it's not meant to do any I/O because I/O can usually take an arbitrarily long time?

If you did

db.fetch()
synchronouslyParse100MBOfJSON()
db.update()
synchronouslyTypecheckASwiftProgram()

then we have synchronous code but still, we'd be holding onto the DB connection for far too long.

I assume that you're now talking about SQLite, right? In PostgresNIO's case, there are no background threads and no thread switches, the I/O happens straight from the EventLoop, using NIO.

In PostgresNIO's case, there is not necessarily thread hopping, if you want to avoid it you can run the connection to PostgreSQL on the same EL as you are (AFAIK Vapor always co-locates the Channel to Postgres with the Channel that delivers the HTTP request/response). But even if you have thread hopping, that likely won't be an issue because the sending, processing, and receiving the query/result is usually a lot slower than a few thread hops. Sure, not thread hopping is better than thread hopping but if we're waiting for the DB to reply to us, then the thread hop likely won't be the thing that kills us.

If we can batch queries, then that's even better because we can send them off at the same time rather than waiting for the result of the previous query.

Lastly (I don't think this matters for this debate at all), SwiftNIO executes future callbacks immediately, so you won't suffer any delays because of the EventLoop. So let's say you had this code:

let el = group.next()
el.execute {
    let promise = el.makePromise(of: Void.self)
    
    for i in 0..<1000 {
        el.execute { print("\(i)") }
    }
    
    promise.futureResult.whenComplete { _ in
        print("future callback")
    }
    
    promise.succeed(())
}

What you'll see printed is

future callback
0
1
2
3
4
[...]
999

I think providing a way to make SQLite and NIO play nice is valuable. However, I do not think the API in the package is the right abstraction.

Some examples:

  • The API doesn't make it possible to efficiently insert multiple records (reusing the same prepared statement, possibly not even changing all of the bound parameters after a step).
  • The Statement type queries a bunch of information from SQLite that might not be needed, and then copies it around with every Row.
  • There are no ways to conditionally read some columns from query results.
  • You can't customize flags the connection is opened with.
  • You can't customize the busy handler, which currently just waits forever (I think this can lead to deadlocks, but I'm not quite sure right now).

Now, don't get me wrong: I'm not worried about all that functionality not being available in the package right now. What I do worry about is that the architecture doesn't seem to be designed in a way that allows for functionality like that to be slotted in later. I think this stems from the fact that the package tries to be both a swifty SQLite interface as well as a NIO adapter at the same time.

One thing that sets SQLite apart from other SQL databases (in my experience at least; I haven't worked as much with others as I have with SQLite) is that there is a bunch of functionality in the C interface that can and should be employed to both improve performance and enable advanced use-cases.

For someone looking to do more than just fire off a few non-performance-sensitive queries, looking to use even one bit of the C interface, the package as it exists right now is, sadly, completely useless: Even if going down to the C interface would be acceptable (which is awful to use directly thanks to the pervasive use of opaque types), the package doesn't provide access to the underlying handles.


To address these shortcomings, I would propose an alternative architecture:

First, a target that is a very thin (and as little overhead as possible) wrapper to make the C API a tad more usable in Swift. This should really be not much more than an apinotes overlay could provide: Make functions into methods, replace OpaquePointer occurrences with proper types, and possibly convert result codes into thrown errors, if that can be done without too much overhead.

This target could then be used as base for all sorts of SQLite packages, related to NIO or not (crucially, this should work on all platforms that both SQLite and Swift support, even if NIO does not).

Second, a target that provides integration with NIO. From my experience using SQLite in concert with Dispatch and OperationQueue so far, it would probably for the moment suffice to have a database-queue type that has a single method func run<T>(_ block: (Connection) -> T) -> EventLoopFuture<T>, but I think it might be possible to extend and keep the existing interface around (although maybe with deprecations sprinkled generously over it) to avoid a breaking change as well.

Any convenience APIs not directly related to NIO can then be implemented (either in this or in other packages) on top of the wrapper, which would make them usable both with the NIO adapter as well as without, enabling more code reuse across the ecosystem.


Wrt. linking SQLite statically, I think that is an interesting enhancement, especially if in the future SPM gained a feature-flag system that would allow easy customization of which parts of the amalgamation are included. I do think keeping the ability to use the system version is important though, so as long as only one is possible (or is there a SPM feature that would allow the user to choose?) the system library is probably the way to go.

When it comes to renaming the symbols to avoid collisions, I do not think that is a good idea. For one, it would make using the system provided SQLite more difficult to impossible, and it would also mean that to use SQLite extensions written in C (both those provided by the SQLite project as well as third-party), one would have to fiddle around with the sources. Just 's/sqlite3_/foobar_sqlite3_/g' also won't cut it, because there are at least constants (and maybe macros?) to deal with, as well as internal functions, types and constants that do not follow the sqlite3_* naming scheme.

The SQLite C interface also doesn't change very often and (afaik) only in backwards-compatible ways, so I'm not too worried there are issues to be solved here.

5 Likes