[pitch] SwiftNIO based PostgreSQL client

tanner0101 · November 15, 2018, 7:09pm

NIOPostgres: Client for PostgreSQL server built on NIO.

This module, NIOPostgres, would include a client for connecting to, authorizing, and querying a PostgreSQL server. This includes channel handlers for encoding / decoding PostgreSQL packets, a handler for enabling request / response interaction, and a Codable-based support for parsing / serializing PostgreSQL's binary data format.

Motivation

Most Swift implementations of a PostgreSQL client are based on the libpq C library which handles transport internally. Building a library directly on top of PostgreSQL's wire protocol using SwiftNIO should yield a more reliable, maintainable, and performant interface for PostgreSQL databases.

Propsed solution

A proof of concept implementation is available at vapor-community/nio-postgres. This package has the following features implemented:

Connection & Authorization
Optional TLS negotation
Simple Query
Extended Query (parameterized binds)
Encoder for converting single-value Encodable types to query paramters
Decoder for converting PostgreSQL data to single-value Decodable types
Decoder for converting PostgreSQL rows to keyed Decodable types
Unit and performance tests

This package is based on the core implementation of vapor/postgresql. Basically, just the part that builds directly on SwiftNIO. As such, NIOPostgres's only dependencies are swift-nio and swift-nio-ssl.

Detailed solution

This section goes into some of the details of this solution and decisions that were made.

Connection-based Request / Response

This module centers around a PostgresConnection type that enables a request / response interaction with the connected PostgreSQL database.

Simple Query

Below is working example of connecting to a remote server with TLS, and making a simple query.

let elg = MultiThreadedEventLoopGroup(numberOfThreads: 1)

// open a new connection, adding PostgreSQL packet handlers
let conn = try PostgresConnection.connect(
    to: SocketAddress.newAddressResolving(host: "my.postgresql.domain", port: 5432),
    on: elg.next()
).wait()
print(conn) // PostgresConnection

// ask server if it supports TLS, upgrading if so
let upgraded = try conn.requestTLS(using: .forClient(certificateVerification: .none)).wait()
guard upgraded else {
	fatalError("Remote server does not support TLS")
}

// authenticate, fetching startup information
try conn.authenticate(username: "vapor", database: "vapor", password: "secret").wait()

// perform a simple (text) query
let rows = try conn.simpleQuery("SELECT version()").wait()
print(rows) // [PostgresRow]

// decode a single-value `Decodable` from the row
let version = try rows[0].decode(String.self, at: "version")
print(version) // PostgreSQL 9.1 ... 

// cleanup
try conn.close().wait()
try elg.syncShutdownGracefully()

Extended Query

Extended queries are similar to simple queries, except they allow for values to be parameterized.

let conn: PostgresConnection ...

// performs an extended query
let binds: PostgresBinds = ["hello"]
let rows = try conn.query("SELECT $1::TEXT as foo", binds).wait()

// decode a single-value `Decodable` from the row
let foo = try rows[0].decode(String.self, at: "foo")
print(foo) // hello

PostgresMessage

This module's PostgresMessage enum is the basis of its communication with the connected server. It represents all possible packets that could be sent to or received from the client.

An inbound ByteToMessageDecoder and outbound MessageToByteEncoder are used to translate raw data from SwiftNIO into these messages.

PostgresConnection

This class uses an inbound ChannelHandler to send and receive PostgresMessage via a request / response interface. This handler supports pipelining multiple requests by storing an array of promises that will be fulfilled later.

Table Names

PostgreSQL does not include table names in its wire protocol data. Only table OIDs are provided. Without string table identifiers, it is difficult to disambiguate between duplicate column names in joined queries.

To help remedy this issue, this package provides a method loadTableNames() on PostgresConnection. This method executes and stores the result of the following query:

SELECT oid, relname FROM pg_class

Once loaded, this information can be used to translate table name to table OID using a hash map. This enables PostgresRow to decode columns / entities from a specific table.

// SELECT * FROM user, orders
// this query results in two columns named "id"

// a row from the previous query
let row: PostgresRow ...

let user = row.decode(User.self, table: "users")
let id = row.decode(UUID.self, at: "id", table: "users")

CMD5

MD5 hashing is required for PostgreSQL's authentication flow. This module follows NIO's approach and embeds a private C-based implementation rather than relying on external Swift crypto libraries.

Error codes

A Swifty interface to PostgreSQL's supported error codes is provided. This makes it possible to detect errors like syntax errors.

let conn: PostgresConnection ...

do {
    _ = try conn.simpleQuery("SELECT &").wait()
} catch let error as PostgresError {
	switch error.code {
	case .syntax_error: ...
	default: ...
	}
}

pvieito · November 17, 2018, 10:59pm

This seems like I great addition to the Swift ecosystem! However I have two concerns:

First, it would be great to pitch before any database connector library a common table/dataframe library (à la Pandas in the Python world) so that all of them could rely on.
Second, while this is great for server-side tools already relying on NIO, I think it could be great also for a lot more uses where the user should not know the underlying use of NIO threading. Could the API be tweaked for these cases avoiding the requirement of more "complex" concepts like the MultiThreadedEventLoopGroup, for example defaulting to MultiThreadedEventLoopGroup(numberOfThreads: 1) as in the example?

rnantes · November 18, 2018, 8:06pm

I think this would be very useful. A lightweight and standard way to interact with PostgreSQL databases without the need to import heavy frameworks is a great idea. Additionally, a standard Redis client similar to this would also be useful.

johannesweiss · November 19, 2018, 6:38pm

A PostgreSQL driver looks like a really great thing to have and I think what you propose looks great!

I think supporting clients that aren't NIO apps without forcing them to import NIO is a good idea. Initially I was planning on making quite a short post here but now it's actually a bit long, apologies...

I thought about how an API could look that supports both NIO and non-NIO users fully without compromising on one of them. Below there's an outline on how a Postgres library could look, taking @tanner0101's API as the basis.

There's a fundamental difference between NIO users and non-NIO users. NIO users will know what EventLoop/EventLoopGroup they want to run their connections on to achieve best performance. Non-NIO users on the other hand shouldn't need to know what an EventLoop(Group) even is. To rectify this fundamental difference I'd propose to have an enum Executor where a user can choose where they want to run the network IO on.

So in the example below, NIO users would choose .eventLoopGroup(myAlreadyExistingNIOEventLoopGroup) and all others can just do .spawnThreads. To have a place to store the executor I'd use a class called Postgres which would also hold the connect method.

(disclaimer: all code here has never been compiled so there will be mistakes here, this is just meant as an example)

public enum Executor {
    case spawnThreads
    case eventLoopGroup(EventLoopGroup)
}

public class Postgres {
    private let executor: Executor
    private let group: EventLoopGroup
    private let running = Atomic<Bool>(value: true)

    public init(executor: Executor) {
        switch executor {
            case .spawnThreads:
                self.group = MultiThreadedEventLoopGroup(numberOfThreads: 1)
            case .eventLoopGroup(let group):
                self.group = group
        }
    }

    public static func connect(host: String,
                               port: Int) -> EventLoopFuture<FooConnection> {
        /* actual implementation */
    }

    public func terminate() throws {
        guard self.running.exchange(value: false) else {
            return
        }
        switch executor {
            case .spawnThreads:
                try self.group.syncShutdownGracefully()
            case .eventLoopGroup:
                ()
        }
    }

    deinit {
        assert(!self.running.load(), "Postgres not terminated")
    }
}

So what's this odd terminate method doing there? There might be resources (like the threads spawned) attached to a Postgres object so we need to have a place where we free those resources up again. As soon as we have an SSWG logging library this might also be a place to store a logger.
This Postgres object shouldn't be a burden at all, just initialise it in main.swift (and terminate it in a defer block right after). It's really just a handle to the library itself which will also work well with dependency injection mechanisms.

For now though, this library isn't usable without using NIO's futures but that's really easy to fix and should IMHO also be part of the library:

public extension Postgres { /* queue + callback support */
    public static func connect(host: String,
                               port: Int,
                               queue: DispatchQueue,
                               _ body: @escaping (Result<FooConnection, Error>) -> Void) {
        self.connect(host: host, port: port).then { value in
            queue.async {
                body(.success(value))
            }
        }.whenFailure { error in
            queue.async {
                body(.error(error))
            }
        }
    }
}

I know that's quite a bit of boilerplate for the library author to write...

Next, we obviously need some useful operations:

public class PostgresConnection {
    func simpleQuery(_ query: String) -> EventLoopFuture<[PostgresRow]> { ... }
    /* more operations */
}

public extension PostgresConnection { /* queue + callback support */
    public static func simpleQuery(_ query: String,
                                   queue: DispatchQueue,
                                   _ body: @escaping (Result<FooConnection, Error>) -> Void) {
        self.simpleQuery(query).then { value in
            queue.async {
                body(.success(value))
            }
        }.whenFailure { error in
            queue.async {
                body(.error(error))
            }
        }
    }
}

If we give the library this shape, we can use it in the NIO way:

import NIO

let group = MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount)

let pg = Postgres(executor: .eventLoopGroup(group)
defer {
    try! pg.terminate()
}

func giveMeSomeRows() -> EventLoopFuture<[PostgresRow]> {
    let result = pg.connect(host: "localhost", port: 5432).then { conn in
        conn.simpleQuery("SELECT * FROM foo;")
    }
    result.whenComplete
        conn.close()
    }
    return result
}

do {
    let rows = try giveMeSomeRows().wait()
    print("OK: \(rows)")
} catch {
    print("ERROR: \(error)")
}

but also without NIO in the usual Swift way by having completion blocks. So the example from above without any NIOisms would look like the following (assuming Result lands):

import Dispatch

let pg = Postgres(executor: .spawnThreads)
defer {
    try! pg.terminate()
}

func giveMeSomeRows(queue: DispatchQueue, _ body: Result<[PostgresRow], Error>) {
    pg.connect(host: "localhost", port: 5432, queue: .global) { maybeConnection in
        switch maybeConnection {
        case .success(let conn):
            conn.simpleQuery("SELECT * FROM foo;", queue: .global) { maybeRows in
                queue.async {
                    body(maybeRows)
                }
                conn.close() // technically close can probably also return an error but let's ignore that here...
            }
        }
    case .error(let error):
        queue.async {
            body(maybeRows)
        }
    }
}

let dispatchGroup = DispatchGroup()
giveMeSomeRows { maybeRows in
    switch maybeRows {
    case .success(let rows):
        print("OK: \(rows)")
    case .failure(let error
        print("ERROR: \(error)")
    }
    dispatchGroup.signal()
}
dispatchGroup.wait()

or in case Result does not land (so everything would take the unfortunate @escaping (Type?, Error?) -> Void closures):

import Dispatch

let pg = Postgres(executor: .spawnThreads)
defer {
    try! pg.terminate()
}

func giveMeSomeRows(queue: DispatchQueue, _ body: Result<[PostgresRow], Error>) {
    pg.connect(host: "localhost", port: 5432, queue: .global) { (conn, error) in
        if let conn = conn {
            conn.simpleQuery("SELECT * FROM foo;", queue: .global) { (rows, error) in
                queue.async {
                    body(rows, error)
                }
                conn.close() // technically close can probably also return an error but let's ignore that here...
            }
        }
    } else {
        queue.async {
            body(error!)
        }
    }
}

let dispatchGroup = DispatchGroup()
giveMeSomeRows { (rows, error) in
    if rows = rows {
        print("OK: \(rows)")
    } else {
        print("ERROR: \(error!)")
    }
    dispatchGroup.signal()
}
dispatchGroup.wait()

Please let me know what you think.

pvieito · November 20, 2018, 6:00pm

Seems great to me! Having this dual API would allow this type of libraries to have a more broad user base while enhancing the Swift ecosystem as a whole.

tanner0101 · November 21, 2018, 3:26pm

Thanks for the detailed response! I agree it would be nice for this library to support both NIO and non-NIO APIs.

It would seem all of the non-NIO functionality can be built on top of the APIs as already described in this pitch. Because of this, I wonder if it might make sense to expose the non-NIO API entirely through a new wrapper type (like your Postgres class). Maybe calling it something like: DispatchPostgresConnection, CallbackPostgresConnection, or SimplePostgresConnection?

The underlying implementation would consist of simply combining a PostgresConnection and EventLoopGroup and exposing non-Future-based methods for interacting with the connection.

public class SimplePostgresConnection {
    private let connection: PostgresConnection
    private let group: EventLoopGroup
    // any other properties needed for thread-safety

    public init() {
        // create event loop group and connection
    }

    public func connect(...) { 
        // call connect on underlying connection, translating future to callbacks
    }
    
    // mirror simpleQuery and query methods, converting futures to callbacks

    public func terminate() {
        // do event loop group cleanup
    }
}

johannesweiss · November 21, 2018, 5:43pm

Yes, I think that would work well! There's one small concern I have: Let's imagine the app is written in a non-NIO style but it uses a library that is written in NIO. The app would then get a SimplePostgresConnection but the library might prefer a NIO-style PostgresConnection. For a database library (like the one you propose) all this is probably not that relevant because you typically don't hand database connections around to other libraries all that often but for other libraries that might be relevant.
But generally I think it would be cool to share the same type I think. And as you said, the non-NIO APIs are really just wrappers on top of the NIO-style API, that also why I created them mostly as extensions.

Let me know what you think.

tanner0101 · November 21, 2018, 6:02pm

That's a good point. The SimplePostgresConnection could (and probably should) expose its PostgresConnection as a public var, in case you need to convert for an API.

For this specific example, I wonder how likely it would be that someone using the non-NIO version of Postgres would need to interface with a NIO-version library? That seems like a complex enough case to merit just using NIO-based APIs for both libraries. I may be wrong here though.

My main concern with having the non-NIO methods directly alongside the NIO methods is that it adds (possibly confusing) bloat:

For NIO users:

There is a terminate() method that should be called but doesn't really do anything.
Auto-complete has extraneous methods which don't support futures.

For non-NIO users:

Auto-complete has extraneous methods which don't support callbacks.

I'm interested to hear your additional thoughts.

tkrajacic · November 21, 2018, 7:02pm

You can also have a separate package nio-postgres-dispatch (strawman name) that contains the wrappers, and hence if you don't import that you don't have those autocompletion issues when you just use the NIO flavored version at least.

Would be nice if there was a way to hide the NIO symbols if the library is just a dependency downstream and not top level.

pvieito · November 21, 2018, 8:08pm

I think having two structs one wrapping the other could be a good idea but maybe that naming is a bit NIO-centric, what about:

PostgresConnection: the standard Swift API wrapper
NIOPostgresConnection: the low-level NIO API (or even, nested PostgresConnection.NIO?)

johannesweiss · November 22, 2018, 8:35am

I think explicitly managing the lifecycle for scarce resources (ie. everything that isn't memory/CPU which are done automatically by ARC/the OS) is quite important and I would introduce it early in an API design. I don't think it's unreasonable for users to terminate Postgres. Therefore I would argue that in the NIO world, we should still have users call the terminate() method even if it might not do something.
In the future, we might want to implement some connection pooling and then the terminate() method could terminate the connection pool.

That's a fair point. We could in theory solve that with a module which holds the non-NIO extensions maybe?

johannesweiss · November 22, 2018, 8:42am

Extra packages put (I think) too much burden on the package maintainers but a separate module might do it.

Maybe we could have a NIOPostgres module which gives you all the NIO APIs only and another module Postgres which does @_exported import NIOPostgres and therefore would give you both styles of API. Then most people would just import Postgres and get both but if you really only want the NIO stuff, you'd do import NIOPostgres.

tkrajacic · November 22, 2018, 8:46am

Absolutely! That's a lot better. I completely forgot about @_exported!

Helge_Hess1 · December 28, 2018, 3:00pm

Having a basic NIO PostgreSQL lib would be great, I also considered approaching this. There are essentially three implementations I know of: the Vapor one, then there was another GCD(?) based one on GitHub from which I don't remember the name (was the Vapor one derived from that?), and finally I also have an (unreleased) partial protocol implementation for Noze.io. I think it is fair to say that the Vapor one is tested best.

What I very much dislike about the Vapor implementation is that it tightly couples with Codable, I wouldn't want that - at least not at this level.

I would love to see un-opinionated PostgreSQL decoder/encoder channels in the spirit of NIOIRC or NIORedis (or libpq for that matter). They would yield raw PostgreSQL packets w/o copying of data.
For example (if I remember right) the PG protocol returns rows as a single framed packet. Lets assume a NIO channel receives a ByteBuffer. Lets say that buffer is 500 bytes and contains like 80 pkey rows.
What I'd like the decoder to read-emit is a small PostgreSQLRowPacket struct which refers to the ByteBuffer slice for that row, and provides ways to extract the desired base types by index (and potentially column name, I think PG also transmits a row schema which could be shared between those rows).

Arbitrary Coding approaches could then efficiently built on top of this (and likely once for different DB client libs). I guess it would be nice to have a basic "batteries included" builtin client library, but that is Step 2 IMO.

That also again outlines the potential need for two things which are not specific to PG at all but affect pretty much any client library:

Connection Setup Strategies (retries, etc)
Connection Pooling

It would be really good to have standard NIO implementations for this.

There's a fundamental difference between NIO users and non-NIO users. ... event loops ...

IMO the approach in my IRC client is reasonable. Simple and deals with both scenarios in reasonable ways.

Mordil · December 30, 2018, 3:58pm

Helge_Hess1:

What I very much dislike about the Vapor implementation is that it tightly couples with Codable, I wouldn't want that - at least not at this level .

I would love to see un-opinionated PostgreSQL decoder/encoder channels in the spirit of NIOIRC or NIORedis (or libpq for that matter). They would yield raw PostgreSQL packets w/o copying of data.
For example (if I remember right) the PG protocol returns rows as a single framed packet. Lets assume a NIO channel receives a ByteBuffer. Lets say that buffer is 500 bytes and contains like 80 pkey rows.
What I'd like the decoder to read-emit is a small PostgreSQLRowPacket struct which refers to the ByteBuffer slice for that row, and provides ways to extract the desired base types by index (and potentially column name, I think PG also transmits a row schema which could be shared between those rows)

Arbitrary Coding approaches could then efficiently built on top of this (and likely once for different DB client libs). I guess it would be nice to have a basic "batteries included" builtin client library, but that is Step 2 IMO.

I'm probably missing something here, but in what way does the current implementation not provide these capabilities?

Queries provide a PostgresRow, which in turn have methods to access underlying PostgresData that have the fundamental byteBuffer property.

I was following Tanner's pitch and started working on a proposal for a SwiftNIO Redis client, and after working on that I've come to the same conclusion. Perhaps we should have another pitch for a "Database Driver API" library, akin to the Logging proposal?

Helge_Hess1 · December 30, 2018, 4:21pm

Hm, fair point. Looks like PostgresMessage.InboundHandler is pretty close to what I'm looking for!
Can we just have that and put the Codable stuff in a different module? Just a clean module yielding and taking those packets?

(BTW: I think the PostgresMessage enum might be a little big to be passed around in NIOAny's, @johannesweiss ?)

Perhaps we should have another pitch for a "Database Driver API" library

I don't think this is database specific at all either, it is also the same for IRC or IMAP4 or HTTP, in short: essentially any client.

A "database driver API" might be the thing which holds the types to make database protocol libs do the Codable thing.

Helge_Hess1 · January 2, 2019, 11:36am

Although unfortunate I think it may be worthwhile to have separate NIOPostgresClient and NIOPostgresClientTLS modules.

Why? The basic NIOPostgres package would be just the raw decoder and essentially a really tiny module doing the hard work. Even a NIOPostgresClient module would be rather small, especially if we can get pooling and connection setup into a more general module.

It would be a really a waste having to pull down the whole nio-ssl just for that. Particularly because many people won't connect to PG using TLS.
(and many (presumably almost all) projects are running a proxy for frontend TLS, so they might not link nio-ssl at all)

tanner0101 · January 8, 2019, 10:39pm

This sounds similar to Vapor's DatabaseKit package. That includes some core Database fundamentals like a ConnectionPool.

I agree. The plan is that Vapor's higher level PostgresKit package will combine NIOPostgres and DatabaseKit to create a high-level, easy to use Postgres client.

It should be possible to move the Codable stuff to that higher layer. I put it in NIOPostgres originally because it allows for the details of PostgresMessage to be kept internal. But it sounds like we might want to make that public anyway.

Helge_Hess1 · January 9, 2019, 1:38pm

If you want, I could have a look on isolating the pure protocol parts (similar to what I did with the very plain NIOIRC vs the more opinionated IRC Client module).
If that's not being done here, I would probably do it for my own purposes anyways

tanner0101 · January 9, 2019, 7:49pm

I see PostgresKit being the right place to put the Codable stuff. I do think NIOPostgres should still take care of converting PostgresData to Swift types like String and Int, but that can we done with a simple protocol like PostgresDataConvertible. Then, in PostgresKit, Coders can be built on top on top of the convertible protocol.

I will take a stab at this soon (hopefully this week), but I am always happy for PRs :)