[pitch] SwiftNIO based PostgreSQL client


(Tanner) #1

NIOPostgres: Client for PostgreSQL server built on NIO.

This module, NIOPostgres, would include a client for connecting to, authorizing, and querying a PostgreSQL server. This includes channel handlers for encoding / decoding PostgreSQL packets, a handler for enabling request / response interaction, and a Codable-based support for parsing / serializing PostgreSQL's binary data format.

Motivation

Most Swift implementations of a PostgreSQL client are based on the libpq C library which handles transport internally. Building a library directly on top of PostgreSQL's wire protocol using SwiftNIO should yield a more reliable, maintainable, and performant interface for PostgreSQL databases.

Propsed solution

A proof of concept implementation is available at vapor-community/nio-postgres. This package has the following features implemented:

  • Connection & Authorization
  • Optional TLS negotation
  • Simple Query
  • Extended Query (parameterized binds)
  • Encoder for converting single-value Encodable types to query paramters
  • Decoder for converting PostgreSQL data to single-value Decodable types
  • Decoder for converting PostgreSQL rows to keyed Decodable types
  • Unit and performance tests

This package is based on the core implementation of vapor/postgresql. Basically, just the part that builds directly on SwiftNIO. As such, NIOPostgres's only dependencies are swift-nio and swift-nio-ssl.

Detailed solution

This section goes into some of the details of this solution and decisions that were made.

Connection-based Request / Response

This module centers around a PostgresConnection type that enables a request / response interaction with the connected PostgreSQL database.

Simple Query

Below is working example of connecting to a remote server with TLS, and making a simple query.

let elg = MultiThreadedEventLoopGroup(numberOfThreads: 1)

// open a new connection, adding PostgreSQL packet handlers
let conn = try PostgresConnection.connect(
    to: SocketAddress.newAddressResolving(host: "my.postgresql.domain", port: 5432),
    on: elg.next()
).wait()
print(conn) // PostgresConnection

// ask server if it supports TLS, upgrading if so
let upgraded = try conn.requestTLS(using: .forClient(certificateVerification: .none)).wait()
guard upgraded else {
	fatalError("Remote server does not support TLS")
}

// authenticate, fetching startup information
try conn.authenticate(username: "vapor", database: "vapor", password: "secret").wait()

// perform a simple (text) query
let rows = try conn.simpleQuery("SELECT version()").wait()
print(rows) // [PostgresRow]

// decode a single-value `Decodable` from the row
let version = try rows[0].decode(String.self, at: "version")
print(version) // PostgreSQL 9.1 ... 

// cleanup
try conn.close().wait()
try elg.syncShutdownGracefully()

Extended Query

Extended queries are similar to simple queries, except they allow for values to be parameterized.

let conn: PostgresConnection ...

// performs an extended query
let binds: PostgresBinds = ["hello"]
let rows = try conn.query("SELECT $1::TEXT as foo", binds).wait()

// decode a single-value `Decodable` from the row
let foo = try rows[0].decode(String.self, at: "foo")
print(foo) // hello

PostgresMessage

This module's PostgresMessage enum is the basis of its communication with the connected server. It represents all possible packets that could be sent to or received from the client.

An inbound ByteToMessageDecoder and outbound MessageToByteEncoder are used to translate raw data from SwiftNIO into these messages.

PostgresConnection

This class uses an inbound ChannelHandler to send and receive PostgresMessage via a request / response interface. This handler supports pipelining multiple requests by storing an array of promises that will be fulfilled later.

Table Names

PostgreSQL does not include table names in its wire protocol data. Only table OIDs are provided. Without string table identifiers, it is difficult to disambiguate between duplicate column names in joined queries.

To help remedy this issue, this package provides a method loadTableNames() on PostgresConnection. This method executes and stores the result of the following query:

SELECT oid, relname FROM pg_class

Once loaded, this information can be used to translate table name to table OID using a hash map. This enables PostgresRow to decode columns / entities from a specific table.

// SELECT * FROM user, orders
// this query results in two columns named "id"

// a row from the previous query
let row: PostgresRow ...

let user = row.decode(User.self, table: "users")
let id = row.decode(UUID.self, at: "id", table: "users")

CMD5

MD5 hashing is required for PostgreSQL's authentication flow. This module follows NIO's approach and embeds a private C-based implementation rather than relying on external Swift crypto libraries.

Error codes

A Swifty interface to PostgreSQL's supported error codes is provided. This makes it possible to detect errors like syntax errors.

let conn: PostgresConnection ...

do {
    _ = try conn.simpleQuery("SELECT &").wait()
} catch let error as PostgresError {
	switch error.code {
	case .syntax_error: ...
	default: ...
	}
}

(Pedro José Pereira Vieito) #2

This seems like I great addition to the Swift ecosystem! However I have two concerns:

  • First, it would be great to pitch before any database connector library a common table/dataframe library (à la Pandas in the Python world) so that all of them could rely on.
  • Second, while this is great for server-side tools already relying on NIO, I think it could be great also for a lot more uses where the user should not know the underlying use of NIO threading. Could the API be tweaked for these cases avoiding the requirement of more "complex" concepts like the MultiThreadedEventLoopGroup, for example defaulting to MultiThreadedEventLoopGroup(numberOfThreads: 1) as in the example?

(Reid Nantes) #3

I think this would be very helpful. A lightweight and standard way to interact with PostgreSQL databases without the need to import heavy frameworks is very useful. I think a standard Redis client similar to this would also be useful.


(Johannes Weiss) #4

A PostgreSQL driver looks like a really great thing to have and I think what you propose looks great!

I think supporting clients that aren't NIO apps without forcing them to import NIO is a good idea. Initially I was planning on making quite a short post here but now it's actually a bit long, apologies...

I thought about how an API could look that supports both NIO and non-NIO users fully without compromising on one of them. Below there's an outline on how a Postgres library could look, taking @tanner0101's API as the basis.

There's a fundamental difference between NIO users and non-NIO users. NIO users will know what EventLoop/EventLoopGroup they want to run their connections on to achieve best performance. Non-NIO users on the other hand shouldn't need to know what an EventLoop(Group) even is. To rectify this fundamental difference I'd propose to have an enum Executor where a user can choose where they want to run the network IO on.

So in the example below, NIO users would choose .eventLoopGroup(myAlreadyExistingNIOEventLoopGroup) and all others can just do .spawnThreads. To have a place to store the executor I'd use a class called Postgres which would also hold the connect method.

(disclaimer: all code here has never been compiled so there will be mistakes here, this is just meant as an example)

public enum Executor {
    case spawnThreads
    case eventLoopGroup(EventLoopGroup)
}

public class Postgres {
    private let executor: Executor
    private let group: EventLoopGroup
    private let running = Atomic<Bool>(value: true)

    public init(executor: Executor) {
        switch executor {
            case .spawnThreads:
                self.group = MultiThreadedEventLoopGroup(numberOfThreads: 1)
            case .eventLoopGroup(let group):
                self.group = group
        }
    }

    public static func connect(host: String,
                               port: Int) -> EventLoopFuture<FooConnection> {
        /* actual implementation */
    }

    public func terminate() throws {
        guard self.running.exchange(value: false) else {
            return
        }
        switch executor {
            case .spawnThreads:
                try self.group.syncShutdownGracefully()
            case .eventLoopGroup:
                ()
        }
    }

    deinit {
        assert(!self.running.load(), "Postgres not terminated")
    }
}

So what's this odd terminate method doing there? There might be resources (like the threads spawned) attached to a Postgres object so we need to have a place where we free those resources up again. As soon as we have an SSWG logging library this might also be a place to store a logger.
This Postgres object shouldn't be a burden at all, just initialise it in main.swift (and terminate it in a defer block right after). It's really just a handle to the library itself which will also work well with dependency injection mechanisms.

For now though, this library isn't usable without using NIO's futures but that's really easy to fix and should IMHO also be part of the library:

public extension Postgres { /* queue + callback support */
    public static func connect(host: String,
                               port: Int,
                               queue: DispatchQueue,
                               _ body: @escaping (Result<FooConnection, Error>) -> Void) {
        self.connect(host: host, port: port).then { value in
            queue.async {
                body(.success(value))
            }
        }.whenFailure { error in
            queue.async {
                body(.error(error))
            }
        }
    }
}

I know that's quite a bit of boilerplate for the library author to write...

Next, we obviously need some useful operations:

public class PostgresConnection {
    func simpleQuery(_ query: String) -> EventLoopFuture<[PostgresRow]> { ... }
    /* more operations */
}

public extension PostgresConnection { /* queue + callback support */
    public static func simpleQuery(_ query: String,
                                   queue: DispatchQueue,
                                   _ body: @escaping (Result<FooConnection, Error>) -> Void) {
        self.simpleQuery(query).then { value in
            queue.async {
                body(.success(value))
            }
        }.whenFailure { error in
            queue.async {
                body(.error(error))
            }
        }
    }
}

If we give the library this shape, we can use it in the NIO way:

import NIO

let group = MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount)

let pg = Postgres(executor: .eventLoopGroup(group)
defer {
    try! pg.terminate()
}

func giveMeSomeRows() -> EventLoopFuture<[PostgresRow]> {
    let result = pg.connect(host: "localhost", port: 5432).then { conn in
        conn.simpleQuery("SELECT * FROM foo;")
    }
    result.whenComplete
        conn.close()
    }
    return result
}

do {
    let rows = try giveMeSomeRows().wait()
    print("OK: \(rows)")
} catch {
    print("ERROR: \(error)")
}

but also without NIO in the usual Swift way by having completion blocks. So the example from above without any NIOisms would look like the following (assuming Result lands):

import Dispatch

let pg = Postgres(executor: .spawnThreads)
defer {
    try! pg.terminate()
}

func giveMeSomeRows(queue: DispatchQueue, _ body: Result<[PostgresRow], Error>) {
    pg.connect(host: "localhost", port: 5432, queue: .global) { maybeConnection in
        switch maybeConnection {
        case .success(let conn):
            conn.simpleQuery("SELECT * FROM foo;", queue: .global) { maybeRows in
                queue.async {
                    body(maybeRows)
                }
                conn.close() // technically close can probably also return an error but let's ignore that here...
            }
        }
    case .error(let error):
        queue.async {
            body(maybeRows)
        }
    }
}

let dispatchGroup = DispatchGroup()
giveMeSomeRows { maybeRows in
    switch maybeRows {
    case .success(let rows):
        print("OK: \(rows)")
    case .failure(let error
        print("ERROR: \(error)")
    }
    dispatchGroup.signal()
}
dispatchGroup.wait()

or in case Result does not land (so everything would take the unfortunate @escaping (Type?, Error?) -> Void closures):

import Dispatch

let pg = Postgres(executor: .spawnThreads)
defer {
    try! pg.terminate()
}

func giveMeSomeRows(queue: DispatchQueue, _ body: Result<[PostgresRow], Error>) {
    pg.connect(host: "localhost", port: 5432, queue: .global) { (conn, error) in
        if let conn = conn {
            conn.simpleQuery("SELECT * FROM foo;", queue: .global) { (rows, error) in
                queue.async {
                    body(rows, error)
                }
                conn.close() // technically close can probably also return an error but let's ignore that here...
            }
        }
    } else {
        queue.async {
            body(error!)
        }
    }
}

let dispatchGroup = DispatchGroup()
giveMeSomeRows { (rows, error) in
    if rows = rows {
        print("OK: \(rows)")
    } else {
        print("ERROR: \(error!)")
    }
    dispatchGroup.signal()
}
dispatchGroup.wait()

Please let me know what you think.


(Pedro José Pereira Vieito) #5

Seems great to me! Having this dual API would allow this type of libraries to have a more broad user base while enhancing the Swift ecosystem as a whole.


(Tanner) #6

Thanks for the detailed response! I agree it would be nice for this library to support both NIO and non-NIO APIs.

It would seem all of the non-NIO functionality can be built on top of the APIs as already described in this pitch. Because of this, I wonder if it might make sense to expose the non-NIO API entirely through a new wrapper type (like your Postgres class). Maybe calling it something like: DispatchPostgresConnection, CallbackPostgresConnection, or SimplePostgresConnection?

The underlying implementation would consist of simply combining a PostgresConnection and EventLoopGroup and exposing non-Future-based methods for interacting with the connection.

public class SimplePostgresConnection {
    private let connection: PostgresConnection
    private let group: EventLoopGroup
    // any other properties needed for thread-safety

    public init() {
        // create event loop group and connection
    }

    public func connect(...) { 
        // call connect on underlying connection, translating future to callbacks
    }
    
    // mirror simpleQuery and query methods, converting futures to callbacks

    public func terminate() {
        // do event loop group cleanup
    }
}

(Johannes Weiss) #7

Yes, I think that would work well! There's one small concern I have: Let's imagine the app is written in a non-NIO style but it uses a library that is written in NIO. The app would then get a SimplePostgresConnection but the library might prefer a NIO-style PostgresConnection. For a database library (like the one you propose) all this is probably not that relevant because you typically don't hand database connections around to other libraries all that often but for other libraries that might be relevant.
But generally I think it would be cool to share the same type I think. And as you said, the non-NIO APIs are really just wrappers on top of the NIO-style API, that also why I created them mostly as extensions.

Let me know what you think.


(Tanner) #8

That's a good point. The SimplePostgresConnection could (and probably should) expose its PostgresConnection as a public var, in case you need to convert for an API.

For this specific example, I wonder how likely it would be that someone using the non-NIO version of Postgres would need to interface with a NIO-version library? That seems like a complex enough case to merit just using NIO-based APIs for both libraries. I may be wrong here though.

My main concern with having the non-NIO methods directly alongside the NIO methods is that it adds (possibly confusing) bloat:

For NIO users:

  • There is a terminate() method that should be called but doesn't really do anything.
  • Auto-complete has extraneous methods which don't support futures.

For non-NIO users:

  • Auto-complete has extraneous methods which don't support callbacks.

I'm interested to hear your additional thoughts.


(Thomas Krajacic) #9

You can also have a separate package nio-postgres-dispatch (strawman name) that contains the wrappers, and hence if you don't import that you don't have those autocompletion issues when you just use the NIO flavored version at least.

Would be nice if there was a way to hide the NIO symbols if the library is just a dependency downstream and not top level.


(Pedro José Pereira Vieito) #10

I think having two structs one wrapping the other could be a good idea but maybe that naming is a bit NIO-centric, what about:

  • PostgresConnection: the standard Swift API wrapper
  • NIOPostgresConnection: the low-level NIO API (or even, nested PostgresConnection.NIO?)

(Johannes Weiss) #11

I think explicitly managing the lifecycle for scarce resources (ie. everything that isn't memory/CPU which are done automatically by ARC/the OS) is quite important and I would introduce it early in an API design. I don't think it's unreasonable for users to terminate Postgres. Therefore I would argue that in the NIO world, we should still have users call the terminate() method even if it might not do something.
In the future, we might want to implement some connection pooling and then the terminate() method could terminate the connection pool.

That's a fair point. We could in theory solve that with a module which holds the non-NIO extensions maybe?


(Johannes Weiss) #12

Extra packages put (I think) too much burden on the package maintainers but a separate module might do it.

Maybe we could have a NIOPostgres module which gives you all the NIO APIs only and another module Postgres which does @_exported import NIOPostgres and therefore would give you both styles of API. Then most people would just import Postgres and get both but if you really only want the NIO stuff, you'd do import NIOPostgres.


(Thomas Krajacic) #13

Absolutely! That's a lot better. I completely forgot about @_exported!