How do connection pools handle broken connections?

crontab · May 3, 2022, 12:33pm

Looking at Async-Kit, as well as MySQL-Kit and similar frameworks, I'm trying to understand how connection pools handle connections that are broken without the client's knowledge - I'm not sure if there's a specific term for this.

Let's say my MySQL server drops all connections e.g. due to a sudden reboot, or just one of the connections is dropped due to a timeout, and as a result of that the SwiftNIO side has one or more broken connections in the pool but doesn't know about it yet. A client then pulls the next connection out of the pool, tries to talk to the server and gets an error.

This is not a very pleasant situation for my SwiftNIO server to have: I should either handle possible errors at every step where I communicate with my DB server (and get another connection out of the pool?), or I should bail and return an error to my client app. Neither is a very nice thing to do.

So how are connections kept alive in a connection pool? Or is there a common approach to this problem?

lukasa · May 3, 2022, 3:14pm

You've hit on one of the core issues with connection pools! The important insight to have here is that a connection pool is incapable of solving this problem with full generality. In particular, as there is always some delay between a connection being "leased" from a pool and being used, it's possible for a connection to be valid while it was in the pool but become invalid between the point of lease and the point of use. This makes solving the problem perfectly impossible.

The best approach, then, is to support retries. Essentially, be willing to have a connection pool vend you a silently-broken connection and try again. This requires care in your programming as you need to know that you safely can retry, but it's the best possible option.

Other options are to take advantage of protocol-level keepalives to try to keep connections live. This is useful in cases where you have a very small number of connections you know you want, e.g. to a database server, but it's less useful in things like HTTP connection pools where you may have connections to a wide range of servers that will never be used again.

crontab · May 3, 2022, 4:52pm

Got it, thanks!

In that case, couldn't the connection pool improve the chances of handing a "good" resource by trying it right before it's requested? Let's say a concrete ConnectionPoolSource provides an additional method for pinging the server, some minimal/empty request to at least ensure the socket hasn't timed out, e.g.:

public protocol ConnectionPoolSource {
    // ...
    func ping(connection: Connection) -> EventLoopFuture<Void>
}

Which of course means some overhead, but this might work well for wire protocols where automatic keep-alive is not defined. (In fact I'm curious to know if MySQL and PostgreSQL have any kind of keep-alive, or if not how they time out exactly.)

Of course things can happen, for example the DB server may go down while some request handler is preparing a HTTP response on the SwiftNIO side. In which case I'd say it's not the request handler's job to wait and retry - it may potentially make the code impenetrably complex. Just tell the client there was a DB hiccup, it's fine.

But at least the trivial timeout situations would be ruled out, don't you think?

lukasa · May 4, 2022, 8:11am

Yes, though the latency increase of doing this is a bit painful. The more traditional pattern is regular keepalives during the lifetime of the pool.

The reason this is not done with DBs is very simple: it scales the load on the database. Database connections are usually cheap to recreate and they tend not to silently fail (as they usually flow within a DC) so it's often best to let those fall idle.