Using swift for a simple TCP proxy?

due to high AWS network load balancer costs, i am interested in setting up my own network load “balancer”, which is really just needs to be a proxy that forwards TCP traffic to EC2 nodes that lack public IP addresses.

the proxy wouldn’t be public-facing, and would only be used to relay SSH traffic.

would SwiftNIO be the right tool for this task, and if so, are there good examples of a SwiftNIO TCP proxy to look at?

Yes it would.

If you're doing a truly simple L4 loadbalancer, what you'll want to do is take the CONNECT proxy example and then remove almost all the code.

The HTTP handlers are no longer needed, and almost all the function of the ConnectHandler can be removed as well. In your channelInitializer you can make an outbound connection to your backend (using a ClientBootstrap) and then use GlueHandler to glue the two pipelines together. If you line your futures up appropriately then all the backpressure will work correctly, and you won't start serving the inbound connection until the outbound connection is ready to go.

4 Likes

i tried to create a simple proxy that just listens on port 8001 and connects to google.com.

import NIOCore
import NIOPosix

@main
enum Main
{
    public static
    func main() async throws
    {
        let executor:MultiThreadedEventLoopGroup = .init(numberOfThreads: 1)
        let bootstrap:ServerBootstrap = .init(group: executor)
            .serverChannelOption(ChannelOptions.socket(.init(SOL_SOCKET), SO_REUSEADDR),
                value: 1)
            .childChannelOption(ChannelOptions.socket(.init(SOL_SOCKET), SO_REUSEADDR),
                value: 1)
            .childChannelInitializer
        {
            (incoming:any Channel) in

            let bridge:(GlueHandler, GlueHandler) = GlueHandler.bridge()

            let bootstrap:ClientBootstrap = .init(group: executor)
                .connectTimeout(.seconds(3))
                .channelInitializer
            {
                $0.pipeline.addHandler(bridge.1)
            }

            let forward:EventLoopFuture<any Channel> = bootstrap.connect(
                host: "google.com",
                port: 443)

            return incoming.pipeline.addHandler(bridge.0)
                .and(forward)
                .map
            {
                _ in
                print("Connected to google.com")
            }
        }

        let channel:any Channel = try await bootstrap.bind(host: "127.0.0.1", port: 8001).get()

        print("Listening...")

        try await channel.closeFuture.get()
    }
}

here is the glue handler, which is basically the same as the one in the example repo:

import NIOCore

final
class GlueHandler
{
    private
    var partner:GlueHandler?

    private
    var context:ChannelHandlerContext?

    private
    var readPending:Bool

    private
    init()
    {
        self.readPending = false
    }
}
extension GlueHandler
{
    static
    func bridge() -> (GlueHandler, GlueHandler)
    {
        let bridge:(GlueHandler, GlueHandler) = (.init(), .init())

        bridge.0.partner = bridge.1
        bridge.1.partner = bridge.0

        return bridge
    }
}
extension GlueHandler
{
    private
    func partnerWrite(_ data:NIOAny)
    {
        self.context?.write(data, promise: nil)
    }

    private
    func partnerFlush()
    {
        self.context?.flush()
    }

    private
    func partnerWriteEOF()
    {
        self.context?.close(mode: .output, promise: nil)
    }

    private
    func partnerCloseFull()
    {
        self.context?.close(promise: nil)
    }

    private
    func partnerBecameWritable()
    {
        if  self.readPending
        {
            self.readPending = false
            self.context?.read()
        }
    }

    private
    var partnerWritable:Bool
    {
        self.context?.channel.isWritable ?? false
    }
}

extension GlueHandler:ChannelDuplexHandler
{
    typealias InboundIn = NIOAny
    typealias OutboundIn = NIOAny
    typealias OutboundOut = NIOAny

    func handlerAdded(context:ChannelHandlerContext)
    {
        self.context = context
    }

    func handlerRemoved(context:ChannelHandlerContext)
    {
        self.context = nil
        self.partner = nil
    }

    func channelRead(context:ChannelHandlerContext, data:NIOAny)
    {
        self.partner?.partnerWrite(data)
    }

    func channelReadComplete(context:ChannelHandlerContext)
    {
        self.partner?.partnerFlush()
    }

    func channelInactive(context:ChannelHandlerContext)
    {
        self.partner?.partnerCloseFull()
    }

    func userInboundEventTriggered(context:ChannelHandlerContext, event:Any)
    {
        if  case ChannelEvent.inputClosed = event
        {
            self.partner?.partnerWriteEOF()
        }
    }

    func errorCaught(context:ChannelHandlerContext, error:any Error)
    {
        self.partner?.partnerCloseFull()
    }

    func channelWritabilityChanged(context:ChannelHandlerContext)
    {
        if  context.channel.isWritable
        {
            self.partner?.partnerBecameWritable()
        }
    }

    func read(context:ChannelHandlerContext)
    {
        if  case true? = self.partner?.partnerWritable
        {
            context.read()
        }
        else
        {
            self.readPending = true
        }
    }
}

however, when i try to run it, and navigate to https://127.0.0.1:8001/ in a browser, the page fails to load, although the server seems to have successfully connected to the proxy target.

Listening...
Connected to google.com

not sure what i am doing wrong here.

i also get a Sendable error when passing one of the GlueHandlers to the channelInitializer closure, and if i configure numberOfThreads to something other than 1, i get a precondition failure in GlueHandler due to self.partner?.partnerWritable being called on the wrong event loop.

Let's start here. Yes, this is by design. GlueHandler, as written, expects the two Channels it glues together to be on the same event loop. Right now you're achieving this by only having one event loop, but you're getting into slightly tricky territory.

The way this is achieved in connect-proxy is to use context.eventLoop when the ClientBootstrap is created, instead of reaching out to a general ELG. This ensures that you will only get connections created on the same EL you're on. As for the Sendable warnings, once you've met that criteria you can use NIOLoopBound to move the value, which will enforce at runtime this invariant and silence the warning.

As for why the connection fails, here you might be running into issues with HTTP/3 existing. If you use something that doesn't do a HTTP/3 negotiation, such as curl, you'll find the connection works just fine, though you're almost immediately redirected to a new hostname due to SNI issues. The existence of HTTP/3 though means that the client finds out about alternative ways to reach the server, including over QUIC, and that can cause an attempt to migrate to those.

i am not sure if this is it. first, i can confirm a very basic setup that uses HTTP only (no HTTPS) works locally:

starting the target on 127.0.0.1:8002:

$ while true; do { echo -e "HTTP/1.1 400 Bad Request\n\n{\"foo\": \"bar\"}"; } | nc -l 8002; done

connecting through the proxy on 127.0.0.1:8001:

$ curl -v 127.0.0.1:8001
*   Trying 127.0.0.1:8001...
* Connected to 127.0.0.1 (127.0.0.1) port 8001
> GET / HTTP/1.1
> Host: 127.0.0.1:8001
> User-Agent: curl/8.3.0
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
* no chunk, no close, no size. Assume close to signal end
< 
{"foo": "bar"}
* Recv failure: Connection reset by peer
* Closing connection
curl: (56) Recv failure: Connection reset by peer

but for some reason i simply cannot load anything from an external site.

let forward:EventLoopFuture<any Channel> = bootstrap.connect(
    host: "google.com",
    port: 443)
$ curl -v https://127.0.0.1:8001
*   Trying 127.0.0.1:8001...
* Connected to 127.0.0.1 (127.0.0.1) port 8001
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
*  CApath: none

i wondered if this was a problem with HTTPS, so i swapped out google.com:443 with example.com:80, but that did not work either. however i am able to connect via raw IPv4 address. could there be a DNS problem here?

Are you adding -k to your curl flags? I have no issues running the exact code you posted here with curl and -k

that wouldn’t explain why i cannot connect to example.com:80 but i can connect to 93.184.216.34:80, which is its current IP address. i added the flag anyway; it had no effect.

How have you installed the proxy? A DNS issue seems quite unlikely, especially as you've found the actual connections do work. Any VPNs or anything else installed?

i have not installed it anywhere, it is running on my local machine in a docker container. i am also running curl from inside the same devcontainer.

there also doesn’t seem to be any obvious issues with TLS. when connecting to the IPv4 address, i get the expected results with/without the -k option. it just does not work when i substitute a domain name for the IPv4 address.

update: so i tried deploying this to an EC2 instance anyway, and it seems to work fine there. so i am just going to chalk this up to some configuration of my ISP.

back to the actual proxy implementation: how should i be breaking the reference cycle between the two GlueHandlers in the event that the setup fails?

handlerRemoved does this automatically.

i know the handlerRemoved implementation does this. i was asking how to best handle the scenario where one of the prerequisite steps fails and never adds the GlueHandlers in the first place.

right now, i am doing this:

{
    (incoming:any Channel) in

    let incomingHandler:GlueHandler
    let outgoingHandler:NIOLoopBound<GlueHandler>

    (incomingHandler, outgoingHandler) = GlueHandler.bridge(on: incoming.eventLoop)

    let bootstrap:ClientBootstrap = .init(group: incoming.eventLoop)
        .connectTimeout(.seconds(3))
        .channelInitializer
    {
        $0.pipeline.addHandler(outgoingHandler.value)
    }

    let host:String = "example.com" 

    let future:EventLoopFuture = incoming.pipeline.addHandler(incomingHandler)
        .and(bootstrap.connect(host: host, port: 443))
        .map
    {
        _ in print("Connected to \(host)")
    }

    //  Break reference cycle.
    future.whenFailure
    {
        _ in outgoingHandler.value.unlink()
    }

    return future
}

I don't see any pre-requisite step that can fail. Unless your and method is very different from NIO's built-in ones, the call to connect will always fire, which will always add the outgoing handler to the pipeline.

so, i’ve had this running on our cluster for about a week now, and it seems to be pretty stable for maintenance workloads. it is saving us a hilarious amount of money, as the IPv4 addresses on AWS actually cost more than the EC2 nodes themselves for many architectures :joy:

of course, once you go IPv6-only, you start noticing all the services you interface with that are still IPv4-only. so i am now interested in using the proxy for some higher-volume, continuous data transfers.

as the proxy is running on the same host that serves real human users, i want to put some safeguards in place so that high-volume downloads don’t displace all other traffic going through that host. ideally, i would like to limit it to some percentage of the total network bandwidth available on the host.

is this something that makes sense? if so, what is the best way to accomplish it?