Capture packets using man-in-the-middle (MITM) proxy

@ceepee i need your help for capture packets. can you share how you done this?

Can you explain how you decompress data here? in swiftnio.
Thanks your time.

You can check the Content-Encoding response header to decide which decoder to use (brotli/gzip etc)

Can you show me what's your InboundIn, InboundOut , OutboundIn , OutboundOut in glue handler.
Thanks

Sure:
InboundIn = HTTPServerRequestPart
InboundOut = HTTPClientRequestPart
OutboundIn = HTTPClientResponsePart
OutboundOut = HTTPServerResponsePart

What I do is create a pair of GlueHandler, one for client (iOS <> Proxy) and one for server (Proxy <> Remote server)

Something like:

class GlueHandler {
 var isServer: Bool
 var peer: GlueHandler

 class func createPair(endpoint: HTTPRequestHead) -> (GlueHandler, GlueHandler) -> (GlueHandler, GlueHandler) {
  let server = GlueHandler(server: true, endpoint: endpoint)
  let client = GlueHandler(server: false, endpoint: endpoint)

  server.peer = client
  client.peer = server
 
  return (server, client)
 }

}

Then you can add it in your ConnectHandler like:

  let (server, client) = GlueHandler. createPair(endpoint: self.endpoint)
  context.channel.pipeline.addHandler(server, name: "server-glue")
  .and(peerChannel.pipeline.addHandler(client, name: "client-glue")).whenComplete { result in
       let _ = context.channel.pipeline.addHandler(ErrorHandler())
   }

Now in your GlueHandler's channelRead you can decide how to decode the data:

extension GlueHandler: ChannelDuplexHandler {
    func channelRead(context: ChannelHandlerContext, data: NIOAny) {
         
     if self.server {
       switch self.unwrapInboundIn(data) {
         .... // handle other cases like head etc
         case .body(let body):  {
         print("Server received: ", data)
         self.peer?.partnerWrite(self.wrapInboundOut(.body(.byteBuffer(body))))
       }
      }
     } else {
        switch self.unwrapOutboundIn(data) {
           ....
          case .body(let body): {
            print("Client received: ", data)
            self.peer?.partnerWrite(self.wrapOutboundOut(.body(.byteBuffer(body))))
         }
       }
     }
   }
}

In your channelRead function above you can use an instance var in your GlueHandler to store the headers and body so that in channelReadComplete you can use the Content-Type to decide how to decode the body.

This is my new GlueHanlder

final class GlueHandler {
    private var partner: GlueHandler?

    private var context: ChannelHandlerContext?

    private var pendingRead: Bool = false
    private var isServer = false
    private var endPoint: HTTPRequestHead
    
    private var clientBuffer = [ByteBuffer]()
    private var serverBuffer = [ByteBuffer]()
    
    private init(isServer: Bool, endPoint: HTTPRequestHead) {
        self.isServer = isServer
        self.endPoint = endPoint
    }
    class func matchedPair(endpoint: HTTPRequestHead) -> (GlueHandler, GlueHandler) {
        let server = GlueHandler(isServer: true, endPoint: endpoint)
        let client = GlueHandler(isServer: false, endPoint: endpoint)
        
        server.partner = client
        client.partner = server
        
        return (server, client)
    }
}
extension GlueHandler: ChannelInboundHandler {
    typealias InboundIn = HTTPServerRequestPart
    typealias InboundOut = HTTPClientRequestPart
    typealias OutboundIn = HTTPClientResponsePart
    typealias OutboundOut = HTTPServerResponsePart
    
    // channelRead(context: ChannelHandlerContext, data: NIOAny)
    func channelRead(context: ChannelHandlerContext, data: NIOAny) {
        if self.isServer {
            switch self.unwrapInboundIn(data) {
            case .body(let body):
                vpnClientMessage(message: "Server received: \(body)")
                self.partner?.partnerWrite(self.wrapInboundOut(.body(.byteBuffer(body))))
            case .head(let head):
                vpnClientMessage(message: "Server received: \(head)")
            case .end(let end):
                vpnClientMessage(message: "Server received: \(String(describing: end))")
            }
        } else {
            switch self.unwrapOutboundIn(data) {
            case .body(let body):
                vpnClientMessage(message: "Client received: \(body)")
                self.partner?.partnerWrite(self.wrapInboundOut(.body(.byteBuffer(body))))
            case .head(let head):
                vpnClientMessage(message: "Client received: \(head)")
            case .end(let end):
                vpnClientMessage(message: "Client received: \(String(describing: end))")
            }
        }
    }
}

and my Client Bootstrap look's like this in ConnectHandler:

let channelFuture = ClientBootstrap(group: context.eventLoop)
            .connect(host: String(host), port: port)

and getting the error

NIO-ELT-0-#1 (6): Fatal error: tried to decode as type HTTPPart<HTTPRequestHead, ByteBuffer> but found IOData with contents ioData(IOData { ByteBuffer { readerIndex: 0, writerIndex: 517, readableBytes: 517, capacity: 2048, storageCapacity: 2048, slice: _ByteBufferSlice { 0..<2048 }, storage: 0x0000000108025200 (2048 bytes) } })

Which line of code are you getting this error?

In self.unwrapInboundIn(data) in channelRead in Gluehandler switch case

My local server pipeline is:
& the pipeline to connect to remote server is:

are you creating 2 server's in packet tunnel?
or 1 in packetTunnel and other in ConnectHandler or are they same?
do me they don't like to be same.

Hi @ceepee is it possible to share your project, if not, is there a way I can get in touch with you, I also want to do the same thing

@lukasa I've read all your stuff about Swift NIO on the forums that I could find, especially in regards to the Connect Proxy (and am a huge fan). Like others, I'm still trying to wrap my head around it, (and pretty dumb with networking overall) so I was hoping you could maybe explain the state machine implementation that is using upgradeState? I'm confused at the point where we expect to see awaitingConnection in channelRead in the ConnectHandler because it says //we've seen .end, this must not be HTTP anymore and yet, in the connectSucceeded function it says //upgrade complete. Can you please shed some light on how the values of upgradeState will affect the connection and what times they will be read? Would be really grateful.
Also, does it make a difference if I add addHTTPClientHandlers to the connectTo method's ClientBootstrap?

Sure! Considering only the success path for a moment, the state machine graph looks like this:

We enter beganConnecting when we read the first .head message. The upgrade is complete when two things happen: we have to have received the .end associated with that first message, and the TCP connection has to succeed.

That leads to this diamond pattern in the state machine. If we channelRead .end before the connect succeeds, we enter awaitingConnection. If the connect succeeds before we read .end, we enter awaitingEnd. In each case, we are waiting for the other event to happen.

Thus, to your detailed question:

When we're in awaitingConnection we've read the .end of the HTTP message that triggered our upgrade. Because this is a CONNECT request, any subsequent bytes are presumably part of the TCP stream the client wants to send to the other server. We don't know what format those are, so we can't read them! Hence "this must not be HTTP anymore". But we can't leave the channel pipeline yet, because we're still waiting for the TCP connection to succeed.

In connectSucceed, we move to upgrade complete because the thing we were waiting for has happened!

It sure does, you shouldn't do that.

CONNECT requests that the proxy set up a TCP connection to another peer. What happens next is not well defined: the client might try to do HTTP, or it might try to do TLS, or SMTP, or any number of other things. You can't safely add HTTP handlers there because you don't know what that next protocol would be.

That's an amazing answer + diagram! Thank you so much!
I don't want to ask more, feels like I'm imposing but about the addHTTPClientHandlers I forgot to mention, for my use case, I am only going to service HTTPS requests through this proxy. All others should fail.

I tried adding NIOSSLServerHandler in the glue method right before the localGlue and peerGlue are joined. I found that if I don't have the addHTTPClientHandlers, the partnerWrite in the GlueHandler somehow seems to be writing to a NIOSSL handler. And then it crashes because that handler apparently expected ByteBuffer, not IOData. I have the backtrace to prove it. That's why I asked about it.
Sorry for the rain of questions!
I think the thing that confuses me most after the state machine is just what the pipeline looks like before and after the glue method glues the connection and removes the current extraneous(?) ConnectHandler. In the ConnectHandler channelRead, the .awaitingEnd calls glue itself, and glue is also called in the connectSucceeded. Will both these events happen at a given request?

Again, sorry to keep asking!

Also, I read somewhere else it was written that the CONNECT proxy only works for HTTPS requests?

For CONNECT, that's moot: you can't see what happens next. A HTTPS request running through a CONNECT proxy is invisible to that proxy.

It helps a bit to understand what is happening, with CONNECT. A client sends a request like this:

CONNECT example.com:443 HTTP/1.1
Host: example.com:443

This asks the server to establish a TCP connection to example.com on port 443. When that connection is established, the server responds with:

200 OK
Server: myserver

At this point, this connection can no longer be used for HTTP. When the server sends its 200 response, it is telling the client that from this point forward the server is no longer processing the bytes the client sends it. Instead, it is forwarding them on to example.com. In essence, the server has become a dumb pipe, just forwarding bytes back and forth.

This behaviour is required by the HTTP specifications. Quoting RFC 9110 (HTTP Semantics) § 9.3.6 (Connect):

The CONNECT method requests that the recipient establish a tunnel to the destination origin server identified by the request target and, if successful, thereafter restrict its behavior to blind forwarding of data, in both directions, until the tunnel is closed.

If you're only serving HTTPS requests through the proxy, what will happen next is that the client will send a TLS CLIENT_HELLO message, beginning a TLS handshake. This handshake is (by design) resistant to being intercepted. You cannot see what is going on inside that TLS tunnel, and so will not be able to decode any of the data, unless you are attempting to build a man-in-the-middle proxy.

No, glue will only be called once.

Before, we have a standard plaintext HTTP server pipeline followed by the ConnectHandler:

┌────────────────────────┐  ┌───────────────────────┐ ┌───────────────────────┐
│                        │  │                       │ │                       │
│  HTTP Request Decoder  │  │ HTTP Response Encoder │ │    Connect Handler    │
│                        │  │                       │ │                       │
└────────────────────────┘  └───────────────────────┘ └───────────────────────┘

After, all of the HTTP handlers are removed, and we have two Channel's that have been "glued" together by use of a pair of GlueHandlers:

                               │

┌───────────────────────┐      │      ┌───────────────────────┐
│                       │             │                       │
│     Glue Handler      │      │      │     Glue Handler      │
│                       │             │                       │
└───────────────────────┘      │      └───────────────────────┘

                               │

Thank you, that helps a lot! And yes, I am attempting to build a man-in-the-middle proxy for debugging purposes, so that's why I need NIOSSL to take care of the decryption. I have to see what's happening inside the tunnel, like others in this forum. Any advice you might have for that? For now, my best effort was to add SSL handlers in the glue and connectTo methods, same as the others. And I have a choppy certs-on-the-fly mechanism in place too. RootCAs added and everything. uncleanShutdown is something I'm seeing though.

Thank you so much for all your help so far!

If you've got the root CA stuff sorted out, then your goal is to add a NIOSSLServerHandler to the Channel that contained the ConnectHandler, and a NIOSSLClientHandler to the newly created channel. You can do that in func glue by changing:

context.channel.pipeline.addHandler(localGlue).and(peerChannel.pipeline.addHandler(peerGlue))

To

context.channel.pipeline.addHandlers([
    NIOSSLServerHandler()
    localGlue
]).and(peerChannel.pipeline.addHandlers([
    NIOSSLClientHandler()
    peerGlue,
]))

Notice that the glue handlers are always last: that's where they need to be to get things to line up appropriately.

This should get you MITM. You can then insert your debugging handlers between the NIOSSLServerHandler and localGlue, as needed.

1 Like

Thanks a tremendous bunch! This helps so much!

I was trying to use EmbeddedChannel to write unit tests for this code, especially the ConnectHandler, but I don't think that's possible with the way it is written now? Any suggestions? I believe network utilization or relying on external servers is bad practice but can we write unit tests for the Connect-Proxy without it?

Yeah, you'd need to factor out the bootstrap logic to have a generalised connection factory, such that in tests you can mock it out with something that returns an EmbeddedChannel.

I see. I thought it could be done as it was and I might just be missing something. Back to the drawing board...
Thanks for your help, all your materials are so great!