Intercepting and editing data of HTTP requests

I built a proxy app that uses a HTTPS MITM proxy built using SwiftNIO to log HTTP traffic.

I've been thinking of allowing the user to intercept (i.e. catch and edit) HTTP requests and responses as they are transmitted through the proxy, but am really not sure how to do this.

I began writing a ChannelHandler that collects the HTTPRequestHead, ByteBuffer and HTTPHeaders trailer, then presents them in the app UI and awaits edited data to come back. I can probably block read events while a message has been intercepted.

But it would only allow editing the structured HTTP message contents, whereas I really would like the user to be able to "break" the HTTP message format if needed. That is, whatever changes the user made to the intercepted message data would get sent as is to the network connection. The user would be presented a text view or hex editor to freely edit the message data.

Now, if I add a ChannelHandler to the pipeline after HTTPRequestEncoder, I would get unstructured data and could write different data back in after the user's edits. But in this case, how do I know where one HTTP request ends and the next begins? Any suggestions how to implement this?

1 Like

i’m not sure what you mean by this. do you mean dropping down to the TCP level?

welcome to the fascinating world of network protocol specifications! i’ve never implemented HTTP (as there is already a SwiftNIO module for that), but i’ve implemented other protocols. this might be helpful

Say the user wanted to test for HTTP parsing issues in the remote server component. Inserting double headers, unexpected white space, etc.

So I’d like whatever bytes the user edited to the intercepted message to get sent out, without additional HTTP encoding steps in between, which would most probably just normalize the data or discard it without sending.

Yes, I guess I really would need to keep track of HTTP message lengths, if I want to specifically bypass SwiftNIO’s HTTP message parsing after my interception step. I’m just trying to make sure there’s no easier way to do this that I just haven’t found out yet.

1 Like

you would probably have to do this at the TCP level. i don’t know if the default NIOHTTP pipeline is one handler or a sequence of handlers (i’ve never looked closely enough), but if it’s a sequence of handlers you might be able to still reuse some of the earlier stages.

for what it’s worth, some protocol library authors spend a lot of time testing these exact conditions, and hating themselves for spending so much time on something that 99.9% of clients will never rely on, and will feel thankful that their work did not go to waste. some others might hate you. a few might be so disillusioned with open-sourcing their work that they might give you some tips on how to hack a SwiftNIO-based server :)

My thinking is to offer the flexibility to use this tool for testing server endpoints for security issues, in addition to more typical HTTP related work.

Sounds like you're looking for fuzzing.

IIUC, you typically wouldn't do that over the network - you'd create a small application which consumes bytes as if they came from the network, and process them as normal. Fuzzing takes a long time; it's very compute intensive, and needs to be done continuously as the code changes. It's also much more effective if you build with instrumentation so it can be guided by code coverage.

NIO is fuzzed by Google's OSS-Fuzz project. You can see the small application it uses here.

But the App looks very cool! Congrats.

Thanks for the suggestion! :slightly_smiling_face:

But I’m not so much looking to do full on fuzzing. More a manual tool, perhaps for lightweight pentesting etc. The primary purpose would be web service API testing. I’ve already got the editor view working, which lets you freely edit HTTP requests and shoot them over network connections to the server. And I’d like to have the same free form editing experience for intercepted messages, modifying them on the fly.

My current plan is to convert the SwiftNIO HTTP message parts to Data, present that to the user, and convert the edited Data back to message parts for SwiftNIO. Of course this encode-back-to-HTTP step can fail, but maybe I need to live with that for now.

And just to be clear, I’m not looking to fuzz or test SwiftNIO itself in any way. It serves as a HTTP proxy for my tool, and SwiftNIO has proven to be very fast and powerful for that purpose :+1:

To allow the user to free-form edit the (whole) request means that we can -- for this project -- assume that you can collect a whole request in memory (ie. no streaming). After the edit, it'll be sent off in one go. Is my assumption correct?

If correct, I would suggest the following: Make use of the fact that NIO is modular: NIOHTTP1 is in now way more "special" than other code, it just uses the public APIs of the NIO module (which is actually a collection of modules but that's unnecessary detail). Just copy NIO's HTTPRequestEncoder into your own project and call it MyProjectOneShotHTTPRequestEncoder or something. In that handler instead of sending of each bit separately as its received through the pipeline, accumulate it all into a ByteBuffer. Once you receive the .end(...), write that into the ByteBuffer and then send the ByteBuffer in one chunk.

Once you did that, the next handler in the pipeline can assume to get exactly one ByteBuffer per request (so it's still framed like HTTP). You can then surface that data in the UI and once the user's done editing, you just pass the edited ByteBuffer off.

Thanks so much for the reply @johannesweiss !

Yes, while the app does have an iOS client too, I'm planning to implement interception in the Mac app only, so memory usage should not be an issue.

I was already looking at HTTPRequestEncoder's implementation, but thought there must be some magic involved in it, which I cannot replicate.

Your idea sounds excellent, I'll try that right away! I can easily replace NIO's HTTPRequestEncoder with my modified implementation as I add it to the channel :+1:

Memory is one consideration. The other consideration is that HTTP can do full bi-directional streaming and each direction may take an arbitrary amount of time (potentially forever) to finish. Like, the client could continuously be sending, say, a chunk of JSON indicating say its current location (and a \n or so to allow framing). So you may want to consider to only allow the user to use this feature if you have a content-length and it's of reasonable size (say less than 5 MB or so). Because when continuously streaming something like say the current location, the client can't know the content-length ahead of time so will choose transfer-encoding: chunked.

Not at all. It's just one fine you have to copy & edit. Btw, this is true for anything that's not in NIOCore, NIOTransportServices, NIOPosix and maybe one or two other modules I'm forgetting about. Most NIO stuff is just using the very same APIs that any other NIO user is using.
IMHO, that's one of the great strengths of NIO: You can do anything that NIO's HTTP, HTTP/2, SSH, WebSocket, SOCKS, ... implementations can do (it's just must work for you).

That's exactly the idea. NIO's default implementations still try to be usable for almost any use-case but there will always be some stuff that's outside of what can reasonably supported. And I think your use-case is one of those: NIO's default implementations need to cover the full spectrum of the spec and therefore need to do streaming and must not accumulate a potentially unbounded amount of memory ever. That's fundamentally incompatible with what you want to do. And your use-case is totally relatable. Most HTTP client requests aren't arbitrarily long streams, they're often just a bit of JSON, a protobuf or maybe a small file. In restricted cases there's nothing wrong with accumulating that, letting the user edit that and then sending it off in one go.

If -- for testing purposes -- you wanted to have an example where this would not work, check out this repo: GitHub - weissi/HTTPBidiStreamingExamples , it implements two servers (one pure NIO, one Vapor) and two clients (one on AsyncHTTPClient, one on URLSession) and does bi-directional streaming. Concretely, the client sends 1!, then the server responds with 1!. After the client sees 1! back from the server, it goes to 2!, the server sends the 2! back and the client goes to 3!... until 10!. All in just one HTTP/1.1 transaction (ie. 1 request, 1 response). Your proxy should not allow the user to edit these requests and you can easily notice such requests because (when using HTTP/1.1) they'll have transfer-encoding: chunked set.


Good point, something I had in the back of my mind already but had not decided what to do about yet.

I'll restrict this interception function to only those requests that have a Content-Length header and with some reasonable value (although I can easily allow several megabytes here). It is also easy to make it clear to the user that only non-streaming messages are intercepted for editing :+1:

Btw, this is true for anything that's not in NIOCore , NIOTransportServices , NIOPosix and maybe one or two other modules I'm forgetting about. Most NIO stuff is just using the very same APIs that any other NIO user is using.

I definitely need to study the ChannelHandlers that come with NIO in more detail and learn to customise their behavior.


i really wish a lot more of these techniques were written down somewhere, sometimes it feels like server side swift is entirely oral tradition…