How to help fortify swift-nio against DoS?

taylorswift · October 22, 2023, 3:06am

with much thanks to @FranzBusch and @Jon_Shier , swiftinit was able to survive another DDoS attack attempted around 11 UTC today:

i credit this to 1) dropping HTTP requests when the queue has a backlog and 2) outsourcing a lot of the less-dynamic endpoints to Cloudfront.

i don’t know if anyone (besides maybe whoever is conducting these attacks) is following these threads that closely, but for those who are, you might notice that network throughput is going up instead of down during the initial phases of the attacks, which is different from how they progressed in the past.

still, there seems to be a tipping point beyond which the NIO server can’t keep up and the network congestion prevents it from even chipping away at the backlog since it can no longer communicate with mongod. and the host again spirals into a 100% CPU doom loop.

i suspect the bottleneck is now in SwiftNIO and it would be great if i could get more visibility into what is actually getting clogged in the server. (are we creating too many EventLoopPromises? reading too much data from channelRead?) but it’s almost impossible to export any sort of metrics while an attack is ongoing, since logging happens over the network, and the network is congested. and you can’t really know if your logging system is working under these sorts of conditions until 12–24 hours passes and the next attack comes.

the first step to patching whatever vulnerabilities remain would be to figure out what part of the NIO channel is under stress. but i have little experience collecting the kinds of perf traces we need under the conditions where it matters in order to investigate this. so guidance would be greatly appreciated.

maartene · October 22, 2023, 6:48pm

Very interesting question!

Would it be possible to simulate this behavior in a controlled environment? In some form of load/stress test?

KR Maarten

tera · October 22, 2023, 7:32pm

Would it be possible to do a custom local logging, or at least postpone writing logs over network? E.g. allocate a big, say 20MB memory buffer, write detailed information about what's happening in it, when you reach the end of the buffer wrap around (i.e. use it in a ring buffer manner, and once you detect the attack or other critical condition - put that buffer aside, and either write its contents to a local file, or wait and write it over network when it becomes possible.

taylorswift · October 22, 2023, 9:01pm

we have been pondering this as well, ideally we would have a lab cluster built to simulate a DDoS attack and profile the server performance under controlled circumstances. but this is quite expensive to set up and we would like to exhaust all of our options for mocking an attack locally before renting a cluster for research.

one problem we have right now is that we don’t have a handy way to detect if an attack is even taking place - from the server’s perspective everything is functioning normally (it is still receiving requests and sending responses over the network), it is only from the perspective of the outside world that the server appears unresponsive. this is essentially the same “backpressure” problem discussed in the other thread.

swift-nio-http2 gives us the tools to apply backpressure at the stream level, but not at the connection level.

by the way, for those keeping score, swiftinit repelled three more attacks overnight!

based on aggregate logs, it appears the attacker impersonated a whitelisted bingbot user-agent during the first wave, and human-like user-agents during the second two waves:

tera · October 22, 2023, 11:42pm

Can't you check if CPU usage / packets sent in/out are over a certain threshold? e.g. average CPU usage over a ten second period is > 20%. Perhaps even undo the other mitigations for this condition to work reliably.

taylorswift · October 22, 2023, 11:48pm

it’s the communication with the NIO channel handlers that’s the tricky part. the channel handlers live in their own hermetic EventLoop world where you can’t make blocking system calls or await on async functions.

indeed one of the most frustrating things about swift-nio is that the channel handlers have no idea what is happening in the outside world because they can never block on anything.

taylorswift · October 23, 2023, 3:02am

naturally, i was solving the wrong problem. as it turns out, at some point in the past few upgrade cycles, print started fully-buffering its output to journalctl. previously it was line-buffering. adding an fflush makes the logs reappear. it is apparently a known issue. hopefully i will get some useful log output during the next attack.

maartene · October 23, 2023, 7:45am

I assume you are talking about swiftinit.org ? Can I assume this is a non-profit service to the Swift community as a whole? In that case I can imagine that budgeting for a lab environment can be challenging.

sspringer · October 23, 2023, 8:05am

I would put some specialized software and maybe even hardware in front of my server applications. This prevents the actual applications from having to deal with DoS attacks and leaves it to tools that specialize in this area.

lukasa · October 23, 2023, 11:13am

This isn't true: you can absolutely delay inbound connection acceptance. You do this by putting channel handlers in the server channel, which can be done using serverChannelInitializer.

Exerting backpressure here the regular NIO way (delaying read) will slow down connection acceptance. This will also work with NIOAsyncChannel by delaying reading new connections from the server connection.

johannesweiss · October 23, 2023, 11:55am

Exactly and just to add a little colour because that's not widely known: The server channel reads the Channels that you accept. So with HTTP/2 you'll have three layers of channel:

layer 1: the server channel. It reads the accepted connections (i.e. its reads are Channel objects)
layer 2: the TCP stream connection channels (one per TCP connection). They read the bytes that come from the remote peer (ByteBuffer) which then get decoded by the HTTP/2 handlers
layer 3: the HTTP/2 stream channels (one per HTTP/2 stream and per TCP connection): They read the HTTP/2 frames (HTTP2Frame).

That hierarchy is also represented using the Channel.parent property. Layer 3's .parent will be the layer 2 channel. And layer 2's .parent will be the layer 1.

All of them are back-pressured and in every case you delay outbound read() calls from happening to put on the brakes and exert backpressure. If you wanted to not put on the brakes on everybody but filter (say) by IP address and only ignore those that do more than 100 connections per second you can do that by closing the accepted channel straight in the server channel.

FWIW, if you exert backpressure on layer 1 (the server channel) then you'll fill up the operating system's listen backlog. And when that's full the OS will stop accepting new connections. The size of the listen backlog is configured using the ChannelOptions.backlog option on the server channel (.serverChannelOption(ChannelOptions.backlog, value: 256)). The OSes don't guarantee to follow your value exactly, treat it more like a suggestion. But just like SwiftNIO, the OSes will only have buffers of a fixed size. So whilst you might not be able to control the exact limit (in precise bytes/connections), the key bit of information is that everything is backpressured and limited in the OSes as well as SwiftNIO. So you will be able to build a system that is resilient to attacks that are trying to exhaust your resources.

Finally @taylorswift thank you so much for asking these questions on the forums where everybody can see them. A lot of times they get asked in DMs in various places. This kind of information should be widely available.

t089 · October 23, 2023, 12:59pm

Indeed!

taylorswift · October 23, 2023, 8:46pm

yes that’s it! the site operates on a shoestring budget; people often assume it is a bigger operation than it really is based on the number of pages it serves.

eventually my vision for the site is for it to become ad-supported, at least in part, perhaps with a “supporter” option to opt-out of ads, similar to MDN but for the swift community. but we can’t really place ads while the site is still experiencing chronic reliability issues due to these attacks.

er, is there an example i can follow? i have a feeling what i am imagining (only one person being able to connect to the site at any given time) is not what you are suggesting.

can you explain what “delay outbound read” means? this seems to contradict the “thou shall not block the event loop” principle.

by the way, what is read? channels have write, and the channel handlers have channelRead, but i have never heard of read by itself.

this might be worth a separate thread, but how can a channel handler access some shared state like an IP rate limit table? the channel handlers live in different concurrency domains, and they cannot await on any shared state.

johannesweiss · October 23, 2023, 10:32pm

can you explain what “delay outbound read” means? this seems to contradict the “thou shall not block the event loop” principle.

Yes, don't block the event loop, ever. I meant delay as in "scheduling later".

What I meant by "delaying the outbound read" is that you'd have an OutboundChannelHandler which will not immediately forward the read event. Forwarding read means calling context.read(). If you don't implement read at all you get the default implementation which does forward immediately. Instead of forwarding immediately, it would instead figure out if the system has too much load (say > 10k connections) and if so, stop forwarding the read until we're in a safe territory again.

Here are two explanations of back pressure in NIO:

Both of the explainers explain how to exert backpressure on regular TCP channels (where you read ByteBuffers) but the concepts are exactly the same for the server channels which read accepted connections. Essentially, delay the outbound read call to when you're ready.

by the way, what is read ? channels have write , and the channel handlers have channelRead , but i have never heard of read by itself.

Ah, for NIO to write bytes to the network you trigger the outbound (meaning towards the network) write/flush operations that you know. Similarly, to make NIO read some bytes you need to trigger the outbound read operation. That will make NIO read a fixed amount of bytes from the network when they arrive. And once those bytes have been read, NIO will then call channelRead to hang the read data to you (in a server channel the read "data" would be accepted connection (aka a new Channel)).

Now, why have you now come across the outbound read before? By default, NIO enables the autoRead property which means that NIO will trigger one read to start with and whenever channelReadComplete has been triggered, it automatically fires another read. So if you don't have any outbound/duplex channel handlers that implement read and delay it, you'll essentially always read.

But this is also covered by the explainers linked above.

A while ago I also created this diagram which might be helpful:

this might be worth a separate thread, but how can a channel handler access some shared state like an IP rate limit table? the channel handlers live in different concurrency domains, and they cannot await on any shared state.

That is a good question but the answer is very boring: If I were you, I'd stick with the default which means you accept from one event loop. That means you have just one server channel which "reads" (accepts) all the accepted TCP connections. So you can pop a single channel handler in the server channel which regulates the acceptance of incoming TCP connections (those are the Channels you already know). That way, you're in one concurrency domain where you'd need your IP tables, you'd even be in one single ChannelHandler instance. Easy. But even if you wanted to switch to accepting connections from multiple EventLoops at the same time (which NIO supports) this wouldn't be an issue, you'd need to arrange for synchronisation using locks/atomics/...

Examples that might help:

NIO's ServerQuiescingHelper demonstrates how to user a serverChannelInitialiser as well as capturing a bunch of state. It helps to quiesce a server waiting for all in-flight requests to terminate to do a restart without losing connections. Usage example
The backpressure-file-io example project which goes through great lengths to explain how you could (without any helpers) create a fully backpressured file upload. From NIO into the file system. It explains NIO backpressure as well as using a state machine for this kind of work.
A github search for func read(context: ChannelHandlerContext) brings up a bunch. Particularly NIO's SSH and HTTP/2 implementations are probably interesting because they also handle multiplexing

taylorswift · October 23, 2023, 11:43pm

thanks, that explains read, it never occurred to me that read is a noun that propagates from the back of the channel to the front, like a write.

still, i am confused as to how this generalizes to reads that return any Channels themselves.

if i add a channel handler to the “tail” of the root server channel that receives any Channels through channelRead, what does it do with those channels?
what even is the “tail” of a server channel? how do the child channels fit into this picture? if the last channel handler in the root pipeline reads channels, then i assume it must broadcast those channels to… the child channels themselves? that doesn’t make any sense! i do not understand this forking/splitting concept, i have only seen examples where channel handlers have at most one channel handler directly after them.

to use an analogy, the model i was taught is that a channel pipeline is like a train. each car in the train has a car before it and a car after it, but the engine doesn’t have a car before it, and the caboose doesn’t have a car after it. things can percolate from the front of the train to the back or from the back of the train to the front, and each car in the train can buffer, reorder, drop, forward etc. things it intercepts.

this mental model doesn’t make sense for trains that branch into sub-trains, or trains that produce child trains, and i don’t know what a more correct model would be.

that’s not what i’m asking. something like a rate limit table gets written-to from many different connections, which have channel handlers that live in other concurrency domains.

for example, i might have an HTTP/2 stream handler that increments the table like:

if  case HTTPPart.head = self.unwrapInboundIn(data)
{
    sharedTable[self.remoteIP] += 1
}

and then i might have a root channel-level handler that reads the table and rejects the connection if the associated IP has made more than N requests.

but how would it write to sharedTable, if the table is read from a different channel handler?

taylorswift · October 24, 2023, 1:23am

okay, so i have no idea if i did this right, but it doesn’t seem to crash when i test it, so here’s a really basic server channel handler that tries to limit the number of active connections to 50:

handler:

extension HTTP
{
    final
    class ServerConnectionHandler
    {
        private
        var connections:Int

        init()
        {
            self.connections = 0
        }
    }
}
extension HTTP.ServerConnectionHandler
{
    private static
    var capacity:Int { 50 }
}

channelRead:

extension HTTP.ServerConnectionHandler:ChannelInboundHandler
{
    typealias InboundOut = any Channel
    typealias InboundIn = any Channel

    func channelRead(context:ChannelHandlerContext, data:NIOAny)
    {
        defer
        {
            context.fireChannelRead(data)
        }

        let channel:any Channel = self.unwrapInboundIn(data)

        //  If we have room for more connections, enqueue another read.
        self.connections += 1
        if  self.connections < Self.capacity
        {
            context.read()
        }

        channel.closeFuture.whenComplete
        {
            _ in

            //  Are we even on the right thread???
            if  context.eventLoop.inEventLoop
            {
                //  We now have room for at least one more connection,
                //  so enqueue a read.
                self.connections -= 1
                context.read()
                return
            }

            context.eventLoop.execute
            {
                self.connections -= 1
                context.read()
            }
        }
    }
}

backpressure:

extension HTTP.ServerConnectionHandler:ChannelOutboundHandler
{
    typealias OutboundIn = Never
    typealias OutboundOut = Never

    func read(context:ChannelHandlerContext)
    {
        //  Don’t bother buffering a pending read if we’re already at capacity,
        //  since the only way a slot can open up is if a connection closes, and
        //  that will trigger a read on its own.
        if  self.connections < Self.capacity
        {
            context.read()
        }
    }
}

i added this to the ServerBootstrap:

    .serverChannelOption(ChannelOptions.autoRead, value: false)
    .serverChannelInitializer
{
    $0.pipeline.addHandler(HTTP.ServerConnectionHandler.init())
}

and apparently i also had to manually kickstart the server channel with:

let channel:any Channel = try await bootstrap.bind(
    host: binding.address,
    port: binding.port).get()

channel.read()

it seemed to work as expected when i set capacity { 1 }.

i still don’t know why context.fireChannelRead(data) is necessary, or who could possibly be listening for it, since this is the last handler in the server channel pipeline.

lukasa · October 24, 2023, 10:20am

It might feel that way, but it isn't. NIO has two "secret" channel handlers: the HeadChannelHandler and the TailChannelHandler. These two handlers bridge the ChannelPipeline back into the Channel object: they capture the events that fall "off the end" of the pipeline and pass them into the channel. The Head handler is always first, the Tail handler is always last.

In this instance, for the ServerSocketChannel (which is the server channel in this case), when channelRead gets to the tail channel handler it is passed, eventually, to ServerSocketChannel.channelRead0. This ultimately activates the channel.

t089 · October 24, 2023, 10:22am

Oh, TIL I should not just drop events even if I am the "last" channel handler.

lukasa · October 24, 2023, 6:57pm

In general we don't do interesting things with these events on the tail channel handler: ServerSocketChannel is the only real counter-example. I increasingly think that was a wart we should have removed, because as @taylorswift has noticed this behaviour is quite confusing. A better approach might have been having a specific channel handler doing this child channel dance, as we do in the multiplexed protocols.

FranzBusch · October 24, 2023, 7:37pm

Apologies for not having seen your response in the other thread. Cory and Johannes have provided great insights here already that we should at best distill into some documentation somewhere. To add to their great insights, I want to expand on one of the examples I gave you in the other thread where I suggested you use the new async bootstrap APIs that we offer.

try await withThrowingDiscardingTaskGroup { group in
    for try await connectionChannel in serverChannel.inboundStream {
        group.addTask {
            do {
                for try await inboundData in connectionChannel.inboundStream {
                    // Let's echo back all inbound data
                    try await connectionChannel.outboundWriter.write(inboundData)
                }
            } catch {
                // Handle errors
            }
        }
    }
}

In the above example you can do the same as you did in your ChannelHandler that you added in the server channel initializer but in Swift Concurrency. You can use a width constraints task group to achieve this.

try await withThrowingTaskGroup { group in
    var iterator = serverChannel.inboundStream.makeAs
    for _ in 0...10 {
        let connectionChannel = try await iterator.next()
        group.addTask {
            await handleConnection(connectionChannel)
        }
    }

   await group.next()
   let connectionChannel = try await iterator.next()
   group.addTask {
       await handleConnection(connectionChannel)
   }
}

func handleConnection(_ connectionChannel: NIOAsyncChannel<ByteBuffer, ByteBuffer>) async { ... }

The above code is now limiting the task group to 10 concurrent child tasks and whenever one child task finishes it pops off the next connection and handles it in a child task. There is still a couple of more connections accepted and buffered. Those can be customised in the bind methods serverBackPressureStrategy and the NIO backlog.