How to help fortify swift-nio against DoS?

naturally, i was solving the wrong problem. as it turns out, at some point in the past few upgrade cycles, print started fully-buffering its output to journalctl. previously it was line-buffering. adding an fflush makes the logs reappear. it is apparently a known issue. hopefully i will get some useful log output during the next attack.

3 Likes

I assume you are talking about swiftinit.org ? Can I assume this is a non-profit service to the Swift community as a whole? In that case I can imagine that budgeting for a lab environment can be challenging.

I would put some specialized software and maybe even hardware in front of my server applications. This prevents the actual applications from having to deal with DoS attacks and leaves it to tools that specialize in this area.

1 Like

This isn't true: you can absolutely delay inbound connection acceptance. You do this by putting channel handlers in the server channel, which can be done using serverChannelInitializer.

Exerting backpressure here the regular NIO way (delaying read) will slow down connection acceptance. This will also work with NIOAsyncChannel by delaying reading new connections from the server connection.

3 Likes

Exactly and just to add a little colour because that's not widely known: The server channel reads the Channels that you accept. So with HTTP/2 you'll have three layers of channel:

  • layer 1: the server channel. It reads the accepted connections (i.e. its reads are Channel objects)
  • layer 2: the TCP stream connection channels (one per TCP connection). They read the bytes that come from the remote peer (ByteBuffer) which then get decoded by the HTTP/2 handlers
  • layer 3: the HTTP/2 stream channels (one per HTTP/2 stream and per TCP connection): They read the HTTP/2 frames (HTTP2Frame).

That hierarchy is also represented using the Channel.parent property. Layer 3's .parent will be the layer 2 channel. And layer 2's .parent will be the layer 1.

All of them are back-pressured and in every case you delay outbound read() calls from happening to put on the brakes and exert backpressure. If you wanted to not put on the brakes on everybody but filter (say) by IP address and only ignore those that do more than 100 connections per second you can do that by closing the accepted channel straight in the server channel.

FWIW, if you exert backpressure on layer 1 (the server channel) then you'll fill up the operating system's listen backlog. And when that's full the OS will stop accepting new connections. The size of the listen backlog is configured using the ChannelOptions.backlog option on the server channel (.serverChannelOption(ChannelOptions.backlog, value: 256)). The OSes don't guarantee to follow your value exactly, treat it more like a suggestion. But just like SwiftNIO, the OSes will only have buffers of a fixed size. So whilst you might not be able to control the exact limit (in precise bytes/connections), the key bit of information is that everything is backpressured and limited in the OSes as well as SwiftNIO. So you will be able to build a system that is resilient to attacks that are trying to exhaust your resources.


Finally @taylorswift thank you so much for asking these questions on the forums where everybody can see them. A lot of times they get asked in DMs in various places. This kind of information should be widely available.

14 Likes

Indeed!

yes that’s it! the site operates on a shoestring budget; people often assume it is a bigger operation than it really is based on the number of pages it serves.

eventually my vision for the site is for it to become ad-supported, at least in part, perhaps with a “supporter” option to opt-out of ads, similar to MDN but for the swift community. but we can’t really place ads while the site is still experiencing chronic reliability issues due to these attacks.

er, is there an example i can follow? i have a feeling what i am imagining (only one person being able to connect to the site at any given time) is not what you are suggesting.

can you explain what “delay outbound read” means? this seems to contradict the “thou shall not block the event loop” principle.

by the way, what is read? channels have write, and the channel handlers have channelRead, but i have never heard of read by itself.

this might be worth a separate thread, but how can a channel handler access some shared state like an IP rate limit table? the channel handlers live in different concurrency domains, and they cannot await on any shared state.

2 Likes

can you explain what “delay outbound read” means? this seems to contradict the “thou shall not block the event loop” principle.

Yes, don't block the event loop, ever. I meant delay as in "scheduling later".

What I meant by "delaying the outbound read" is that you'd have an OutboundChannelHandler which will not immediately forward the read event. Forwarding read means calling context.read(). If you don't implement read at all you get the default implementation which does forward immediately. Instead of forwarding immediately, it would instead figure out if the system has too much load (say > 10k connections) and if so, stop forwarding the read until we're in a safe territory again.

Here are two explanations of back pressure in NIO:

Both of the explainers explain how to exert backpressure on regular TCP channels (where you read ByteBuffers) but the concepts are exactly the same for the server channels which read accepted connections. Essentially, delay the outbound read call to when you're ready.

by the way, what is read ? channels have write , and the channel handlers have channelRead , but i have never heard of read by itself.

Ah, for NIO to write bytes to the network you trigger the outbound (meaning towards the network) write/flush operations that you know. Similarly, to make NIO read some bytes you need to trigger the outbound read operation. That will make NIO read a fixed amount of bytes from the network when they arrive. And once those bytes have been read, NIO will then call channelRead to hang the read data to you (in a server channel the read "data" would be accepted connection (aka a new Channel)).

Now, why have you now come across the outbound read before? By default, NIO enables the autoRead property which means that NIO will trigger one read to start with and whenever channelReadComplete has been triggered, it automatically fires another read. So if you don't have any outbound/duplex channel handlers that implement read and delay it, you'll essentially always read.

But this is also covered by the explainers linked above.

A while ago I also created this diagram which might be helpful:

this might be worth a separate thread, but how can a channel handler access some shared state like an IP rate limit table? the channel handlers live in different concurrency domains, and they cannot await on any shared state.

That is a good question but the answer is very boring: If I were you, I'd stick with the default which means you accept from one event loop. That means you have just one server channel which "reads" (accepts) all the accepted TCP connections. So you can pop a single channel handler in the server channel which regulates the acceptance of incoming TCP connections (those are the Channels you already know). That way, you're in one concurrency domain where you'd need your IP tables, you'd even be in one single ChannelHandler instance. Easy. But even if you wanted to switch to accepting connections from multiple EventLoops at the same time (which NIO supports) this wouldn't be an issue, you'd need to arrange for synchronisation using locks/atomics/...

Examples that might help:

  • NIO's ServerQuiescingHelper demonstrates how to user a serverChannelInitialiser as well as capturing a bunch of state. It helps to quiesce a server waiting for all in-flight requests to terminate to do a restart without losing connections. Usage example
  • The backpressure-file-io example project which goes through great lengths to explain how you could (without any helpers) create a fully backpressured file upload. From NIO into the file system. It explains NIO backpressure as well as using a state machine for this kind of work.
  • A github search for func read(context: ChannelHandlerContext) brings up a bunch. Particularly NIO's SSH and HTTP/2 implementations are probably interesting because they also handle multiplexing
3 Likes

thanks, that explains read, it never occurred to me that read is a noun that propagates from the back of the channel to the front, like a write.

still, i am confused as to how this generalizes to reads that return any Channels themselves.

  1. if i add a channel handler to the “tail” of the root server channel that receives any Channels through channelRead, what does it do with those channels?

  2. what even is the “tail” of a server channel? how do the child channels fit into this picture? if the last channel handler in the root pipeline reads channels, then i assume it must broadcast those channels to… the child channels themselves? that doesn’t make any sense! i do not understand this forking/splitting concept, i have only seen examples where channel handlers have at most one channel handler directly after them.

to use an analogy, the model i was taught is that a channel pipeline is like a train. each car in the train has a car before it and a car after it, but the engine doesn’t have a car before it, and the caboose doesn’t have a car after it. things can percolate from the front of the train to the back or from the back of the train to the front, and each car in the train can buffer, reorder, drop, forward etc. things it intercepts.

this mental model doesn’t make sense for trains that branch into sub-trains, or trains that produce child trains, and i don’t know what a more correct model would be.

that’s not what i’m asking. something like a rate limit table gets written-to from many different connections, which have channel handlers that live in other concurrency domains.

for example, i might have an HTTP/2 stream handler that increments the table like:

if  case HTTPPart.head = self.unwrapInboundIn(data)
{
    sharedTable[self.remoteIP] += 1
}

and then i might have a root channel-level handler that reads the table and rejects the connection if the associated IP has made more than N requests.

but how would it write to sharedTable, if the table is read from a different channel handler?

1 Like

okay, so i have no idea if i did this right, but it doesn’t seem to crash when i test it, so here’s a really basic server channel handler that tries to limit the number of active connections to 50:

handler:

extension HTTP
{
    final
    class ServerConnectionHandler
    {
        private
        var connections:Int

        init()
        {
            self.connections = 0
        }
    }
}
extension HTTP.ServerConnectionHandler
{
    private static
    var capacity:Int { 50 }
}

channelRead:

extension HTTP.ServerConnectionHandler:ChannelInboundHandler
{
    typealias InboundOut = any Channel
    typealias InboundIn = any Channel

    func channelRead(context:ChannelHandlerContext, data:NIOAny)
    {
        defer
        {
            context.fireChannelRead(data)
        }

        let channel:any Channel = self.unwrapInboundIn(data)

        //  If we have room for more connections, enqueue another read.
        self.connections += 1
        if  self.connections < Self.capacity
        {
            context.read()
        }

        channel.closeFuture.whenComplete
        {
            _ in

            //  Are we even on the right thread???
            if  context.eventLoop.inEventLoop
            {
                //  We now have room for at least one more connection,
                //  so enqueue a read.
                self.connections -= 1
                context.read()
                return
            }

            context.eventLoop.execute
            {
                self.connections -= 1
                context.read()
            }
        }
    }
}

backpressure:

extension HTTP.ServerConnectionHandler:ChannelOutboundHandler
{
    typealias OutboundIn = Never
    typealias OutboundOut = Never

    func read(context:ChannelHandlerContext)
    {
        //  Don’t bother buffering a pending read if we’re already at capacity,
        //  since the only way a slot can open up is if a connection closes, and
        //  that will trigger a read on its own.
        if  self.connections < Self.capacity
        {
            context.read()
        }
    }
}

i added this to the ServerBootstrap:

    .serverChannelOption(ChannelOptions.autoRead, value: false)
    .serverChannelInitializer
{
    $0.pipeline.addHandler(HTTP.ServerConnectionHandler.init())
}

and apparently i also had to manually kickstart the server channel with:

let channel:any Channel = try await bootstrap.bind(
    host: binding.address,
    port: binding.port).get()

channel.read()

it seemed to work as expected when i set capacity { 1 }.

i still don’t know why context.fireChannelRead(data) is necessary, or who could possibly be listening for it, since this is the last handler in the server channel pipeline.

1 Like

It might feel that way, but it isn't. NIO has two "secret" channel handlers: the HeadChannelHandler and the TailChannelHandler. These two handlers bridge the ChannelPipeline back into the Channel object: they capture the events that fall "off the end" of the pipeline and pass them into the channel. The Head handler is always first, the Tail handler is always last.

In this instance, for the ServerSocketChannel (which is the server channel in this case), when channelRead gets to the tail channel handler it is passed, eventually, to ServerSocketChannel.channelRead0. This ultimately activates the channel.

1 Like

Oh, TIL I should not just drop events even if I am the "last" channel handler.

2 Likes

In general we don't do interesting things with these events on the tail channel handler: ServerSocketChannel is the only real counter-example. I increasingly think that was a wart we should have removed, because as @taylorswift has noticed this behaviour is quite confusing. A better approach might have been having a specific channel handler doing this child channel dance, as we do in the multiplexed protocols.

2 Likes

Apologies for not having seen your response in the other thread. Cory and Johannes have provided great insights here already that we should at best distill into some documentation somewhere. To add to their great insights, I want to expand on one of the examples I gave you in the other thread where I suggested you use the new async bootstrap APIs that we offer.

try await withThrowingDiscardingTaskGroup { group in
    for try await connectionChannel in serverChannel.inboundStream {
        group.addTask {
            do {
                for try await inboundData in connectionChannel.inboundStream {
                    // Let's echo back all inbound data
                    try await connectionChannel.outboundWriter.write(inboundData)
                }
            } catch {
                // Handle errors
            }
        }
    }
}

In the above example you can do the same as you did in your ChannelHandler that you added in the server channel initializer but in Swift Concurrency. You can use a width constraints task group to achieve this.

try await withThrowingTaskGroup { group in
    var iterator = serverChannel.inboundStream.makeAs
    for _ in 0...10 {
        let connectionChannel = try await iterator.next()
        group.addTask {
            await handleConnection(connectionChannel)
        }
    }

   await group.next()
   let connectionChannel = try await iterator.next()
   group.addTask {
       await handleConnection(connectionChannel)
   }
}

func handleConnection(_ connectionChannel: NIOAsyncChannel<ByteBuffer, ByteBuffer>) async { ... }

The above code is now limiting the task group to 10 concurrent child tasks and whenever one child task finishes it pops off the next connection and handles it in a child task. There is still a couple of more connections accepted and buffered. Those can be customised in the bind methods serverBackPressureStrategy and the NIO backlog.

6 Likes

hi Franz, undoubtedly this is a massive improvement over writing state machines in channel handlers.

there is a really mundane issue with this solution, which the same as i mentioned before: these API’s aren’t visible in any released version of SwiftNIO! (also, the configureAsyncHTTPServerPipeline you suggested in the other thread seems to have vanished from the main branch too.)

the DocC documentation doesn’t have any record of such a type NIOAsyncChannel. swiftinit only picks it up because i configured it to do so, but it doesn’t know where the symbol comes from, because lib/SymbolGraphGen ignores @_spis.

none of these issues are your fault, of course. this is a lib/SymbolGraphGen problem. it affects many libraries¹ and i raised it a few months ago over at the swift-docc slack, but it seems to have fallen out of the 90-day retention window, so i’ve gone ahead and filed a proper issue in its place.

anyway, this is not a criticism of SwiftNIO, just a remark that these new features are not quite as accessible as one might assume.


[1] this is also an issue when working with raw syntax in SwiftSyntax. it appears that some package authors have also begun working around this problem by just manually labeling things “SPI” without using the attribute.

EDIT:

it looks like the AsyncChannel interfaces were removed three weeks ago in order to patch CVE-2023-44487, but were never put back. so it is not possible to use them even if you want to be an early adopter.

1 Like

I know they aren't but I want to recommend them anyway because they are the future. We are working on making them stable API currently. We already have everything landed in NIO and are preparing a release. The rest of the packages will follow shortly afterwards.

2 Likes

in the meantime, could you at least revert 2140160e95f7d27e8b6a90c11e8cfbc7f69ff4f0 from swift-nio-http2, so we do not need to downgrade to an insecure version of swift-nio-http2 to get these APIs from the main branch? :slight_smile:

The revert is unlikely to be needed due to the freshly tagged new releases: New SwiftNIO async APIs.

3 Likes

@Max_Desiatov thanks for sharing it here. You beat me to it ;)

great!

fyi, doc generation for swift-nio is now hitting the same 5.9.1 compiler crash that was breaking swift-certificates last month. i could only fix the issue by manually blacklisting the _NIODataStructures module; no clue what changed in the module to trigger the compiler crash.

again, not NIO’s fault. unsurprisingly, it is (yet another!) lib/SymbolGraphGen bug. please, if anyone has any influence at Apple, this part of the compiler badly needs some attention.

1 Like