SwiftNIO async APIs: case study

over the weekend, i ported the swiftinit server to the new async interfaces, and as a result i’ve gained a lot of real-world experience using these APIs in production, which i’m now writing up as a case study in the hopes it will inform future work on this library.

since it doesn’t fit neatly into a category, i also want to say the new APIs are a joy to use, and despite some early bumps in the road, my experience with them has been overwhelmingly positive.

no performance regression

one major concern i had before adopting the new interfaces was that moving logic out of channel handlers and into the async/await world would cause a performance regression.

this concern was unfounded. as you can see from this CPU plot of the swiftinit production server, there has been no noticeable performance regression after deploying the update, which is demarcated by the discontinuity in the graph.

if you operate a NIO server that currently uses channel handlers, i strongly recommend porting it to use the new async API.

improved security

by itself the new API doesn’t introduce any new security features. but at a higher level, there are always a lot of good security ideas that are hard to implement with state machines, and so just weren’t practical to add to a channel handler-based server implementation.

the async/await API is a lot better for security, because you can easily enforce security procedures in an imperative manner using NIOAsyncChannelInboundStream.AsyncIterator, you don’t end up shipping insecure servers because good ideas are backlogged on implementation.

more atomics

the natural pattern for synchronization in channel handlers is to schedule things on event loops and assume the mutable state of the channel handler is protected that way, like an almost-actor.

i find that the new API is unintentionally pushing me away from scheduling tasks, and pushing me towards synchronizing things using atomics. this is because the legacy APIs had a lot of grandfathered @preconcurrency, @unchecked Sendable, etc., and the analogous modern constructs, like actors, make it a lot more obvious how often you are suspending.

i don’t know yet if this is a good thing or a bad thing.

HTTP/1.1 is actively harmful

unlike the old interface, the new interface has separate paths for HTTP/1.1 and HTTP/2 connections. even though common abstractions are available for the new API, they weren’t released until a couple days after the release of the core feature set in swift-nio and swift-nio-http2, so a serendipitous consequence of this was we had an opportunity to collect a lot of data over the weekend about what protocols swiftinit users are connecting with.

barbies

a barbie is a human user or a well-cloaked robot that doesn’t display immediately-detectable signs of automation. barbie is a term i invented to make it easier to understand the site’s usage patterns. the term is useful because many barbies aren’t actually humans and do display telltale signs of automation (such as requesting the same page at the same time each day), but nevertheless make it through swiftinit’s initial filters.

barbies often browse with caching enabled, and rarely encounter errors.

image

barbies request the site in a variety of languages, and almost always use HTTP/2. this is plausible, because many barbies are humans, and humans upgrade their browsers aggressively compared to robots.

image

bratz

bratz are cloaked robots that impersonate human users but exhibit behaviors that get instantly flagged by swiftinit’s filters. most commercial analytics software (e.g. Cloudfront) classifies them as human, but there are services¹ that specialize in detecting them.

bratz often exhibit malicious behaviors, and usually appear in waves when the site is being DoS’d. they frequently encounter errors because they probe the site for endpoints that don’t exist, or request pages deep in documentation bundles that were delisted months ago and couldn’t possibly be discovered by a client that doesn’t maintain a database of historical URLs.

bratz rarely use caching, and almost always request the site in english.

image

in the roughly 48 hours since we deployed the update, only a handful of bratz used HTTP/2.

image

search engines

search engines are benign crawlers like googlebot or bingbot, or one of the dozens of minor search engine crawlers. the category also includes malicious scripts that impersonate these crawlers in order to exploit a site’s whitelist.

image

from our data, about half of these crawlers are using HTTP/2. but i suspect that a lot of the ones using HTTP/1.1 are not real search engine crawlers. according to Cloudflare, googlebot and friends are perfectly capable of using HTTP/2, and my suspicion is that googlebot only falls back on HTTP/1.1 when a site does not support HTTP/2 or newer.

this is an area that requires more investigation.

research bots

research bots are crawlers like ahrefs that do not declare any relationship to a search engine, but also do not try to impersonate anyone else. research bots are considered harmful (but not malicious), and many sites such as wikipedia attempt to blacklist them in robots.txt.

research bots are an enormous drain on the swiftinit server, both in terms of CPU and bandwidth.

image

research bots overwhelmingly use HTTP/1.1.

image

implications for SwiftNIO

a lot of work in SwiftNIO has been devoted to trying to dual-boot HTTP/1.1 and HTTP/2 on the same server. this effort has both a human cost - from having to design common abstractions - and a machine cost, from cycles needed to translate HTTP/2 requests into a common format.

in my opinion this effort is misplaced, and at swiftinit, we are seriously considering deprecating HTTP/1.1 entirely and making the site HTTP/2-only. at worst HTTP/1.1 is a security hazard, and at best it is a hilarious waste of resources since it facilitates spam and automated scraping.

EDIT: see update below


[1] not an ad! i don’t work for them.

14 Likes

First, thanks for sharing all of this! I am very glad to hear that you find the new async interfaces easy to use and that it becomes more natural to implement your business logic. This is exactly what we have been striving to.

I am curious where and how you use atomics here? Could you provide a few examples and maybe reasons why you choose atomics over something like an actor?

While I love all the information you have provided here, I want to caution this statement. It really depends on the application and use-case if HTTP/1.1 should be used or not. We can't make assumptions here and HTTP/1.1 is still commonly used in lots of places so we want to make that experience as easy as possible.

On the same topic, we even want to go one step further even for HTTP and the SSWG is working on a general purpose HTTP server. This server is going to build on top of the new NIO interfaces and add a bunch more configuration and even easier usability while still aiming to have high throughput and low latency. We are currently in the API modelling phase and I hope in the next months we have something more concrete to share.

7 Likes

What an amazing write up!

I would definitely understand if you switch to HTTP/2 only. Off course the bratz and research bots will eventually adapt. But that will take time. In the mean time, you can focus on other measures. And enjoy some DoS free time.

2 Likes

let’s transfer this facet of the discussion to the thread How to reap idle HTTP/2 connections in SwiftNIO? - #5 by taylorswift

i think the situation is different for servers and clients. there are some services like AWS S3 which are HTTP/1.1 only, so you need an HTTP/1.1 client to be able to interact with them. but i don’t think there are many (any?) legitimate reasons¹ to support HTTP/1.1 in server mode, nor is there usually a reason to dual-boot both protocols in client mode - there should be two separate interfaces and users should pick one based on what the remote service supports.

for what it’s worth, HTTP/2 over TCP isn’t exactly a “modern” stack either; it is slowly being supplanted by QUIC (“HTTP/3”). but last year the SSWG indicated they were deprioritizing QUIC.


[1] i suppose maybe if you were running a honeypot specifically to collect data about malicious scripts and crawlers, an HTTP/1.1 server would be useful. but i’m not convinced abstracting over both versions simultaneously is valuable.

2 Likes

(Just to clarify - the SSWG has not deprioritised QUIC support. There was a pitch for a library that added HTTP/3 but it was fairly PoC and needed a lot of work before libraries and apps could start adopting it. If the work is done then we're more than happy to take a look. We also don't control the roadmap for SwiftNIO - that's the SwiftNIO team)

2 Likes

just an update, i tried changing the TLSConfiguration to only offer HTTP/2

- configuration.applicationProtocols = ["h2", "http/1.1"]
+ configuration.applicationProtocols = ["h2"]

but this was not enough, because scripts and crawlers are still connecting and requesting pages over HTTP/1.1.

i assume to really shut off the flow of HTTP/1.1 traffic, i need to stop using configureAsyncHTTPServerPipeline and use something like configureAsyncHTTP2Pipeline. but this function has a completely different, and seemingly much lower-level API than the combined one. is there a middle ground between the two APIs?

I mean you could just reply with 426 Upgrade Required

HTTP/1.1 426 Upgrade Required
Upgrade: HTTP/2.0
Connection: Upgrade
Content-Length: [...]
Content-Type: text/plain

This service requires use of the HTTP/2 protocol

to any HTTP/1 request if you really only want to support H2.

1 Like

but by the time it sends that response, haven’t all the upgrade handlers already been removed?

1 Like

Sorry, my example was probably not quite right. I was thinking connection: close. The idea was to tell the other side it needs a different protocol but close the connection. Now, I'm not so sure this is a great idea because it says "upgrade required" but then doesn't technically allow an upgrade but would require the client to re-connect with a better protocol...

Maybe 505 HTTP Version Not Supported?

2 Likes

so just to follow up, i spent a hilarious amount of time this week trying to phase out HTTP/1.1 specifically, and make a dent in the spam in general, and i’ve concluded that we cannot avoid dual-booting HTTP/1.1, even though statistically speaking, there is almost no legitimate usage (<2%) of the older protocol.

the main blocker is Googlebot. after cross-checking requests against public IP whitelists for well-known search crawlers, it appear my earlier hypothesis about HTTP/1.1 Googlebot user agent strings being fraudulent was wrong.

despite Googlebot (and its main HTTP/1.1 counterpart, Yandexbot) making up a total of less than two percent of HTTP/1.1 traffic, nothing that i tried doing could make Googlebot start crawling our site over HTTP/2. given how important a player Google is in this space, this means HTTP/1.1 is here to stay.

image

for what it is worth, Bingbot does not have this problem and crawls exclusively over HTTP/2. in fact, well over three-quarters of HTTP/2 traffic is legitimate.

image

i think that we cannot stem the flow of invalid traffic at the network protocol level; this needs to be done in a more sophisticated fashion using other techniques. so, the current approach of the SwiftNIO libraries is the correct one.

2 Likes

I'm curious as to whether you contacted Google about this? e.g. through GoogleBot Report. It seems like a bug if it if refuses to use HTTP/2 when it's available and HTTP/1 isn't.

people who think themselves far more important than i have been lobbying for HTTP/2 for a long time, so i doubt it would make a difference.

it makes sense from a pure implementation perspective for a crawler to use HTTP/1.1 instead of HTTP/2 when possible. it is easier and far better supported in terms of library availability. this is true even in the NIO ecosystem itself, where HTTP/1.1 is supported “out of the box”, while HTTP/2 still requires adding a dependency on a separate package (swift-nio-http2).