HTTP benchmarking tool written in Swift

lovetodream · May 17, 2024, 5:21pm

Hello everyone,
inspired by the discussion in this thread, I've hacked together a small http benchmarking tool using nio during the last few evenings. It was a great chance to try out the AsyncChannel stuff recently added to NIO. Which I really enjoyed, I like it a lot more than the channel handler stuff .
I have no prior experience with benchmarking tools, so this is a first for me. There are very likely many optimisations one can make and incorrect assumptions in the code .

As of now it only supports HTTP1, but I'd like to add HTTP2 and TLS/HTTPS in the upcoming days.
I've been following the whole discussion in said thread with a lot of excitement. Due to this I wanted to share the code pretty much immediately after I got something together to get feedback from the community. Maybe there are some people who want to contribute. Anyways, I'd love to hear feedback and tips on how I can improve the tool, also things I need to consider when writing benchmarking tools.

Currently I'm creating x connections to the target server, then I send a lot of requests within those x connections and measure the time the requests take. Is this a valid approach, or should I create a new connection for every request and measure the tcp + http time?
At the moment I'm measuring the following way:
The first request in each connection contains the tcp (is this relevant?) + http duration. All subsequent requests measure only the http duration.

Here's the code: GitHub - lovetodream/ohje

Thanks everyone!

wadetregaskis · May 17, 2024, 7:47pm

That's fantastic - thanks for jumping in and creating this.

This could be the beginnings of a great community project, if others can also jump in and contribute. I'll be taking a look at it myself, for sure.

Getting it out there early is good on its own merits, too - I'm sure there'll be plenty of suggestions and wisdom shared on the best ways to flesh out the benchmark tool.

It is a valid way to do it, but not the only way. Different types of web servers - as in, for different use-cases - have different needs, so this method isn't valid for all of them. No one method is - which is why my immediate suggestion is to support multiple rate and connection methods (which are orthogonal, so you have NxM total modes). For rates:

Constant concurrency. This is how wrk works. A fixed number of connections all issuing back-to-back requests. It sounds like this is what you've done first. It's good for imposing a consistent load on a server, where you're basically just testing its throughput capacity.
Constant request rate. This is how wrk2 works. You issue requests at a fixed frequency, irrespective of how many are already in flight. This is good for testing "stampeding herd" behaviour (such as getting Slashdotted) and is arguably the best methodology (generally) for latency testing.

For connection methods:

New TCP connection for every request. This is a good test for a lot of large-client-low-chattiness applications, where clients don't send requests frequently enough to justify keeping connections open. It's also a decent approximation for thundering herd scenarios.
HTTP reuse. This is a good test for chatty applications, where clients largely stay connected indefinitely and send requests frequently. Note that you should also report connection loses in this mode, as that can be important to behaviour & performance, and reveal overload problems with the server or networking in-between.

Really the above two are just the two extremes of reusing a connection for a configurable N requests, before forcing it closed and creating a new one. A small (but greater than one) value for N is a good approximation for typical web traffic, where browsers load a page which requires multiple requests which are done with limited TCP connection parallelism.

odmir · May 18, 2024, 2:00am

This is very exciting, thank you for doing it!

lovetodream · May 18, 2024, 9:02am

Thanks for the writeup, there are a lot of helpful things in it already

This is exactly what I wanted to achieve.

Yes, that's indeed the way I'm doing it right now. Supporting both "rate-modes" seems like the best way to do go forward. Thanks for the explanation, I couldn't grasp the difference from looking at the code of wrk and wrk2 (I'm not very good with C though).

This is what I've done initially, but I faced errors in certain conditions:
When running the tool on 4 threads with 10 simultaneous connections for 10 seconds, I receive the following error on macOS (M1 Max) starting around 3/4 of the runtime:

NIOConnectionError(host: "localhost", port: 8080, dnsAError: nil, dnsAAAAError: nil, connectionErrors: [NIOPosix.SingleConnectionFailure(target: [IPv6]localhost/::1:8080, error: connection reset (error set): Connection refused (errno: 61)), NIOPosix.SingleConnectionFailure(target: [IPv4]localhost/127.0.0.1:8080, error: connect(descriptor:addr:size:): Can't assign requested address (errno: 49))])

When running on a dedicated linux machine under the same conditions I receive this error:

connect(descriptor:addr:size:): Cannot assign requested address (errno: 99)

~~It seems like the same issue, maybe the program ran out of ephemeral ports?~~
While I'm writing the post, I realised to set so_reuseaddr on the channel, which resolved the issue on first glance. But when running the tool again, the error reappears. Maybe I'm not releasing the sockets properly or I have to set another channel option (Edit: just checked with netstat, the sockets are in TIME_WAIT state)

Due to this, I changed to connection reuse, but I'd love to make this configurable in some way to get a good middle ground, as you suggested.

wadetregaskis · May 18, 2024, 3:20pm

It definitely should be possible to use a huge number of transient sockets even under default configurations of Linux & macOS… other than needing to raise the file descriptor ulimit potentially, I think you should be good until you hit the ephemeral port limit which is ~16Ki on most systems (on macOS it's controlled via the four net.inet.ip.portrange.{,hi}{first,last} sysctls - whether you use the 'low' or 'high' range is a setsockopt configuration option, I believe, though I don't know what the default is).

SO_REUSEADDR is definitely what you need to ensure you don't exhaust the ephemeral ports early. So I'm not sure why setting that didn't address this for you, given your low connection rates. Were those sockets in TIME_WAIT left-over from previous runs before you were setting SO_REUSEADDR?

lovetodream · May 18, 2024, 4:59pm

Seems like they've indeed been leftovers, I can't reproduce the errors anymore.

wadetregaskis · May 19, 2024, 9:34pm

Related to this, there's apparently a serverperfmode kernel boot argument that makes a bunch of tweaks to improve macOS's suitability for server roles. It only applies on x86 Macs, though. Nonetheless, it gives some hints about which sysctls might be interesting to look at.