'Standard' vapor website drops 1.5% of requests, even at concurrency of 100!

wadetregaskis · May 5, 2024, 6:09pm

That's maybe not always true.

It depends on the request rate (and probably request & response sizes, and whether you're running any Lua scripts within wrk, etc), but basically wrk can hit the limits of a single CPU core, for enough activity.

But even before that, a lone wrk thread can't turn around a connection infinitely fast, so it can end up delaying requests that are otherwise ready to go even if it's not fully utilising its CPU core.

Just using --threads $(sysctl hw.ncpu) seems like the best and a robust default. I've not seen any negative side-effects of that, in my testing [so far].

Sent via PM (not that the patch is secret, just not sure folks want it here on the main thread - and it's quite trivial and exactly as described previously).

I also am just running everything on the one host (wrk doesn't use that much CPU for this benchmark, and not having to deal with actual network hardware eliminates some variability).

Keep in mind, also, that the iMac Pros contain slow Xeons, so those 20 cores aren't what you'd think. Those Xeons are about eight years old now and are thermally limited in the iMac Pro below their design limits - I think they're actually under-clocked by a turbo level or two, as well - so they're pretty crappy "server" CPUs, even by Xeon standards.

But nonetheless, I wouldn't expect modally different behaviour from your system - just some variation in throughput numbers.

It would of course be best to do the official benchmark runs on a modern server (and IMO that means something like a Graviton 4 or an Ampere Altra). 2013 and 2016 Xeons are ancient in every important respect - architecture, design, process node, etc.

Oh, and also: I didn't use Docker. I'm just doing swift run -c release, because it's way simpler and I don't really want to install Docker and its entourage.

I don't think having INFO level enabled is wrong - every web server I've ever encountered in production has had essentially that enabled; i.e. at least one log line for every HTTP request received. In fact Vapor logs relatively little vs e.g. Apache - just a UUID!

But directing that logging anywhere but a file is definitely unrealistic.

Yes - it's important to remember that irrespective of what optimisations or configuration changes might be proposed, the original benchmark was still useful. "Fairness" in benchmarks is overrated - what matters is whether they reveal interesting or useful things.

Could be that the iMac Pro's CPU isn't as powerful as you're thinking (I think that's at least part of it, for reasons mentioned above). Could be Swift on Linux compiles differently. Could very well be differences in Linux kernel vs macOS kernel performance. Etc.

Oh, and to be clear: 20 logical cores. Given HyperThreading. Only 10 physical cores.

I didn't record it for every benchmark run, but I did occasionally check memory usage and I never saw it being more than 9 MiB. By server standards that's literally a rounding error (I've worked with job management systems that literally don't support resolutions finer than gigabytes).

And I wouldn't expect any more - it's basically small-N threads (number of CPU cores) doing an iterative calculation on a pair of relatively small memory allocations. There's probably a lot of memory traffic, not just actual reads & writes but meta-traffic like alloc/free & retain/releases etc. But the high water mark should be very low and very close to the average water mark.

In Swift. Obviously in GC'd languages all bets are off.

But I would like to see you run the numbers limiting Java & JavaScript to the same amount of memory Swift needs. I'm pretty sure they won't even launch with that "little".

(more seriously - varying Java heap size is an important and insightful test, as there tends to be a quadratic relationship between heap size and performance in Java, and GC'd languages generally - and servers don't in fact have infinite RAM, so sometimes you don't get to pick your ideal spot on that curve)

Will Swift use hardware offloading for TLS? None of Apple's platforms have ever had or supported that (to my knowledge) - not counting some ISA specialisations to accelerate certain operations on the CPU cores - so I wouldn't be surprised if Swift doesn't. (unless Vapor w/ TLS is ultimately using BoringSSL or somesuch, which might still)

In any case, I wouldn't assume that the cost of TLS is equivalent across all these languages and server stacks. Maybe they all just e.g. use OpenSSL underneath, but maybe not.