'Standard' vapor website drops 1.5% of requests, even at concurrency of 100!

swift-nio-ssl (which is used by Vapor) has no support for hardware offload for TLS, and frankly I'm not aware of any hardware offloading of TLS that's done for performance. TLS is very fast. As an example, here's crypto involved in TLS 1.3's handshake:

  • Digest calculation over the handshake messages, usually 4kB or less.
  • One key exchange, usually ECDHE which involves generating a random scalar and performing two point multiplications.
  • One signature (either signing or verifying depending on role), again usually an EC signature. Implies another digest calculation and point multiplication.
  • HKDF generation of a number of derived secrets. All digest calculations.

That's it. The in-band cryptography is then all bulk symmetric through an AEAD, usually AES-GCM but sometimes ChaCha20-Poly1305.

The important heuristic there is that these are all super fast on even slightly modern application processors. Offloading these to a hardware card would almost never produce improved performance, because the cost of shuffling the data between the cards would utterly dominate the cost of the computation being done. Back in 2010 Adam Langley offered a rough calculus of 1500 handshakes/sec/core. That number not only assumes 2010-era server hardware, but also involves RSA 1024 (far slower than EC) and RC4 without hardware acceleration (compare AES with hardware acceleration in all modern CPUs). Nowadays we are well past the point where the crypto for TLS handshakes is lost in the noise of the rest of the protocol stack.

The reason you might do crypto offload isn't for performance, but you might choose to do it for security. Keeping a private key inaccessible from main memory is a valuable thing to do, so you may choose to do that. But this will never make your handshake faster, it'll only make it slower. (Sidebar: also, this only has an effect on the signing operation, as it'll only be the server's private key in the hardware security component. So only one of these operations gets slower, but it does get a lot slower.)

(Sidebar to my sidebar: how much slower? Using YubiHSM2 as a good example of a publicly documented HSM, Yubico quotes ~73ms for ECDSA-P256-SHA256. Whereas, openssl speed ecdsap256 on my M1 Max with OpenSSL 3.2.1 says 59353.7 signing operations per second, or ~16┬Ás. That makes YubiHSM2 about 4,500x slower than doing the signing operation on the CPU. Worse still, typically HSMs are single-threaded, so you have locked your handshake rate down hard.)

It is, swift-nio-ssl is a BoringSSL wrapper. However, we have not exposed much in the way of API to use hardware secure elements except for hopping through Swift code first using NIOSSLCustomPrivateKey.

Many stacks use BoringSSL under the covers. However, for these 4 that's likely not true. I'd expect PHP to use OpenSSL via httpd. Node.js uses OpenSSL as well I believe. As for Java, it depends: Java has a builtin implementation, but Netty uses BoringSSL.