'Standard' vapor website drops 1.5% of requests, even at concurrency of 100!

wadetregaskis · May 7, 2024, 8:31pm

Update: I'm not sure now if wrk is actually the problem.

The problem is `wrk`; it is very buggy.

It naively assigns connections / threads connections to each thread. Integer division. So you're not necessarily getting the actual number of connections requested (e.g. 100 / 26 = 78).
It doesn't even check for socket errors, so it doesn't notice when it runs out of file descriptors, and just silently fails (running with fewer connections than instructed).

Once I fixed that and added some logging, I finally saw the EMFILE errors that are expected when the file descriptor limit is hit (not that this was a big issue in any of my posted results, as it at worst means my 2,560-connection runs were actually more like 2,540-connection runs).
It doesn't actually wait for the sockets to open before trying to use them, resulting in spurious errors and basically, surprisingly varied undefined behaviour. This, specifically, is what was causing the slow- and no-ramp behaviour.

~~It's a kinda comical error, too:~~
```
flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) {
    if (errno != EINPROGRESS) goto error;
}
```
~~…else blindly assume the socket is ready and try to use it.~~ Note also the complete lack of error checking, for the fcntl calls (a common theme throughout the wrk code).

Clearly the author encountered EINPROGRESS, but for reasons unfathomable decided to just hard-code an ignore of it.

~~In retrospect this was hinted at already in what I'd observed & written about~~ - that the read & write errors only happen during the ramp period. Turns out that's not completely correct - there are still some after the ramp, but several orders of magnitude fewer (and I don't yet know what's causing those lingering read/write errors).

~~The proper fix is probably to remember that the socket isn't connected yet and react appropriately when it does while moving on to other sockets in the meantime. But even as a quick hack,~~ by moving the O_NONBLOCK enablement to after connect, that essentially fixes it for the purposes of this benchmark. Technically it does now require that all connections within a given wrk thread have to open before it'll send any requests, but that doesn't seem to cause a significant start-up delay even with many thousands of connections (and Vapor's listen queue left at the default 256).

Patch

diff --git a/src/net.c b/src/net.c
index 75916f7..10764fe 100644
--- a/src/net.c
+++ b/src/net.c
@@ -2,6 +2,7 @@
 
 #include <errno.h>
 #include <unistd.h>
+#include <string.h>
 #include <sys/ioctl.h>
 
 #include "net.h"
@@ -17,6 +18,7 @@ status sock_close(connection *c) {
 status sock_read(connection *c, size_t *n) {
     ssize_t r = read(c->fd, c->buf, sizeof(c->buf));
     *n = (size_t) r;
+    if (0 > r) { printf("read failed: %d (%s)\n", errno, strerror(errno)); }
     return r >= 0 ? OK : ERROR;
 }
 
@@ -25,7 +27,7 @@ status sock_write(connection *c, char *buf, size_t len, size_t *n) {
     if ((r = write(c->fd, buf, len)) == -1) {
         switch (errno) {
             case EAGAIN: return RETRY;
-            default:     return ERROR;
+            default:     printf("write failed: %d (%s)\n", errno, strerror(errno)); return ERROR;
         }
     }
     *n = (size_t) r;
@@ -35,5 +37,6 @@ status sock_write(connection *c, char *buf, size_t len, size_t *n) {
 size_t sock_readable(connection *c) {
     int n, rc;
     rc = ioctl(c->fd, FIONREAD, &n);
+    if (0 > rc) { printf("ioctl(…, FIONREAD, …) failed: %d (%s)\n", errno, strerror(errno)); }
     return rc == -1 ? 0 : n;
 }
diff --git a/src/wrk.c b/src/wrk.c
index 51f46f7..8f3d93f 100644
--- a/src/wrk.c
+++ b/src/wrk.c
@@ -240,13 +240,21 @@ static int connect_socket(thread *thread, connection *c) {
 
     fd = socket(addr->ai_family, addr->ai_socktype, addr->ai_protocol);
 
-    flags = fcntl(fd, F_GETFL, 0);
-    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+    if (0 > fd) {
+        printf("socket failed: %d (%s)\n", errno, strerror(errno));
+        goto error;
+    }
 
     if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) {
-        if (errno != EINPROGRESS) goto error;
+        if (errno != EINPROGRESS) {
+            printf("connect failed: %d (%s)\n", errno, strerror(errno));
+            goto error;
+        }
     }
 
+    flags = fcntl(fd, F_GETFL, 0);
+    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+
     flags = 1;
     setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &flags, sizeof(flags));
 
@@ -257,6 +265,8 @@ static int connect_socket(thread *thread, connection *c) {
         return fd;
     }
 
+    printf("Fell off the end of the connect_socket function.\n");
+
   error:
     thread->errors.connect++;
     close(fd);
@@ -406,7 +416,7 @@ static void socket_writeable(aeEventLoop *loop, int fd, void *data, int mask) {
 
     switch (sock.write(c, buf, len, &n)) {
         case OK:    break;
-        case ERROR: goto error;
+        case ERROR: /*printf("sock.write failed\n");*/ goto error;
         case RETRY: return;
     }
 
@@ -430,12 +440,19 @@ static void socket_readable(aeEventLoop *loop, int fd, void *data, int mask) {
     do {
         switch (sock.read(c, &n)) {
             case OK:    break;
-            case ERROR: goto error;
+            case ERROR: /*printf("sock.read failed\n");*/ goto error;
             case RETRY: return;
         }
 
-        if (http_parser_execute(&c->parser, &parser_settings, c->buf, n) != n) goto error;
-        if (n == 0 && !http_body_is_final(&c->parser)) goto error;
+        if (http_parser_execute(&c->parser, &parser_settings, c->buf, n) != n) {
+            printf("http_parser_execute failed\n");
+            goto error;
+        }
+
+        if (n == 0 && !http_body_is_final(&c->parser)) {
+            printf("http_body_is_final failed\n");
+            goto error;
+        }
 
         c->thread->bytes += n;
     } while (n == RECVBUF && sock.readable(c) > 0);

With a ~~fixed~~modified version of wrk, I now see reliable performance and no timeouts even with 10,000 connections (I didn't test beyond that, but it's clearly now following the expected curve of increasing latency and slightly decreasing throughput).

Though those lingering few read/write errors still bother me. I might look into them later.

Orthogonally, a few other miscellaneous observations:

unsafeTryInstallSingletonPosixEventLoopGroupAsConcurrencyGlobalExecutor doesn't work; it just makes the server crash at startup. I believe this is expected, until Vapor is fixed.
Removing all async code from the server itself (which was really just it's main & configure methods) makes no difference. Presumably this is expected; I assume how Vapor actually runs internally is unaffected by whether you use run or execute.
Many (all?) of Vapor's objects have no meaningful descriptions, which is annoying for debugging as I had to basically create them manually, in order to figure out even the most rudimentary things like the default settings for various configuration options.

'Standard' vapor website drops 1.5% of requests, even at concurrency of 100!

The problem is wrk; it is very buggy.

The problem is `wrk`; it is very buggy.