'Standard' vapor website drops 1.5% of requests, even at concurrency of 100!

Digging into this deeper, I'm now not sure wrk is the root problem.

I investigated wrk's implementation further, and it does in fact properly wait¹ (asynchronously via kevent) for the connecting socket to establish (specifically, to become readable or writable). It usually successfully writes to it and then receives a kevent telling it the socket has closed (EV_EOF is set). It tries to read the socket anyway - which is correct; it could have data still in it from before the close - but always comes up empty in this case.

Looking at the syscalls using System Trace in Instruments, it genuinely looks like wrk is doing everything correctly.

And yet on the server side, it seems like basically the same story - it looks like there's a kevent signalling the socket is readable, but when NIO reads from it the response is zero, which NIO takes as meaning EOF.

Callstack on the server's close call
Symbol Source location
close
@nonobjc close(_:) …/serverbench/vapor/swiftbench/:0
static Posix.close(descriptor:) …/serverbench/vapor/swiftbench/:0
static NIOBSDSocket.close(socket:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/BSDSocketAPIPosix.swift:46
BaseSocket.close() …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/BaseSocket.swift:342
protocol witness for BaseSocketProtocol.close() in conformance BaseSocket …/serverbench/vapor/swiftbench/:0
BaseSocketChannel.close0(error:mode:promise:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/BaseSocketChannel.swift:862
specialized BaseSocketChannel.readable0()
specialized BaseSocketChannel.readEOF0() …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/BaseSocketChannel.swift:1042
specialized BaseSocketChannel.readEOF() …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/BaseSocketChannel.swift:1028
specialized protocol witness for SelectableChannel.readEOF() in conformance BaseSocketChannel …/serverbench/vapor/swiftbench/:1019
specialized SelectableEventLoop.handleEvent(_:channel:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:401
specialized SelectableEventLoop.handleEvent(_:channel:) …/serverbench/vapor/swiftbench/:0
partial apply for closure #2 in closure #2 in SelectableEventLoop.run() …/serverbench/vapor/swiftbench/:0
specialized Selector.whenReady0(strategy:onLoopBegin:_:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectorKqueue.swift:258
specialized Selector.whenReady(strategy:onLoopBegin:_:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectorGeneric.swift:288
closure #2 in SelectableEventLoop.run() …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:470
specialized closure #1 in withAutoReleasePool(_:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:26
specialized autoreleasepool(invoking:) …/serverbench/vapor/swiftbench/:0
specialized withAutoReleasePool(_:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:25
SelectableEventLoop.run() …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:469
specialized static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:_:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:93
specialized static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:_:) …/serverbench/vapor/swiftbench/:0
closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:111
partial apply for closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:) …/serverbench/vapor/swiftbench/:0
thunk for @escaping @callee_guaranteed (@guaranteed NIOThread) -> () …/serverbench/vapor/swiftbench/:0
closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:) …/serverbench/vapor/swiftbench/.build/checkouts/swift-nio/Sources/NIOPosix/ThreadPosix.swift:116
_pthread_start
thread_start

It appears that the connection is being closed by a third party, which I guess means the macOS kernel here? Though I have no idea what its motivation would be.

It's not completely clear if the socket is actually closed, before either the client or the server explicitly close it anyway as part of their clean-up. Perhaps the return value of zero from read on the server side does not imply the socket is closed; perhaps NIO is being presumptuous?

The slow- & no-ramp starts are a direct consequence of this - wrk spends all its time trying to open its connections and keeps getting ECONNRESET on them (mostly on reads, sometimes on writes / connects).

I don't know why opening all the connections first, before sending anything on them, avoids this problem. For a long while I suspected a logic error in the kevent code in either wrk or the server, but if it's there I can't see it. I also cast about for race conditions or erroneous sharing of kevent queues between threads, but couldn't find anything suspect on the wrk side, and couldn't really penetrate the very complicated NIO code.

I'd also very much like to know if this same behaviour occurs on other platforms, e.g. Linux, as that might narrow it down to at least userspace vs kernelspace.


¹ In my defence, it does it a bit obtusely and the code is generally hard to follow, utterly undocumented, and missing most error checking. So I'm a little embarassed that I perhaps mistakenly or prematurely accused wrk, but not that embarassed.

3 Likes