At the start of my long-running server on Linux I create these threading mechanisms:
self.eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount / 4)
self.blockingThreadPool = NIOThreadPool(numberOfThreads: System.coreCount / 2)
for i in 0..<(System.coreCount / 2) {
let queue = DispatchQueue(label: "serial-queue-\(i)")
self.serialQueues.append(queue)
}
System.coreCount
happens to be 16, so that's 4 EventLoopGroup
threads, 8 NIOThreadPool
threads, and 8 serial DispatchQueue
s.
The EventLoopGroup
connects and manages ~500 web socket connections. Blocking tasks that can be performed concurrently such as JSON encoding and storing in a database are sent to the NIOThreadPool
. Each of the ~500 connections are randomly assigned one of the 8 serial DispatchQueue
s and blocking tasks that must be performed in serial, such as JSON decoding and checksum validation, are performed on that connection's assigned queue: self.serialDispatchQueue.async{}
. This is because the checksum validation of data received at timestep t+1
relies on the data received at timestep t
. They must be done in the order received.
Here is a perf
flame graph captured on 8 hours of thle above server running with swift build -c release -Xswiftc -cross-module-optimization
:
In the bottom left-hand corner you can see the 4 EventLoopGroup
threads: NIO-ELT-0-#<0, 1, 2, 3>
. On the bottom right are the 8 NIOThreadPool
threads: TP-#<0, 1, 2, 3, 4, 5, 6, 7>
. A couple of questions arise:
- The 8
NIOThreadPool
threads are all nearly identical, while the 4EventLoopGroup
threads are very uneven. You can seeNIO-ELT-0-#1
is barely visible as it was in only 2% of samples, whileNIO-ELT-0-#2
was in 43% of samples. #0 is in 22% of samples and #3 is in 15%. Why might this be? #0, 2, and 3 all appear to be calling the same functions, while #1 doesn't appear to contain any of the calls to my code that are in the other three threads. Is it perhaps expected that one of the threads is reserved for certain activity? - Everything above the
_dispatch_call_block_and_release
is what's occurring via theserialDispatchQueue.async{}
calls. You can see thePureSwiftJSON
decoding andCryptoSwift
checksum calculations occurring here. These calls are initiated by theEventLoopGroup
, but I believe it's safe to say the work is not actually consuming time on theEventLoopGroup
's threads? It would be a serious issue if it were. The work on theDispatchQueue
s is confusing because it doesn't show up under specific thread names like theEventLoopGroup
andNIOThreadPool
tasks do.
The instructions for creating this graph on Linux came from here.