I have a long-running server that is a combination of a WebSocket client and BigQuery requests. It sometimes gets itself into a bad state, and I wonder if my current hypothesis is plausible:
Ubuntu 18.04. 16 cores. The app has a single
MultiThreadedEventLoopGroup(numberOfThreads: System.coreCount * 2).
At start, it connects to 400 different WebSocket endpoints on the same server. As it receives WebSocket messages, it processes the data and then inserts it into BigQuery. There are times when the WebSocket server restarts or goes down for maintenance, causing all 400 of my connections to die. Furthermore, during this maintenance time, my app's attempts to reconnect will all fail with
WebSocket connection error connectTimeout(NIO.TimeAmount(nanoseconds: 10000000000)). There is a 1 second wait between each reconnect attempt. If I wait long enough, my app will successfully reconnect to all 400 endpoints when the server is back up. The thing I am surprised to witness during this time is that most or all of my BigQuery requests start failing too:
RESTRequest.swift:handleHTTPRequest(request:requestAttribution:httpClient:logger:logSuccess:retryCount:):79 : connectTimeout(NIO.TimeAmount(nanoseconds: 10000000000)) with request: Request(method: NIOHTTP1.HTTPMethod.POST, url: https://www.googleapis.com/bigquery/v2/projects/myproject/datasets/myDataset/tables/myTable/insertAll, scheme: "https", host: "www.googleapis.com", headers: [("Authorization", "Bearer <token>), ("Content-Encoding", "gzip")], body: Optional(AsyncHTTPClient.HTTPClient.Body(length: Optional(28197), stream: (Function))), redirectState: nil, kind: AsyncHTTPClient.HTTPClient.Request.Kind.host)"
Is it possible this is because all of the EventLoops are blocking on the WebSocket connection timeouts? Perhaps I should break up the BigQuery requests and the WebSockets connections onto two separate