I want to fork this conversation off slightly because it's important: backpressure is a vital component in defending against DoS. To get a feeling as to how, consider backpressure not as limiting the number of bytes in flight but instead in limiting the rate at which you accept new work.
A system like a network, and like a NIO application, can be thought of as a pipeline of logical units joined by communication channels. At a very high level we can think of this with each network appliance as a logical unit, and the network connections between them being the links.
At a lower level, each of these appliances is also a series of logical units joined by links. In your case, your webapp is this: a series of logical types and tasks, joined by async sequences, or channel handlers joined by pipelines.
In this model, we don't want to think of what is sent along the links as bytes, but instead think of it as work. Each packet is a unit of work that must be processed, each HTTP message a unit of work, each item enqueued into an async stream a unit of work. These units of work are not all equal: some cost more RAM, others more CPU, others more network bandwidth. But they are all work.
The goal of backpressure propagation is to ensure that this entire distributed system moves at the speed of the slowest participant. That is, whichever unit is processing this work most slowly forms the bottleneck, and backpressure propagation should slow all work producers down until they only produce work at a rate that the slowest worker can handle.
If backpressure propagation fails, then the producers do not slow down and work "piles up" in front of the slower workers. The effect of this is that each work item sees greater and greater latency than the last, which eventually manifests as timeouts and service unavailability (because in a distributed system an unavailable node is indistinguishable from one with arbitrarily large latency). This manifests as a complete denial of service: no user is served within reasonable timeouts.
Proper backpressure propagation limits the damage here. You approach a maximum latency, which is a product of buffer depth and time taken to complete each work item. Beyond that point, work is dropped. This leads to partial DoS: some users cannot access the service, but others can (assuming your buffer depths are reasonable). The system is degraded, but not unavailable.
NIO contains some protections here above and beyond the backpressure propagation. The nature of its design is that it bounds the maximum amount of work it will do on one connection in response to I/O using
maxMessagesPerRead. Once we have issued that many reads, we will stop serving a given channel and let the others operate. This increases their latency, but does not starve them entirely.
The result is that when we experimented with the H2 DoS linked above we found NIO handled it fairly well. We were degraded, for sure, but still live. We shipped protections to reduce our exposure, but in general it was acceptable.
This is why I'd like to see a performance trace here, as well as the metrics @FranzBusch asked for. It's likely that somewhere there is a hole in the protection. Given the CPU and network graph you showed, it looks more likely to be a backpressure issue, because this manifests as a "bubble". But it may not be, and we can tell based on what the system is doing while it's under attack.