Thanks for the writeup, there are a lot of helpful things in it already 
This is exactly what I wanted to achieve.
Yes, that's indeed the way I'm doing it right now. Supporting both "rate-modes" seems like the best way to do go forward. Thanks for the explanation, I couldn't grasp the difference from looking at the code of wrk and wrk2 (I'm not very good with C though).
This is what I've done initially, but I faced errors in certain conditions:
When running the tool on 4 threads with 10 simultaneous connections for 10 seconds, I receive the following error on macOS (M1 Max) starting around 3/4 of the runtime:
NIOConnectionError(host: "localhost", port: 8080, dnsAError: nil, dnsAAAAError: nil, connectionErrors: [NIOPosix.SingleConnectionFailure(target: [IPv6]localhost/::1:8080, error: connection reset (error set): Connection refused (errno: 61)), NIOPosix.SingleConnectionFailure(target: [IPv4]localhost/127.0.0.1:8080, error: connect(descriptor:addr:size:): Can't assign requested address (errno: 49))])
When running on a dedicated linux machine under the same conditions I receive this error:
connect(descriptor:addr:size:): Cannot assign requested address (errno: 99)
It seems like the same issue, maybe the program ran out of ephemeral ports?
While I'm writing the post, I realised to set so_reuseaddr on the channel, which resolved the issue on first glance. But when running the tool again, the error reappears. Maybe I'm not releasing the sockets properly or I have to set another channel option
(Edit: just checked with netstat, the sockets are in TIME_WAIT state)
Due to this, I changed to connection reuse, but I'd love to make this configurable in some way to get a good middle ground, as you suggested.