How to use NonBlockingFileIO for repeated writes

johannesweiss · May 8, 2020, 11:29am

That may be too early because previous disk writes are still in progress. You need to close when you're done with all enqueues writes. And obviously also don't forget to close if an error come through the ChannelPipeline or one of the disk writes fails.

It's definitely not extremely unlikely but rather likely, especially if you use that proxy on a local machine.

I did the following tiny change in my program

diff --git a/backpressure-file-io-channel/Sources/BackpressureChannelToFileIO/SaveEverythingHTTPServer.swift b/backpressure-file-io-channel/Sources/BackpressureChannelToFileIO/SaveEverythingHTTPServer.swift
index da36b2a..0a72229 100644
--- a/backpressure-file-io-channel/Sources/BackpressureChannelToFileIO/SaveEverythingHTTPServer.swift
+++ b/backpressure-file-io-channel/Sources/BackpressureChannelToFileIO/SaveEverythingHTTPServer.swift
@@ -137,9 +137,9 @@ extension SaveEverythingHTTPServer: ChannelDuplexHandler {
     }
 
     func read(context: ChannelHandlerContext) {
-        if self.state.shouldWeReadMoreDataFromNetwork() {
+        //if self.state.shouldWeReadMoreDataFromNetwork() {
             context.read()
-        }
+        //}
     }

which switches off the backpressure and ran a quick

curl --data-binary @/tmp/4gb_long_file http://localhost:8080/foo

These are the results:

With backpressure:

And without (after that patch above):

eventually (but very slowly because we use so much memory) going to:

So peak memory usage 2.6 MB vs. 3 GB (note Mega vs. Giga). The non-backpressure version is also much slower because everything takes longer to process because of all the memory we have acquire/release/cache-miss/possibly swap... etc.

No, SwiftNIO is a non-blocking system. If you literally block the event loop, nothing will happen anymore, ie. other requests and any I/O (even writes from yourself) will be stalling completely, even if from unrelated requests.

In a fully synchronous system, you can indeed you blocking as a means of back-pressure, however you will find that in a proxy you would then actually need at least two threads per request because you don't know if you should read from the request body from the client or the response body from the server. To make sure you don't suffer complete stalls you'd need to do both (blocking 2 threads per request).

If that still sounds appealing to you, then you can actually build a fully blocking system which basically builds a blocking adapter for SwiftNIO. So for a Channel you'd spawn a 'buddy thread' which controls what's going on by blocking. It could make NIO mirror that by sending/suppressing the relevant events.

Maybe the following diagram gives you an idea:

This will work totally fine for very simple systems that just basically do

while true {
   let thing = read()
   write(thing)
}

The key property being that you always exactly know what you need to do next: Either you want to read or you want to write what you've read before.

In the real world however, these systems are pretty rare and a proxy is not one of them. You kinda needs two of the loops above, one for the request and one for the response (they can both go on indefinitely at the same time, say if you stream both ways). That can still be solved by just burning 2 threads and adding a bunch of locks. But to be honest, you'll need a state machine again to manage exactly what those those threads are doing. Because for example if one of them hits an error, the other one now needs to be told to stop. That's actually also pretty hard in blocking systems because the other thread may actually be blocking (in read or write) for an indefinite amount of time. That now means you can't close the file descriptor so you'll need a state machine that tells you once both threads have finalised. If you want to wait for both of those threads to finish, you may actually need a third thread, ... I guess you get the idea . That by the way is the reason that UNIX's first eventing API select came at the same time as BSD Sockets. Back then, everything was blocking because the just didn't have any high-scale network services. And still then introduced a inversion of control, asynchronous API.
Yes, you can in theory keep all the synchronous APIs and use cross-thread/process UNIX signals to interrupt the other threads/processes that you want to wake up but that'll get you an even deeper mess . This article deals with this pretty much this topic (it's about multiplexing terminal IO and network IO rather than 2 network channels) is actually pretty good.

Yeah, 1 thread will at least appear to work correctly. However, you will only be able to do one write (to disk) call at the same time. That means that you're more likely to start buffering more received body bits in memory which means that you're more likely to use crazy amounts of memory and then probably crash at some point.