Blocking I/O and concurrency

I'm working on using Swift in an embedded device that talks to a peripheral over SPI, which only has a blocking API on Linux (at least, without writing a kernel driver).

SPI supports full duplex which is accessible using ioctl(2) instead of read(2) or write(2), although the latter are also supported for half-duplex operation.

The rest of my code is asynchronous, using async/await and dispatch queues depending on its vintage.

Ideally I'd like to poll SPI at, say, 30Hz in a thread and exchange information with something like an AsyncBufferedChannel. (i.e. on each poll, I'd pop one item off the channel if it available without suspending, and push one for the response) (I don’t need to poll I suppose but I do expect to receive status information from the peripheral at that rate.)

Any ideas how I might achieve this?

I'd rather avoid having to write a kernel driver or have a separate daemon that buffers and proxies over, say, a domain socket, but both are possibilities I guess. Thanks in advance for any assistance!

I was struggling with the same problem until I found the solution. I don't know the extent of the limitations you have to deal with, so your mileage may vary.

The gist of the solution I came up with is this:

  • Create a dedicated thread for IO with pthread_create.
  • Use only kevent (macOS, BSD) or epoll (linux) to block the thread on a number of ongoing IO operations (which incidentally may also be about stuff like waiting for a file to be changed by another process, not just simple reads and writes).
  • If another IO operation has to begin, add it to the array of pending IO operations and send an interrupt signal to the thread with pthread_kill, in order to kick kevent / epoll awake and have the IO loop re-block the thread with the new IO operation in the array.
  • When kevent / epoll unblocks, send an asynchronous notification of an IO operation being completed, remove the IO operation from the array, so the IO loop can re-block the IO thread.

Here are the advantages of this approach:

  • The implementation doesn't rely on asynchronous system calls (like aio) and can be retrofitted to work with any synchronous blocking call, as long as the limitation of that blocking call are acceptable (e.g. read only does one read operation at a time, unlike kevent or epoll).
  • The implementation does not cause signals to be sent to the process it runs in. The IO thread will never execute client code, so it's impossible for something to try and intercept the interrupt signal.
  • The implementation has constant time overhead per process, because it only ever spawns one extra thread, which it reuses perpetually.
  • The IO notifications triggered by the implementation upon completion of an IO operation are closure-based under the hood (not signal based like aio), meaning that they can't be tampered with by sending a signal to the process they run in.
  • The IO notifications can be wrapped in a continuation in order to provide a pristine asynchronous IO interface in the form of async functions.
    The implementation is completely self-sufficient and has no third-party dependencies (as opposed to using something like DispatchIO).
  • It doesn't involve heap allocation (no classes, no actors), so it has low overhead both per process and per IO operation.

Here are the disadvantages of this solution:

  • It's tedious to implement.
  • In some cases it can cause uncontrollable feelings of self-satisfaction that can be annoying to others. :grin:
8 Likes

Is the array of IO operations some predefined fixed-size stack-allocated C array? Or is it an actual Swift Array type? The latter does allocate on heap for any non-trivial number of elements.

Where do the IOs themselves run in this model? For monitoring completions, yeah, this makes a lot of sense. Actually very reminiscent of how dispatch sources work.

1 Like

You had me with uncontrollable feelings of self-satisfaction ;) This sounds like a great approach, I also like the fact it only uses a single thread. I guess this isn't dissimilar to what CFSocket does, except it also (as I understand from reading the code) requires descriptors to be non-blocking.

I'll update the thread with a link when implemented.

Thanks again!

1 Like

Great question! Depends on your individual needs. If you're writing something like audio streaming software, then performance predictability is moat important of all (not even peak performance). In that case you'd want to have a fixed-size buffer (either static fixed-size C array, or a global fixed-capacity ManagedBuffer with a constant heap-allocation per process). That would put a maximum concurrent IO operation count, but your code would be written with that limitation in mind, to provide strict performance guarantees for those few operations that do run concurrently.

On the other hand, if you're developing a document-based app and/or prefer API convenience and don't need to or can't afford to spend time super-optimizing performance, then just use an Array and accept the amortized allocation cost.

The Allocation-free part of my statement was really referring to the unavoidable allocation requirement. No part of this is fundamentally unimplementable without allocation. You merely have an option to save on development cost by paying for it with extra allocation.

1 Like

Assuming you use kevent / epoll (or at least select), the thread will be blocked on more than one IO operation at a time. The operating system will perform the IO operations concurrently somewhere on kernel level (most probably) and your blocking IO call will awake once the operating system completes one of the concurrent IO operations.

So, in a way, none of the actual work is happening in your process, this scheme is merely allowing your process to keep tabs on what the OS is doing with minimal overhead and maximum responsiveness. The number one job of the thread is to absorb the blocking that is involved in the life cycle of the IO operations.

1 Like

Yeah, pretty much. Except Dispatch and CoreFoundation are object-oriented under the hood and have to pay for the Objective-C runtime or custom object model respectively. And that's just not good enough.

1 Like

Yeah, I’m trying to weed out the Foundation dependencies in my project. Wish I’d thought of it at the start, it has a large memory footprint even if it doesn’t pay the same dynamic dispatch cost on Linux.

It’s been a while since I used pthreads in anger, any thoughts on cancel vs kill for signaling?

2 Likes

This is not true on Darwin currently FWIW

1 Like

Oh wait, is this about socket IO or regular file IO? I'm talking about regular files, sockets may be different.

Terminating a thread with a signal is very dangerous for any language other than C, because any compiler-generated cleanup code (like decrementing reference count or running a destructor) is going to be skipped, leading to potential invariant breakage or memory corruption.

On the other hand, pthread_cancel gives you a chance to handle the cancellation gracefully (e.g. by registering a cancellation handler in your thread), so your thread can gently retire instead of being brutally murdered.

My favorite option is to just put a condition in the IO loop (e.g. while isRunning) and terminate the thread by setting isRunning to false and kicking the blocking operation awake. That way the thread will be neither terminated nor cancelled, it will simply return in its own accord.

Actually, this works for both equally well. You can be waiting on file IO and socket IO through a single blocking call.

I didn't really dig deep on where exactly the OS performs the actual IO, I figured it's way too implementation-specific and ultimately inconsequential.

But good to know that file IO happens in userland on Darwin. :slightly_smiling_face:

Specifically, EVFILT_READ doesn't actually read anything. It tells you when the thing is readable. The read() call will do the actual reading. So no, not inconsequential…

1 Like

As long as it tells you exactly how much data is available for reading and you make sure your IO is buffered, the read will end up being a simple memcpy anyway. I guess it's a single memcpy away from an option like aio that just writes the data in your buffer. For hyper-performance-critical code that may be big enough difference, yeah. If that's the case, just use aio with the blocking version of the "notification mechanism" and you're back to not caring about underlying IO mechanism again.

It's a memcpy IFF the UBC is populated, but you can't guarantee that's the case unless you pre-read like dispatch_io does, and then whichever thread is doing that is paying for that time. EVFILT_READ will happily return "readable" for files that read() still needs to go out to the physical disk to get the data for.

1 Like

Thanks for clarifying! This is going to be super useful to keep in mind for having a complete understanding of performance implications of your IO.

Before making any performance promises, this would need to be rigorously benchmarked per platform, because it's extremely implementation-dependent and is very unlikely to be officially promised by the OS (or even documented for that matter).

1 Like

This thread is a goldmine of useful information, thank you all so much.

3 Likes

I like your overall strategy except for this:

and send an interrupt signal to the thread with pthread_kill

I avoid signals like the plague. IMO it’s better to create a pipe and add the read end to your polling set. Folks can then write to the other end to unblock your event loop so that it ‘sees’ the new pending I/O operations.

Or use a datagram Unix domain socket pair as the queue of pending I/O operations (-:

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

3 Likes

Right, CF uses a domain socket pair by the looks of it. Not sure it’s going to help me if I’m in the middle of a blocking I/O operation, though? I think for that either I need to asynchronously interrupt the operation, or just have a thread per socket and read/write a circular buffer (at least with SPI I can do both simultaneously).