Blocking I/O and concurrency

As this was linux only, I would also consider using io_uring to not have to signal to the other thread at all, then you can perform the async I/O inline as needed and just reap the results (also using io_uring of course) from the dedicated thread.

2 Likes

Ah that’s a good idea. I keep confusing the signaling with the actual I/O!

An additional wrinkle with SPI is, at least with the peripheral I'm using, there are also two GPIO pins that indicate whether data is available for reading, and whether its read buffer is full (i.e. it's safe for the controller to write).

Now, as far as I can tell, it's safe to use dispatch_source to monitor these (as we're just looking for edge transitions rather than reading any data). This is orthogonal to the state of the controller kernel’s SPI buffer of course.

So for reading I could:

  • start a reaper thread per SPI port that waits on a condition variable or semaphore
  • monitor GPIO DATA_AVAILABLE pin using a dispatch_source (each GPIO line presents an event FD)
  • signal the reaper thread whenever the GPIO data available pin is high
  • perform blocking read(), invoking previously registered handler with dispatch_async() to return read data (or could I even use a Task here?)
  • continue reading until GPIO data available is low

Doing reads in small enough chunks to check for the GPIO pin is going to have a pretty high syscall overhead (subject to SPI clock speed).

For writing, I guess I just need a thread-safe queue. And I guess thread-safe means lock-free because I'll want to enqueue data to sent from async Swift code. And avoid writing if the corresponding GPIO pin is not high.

Bonus marks would be to use the full duplex ioctl() to read and write simultaneously.

Intuitively this approach has an concurrency abstraction-mixing code smell. I suppose the real solution might be to write a kernel driver that presented a non-blocking message-based interface, and use that with one of the existing Swift asynchronous I/O packages. That might be above my pay grade though. ;)

Basically io_uring provides you with such a non-blocking interface and I think you should be able to use a single thread servicing all of the SPI ports as well as GPIO pins avoiding multiple?

You can monitor the FD:s for the GPIO lines using io_uring_register to get notifications of those then, no need for any dispatch_source.

For reading, you would then from the io_uring reaper thread schedule a non-blocking read I/O when you get signalled that there is data available on the GPIO pin. After you've reaped the read completion event, you just register the GPIO again to get notified for updates to that fd and rinse and repeat- this is similar to how we did io_uring reap events in SwiftNIO.

If you use registered buffers for there reads you don't even need to read in small chunks, you will simply get a notification when the buffer has been filled by the kernel.

For writing, I would use a semaphore (which is signalled by the reaper thread when it gets the GPIO notification that writes are possible) and just let any other thread schedule a write (and set the semaphore condition to blocked, assuming only a single write is possible and that the GPIO must signal again that write is ok - depends on what those semantics are?).

This approach would allow you to not have to queue anything for writers (just pace the writes using a semaphore), a single thread would be sufficient and no need to pace the write chunk size really.

It's platform specific so not for everyone (and you want a reasonable fresh kernel), but it is really a great kernel interface for stuff like this.

I'm not really into the details of using ioctl practically, but related you might also be interested in:

I like this option a lot more! Thanks for the great suggestion! My only reservation was about performance overhead, but that may be negligible and completely unprovable without benchmarking. On the other hand, using something like a pipe to pass the new file descriptors will help avoid the overhead of synchronization between threads by having the IO thread maintain its own IO operation array.

I totally agree with your opinion on signals, they're a horrible mess that came from an age where people didn't know any better.

Why wouldn't it help? All you need to do is make sure you use a multi-io blocking call like kevent / epoll or select abd make sure your new IO control pipe or your new IO control socket is in the array at all times. You can then kick the blocking call awake by simply writing into the pipe/socket.

I didn't know such a thing existed! Thanks for sharing! By the looks of things, this is linux-only, which is very sad, because I really need a reliable async IO on macOS.

Why wouldn't it help? All you need to do is make sure you use a multi-io blocking call like kevent / epoll or select abd make sure your new IO control pipe or your new IO control socket is in the array at all times. You can then kick the blocking call awake by simply writing into the pipe/socket.`

Ah, I think where I was getting confused was in – thinking I would do a read() on a large buffer and interrupt that asynchronously if the GPIO pin signalled no more data was available for reading. But if I'm only reading, say, 4 bytes at a time then I'll be returning to epoll() or select() frequently enough to check for IO control FD events.

Yes. I have a dream that Darwin will try to just port over io_uring as-close-as-possible - it would have amazing to have it there too. It happened with Dtrace (and very almost with ZFS...) once upon a time, so hope springs eternal.

1 Like

I didn't know such a thing existed! Thanks for sharing! By the looks of things, this is linux-only, which is very sad, because I really need a reliable async IO on macOS.

I can deal with Linux-only (although, it's nice to be able to build on macOS even if I can't test everything). I need to understand more about io_uring, is it the case that I can use it even if the underlying device does not support non-blocking I/O? I couldn't find anything online about anyone trying to use it with SPI (on Linux, at least).

That's the downside of the pipe/socket approach: it's less messy than an interrupt signal and is less unstable, but incurs more overhead and makes single-io options impossible (which may or may not be a dealbreaker).

I think the easiest thing would be just to write a very minimalistic program and try on your device to know for sure, should be pretty quick. The nice thing with the great C integration of Swift is that you can stay in Swift land too...

I think the easiest thing would be just to write a very minimalistic program and try on your device to know for sure, should be pretty quick. The nice thing with the great C integration of Swift is that you can stay in Swift land too...

Yes, I'm actually using SwiftIO for its GPIO and SPI abstractions, but I've built a Linux backend for it (still very much a WIP), and then will in turn am building an async API on top of SwiftIO. It's a sandwich of sorts.

Given how Swift has set a new standard of performance, reliability, and ergonomics, a fully async and fully reliable file/network IO interface is sorely missing in Swift.

Given how Swift has set a new standard of performance, reliability, and ergonomics, a fully async and fully reliable file/network IO interface is sorely missing in Swift.

I'm using FlyingSocks for all the connection-oriented socket I/O in my embedded application. It's pretty neat (I'm not using the HTTP part, just the async socket I/O).

Still using CFSocket for UDP but, that's something I'll weed out eventually... (or at least replace with libdispatch)

Finally, using dispatch_source and dispatch_io for non-network access (GPIO, UART, SPI). It's pretty easy to bridge to async/await. Except of course t's not going to work with SPI for the reasons discussed in this thread.

As this was linux only, I would also consider using io_uring to not have to signal to the other thread at all, then you can perform the async I/O inline as needed and just reap the results (also using io_uringof course) from the dedicated thread.

OK, now I've done some reading :slight_smile:

This link helped me understand the difference between completion and readiness based I/O models.

If I understand it right, with a completion based model, it doesn't matter if the underlying file object is blocking because you're not waiting for the I/O to complete in user space. Indeed, the io_uring kernel source doesn't appear to require non-blocking I/O.

Even better, you can associate user data at submission time and retrieve it with io_uring_cqe_get_data(), which would be an elegant way to call a completion handler block. Which of course can then be bridged to async/await with withCheckedContinuation.

Thanks for all the help folks!

PS. Which kind of got me thinking, maybe the right solution is to add io_uring support to DispatchIO?

1 Like

That probably wouldn't be bad at all (although I am not familiar with DispatchIO), it is the approach that we (slowly but consistently) are doing for SwiftNIO - in general if some package provides an IO abstraction that could be done with io_uring on Linux, it can often be a good idea to support it for scalability, performance and ease of use (YMMV).

1 Like

I think first stop will be to build a simple Swift abstraction for io_uring (well, I imagine it might be more straightforward to write it in C to be honest). Then I will replace libdispatch with it in my SPI code, and go from there.

You could just write directly to the C API and not build much of an abstraction at the lower layer, if you want to see some Swift code using the API:s you can find some here:

as well as in this PR:

Good luck and let us know how it goes!

1 Like

Extremely preliminary start at IORingSwift. Next step will be a simple TCP and UDP echo server, then I can try my original use case (SPI).

3 Likes