Blocking I/O and concurrency

hassila · September 15, 2023, 8:42am

As this was linux only, I would also consider using io_uring to not have to signal to the other thread at all, then you can perform the async I/O inline as needed and just reap the results (also using io_uring of course) from the dedicated thread.

lukeh · September 15, 2023, 8:48am

Ah that’s a good idea. I keep confusing the signaling with the actual I/O!

lukeh · September 15, 2023, 10:14am

An additional wrinkle with SPI is, at least with the peripheral I'm using, there are also two GPIO pins that indicate whether data is available for reading, and whether its read buffer is full (i.e. it's safe for the controller to write).

Now, as far as I can tell, it's safe to use dispatch_source to monitor these (as we're just looking for edge transitions rather than reading any data). This is orthogonal to the state of the controller kernel’s SPI buffer of course.

So for reading I could:

start a reaper thread per SPI port that waits on a condition variable or semaphore
monitor GPIO DATA_AVAILABLE pin using a dispatch_source (each GPIO line presents an event FD)
signal the reaper thread whenever the GPIO data available pin is high
perform blocking read(), invoking previously registered handler with dispatch_async() to return read data (or could I even use a Task here?)
continue reading until GPIO data available is low

Doing reads in small enough chunks to check for the GPIO pin is going to have a pretty high syscall overhead (subject to SPI clock speed).

For writing, I guess I just need a thread-safe queue. And I guess thread-safe means lock-free because I'll want to enqueue data to sent from async Swift code. And avoid writing if the corresponding GPIO pin is not high.

Bonus marks would be to use the full duplex ioctl() to read and write simultaneously.

Intuitively this approach has an concurrency abstraction-mixing code smell. I suppose the real solution might be to write a kernel driver that presented a non-blocking message-based interface, and use that with one of the existing Swift asynchronous I/O packages. That might be above my pay grade though. ;)

hassila · September 15, 2023, 10:54am

Basically io_uring provides you with such a non-blocking interface and I think you should be able to use a single thread servicing all of the SPI ports as well as GPIO pins avoiding multiple?

You can monitor the FD:s for the GPIO lines using io_uring_register to get notifications of those then, no need for any dispatch_source.

For reading, you would then from the io_uring reaper thread schedule a non-blocking read I/O when you get signalled that there is data available on the GPIO pin. After you've reaped the read completion event, you just register the GPIO again to get notified for updates to that fd and rinse and repeat- this is similar to how we did io_uring reap events in SwiftNIO.

If you use registered buffers for there reads you don't even need to read in small chunks, you will simply get a notification when the buffer has been filled by the kernel.

For writing, I would use a semaphore (which is signalled by the reaper thread when it gets the GPIO notification that writes are possible) and just let any other thread schedule a write (and set the semaphore condition to blocked, assuming only a single write is possible and that the GPIO must signal again that write is ok - depends on what those semantics are?).

This approach would allow you to not have to queue anything for writers (just pace the writes using a semaphore), a single thread would be sufficient and no need to pace the write chunk size really.

It's platform specific so not for everyone (and you want a reasonable fresh kernel), but it is really a great kernel interface for stuff like this.

I'm not really into the details of using ioctl practically, but related you might also be interested in:

technogen · September 15, 2023, 11:03am

I like this option a lot more! Thanks for the great suggestion! My only reservation was about performance overhead, but that may be negligible and completely unprovable without benchmarking. On the other hand, using something like a pipe to pass the new file descriptors will help avoid the overhead of synchronization between threads by having the IO thread maintain its own IO operation array.

I totally agree with your opinion on signals, they're a horrible mess that came from an age where people didn't know any better.

technogen · September 15, 2023, 11:09am

Why wouldn't it help? All you need to do is make sure you use a multi-io blocking call like kevent / epoll or select abd make sure your new IO control pipe or your new IO control socket is in the array at all times. You can then kick the blocking call awake by simply writing into the pipe/socket.

technogen · September 15, 2023, 11:15am

I didn't know such a thing existed! Thanks for sharing! By the looks of things, this is linux-only, which is very sad, because I really need a reliable async IO on macOS.

lukeh · September 15, 2023, 11:19am

Why wouldn't it help? All you need to do is make sure you use a multi-io blocking call like kevent / epoll or select abd make sure your new IO control pipe or your new IO control socket is in the array at all times. You can then kick the blocking call awake by simply writing into the pipe/socket.`

Ah, I think where I was getting confused was in – thinking I would do a read() on a large buffer and interrupt that asynchronously if the GPIO pin signalled no more data was available for reading. But if I'm only reading, say, 4 bytes at a time then I'll be returning to epoll() or select() frequently enough to check for IO control FD events.

hassila · September 15, 2023, 11:20am

Yes. I have a dream that Darwin will try to just port over io_uring as-close-as-possible - it would have amazing to have it there too. It happened with Dtrace (and very almost with ZFS...) once upon a time, so hope springs eternal.

lukeh · September 15, 2023, 11:21am

I didn't know such a thing existed! Thanks for sharing! By the looks of things, this is linux-only, which is very sad, because I really need a reliable async IO on macOS.

I can deal with Linux-only (although, it's nice to be able to build on macOS even if I can't test everything). I need to understand more about io_uring, is it the case that I can use it even if the underlying device does not support non-blocking I/O? I couldn't find anything online about anyone trying to use it with SPI (on Linux, at least).

technogen · September 15, 2023, 11:22am

That's the downside of the pipe/socket approach: it's less messy than an interrupt signal and is less unstable, but incurs more overhead and makes single-io options impossible (which may or may not be a dealbreaker).

hassila · September 15, 2023, 11:23am

I think the easiest thing would be just to write a very minimalistic program and try on your device to know for sure, should be pretty quick. The nice thing with the great C integration of Swift is that you can stay in Swift land too...

lukeh · September 15, 2023, 11:25am

I think the easiest thing would be just to write a very minimalistic program and try on your device to know for sure, should be pretty quick. The nice thing with the great C integration of Swift is that you can stay in Swift land too...

Yes, I'm actually using SwiftIO for its GPIO and SPI abstractions, but I've built a Linux backend for it (still very much a WIP), and then will in turn am building an async API on top of SwiftIO. It's a sandwich of sorts.

technogen · September 15, 2023, 11:32am

Given how Swift has set a new standard of performance, reliability, and ergonomics, a fully async and fully reliable file/network IO interface is sorely missing in Swift.

lukeh · September 15, 2023, 11:35am

Given how Swift has set a new standard of performance, reliability, and ergonomics, a fully async and fully reliable file/network IO interface is sorely missing in Swift.

I'm using FlyingSocks for all the connection-oriented socket I/O in my embedded application. It's pretty neat (I'm not using the HTTP part, just the async socket I/O).

Still using CFSocket for UDP but, that's something I'll weed out eventually... (or at least replace with libdispatch)

Finally, using dispatch_source and dispatch_io for non-network access (GPIO, UART, SPI). It's pretty easy to bridge to async/await. Except of course t's not going to work with SPI for the reasons discussed in this thread.

lukeh · September 15, 2023, 12:25pm

As this was linux only, I would also consider using io_uring to not have to signal to the other thread at all, then you can perform the async I/O inline as needed and just reap the results (also using io_uringof course) from the dedicated thread.

OK, now I've done some reading

This link helped me understand the difference between completion and readiness based I/O models.

If I understand it right, with a completion based model, it doesn't matter if the underlying file object is blocking because you're not waiting for the I/O to complete in user space. Indeed, the io_uring kernel source doesn't appear to require non-blocking I/O.

Even better, you can associate user data at submission time and retrieve it with io_uring_cqe_get_data(), which would be an elegant way to call a completion handler block. Which of course can then be bridged to async/await with withCheckedContinuation.

Thanks for all the help folks!

PS. Which kind of got me thinking, maybe the right solution is to add io_uring support to DispatchIO?

hassila · September 15, 2023, 12:34pm

That probably wouldn't be bad at all (although I am not familiar with DispatchIO), it is the approach that we (slowly but consistently) are doing for SwiftNIO - in general if some package provides an IO abstraction that could be done with io_uring on Linux, it can often be a good idea to support it for scalability, performance and ease of use (YMMV).

lukeh · September 15, 2023, 12:35pm

I think first stop will be to build a simple Swift abstraction for io_uring (well, I imagine it might be more straightforward to write it in C to be honest). Then I will replace libdispatch with it in my SPI code, and go from there.

hassila · September 15, 2023, 12:46pm

You could just write directly to the C API and not build much of an abstraction at the lower layer, if you want to see some Swift code using the API:s you can find some here:

github.com

apple/swift-nio/blob/e0d85543d7b9f9d9caad9d233ba465b6ad0a4db4/Sources/NIOPosix/LinuxUring.swift#L15


      
          // Copyright (c) 2021 Apple Inc. and the SwiftNIO project authors
          // Licensed under Apache License v2.0
          //
          // See LICENSE.txt for license information
          // See CONTRIBUTORS.txt for the list of SwiftNIO project authors
          //
          // SPDX-License-Identifier: Apache-2.0
          //
          //===----------------------------------------------------------------------===//
          
          #if SWIFTNIO_USE_IO_URING
          #if os(Linux)
          
          import CNIOLinux
          import NIOCore
          
          @usableFromInline
          enum CQEEventType: UInt8 {
              case poll = 1, pollModify, pollDelete // start with 1 to not get zero bit patterns for stdin
          }

github.com

apple/swift-nio/blob/e0d85543d7b9f9d9caad9d233ba465b6ad0a4db4/Sources/CNIOLinux/include/liburing_nio.h

//===----------------------------------------------------------------------===//
//
// This source file is part of the SwiftNIO open source project
//
// Copyright (c) 2021 Apple Inc. and the SwiftNIO project authors
// Licensed under Apache License v2.0
//
// See LICENSE.txt for license information
// See CONTRIBUTORS.txt for the list of SwiftNIO project authors
//
// SPDX-License-Identifier: Apache-2.0
//
//===----------------------------------------------------------------------===//

#ifndef LIBURING_NIO_H
#define LIBURING_NIO_H

#ifdef __linux__

#ifdef SWIFTNIO_USE_IO_URING

This file has been truncated. show original

github.com

apple/swift-nio/blob/e0d85543d7b9f9d9caad9d233ba465b6ad0a4db4/Sources/CNIOLinux/liburing_shims.c

//===----------------------------------------------------------------------===//
//
// This source file is part of the SwiftNIO open source project
//
// Copyright (c) 2021 Apple Inc. and the SwiftNIO project authors
// Licensed under Apache License v2.0
//
// See LICENSE.txt for license information
// See CONTRIBUTORS.txt for the list of SwiftNIO project authors
//
// SPDX-License-Identifier: Apache-2.0
//
//===----------------------------------------------------------------------===//

// Support functions for liburing

// Check if this is needed, copied from shim.c to avoid possible problems due to:
// Xcode's Archive builds with Xcode's Package support struggle with empty .c files
// (https://bugs.swift.org/browse/SR-12939).
void CNIOLinux_i_do_nothing_just_working_around_a_darwin_toolchain_bug2(void) {}

This file has been truncated. show original

as well as in this PR:

github.com/apple/swift-nio

POC output with URring

apple:main ← ordo-one:uring-out

opened 12:24PM - 24 Jan 23 UTC

ser-0xff

+1108 -478

Use URing on Linux. URing on Linux provides an API to demultiplex events and …perform socket operations. NIO originally designed to demultiplex events and perform all IO in a single thread, while URing socket operations are asynchronous. It requires more significant changes in the NIO taking into account the requirement to additionally manage memory buffers for pending read and write operations. The one idea was to try to minimise PR as possible, so that PR send outbound data using URing, but still read inbound data using base socket API. Reading inbound data with URing is a next step. Some highlights regarding changes in the PR 1) The URing API using on Linux can be enabled by the conditional compilation. In some places I saw '#if defined(SWIFTNIO_USE_IO_URING) && (os(Linux) || os(Android)) As of now we have no possibility even to compile it for Android, not saying to test, so all URing related code is under just os(Linux). Let's decide later what can we do with Android. 2) NIO used some buffers stored in the EventLoop (iovecs, storageRefs etc) which are shared across all channels registered in the event loop. That approach does not work with URing because of its asynchronous nature. There are not too many ways to workaround it for URing, so I moved them to the channel, so with URing each channel has own iovecs and storageRefs buffers. Probably we could cache them... not clear what is better... let's discuss... 3) URing has no analogue for the sendfile() system call. Instead they suggest to splice() the file via intermediate pipe. It works, but requires an intermediate pipe. There are not too many options how to manage intermediate pipes, but I can't see a good one. So, as of now we just cache intermediate pipes and reuse them... 4) URing has no analogue for sendmmsg() system call. Unfortunately did not found a good way to workaround that. Currently just send messages one by one. One possible optimisation here is to collect all messages with the same destination address and same ancillary data and send them with one sendmsg(), but did not found such obvious optimisation in the NIO, so probably it is not a case. 5) I had to disable few tests which checks the order of system calls. Have no idea how we could make them work for URing, because URing is asynchronous. Let's discuss them later. All other tests works on my environment, but I will not be surprised if they will not work somewhere else. Will add some information later within comments to PR if will recall something important.

Good luck and let us know how it goes!

lukeh · September 17, 2023, 12:13pm

Extremely preliminary start at IORingSwift. Next step will be a simple TCP and UDP echo server, then I can try my original use case (SPI).