Socket library built with Swift Concurrency

ColemanCDA · April 4, 2022, 11:02am

Hi all, I wanted to share my library with solves an API gap I see in the Swift software ecosystem.

This library exposes an idiomatic Swift API for interacting with POSIX sockets via an async/await interface. What makes this library unique (even to the point that Swift NIO is still using a custom socket / thread pool) is that it was built exclusively using Swift Concurrency and doesn't use old blocking C APIs, CFSocket, DispatchIO, CFRunloop, GCD, or explicitly create a single thread outside of the Swift's global cooperative thread pool to manage the sockets and polling.

The result is a Socket API that is optimized for async/await and built from the group up. Additionally, like the System, and Concurrency APIs, the Socket is represented as a struct instead of a class, greatly reducing ARC overhead.

Minimal overhead for Swift Async/Await
Minimal ARC overhead, keep state outside of Socket
Avoid thread explosion and overcomitting the system
Use actors to prevent blocking threads
Optimize polling and C / System API usage
Low energy usage and memory overhead

Helge_Hess1 · April 4, 2022, 12:24pm

There is also FlyingFox by @swhitty, with a similar idea (though I'd like the NIO parts to be separated out of the HTTP parts). Looks quite promising.

stackotter · April 4, 2022, 12:30pm

Looks great! I definitely look forward to using this instead of the c apis.

Jean-Daniel · April 4, 2022, 1:52pm

Maybe a silly question, but if it is not using blocking API, how can it efficiently wait for data availability/ write completion without wasting resources polling for events ?

Moreover, while it is a nice library for using on Linux, on macOS, using POSIX sockets is not recommended unless you need portability.

The Network framework is a far better approach to Networking.

lukasa · April 4, 2022, 4:10pm

The answer is that it loops on poll with a manual yield time of 100ms. This is not a suitable approach for a scalable system, but works just fine as a proof of concept. It's also probably a better approach than the one used in FlyingFox, which instead blocks one of the threads in the co-operative thread pool in poll instead.

ColemanCDA · April 4, 2022, 6:52pm

I want to provide some context for this API. This was designed this to be reasonably performant and not purposefully waste resources, but I am not trying to compete with Swift NIO and don't plan to use this for any HTTP library / connections. I am part of the swift-arm group and maintain a port of Swift for Armv7 Debian, Yocto and Buildroot Linux. The software I maintain provides uses POSIX sockets on Linux (in addition to IOCTL) for Bluetooth, Netlink (for WiFi and other drivers), and Unix Sockets for IPC (Avahi/Bonjour). On a low powered ARM device, we need a socket API that doesn't have to scale to 10k connections, will perform reasonably well, and not overcommit the Systems resources, which is typically 2-4 cores, 64-256MB RAM, and 800-1200Mhz. Initially, it was very easy to use unsafe continuations with CFSocket or DispatchSource to bridge those legacy APIs and add async/await support. One major issue with continuations is that you cannot check for Task.isCancelled() properly since you are no longer running in an async context. Right now at the company I work at, we haven't adopted Concurrency yet, and are not taking full advantage of all our cores. If we adopt Concurrency in every aspect of our Swift stack (We have Swift deamons for Bluetooth, WiFi, OTA, GPIO, etc), and don't use this library for the underlying sockets, every other API out there, be it first party by Apple (CFSocket, DispatchIO, NIO) or third party, either blocks the thread, or creates an extra thread to monitor and manage the sockets. If you are developing Server-side Linux with Swift, this library is probably not for you (at least not in its current form), and I would be keeping tabs on Swift NIO. If you are developing CLIs, Daemons, or GUIs on Linux, this should be the equivalent of CFSocket that is compatible with Swift Concurrency's GCD-backed global cooperative thread pool model. If there are any obvious improvements to be made with regards to performance, feel free to contribute and provide feedback.

swhitty · April 4, 2022, 10:36pm

If no timeout is provided to poll(2) then it will block indefinitely.

swhitty · April 4, 2022, 10:40pm

Nice work @ColemanCDA

Helge_Hess1 · April 4, 2022, 10:50pm

I think that's what @lukasa is saying, you block one of the concurrency threads in poll. (though I'm not quite sure why the loop is better, it seems worse to me).

masters3d · April 4, 2022, 11:09pm

very cool. thanks for sharing!

ColemanCDA · April 5, 2022, 12:44am

It's the opposite, if 0 is provided it returns immediately.

If timeout is neither zero nor INFTIM (-1), it specifies a maximum inter-
val to wait for any file descriptor to become ready, in milliseconds. If
timeout is INFTIM (-1), the poll blocks indefinitely. If timeout is
zero, then poll() will return without blocking.

ColemanCDA · April 5, 2022, 12:48am

Thanks! It would be cool to integrate it into your project and do some benchmarking. Even if performance is the same, its a programmer error to block any Task thread when using Swift Concurrency, so that might be another reason.

swhitty · April 5, 2022, 12:53am

Oh you are correct.

If timeout is zero, then poll () will return without blocking.

In that case I think a continuous loop of poll() / Task.sleep() would be better to keep the scheduler free to make progress on other tasks.

Thanks

ColemanCDA · April 5, 2022, 1:07am

Since we aren't bound to a single thread, we have a global loop that continuously polls for all sockets, with a .medium priority and 0.1s interval. This is mainly for getting notifications for the socket closing or hanging up, or if new data is available. When you read or write, we don't rely on the background polling, but directly poll inline, assuming the scheduler has resources to do so (every await is a potential suspension point, if the Thread is not being using by another task, then it won't suspend). If for some reason the socket is not ready for read or write, then we sleep at a 0.01s interval and priority of the current task. These intervals are configurable via Socket.configuration. In the scenario of a using a Bluetooth socket on Linux with a CLI interface (or GUI), on a CPU with 4 cores, the first thread will be used for the CLI (ArgumentParser) and the second thread would be used exclusively for the Bluetooth Host Controller interface communication. Since you have a single long lived socket to the Linux Bluetooth subsystem, sending commands to the HCI hardware and waiting on responses should be able to be done with minimal suspension, reducing the Concurrency runtime overhead. In fact the only time I can thing of the runtime "giving up" the thread would be when sleeping for responses, and even then we are talking about sending less than 256 bytes to the kernel per command. Bluetooth LE's default MTU is a mere 23 bytes, and max is 512. In this scenario, this Socket interface would be able to issue the write and read command immediately without suspension, queuing, runloops, or switching threads. I see that a huge benefit over even CFSocket.

ColemanCDA · April 5, 2022, 1:26am

I want to point out that we do not "queue" the read and write operations and wait for the background polling to kick in to call the C POSIX Socket APIs. Under ideal scenarios there is no overhead and it's the same as calling the C API directly on the same thread, inlined.

I can't stress enough the fact of how revolutionary it is for the Swift Concurrency runtime to determine (at runtime) based on resource usage and queued tasks if a method should be run immediately or queued. Not only avoiding Thread explosion (which GCD can't do), but run async code as basically synchronous code on the same thread. This really should optimize the usage of our hardware on embedded Linux devices. Even GCD forces to you queue work on a separate thread and do context switching even if following all the guidelines and in the best case scenario of a main UI thread and a single serial queue. And in that case you are not doing just context switching, but not taking advantage of the extra cores. And if you go the concurrent queue route, or multiple queues, there is no way to prevent potential thread explosion and overcommitting CPU resources.

Another scenario this works extremely well are single core Armv5 CPU like Allwinner FC100S (I maintain the Armv5 and Armv6 support as well). In this scenario, being a single core devices, you are losing performance with CFSocket's background socket manager thread. With Swift Concurrency, the performance is similar to the old CFRunloop based code of the PowerPC era where the networking code was async but scheduled to run on the main thread / run loop, except if the runtime deems it possible, we don't have to wait for the next runloop, but call read or write immediately, without blocking.

lukasa · April 5, 2022, 8:17am

In case it wasn't clear, my feedback was not intended as a criticism of the design of the project. There's absolutely nothing profoundly, deeply, wrong with this API as a matter of design. I was only noting that there is an inherent throughput and latency limitation on the read side of the API if you're faster than the network.

What I did want to allude to, though, is that this particular design pattern cannot scale into the problem space NIO is aiming at. That's totally fine: there are reasonable trade-offs to be made here, especially if there's a hard requirement not to park a thread.

johannesweiss · April 8, 2022, 4:55pm

FWIW, I agree with @lukasa that this isn't a suitable approach for a scalable system but in case I missed something I opened a ticket on SwiftNIO to discuss adding a mode where it runs in a "poll loop.

I don't think a "poll loop" is compelling because it's worse in both latency & energy efficiency than just running SwiftNIO normally on an extra thread and better at nothing that I can come up with. But maybe I'm missing something.

So if you think that a "poll loop" is indeed compelling for any use case, it'd be awesome if you could comment on the above ticket. Thank you!
Just for the record: SwiftNIO could add a mode to run in such a "poll loop" mode and then it could indeed share threads with Swift Concurrency or other systems but it doesn't seem compelling until we get Custom Executors which can solve this problem properly (and run Swift Concurrency tasks inside SwiftNIO's thread pools).