Future of Swift-NIO in light of Concurrency Roadmap

kirilltitov · October 30, 2020, 9:44pm

So what does it mean for us? Rest in peace EventLoopFutures? I realise that it's just a roadmap and a series of pitches, and things might (will) change a lot, but it's certainly the future, and it's unlikely that Swift-NIO would want to ignore it

Probably, it will take not one but two major versions to adopt it. Or maybe even zero, and we just add async API as additional package, like swift-nio-transport-services, though maybe I read the threads inattentively, because I didn't find a way for working with external async stuff, so can't tell right now how converting an EventLoopFuture to async call would look like (yeah I mean API is kinda straightforward — public func getResult() async throws -> Value, but how would it work under the hood?)

Mordil · October 31, 2020, 3:38am

It depends on exactly what the particular API is trying to do.

In some cases, methods will drop EventLoopFuture as a means to the async/await portion of the API intent, in others it will return a potential Task, as cancellation is potentially needed.

In some cases it might be "possible" to not have them be marked as async or return a Task at all, and the type itself becomes an actor class that handles the synchronization.

One thing for sure is that SwiftNIO will be a huge case study for the impacts of the proposed syntax and semantics that will help shape the final version that lands into Swift.

and given that the Core Team has mentioned that these features will land across a few version of Swift, it's likely that we might see small incremental changes in major versions, unless the SwiftNIO team decides to wait to focus an entire major version for the new concurrency model (I'm a bit in favor of this TBH)

ktoso · October 31, 2020, 4:26am

Hey there,
realistically all this will take quite some time to pan out and fully understand all the implications on APIs.

However a few "simple" things that can be done early are:

The concurrency design offers Task.Handle APIs which effectively are "futures" but married together with the async world. When you look at it you realize it's not very special and simply has a func get() async throws -> T that can be awaited on. NIO could adopt this pattern and offer an awaitable function on EventLoopFuture #if swift(>=6), this function has to be invoked from an async context through... so that's a bit more tricky so all NIO functions would need to be async as well -- that is harder to adopt without breaking compat.

It is not entirely clear how NIO would be expressed as actors, I think we'll need to learn and experiment much more to see if, when or how it is viable to do so. Handlers are pretty similar to actors in that they isolate state and run not concurrently -- the language design offers ways to optimize away not necessary "hops" between actors -- however it does so by actor identity -- NIO needs more than that, it needs to do so based on executor identity. NIO will want to "don't hop to other executor because that executor is the same event loop as my event loop" -- today's concurrency design does this check only about actors ("that actor is not me, so perform a hop"), we would need to extend this to check executor equality.

In addition to that we will have to ensure in the concurrency design though that an actor can be put on a NIO EventLoop (i.e. an EventLoop be an Executor) which has specific execution semantics after all -- I can say that I'm definitely looking at this topic and trying to ensure that the actors are flexible enough to express other executors rather than "just" some dispatch based one. If we had this, actors "on" EventLoops could become viable... Caveat that this is just rough ideas and not something we've planned out at all so far.

Long story short: Adoption in NIO likely will require a major version bump. Other than adopting awaitability on futures it is not clear, yet, how language concurrency will be adopted in NIO.

Cory and the NIO team will likely chime in with their thoughts shortly though

lukasa · October 31, 2020, 8:13am

Thanks for bringing this up, this is an important topic and it’s worth understanding.

Let’s start at a very high level with the top-line answer to “What does this mean for us?” The answer from the NIO core team as of the 31st of October 2020 is we don’t know yet. This proposal is very young, there is little running code, we haven’t been able to prototype and investigate.

More importantly, this is not something for the NIO team to decide by fiat. We want to land in a world where users are able to write high performance, flexible code with minimal overhead when that is necessary, but also to have an easy time writing scalable concurrent code. How these things trade off is not necessarily straightforward to reason about.

With that said, let’s dive into some more detailed thinking.

The Swift core team has signalled that they are considering a breaking language change as part of stage 2 of the concurrency work. With that signaled, I would highly prioritise adopting stage 1 incrementally, without a semver major in NIO. Having a language-break accompanied by a NIO break is a nice natural fit.

This would mean that in the short term we’d be looking at incremental changes. The goal will be to enable users to adopt actors in NIO-based programs without needing to fundamentally rewrite much of what they have already written. Existing programs should continue to work just as they do now.

So where will these abstractions come into being?

I think there are three places to investigate, in roughly the following priority order:

Allowing async calls to the Channel API, or something similar
Allowing EventLoops to be Executors
Allowing actor ChannelHandlers

All of these are built on a foundational point, which is “work out how to plumb Futures through the concurrency code sensibly”.

I think the first point is the most important. The vast majority of users don’t write ChannelHandlers, they work on top of Channels. It would be extremely valuable if users could treat a Channel like an actor. In my mind, this will also want to include having an async function you can call to read data from as well, something the Channel API doesn’t provide today.

Normally this would mean turning a Channel into an actor, but because we don’t want to have breaking changes, this will likely instead mean having a way to perform the transformation. This implies an actor class that wraps a Channel and forwards the calls as needed.

This kind of abstraction frees higher level libraries (such as @mordil’s RediStack) up to reinvent themselves as actor-based libraries. They would continue to provide regular, synchronous ChannelHandlers to do the low-level protocol work, but the high-level connection wrangling would become actor-based, at least optionally.

The second point seems to go hand-in-hand. NIO has a core executor concept: it’s an event loop. We need to do the work to enable users to have an actor that lives on an event loop. However, as @ktoso has noted, the executor concept as it exists in the current pitch isn’t very useful to us. This is because being able to enqueue on an event loop isn’t itself any more useful than being able to enqueue onto a dispatch queue. What matters is that you can take advantage of that knowledge to build a big shared mutex. While we should do this irregardless, for this to be actually useful in server side programs we need to press the core team to consider allowing us to say that actors with the same executor do not need to dispatch across each other.

Finally, ChannelHandlers. The ChannelHandler API is low-level and, for the most part, entirely synchronous. The only wrinkle is that sometimes ChannelHandlers want to do things in response to writes completing, say, or closures happening. It would be nice to be able to provide users the opportunity to write their ChannelHandler as an actor and use the source code transformation that actors provide. For the NIO 2 timeframe, we’d probably do that in a similar way to how we do ByteToMessageDecoder: users would implement a different protocol and we’d wrap it in a ChannelHandler to manage the bridging.

I think doing the ChannelHandler thing is contingent on having a way to make confident assertions about what executor is being used. I don’t know how easy it’ll be to do this and I don’t know if it’ll be worth it.

Longer term, when breaking changes come along, they may warrant a more substantial rethink of the NIO API. In particular, they will challenge us to decide whether NIO’s API needs reinvention in terms of these new abstractions, at least in part. Some obvious options include requiring ChannelHandlers to be actorlocal types, which would solidify their current thread-safety requirements, redefining Channels as actors entirely, and so-on.

How much of that we do, and how quickly, will depend on the performance story. Moving all of NIO over to an actor model requires that we do not give up too much performance. If we have to, then we’ll always want to make the actor interface an opt-in higher level abstraction that can be used in less performance-sensitive contexts.

ktoso · February 4, 2021, 10:27am

Good morning everyone

To follow up here, the concurrency efforts now gained another proposal: Custom Executors which relates to some of the questions posed in the thread.

As in: an EventLoop could be a custom executor.

johannesweiss · February 4, 2021, 12:16pm

Indeed, custom executors will be really fantastic for SwiftNIO (and any other ecosystem that wants to do I/O with the OS interfaces and also benefit from async/await).

Let me quickly outline where we are:

This is now possible to play with, with the main snapshots available today. You'll need to apply this PR which adds async/await support for SwiftNIO.

This is the step that is not yet supported with today's snapshots but will (hopefully) be added with custom executors.

What this means is that in a program like this one (called NIOAsyncAwaitDemo)

let channel = try await makeHTTPChannel(host: "httpbin.org", port: 80)
print("OK, connected to \(channel)")

print("Sending request 1", terminator: "")
let response1 = try await channel.sendRequest(HTTPRequestHead(version: .http1_1,
                                                             method: .GET,
                                                             uri: "/base64/SGVsbG8gV29ybGQsIGZyb20gSFRUUEJpbiEgCg==",
                                                             headers: ["host": "httpbin.org"]))
print(", response:", String(buffer: response1.body ?? ByteBuffer()))

print("Sending request 2", terminator: "")
let response2 = try await channel.sendRequest(HTTPRequestHead(version: .http1_1,
                                                             method: .GET,
                                                             uri: "/get",
                                                             headers: ["host": "httpbin.org"]))
print(", response:", String(buffer: response2.body ?? ByteBuffer()))

try await channel.close()

We would need to do at least 6 thread switches:

From the executor that runs the async code to NIO's I/O threads (EventLoop) to create the Channel (network connection)
Back from NIO's I/O threads to the executor to report back the result (OK, or Error thrown)
Back onto the I/O threads to send & receive the HTTP request/response
Back onto the executor to report the result of the first HTTP request
Once again, onto the I/O threads to send & receive the HTTP request/response
And finally back once more to report the result of the second HTTP request

In reality, there will be more switches. With custom executors (which would mean that the EventLoop itself can become an executor) we could do the whole thing with 0 thread switches (like we can with the futures directly).

That would in theory be possible today although it would be really slow. Before custom executors, we wouldn't even just need to switch once per send/receive, we'd need to switch twice per "actor ChannelHandler" (in and out). With custom executors, this should become possible too. If that's sensible or not remains to be seen, it'll likely be a performance question because calling an async function is cheap but still more expensive than calling a synchronous function.

tl;dr: Custom executors will allow SwiftNIO (and any other system that needs to do I/O directly) to be fast with async/await because it doesn't force us to thread-switch all the time.

kirilltitov · March 26, 2021, 8:26am

I'm wondering. Currently the only way to run async context from Channel*Handler is to Task.runDetached {...}. But is there a way of forcing it to run in given event loop context? Not the event loop itself, of course, but in event loop's thread. I presume it should optimise the runtime a bit (until we have custom executors and proper bridging between them and NIO). Will

context.channel.eventLoop.execute {
    Task.runDetached {
        ...
    }
}

work as I expect it to work? Or runDetached ignores current thread?

ktoso · March 26, 2021, 8:29am

Currently that is correct. Since we don’t have the necessary facilities for NIO to participate in the async world yet (custom executors).

Detach is “detach from whatever context you’re in” and yes, it runs the closure on the global default executor by default.

With custom executors there will be detach(someExecutor) as well. NIO’s event loops should be able to be Executors in the future, once custom executors land.

fumoboy007 · January 8, 2023, 1:38am

I was wondering whether SwiftNIO could use the default executor implementation instead of its own implementation. I just read the following justification from Swift Concurrency Adoption Guidelines for not using the default executors:

It would be possible to make all NIO work happen on the co-operative pool, and thread-hop between each I/O operation and dispatching it onto the async/await pool, however this is not acceptable for high performance I/O: the context switch for each I/O operation is too expensive. As a result, SwiftNIO is not planning to just adopt Swift Concurrency for the ease of use it brings, because in its specific context, the context switches are not an acceptable tradeoff.

How exactly does SwiftNIO’s executor implementation solve the above performance problem better than the default executor implementation? Wouldn’t SwiftNIO still need to (lightweight) context switch for each I/O operation to not block the thread pool on I/O?

(Just trying to fill in some gaps in my knowledge.)

fumoboy007 · January 8, 2023, 1:49am

Hmm Swift Concurrency Adoption Guidelines also says

I/O systems however must, at some point, block a thread waiting for more I/O events, either in an I/O syscall or in something like epoll_wait. This is how NIO works: each of the event loop threads ultimately blocks on epoll_wait. We can’t do that inside the cooperative thread pool, as to do so would starve it for other async tasks, so we’d have to do so on a different thread. As such, SwiftNIO should not be used on the cooperative threadpool, but should take ownership and full control of its threads–because it is an I/O system.

Doesn’t Dispatch do something similar? Could the Dispatch implementation of the global executor be appropriate for SwiftNIO?

lukasa · January 8, 2023, 9:01am

This is a good question. Before I answer it, let's try to be very careful with terminology here to avoid confusion. I'm going to reserve the word "executor" to mean what it means in Swift Concurrency: an object into which Tasks are scheduled. Under this definition, SwiftNIO currently doesn't have an executor at all. What it has are EventLoops, which are somewhat different.

To explain why NIO's current approach produces better performance than the proposal of moving the NIO work into Swift Concurrency, it helps to think of the work as operating in two "zones" of execution. The first "zone" is NIO's EventLoop implementation. This is where we create our own threads, and block them in epoll_wait. In the current design, this is also the zone in which the ChannelPipeline, and all work on ChannelHandlers, operates.

The other "zone" is the Swift Concurrency zone. This is the zone in which async/await operate. This zone is further subdivided into smaller zones for individual actors, but for now that's a complication we can ignore.

Jumping between zones incurs a context-switch, which is fairly expensive. The less often we can context-switch, the better our application will perform.

The commented text above is offering an alternative to the current design. That alternative proposes moving the ChannelPipeline and ChannelHandlers into Swift Concurrency land, across that divide. It continues to leave the epoll_wait in a background thread, as that's required by Swift Concurrency under the "must always make forward progress" doctrine.

As you correctly point out, in either case we will eventually perform a context switch. The reason to prefer the current design is that it reduces the number of those context switches, and it improves the cache locality of most of the code. That produces performance improvements.

In almost all NIO programs, the ChannelPipeline has the effect of reducing the number of channelRead calls as you walk the pipeline from the head to the tail. At the head, we will issue a number of socket read calls, each of which returns some number of datagrams or bytes on a TCP stream. These are passed through the ChannelPipeline, with each ChannelHandler processing the messages, often transforming them into higher-level semantic messages. Often messages are filtered out or joined together. At the tail of the ChannelPipeline the stream of bytes or messages has turned into a smaller number of business logic components, each of which is a single unit of work on its own.

By placing the divide where we do now, at the "tail" of the ChannelPipeline, we context-switch for processed messages, not raw bytes. This means we context switch only when we know we have actual work to do, and we often context switch less often, because those messages are often the product of several underlying socket reads. As an example, in HTTP/2 we filter out most of the control frames and never deliver them to the end user, so they don't need to incur a context-switch at all.

If we move the divide to the "head" of the ChannelPipeline, we have to context-switch for each socket read. This will likely be more often than if we follow the current pattern.

Additionally, we'll context-switch immediately after receiving bytes, but before processing them. This hurts cache locality. When we've just read some bytes from the network, they're highly likely to be in the processor cache (or at least the ByteBuffer that holds them). Additionally, because of the way NIO issues its reads, the ChannelPipeline is also likely to be in cache. The result of this is that we take advantage of the I/O pattern of NIO to maximise the likelihood of cache hits during the "network processing" stage.

If we context-switch on each read, we will almost certainly evict many of these bytes from cache. We have to re-establish information about what ChannelPipeline we're operating on, when we already knew it at the point of issuing the read. This will provide a drag on performance as well.

The opportunity for changing to adopt Concurrency more pervasively will come if and when Swift adopts some flavour of custom Task executor. At that point we can define a Task that will be co-located with the EventLoop, to which NIO can deliver reads without a context-switch at all. That will give us a substantial performance uplift.

Yes and no. Dispatch in general does something similar. The problem is that there is no API for us to hook into. We cannot tell the executor "Please wake this Task when there is I/O to do on this file descriptor". Without that capability, this cannot work.

johannesweiss · January 8, 2023, 10:12am

Just to add a few extra points. In general with libdispatch, the way to be notified about I/O readiness of a file desciptor is dispatch_source_t/DispatchSource but we can't immediately use that for Swift Concurrency for a number of reasons:

Whilst Swift Concurrency's pools are currently implemented with libdispatch on all platforms, it's an implementation detail. In other words, we can't get hold of the dispatch_queue_t of the cooperative pool. And I don't think there's appetite for allowing this because that would forever tie Swift Concurrency to libdispatch on all platforms which is probably not a goal. I think it's deliberate that this is an implementation detail, especially the concrete type/configuration of the threads/queues/pools/...
Even if we got hold of the dispatch_queue_t, I believe the current design is that the whole cooperative pool is just one dispatch_queue_t with the "width" set to the number of CPU cores (i.e. that queue fans out to a ncpu threads and no more). But SwiftNIO's design requires that each Channel/ChannelPipeline has exactly one thread/queue assigned to it.
This could be reconciled similarly to not NIOTransportServices works by creating a DispatchQueue per EventLoop, each of which could target e.g. the cooperative pool.
Performance: Switching away from using kqueue/epoll/io_uring/... and using DispatchSources for the eventing comes with a performance cost that especially on Linux was very significant last time I checked.
New EventLoop(Group)s, this is probably the most minor point, but this would require new EventLoop(Group)s, not the end of the world but also not ideal.

The tl;dr is exactly what Cory says: Today, Swift Concurrency doesn't expose enough hooks for SwiftNIO to just use Swift Concurrency's pools. And neither does Swift Concurrency expose enough hooks allowing SwiftNIO to use its pools to run Swift Concurrency jobs. Therefore there are currently two different "zones" (as Cory puts it) and switching between those isn't free. And just FWIW, this will affect any system that directly does I/O, not just SwiftNIO (some more discussion here).

Helge_Hess1 · January 8, 2023, 7:07pm

That's a thing I'm waiting for, is there any timeline for this (custom executors feel like being completely off the table currently?).

lukasa · January 9, 2023, 11:02am

This question is really best answered by @John_McCall.

John_McCall · January 10, 2023, 7:56am

There's no specific timeline for doing it, no, sorry. We're definitely interested in it, but we haven't even fleshed out the idea completely.

I'm interested in knowing what you'd like to use it for, though.

tomerd · January 10, 2023, 5:34pm

cc @ktoso

hassila · January 10, 2023, 6:08pm

I can mention what we’d hope to be able to use it for (with the caveat that “it” is obviously not yet well defined):

We’d like the capability to group related tasks together and to have their executor (optionally) be pinned to a given cpu core - with the expectation that it would significantly decrease (or even eliminate, if possible) context/task switch overhead within this “system” of tasks and keep data locality / hot caches. This is for a fairly complex enterprise style server side solution where we have good knowledge of “what talks to what” and we believe (from previous experience) that such optimization would be valuable.

John_McCall · January 10, 2023, 6:55pm

Thanks. Yeah, that should be possible with task executors.

ktoso · January 11, 2023, 10:30am

Thanks for the ping, Tom!

Yes, such capability is definitely on the list of what we're interested in achieving with–broadly named–custom executors. Thanks for explaining your use case, that's definitely what we have in mind, but it's good to hear more people list their requirements / expectations

tcldr · January 11, 2023, 10:43am

Will a custom executor also be able to 'pin' further async calls to itself?

For example, as things stand today:

struct SomeThing {
  
  func performOperationAsynchronously() {
    Task.detached { @MainActor in // NOTE: MainActor concurrent context
      print("hi, from dispatch queue main!")
      await someSubRoutine() // but, by design, this will get executed on the cooperative thread pool
    }
  }
  
  private func someSubRoutine() async {
    print("hi, from the cooperative thread pool!")
  }
}

let thing = SomeThing()
thing.performOperationAsynchronously()

It might be handy to override this behaviour somehow when using a custom executor:

struct SomeThing {
  
  func performOperationAsynchronously() {
    Task.detached(using: customExecutor) { // NOTE: custom executor concurrent context
      print("hi, from my custom executor serial queue!")
      await someSubRoutine() // can we get this to inherit the custom executors concurrency context?
    }
  }
  
  private func someSubRoutine() async {
    print("Also hi, from my custom executor serial queue!")
  }
}

let thing = SomeThing()
thing.performOperationAsynchronously()

In other words, will a custom executor be able to define how the concurrency context is inherited? This could be useful in the same scenario @hassila mentions above, i.e. you wish to serialise async calls on a single CPU core.

EDIT: Improved example