Async/await status?

I would like to see Swift adopting and pioneering for main stream languages what is called “Structural Synchronous Reactive Programming“ (SSRP) in Céu http://www.ceu-lang.org/

As opposed to “Functional Reactive Programming” (FRP) - with Combine as a good example - which helps to model reactive data flows via functional transformers, SSRP helps to model control flows which should happen as a reaction to events with the help of structural language constructs. await is only one of several other constructs like watching, every, parallel....

SSRP is a synchronous reactive approach which means that the reaction to events is logically immediate. This helps to precisely specify the behavior even in the case of cancellation of parallel control flows.

This synchronous reactive model of computation could be restricted to special Actors, with Actors themselves communicating in a more asynchronous fashion like proposed by the “Globally Asynchronous Locally Synchronous”. (GALS) approach.

SSRP is especially helpful in the real time control of hardware or other time sensitive domains. These could be domains for Swift to conquer!

18 Likes

Just to follow on from this talk, here are a few more in depth details about how it's implemented in Rust: https://www.youtube.com/watch?v=NNwK5ZPAJCk

3 Likes

It seems that Java is going that way with "virtual threads" in Project Loom. The article below is interesting and it makes me wonder: why have the async and await keywords at all?

https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.html

5 Likes

These virtual threads look pretty neat:

Key Takeaways

  • A virtual thread is a Thread — in code, at runtime, in the debugger and in the profiler.
  • A virtual thread is not a wrapper around an OS thread, but a Java entity.
  • Creating a virtual thread is cheap — have millions, and don’t pool them!
  • Blocking a virtual thread is cheap — be synchronous!
  • No language changes are needed.
  • Pluggable schedulers offer the flexibility of asynchronous programming.

Sounds like a nice approach that could work well both for Swift on iOS (or any other client-side) and server-side.

1 Like

At first read through, it does have a strong DispatchWorkItem on concurrent queues vibe I can’t shake. Hmm, maybe I’ll need to think about it some more. :thinking:

1 Like

I'm confused.

You need to wait for something to happen without wasting precious resources? Forget about callbacks or reactive stream chaining — just block.

All that virtual threads do is another layer of virtulisahtion on top of the OS's. I don't understand how it eliminates the need for asynchronous programming?

1 Like

Excuse the plug, but we have Rust’s futures in Swift for a while now :)

That's a great, well-written article, but it's clear that they haven't actually settled on an implementation. Their prototype includes at least two different designs, probably a function-global frame allocation vs. a narrower continuation allocation. Whatever they do, it's going to have serious trade-offs that I haven't seen any balanced evaluations of yet — which is fair, it's still fairly early days.

You cannot have truly lightweight threads while still managing stacks in a traditional way. Traditional stacks require a substantial reservation of virtual address space plus at least one page of actual memory; even with 4KB pages (and ARM64 uses 16KB), that level of overhead means you'll struggle with hundreds of thousands of threads, much less millions. Instead, local state must either be allocated separately from the traditional stack (which means a lot of extra allocator traffic) or be migratable away from it (which makes thread-switching quite expensive, and so runs counter to the overall goals of lightweight threads). Since the JVM already has a great allocator and GC, I assume they're doing the former, but that's going to introduce a lot of new GC pressure, which is not something I'd think most heavy JVM users will be fans of.

If you don't have "colored" async functions, you have to do that for every function that isn't completely trivial. That allocated context then has to be threaded through calls and returns the same way the traditional stack pointer is. Since Swift doesn't already do that for ordinary functions, and we've declared our ABI stable on Apple platforms, this really just isn't an option, even if there were no other downsides.

16 Likes

There was a paper recently published by @kavon and John Reppy with a more in-depth measurement of the performance of different stack and continuation implementations, which might be interesting: https://kavon.farvard.in/papers/pldi20-stacks.pdf

11 Likes

Loom is indeed quite a hot topic in JVM land since quite some time... It's shaping up quite well, but the "weight" of (virtual) threads remains a bit unclear, as John alludes to.

There's an interesting circle the JVM has walked here; Way back then it hat green threads (eons ago... in 1.1), however those were M:1 mapped, so all java.lang.Tread would share the same underlying OS thread. Obviously, this was limiting for real parallelism (and multi core CPUs), so Java made the switch to mapping its Thread 1:1 to OS threads. That has a nice benefit of mapping "directly" to calling native code etc. It's quite heavy though 500~1000k per Thread used to be the rough estimate, though AFAIR things have improved in JDK11 which I've not used in anger. But in any case, Loom's fibers/virtual-threads are definitely going to be "light" at least as compared to present day j.l.Thread :thinking:

Needless to say that relying on today's Thread directly is too heavy for reactive frameworks, so runtimes like Netty, Akka, Reactor, Reactive Streams impls (anything really), end up scheduling in user-land, many fine grained tasks onto those heavy threads, i.e. scheduling M:N (M entities onto N real threads). All reactive or async libraries effectively do this today.

Loom, is interesting since it flips the mappings around again; i.e. what libraries used to have to do because Thread is too heavy, loom does (basically it will do exactly the same thing in terms of scheduling as those reactive libs do today), and map M "virtual" threads onto N real threads. So... it's going back to green threading, but with M:N (and not M:1 like it historically had).

I remain a bit iffy about the "weight" question for Loom... perhaps they'll figure it out somehow with VM trickery though. The simple thing about stream or actor runtimes on the JVM is that they simply "give up the thread", and when they're scheduled again they start afresh, there's no need to keep around any stack for those models to work well. So I wonder how stackful (threads) will lean themselves to such lighter execution models. Yet another library scheduler on top of virtual threads sounds a bit silly -- 2 layers of user land scheduling seem a bit weird, yet leaving it plain as "lib contepts : directly virtual thread" mapping will be interesting to see if it really is light enough... (One could argue the shape of such APIs will change dramatically though :thinking:)

// Thanks for the paper @Joe_Groff, that's a topic I'd love to learn more about, will dig into it!

3 Likes

Thanks for sharing! If anyone has questions about the paper I'm happy to answer.

Depending on how these layers are setup, they could make sense. Manticore features "nested schedulers", which consists of a stack of schedulers that is user controllable: http://manticore.cs.uchicago.edu/papers/icfp08-sched.pdf

That scheduler paper is based on ideas from this (wonderful) paper: https://www.eecis.udel.edu/~cavazos/cisc879-spring2008/papers/cps-threads.pdf

9 Likes

Thanks for the links, this sounds quite interesting -- added to my queue :bookmark:

1 Like

I see they are using heap allocated stack chunks, and they have a Continuation class that they say might become public API in later releases.

That makes sense, but does Swift effectively do the same thing in async/await code, because it has to heap allocate the closure (the continuation) with the captured local values?

I wonder if the two styles boil down to the same things under the hood, after static analysis and optimization. ?

I see the point about having to pass that context through all functions. I wonder how the Java devs are solving that. Maybe they JIT compile 2 versions of each function.

Does Swift do something like that with generics? Ie, you write foo<T> and get separate runtime copies for foo<Int> and foo<Float>?

I does seem like an explicit await keyword in front of a function call fits with Swift’s style, just like we have an explicit try keyword in contrast to Java, which also marks a modified calling convention.

Yes, it will have to use some non-contiguous implementation. It can be isolated to just async functions, though, rather than impacting everything. That is why all the languages that use colored async functions do so.

That would certainly be possible if you wanted to make lightweight threads that were pinned to an OS thread take advantage of the contiguous stack. It would be a lot of extra complexity for the JIIT, though.

We can, but we don't have to.

1 Like

It seems like we could conceivably implement a growable contiguous implementation as well, since most Swift types and Swift code is not able to persist arbitrary pointers to stack data, and places where we do have references to possibly-stack data like yielded inout or borrowed values are well-scoped and could be tracked in metadata to allow fixup if the context has to be moved. Things that do produce pointers to stack data, such as non-bitwise-takable types, withUnsafePointer, and friends, are relatively rare, and we could conceivably allocate things that need stable addresses out-of-line from the main coroutine context in order to maintain the benefits of contiguity for the majority of Swift code.

1 Like

It’s possible, yeah. It would add a lot of implementation complexity, though, vs. something that never has to copy stacks.

That's fair. We could probably set up the ABI for coroutines in such a way that we could change the growth strategy from segmented to moving in the future. Research like @kavon's, as well as the trajectory of languages like Rust and Go that tried segmented stacks and then abandoned them, suggests that the locality wins and avoiding performance pathologies of segment boundaries could be worth the added complexity.

3 Likes

Well, the problem then is that we'd have to get the implementation right even though we wouldn't be using it. So we'd have all the complexity and overhead and none of the assurance that it actually works.

This is a pure implementation concern, but it also means we'd have to do the lowering at a point where we still have intelligence about local variables and their data invariants, i.e. not as an LLVM pass.

The relevant passage from the article:

Whenever you run a blocking operation in, say, Spring or Hibernate, it ultimately makes use of some core-library API in the JDK — the java.* packages. The JDK controls all the interaction points between the application and the OS or the outside world5, so all we need to do is to adapt them to work with virtual threads. Everything built on top of the JDK will now work with virtual threads. Specifically, we need to adapt all points in the JDK where we block; these come in two flavors: synchronization (think locks or blocking queues) and I/O. In particular, when a synchronous I/O operation is called on a virtual thread, we want to block the virtual thread, perform a non-blocking I/O operation under the covers, and set it so that when the operation completes it will unblock the virtual thread.

Blocking native code called from Java and some Java constructs will still block the underlying OS thread, which is another reason this approach might not be a good fit for Swift, where using native libraries with blocking constructs seems to me much more prevalent.

1 Like

I am not sure if threads are the right approach from a developers point of view as multithreaded code is notoriously had to get right. Using even more threads - because they are light weight - will rather increase the likelihood of subtle bugs like synchronization issues and deadlocks.

I can see an alternative in the synchronous language approach where threads - sometimes called trails in this field - proceed in lockstep through logical time but run on the same thread (or small set of threads) according to a schedule based on the data dependencies at every logical time instant.

1 Like