Async/await status?

Those blocking functions will present the same problem to an async/await style solution.

Indeed, but by marking things that are expected be non-blocking with async, you get an easier mental model for the programmer to avoid blocking. Instead of digging into a libraries implementation to see how it performs its operations internally (does it use whatever the Swift equivalent of the JDK would be, probably some adaption of SwiftNIO, or does it call some C library), you can just take care to only use constructs that are marked async and be reasonably certain you won't block.

I would hope that we could also do a better job of dynamically detecting that an async function called something that ultimately blocked.

4 Likes

Ah, I see what you mean. I agree.

Next thing I want to dig into is Kotlin's coroutines. They mark the function with suspend, instead of async, and it seems like they handle more interesting cases beyond async/await. ... Another piece of prior art to consider for Swift.

I think this is a great trade-off for Swift: pay the cost of fixing up the stack references only when frame movement is actually needed. Other solutions like segment-relative offsets would introduce costs in the fast path.

We discuss this pointer-into-the-stack challenge in Section 4.6 of our paper. I decided to go with a crude segment marker that indicates that no frames are movable once any continuation capture occurs, purely to simplify the implementation. We only ever have stack pointers when continuation capture happens in Manticore.

4 Likes

I share this concern - I haven't seen a segmented stack approach work for a language that aspires to work well in system's programming use cases. It seems better and sufficient to allow stack space be specified when "threads" are created up front. It seems fine for the main thread to have a huge stack, and default other tasks to small threads - but allow them to be cranked up if needed. We can also have an explicit 'switch stack' helper call to mediate between tasks on small stacks that need to call into uncooperative libraries that burn stack space. We have precedent for such things in withoutActuallyEscaping(_:do:) etc.

John is right that you run into things like 16K page sizes, but ARM64 has a big address space to work with.

-Chris

4 Likes

If we can successfully implement movable contexts, and I think we can, then we ought to be able to get the best of both worlds—we can default contexts to a smaller initial capacity, and grow them like std::vectors when they need more space. Then specifying an up-front stack capacity is "just" an optimization, not necessary for correctness.

2 Likes

Yes, I think it's mainly because of the challenges with foreign-function calls.

The System V ABI (for example) does not specify any detection of stack overflow, so the de facto convention is to use a guard page. If a fixup of stack references is needed in order to move the stack, recovering from the page fault is rather tricky since the fault can happen at arbitrary points (instead of safe-points at stack limit tests). The only example I'm aware of in the literature that can achieve safe-points everywhere is: Support for garbage collection at every instruction in a Java compiler | ACM SIGPLAN Notices

Swift sees an additional challenge here because (as far as I know) it is heavily reliant on ObjC libraries that have no notion of stack limit tests, and control-flow can weave between ObjC and Swift code arbitrarily.

In Manticore, when stack-switching for foreign-function calls is disabled, the segmented-stack strategy uses an approach that is backwards compatible with C. The idea is that we reserve an additional n KB of space after the m KB segment, just beyond where the segment limit's address is, to be a C-call reserved space that is followed by a guard page. Thus, C calls have between n and n+m KB of stack space to work with and no stack switching is needed.

With this approach, if the program is primarily calling Manticore functions, stack overflow is detected with a limit test and is fully recoverable. However, if it calls a C function that uses a lot of stack space, then it falls-back to relying on the guard page to detect overflow, in which it crashes instead of recovering.

In our case, I think async coroutines would still have access to the regular C stack, and they could store temporary data that doesn't need to persist across a suspension there. Since C and ObjC functions would never be able to suspend the Swift async context, they could be called normally on that stack.

1 Like

This still requires dynamic stack probing etc, right?

Beyond that, it isn't clear to me that all types will be movable. Many C++ types (which we aspire to interop with) are not movable - std::atomic as one random example.

You'd have to check for capacity before growing the context, but I don't think that would be an unacceptable overhead, since over time it's unlikely to occur.

Values of types that aren't movable, we could conceivably allocate out-of-line in their own allocations, to keep the rest of the context movable.

I have some concerns about movable contexts because of the following reasons:

  • it might break someone’s code that uses pointers, especially if pointers are used as integers, for example, for atomic operations in C code (this might be difficult to handle).
  • it can cause some overhead because of additional memory allocation, copying and so on.

I think it would be better to use one of the following:

  • stackfull coroutines with a good reuse mechanism.
  • coroutines with stack splitting (something between stackfull and stackless coroutines).

The compiler should be able to know when pointers have been formed to local values, so that only those values are given stable addresses.

Agreed, the probe failures and reallocations would be uncommon. I'm pointing out that the probes have to be unconditionally executed, slowing down all code everywhere by a little bit. This is one of the costs of segmented stack solutions in general: it violates the "you don't pay for it unless you use it" goal that systems languages generally aim to provide.

In addition to the probes themselves, there is also the size overhead of the additional metadata required to move things around.

I guess I don't understand the model you are considering. The Swift ABI has to be compatible with the C ABI in terms of the external contract with existing code, and intermixing between the two is critical. How do you forsee Swift code calling into C code (which doesn't do probing) work? How do calls to convention(c) closures work?

I think you can tie this into the Swift ABI at this point, since existing Swift ABI code doesn't do stack probing.

-Chris

Well, C functions can't be async, nor can existing Swift functions. Only async code would ever pay any of these penalties, and non-async Swift and C functions can be called on the normal stack with the normal conventions they have today. Also, within async functions, we'd only need to save things that potentially persist across a suspend point in the coroutine context. Temporaries with fixed lifetimes could remain on the system stack.

I do realize that there are not many possible implementations of coroutines. However, this approach seems to have some trade-offs, like limitations of their usage, which theoretically can be avoided.

I think I'm missing something - the upthread discussion was about segmented stacks and avoiding the goldilocks problem of stacks that are either too big or two small, John's point about some architectures using 16K guard pages, etc.

The async marking only seems to address the coroutine/interruption part of the problem, not the stack size/growth part of it. Am I confused here?

-Chris

2 Likes

Sorry, I'm taking for granted that, because of the C interop and ABI concerns John has already delineated, that we aren't planning to move C code or existing Swift code off of a traditional C-like fixed preallocated system stack. Async functions would, in addition to having access to whatever C stack they're currently executing on, have access to a context object where they would store any local "stack" values that might persist across a suspend. Async functions however wouldn't be pinned to any particular C stack, since they could be suspended and potentially resumed on a different thread. I'm speaking only about the growth strategy we might use for that async context, not for all functions in general.

1 Like

Why would the async context need to grow? In other languages I'm familiar with the context is a per-function object so its size is fixed at compile time. Is there some other strategy being considered for async/await in Swift?

In languages like C#/JavaScript/Python, where async/await is sugar for working with promise objects, then when an async function calls into another one, there's a linked list of promises formed for each "frame" in the async call stack, and each one could generally be pre-allocated to a fixed size, yeah. In the case of Swift, I think we'd want to reduce overhead by using one allocation for the whole async callstack.

5 Likes