Efficient yielding accessor ABI

Since the non-underscored version of yielding accessors has not yet been shipped, maybe it’s not too late for this to make a difference…

Our tests in the Hylo project of Swift’s yielding accessors have shown them to have major performance problems. In particular, IIUC, the coroutines incur dynamic allocations. In the Hylo project we have worked out a compilation model that uses ordinary function calls instead of coroutines, and supports non-lexical lifetimes. If this model is of any interest to the implementation team, I’m happy to share more.

Regards,
Dave

21 Likes

Hi Dave, thanks for starting a discussion about this. Although I’m far from an expert on coroutines, could you elaborate on the alternative implementation you had in mind? Also, how significant are the performance impacts you are seeing?

2 Likes

Joe tells me there is a new ABI for yielding accessors that’s in development which avoids these heap allocations, but it’s not enabled yet, so that might explain the overhead you’re seeing with the current implementation.

I’m always happy to see more discussion of implementation techniques here, so why not? ;)

11 Likes

The overhead can be pretty significant when one starts chaining projections (e.g., in expressions of the form `foo[a].bar = ham` while crossing a resilience boundary.

Sadly I don’t have a reproducer handy and I think we fixed the most problematic use cases in the Hylo code base. What I remember from profiling data was that using `_read` or `_modify` would sometimes create needless dynamic allocations.

2 Likes

The scheme is pretty simple has a pretty simple basis, but well… you'll see. To set the stage, notes about how Hylo differs from Swift so you can read the code.

  • In Hylo, let means “immutable borrow,” like what Swift calls borrowing.
  • Hylo uses the let and inout keywords for the equivalent of Swift’s _readand _modify accesors, respectively
  • Hylo supports named inout and let bindings to projection results.
  • Hylo guarantees that lifetimes end at the first opportunity.
  • Pointers in Hylo are not unsafe, only access to their pointees is

So, you can have code like:

fun f(x: Array<Int>, y: inout Array<Array<String>>) {
  // The next three lines are equivalent to y[1][x[1]] = "Hello World".
  let x1 = x[1]
  inout y1 = &y[1]
  y1[x1] = "Hello World"
  // lifetime of x1 projection ends here.
  print(y1)
  // lifetime of y1 projection ends here.
}

Array element access will be done with an addressors for efficiency, but we need to support this pattern for yielding accessors, e.g. for types like Swift’s Dictionary whose _modify has to construct an Optional<Value>, yield it, and then take action after the yield based on the Optional's value to update the Dictionary. We call the accessor’s code before the yield its “ramp” and the code after the yield its “slide.”

The model starts with the realization that for most use cases an inoutor let accessor can be simply transformed into a higher-order function, so internally you could rewrite:

subscript(j: Int): String {
  inout {
    var q: String = ramp_code(j)
    yield &q
    slide_code(q)
  }
  ...

into

fun subscript_inout(j: Int, access: (inout String)->Void) {
  var q: String = ramp_code(j)
  access(&q)
  slide_code(q)
}

The body of the accessor is just transplanted into the function and the yield is replaced by the call to access. At the call site you wrap the code that uses the yielded in a closure, and pass that to the subscriptInout function as its access parameter.

However, this doesn’t cover cases shaped like our first example, where the ramps and slides don’t nest: you have x1's ramp, y1's ramp, x1's slide, and finally y1's slide. To handle that, you need to change the transformation slightly:
(I hope I didn’t make too many errors in this, I didn’t pass it through a real compiler and this is the first time I’ve tried to put it in code.)

fun subscript_inout<Caller>(
  j: Int, 
  continue: InoutScopeContinuation<String, Caller>,
  caller_locals: Locals<Caller>
) {
  var q: String = rampCode(j)

  continue(
    &q,
    fun (my_locals: SlideLocals)->Void {
      slideCode(&Locals<MyLocals>(my_locals).unsafe_pointee.q)
    },
    my_locals, // <=== synthesized pointer to local stack frame.
    caller_locals)
}

where:

/// Captured access to a functions's local variables, where
/// `X` has the layout of its parameters and local variables.
type Locals<X> = MutablePointer<X>

/// Contextually defined in each function  the layout of its 
/// parameters and local variables.
type MyLocals = ...

/// Type erased `Locals<Slide>` where `Slide` is `MyLocals` for some
/// slide function.
type SlideLocals = RawPointer

/// Post-yield code of an accessor projecting `inout T`
type InoutSlide<Accessor> = (Locals<Accessor>)->Void

/// The rest of a caller up to the end of the enclosing block.
type InoutScopeContinuation<T, Caller> 
  = (projected: inout T, 
     slide: InoutSlide<T>, slideLocals: RawPointer,
     caller_locals: Locals<Caller>)->Void

The transformation in the caller is mostly implied by the names and comments above, but I’ll spell it out:

  • the remainder of the enclosing block containing the access is represented as an InoutScopeContinuation<String, MyLocals>
  • Since the caller knows exactly where the slide needs to execute, a call to the slide parameter of that continuation is made at that point, passing slideLocals.
  • If there is another projection inside the continuation, the transformation is repeated.
  • All the local variables stay accessible because the chain of continuations doesn’t return until the end of scope.
  • You don’t actually need to use the end of the scope in most cases; you can trivially compute where a chain of overlapping projections ends and return from continuations at that point.

That’s it; LMK if you have any questions. I’m interested to hear what @Joe_Groff may be doing in this space.

15 Likes

@Joe_Groff @slava Is any of this useful for Swift, or have you already been down this road and decided on something better?

3 Likes

@Slava_Pestov hey, gentle ping

That looks similar to the coroutine splitting transformation that we let LLVM do for us with the existing ABI. The new ABI we're using takes advantage of the primary use of accessor coroutines still having stack discipline in a different way: the coroutine is allowed to use the callstack to store its state, and yield is implemented by "returning" but still leaving context unpopped on the callstack, avoiding the need for a predefined fixed allocation or fallback to dynamic allocation. When the caller resumes the coroutine, it picks up with the context it left on the stack, and pops it when it makes its final return. Mechanically, the coroutine still gets split into two almost-traditional functions at the LLVM level, but with the trick that the first half gets to leave live values on the stack for the second half to pick up again. The caller also has the ability to redirect context allocations to an async task's context when the caller is async, or for future-proofing, it can still fall back to malloc in case we ever introduce a reified coroutine access type.

6 Likes