[Pitch] Yield-once functions & first-class coroutines

asl · January 8, 2025, 11:57pm

Recent pich and corresponding proposal mentions that one of future enhancements might include yield-once functions as currently functions do not compose well with coroutine accessors.

It seems that we really need this feature to enable automatic differentiation of coroutine accessors.

First of all, let me describe why functions do not compose well with coroutines. In fact, the situation is backwards: coroutines do not exist in Swift AST at all. They are generated directly into SIL from coroutines accessors (_modify and _read). As a result in the following code snippet

struct S {
  private var _x : Float

  var x: Float {
    get{_x}
    _modify { yield &_x }
  }
}

The AST type of S.x._modify is essentially (S) -> () -> ().

What does this mean? Well, essentially coroutines are not first-class objects as one cannot use them where proper AST type is required: they cannot be returned from functions, they cannot be stored into tuples, they cannot be used as enum associated values. Plus lots of other things Also, when some kind of AST type is required currently the compiler
has lots of special code for accessors.

Why do we need them for autodiff then? The automatic differentiation is essentially based on 3 components:

Differentiable protocol & corresponding conformances
Compiler transformations
All kinds of custom derivatives available from _Differentiable module (or provided by a user).

In particular, we'd like to be able to synthesize derivatives for functions that might use such standard library collections as Array or Dictionary. And here we might easily get calls to coroutine accessors as, for example, Array.subscript._modify could be called even for code as simple as a[0] *= x or even a[0] = x.

As I already mentioned above, the autodiff relies on the possibility to register custom derivatives for the operations in order for compiler transforms to work. However here we are seeing some problems.

The reverse-mode derivative for a function normally returns a pair of function result and a so-called pullback function ("derivative" itself). However, as Array.subscript._modify is a coroutine, then its reverse-mode derivative should also be a coroutine to yield the corresponding value. Even more, it should return (not yield) the pullback that itself must be a coroutine. As one can imagine, currently there is no way to define coroutines rather outside of coroutine accessor context and this certainly demands some possible compiler extensions.

Before we proceed, I must note two things:

Certainly, there is a way to workaround the inability to define derivatives for coroutine accessors. See e.g. [SR-14113] Support `_read` and `_modify` accessor differentiation · Issue #54401 · swiftlang/swift · GitHub, however this imposes severe performance complications and also requires switching from using subscripts to other functions.
We only need to define coroutines. We do not need to call them. The compiler would synthesize the calls for them in autodiff transformations. This simplifies lots of things as we do not need to think about possible ownership complications across coroutines boundaries from Swift language perspective.

I've been working on coroutine AST support and corresponding autodiff bits over some time recently and will soon submit a PR with proof-of-concept implementation. As this feature is mostly intended to be used inside standard library I decided to make some simplifications that could be refined further if needed:

The coroutines are introduced via special @yield_once attribute. We already had it, but it was SIL-only. So, it was "promoted" to common attribute. It is now also possible to use this attribute on types to declare coroutine function types.
For SIL functions we are having yields separately from results and parameters. We do not have such luxury on AST level and I decided not to introduce one in order not to complicate the existing code. Instead, the yields are represented as yield results via special @yields attribute (again, we already having it on SIL level, so there is nothing new, the attribute was just promoted). Yields then are represented as special YieldResultType AST node, so these could be represented in the compiler type system and handled accordingly. The node also carries if we are yielding value or address (so if we're having value or inout yield).
I would expect that normally functions would have either normal results or yields, but the cases when we'd need both yields and normal results would be extremely rare. Probably non-existent practically outside of autodiff context where we'd need to return pullback closure from the reverse-mode derivative. Still this corner case is supported as well, the function is supposed to "return" a tuple containing both @yield result and normal result.

Some internals & known issues worth mentioned:

Depending on the situation we might need to deal with yields either as parameters (e.g. when need to be reabstracted as parameters in the context of a caller), or results, or separately. Some additional helpers were added (e.g. get only result type, or "full result" including yields or just yields, etc.)
Some special cases for coroutine accessors were removed. More to follow
We run out of ExtBits. For now I just take single bit that was required to mark coroutines from the # of arguments bitfield, so essentially maximum number of function arguments is reduced from 64k down to 32k. This likely should be resolved one way or another
All coroutine accessors now obtained AST types. I hope that this will not affect any ABI-related things, but it is certainly possible that I missed some important cases.

To conclude, here is an example of reverse-mode derivative for Array.subscript._modify that showcases the functionality:

extension Array where Element: Differentiable {
  @inlinable
  @derivative(of: subscript._modify)
  @yield_once
  mutating func _vjpModify(index: Int) -> (
    value: inout @yields Element, pullback: @yield_once (inout TangentVector) -> inout @yields Element.TangentVector
  ) {
    yield &self[index]
    @yield_once
    func pullback(_ v: inout TangentVector) -> inout @yields Element.TangentVector {
      yield &v[index]
    }
    return pullback
  }
}

This certainly is not intended for end-user consumption (mostly for standard library) and does not constitute a proper language feature. But hopefully it would provide some basic functionality that could eventually end with something usable :)

Joe_Groff · January 9, 2025, 3:30am

Can the derivative of a property or subscript be modeled as a property or subscript itself? You might be able to then attach parallel accessors to the derivative declaration to match the original without additional language features.

extension Array where Element: Differentiable {
  @derivative(of: subscript(_:))
  subscript(...) -> Element.TangentVector {
    _modify { }
  }
}

Does autodifferentiation benefit in practice from the optimization of using modify accessors? Would it be more straightforward to differentiate using the getter and setter instead?

asl · January 9, 2025, 3:37am

Unfortunately, not, or at least I do not see how:

The pullback is a closure that potentially captures some values. In my example above it is Index that is captured. But in general it could capture some other intermediate values produced during normal function invocation.
Pullback is a reverse map, so it takes tangent vector or original result and produces a tangent vector of original parameter (in Array sense it is map (Element.TangentVector) -> Array<Element>.TangentVector).

Well, it takes whatever code is produced by a compiler. The benefits of using modify accessors in autodiff are pretty much the same as using them elsewhere. Also, since we can retrospectively attach derivatives to the functions (including ones in the standard library), likely selectively inhibiting generation of modify accessors for differentiable functions won't work.

And we already differentiate getters and setters if calls to them are emitted, yes.

Essentially, the existing implementation tries to emulate coroutines using closures and inouts, but it is pretty expensive performance-wise: swift-differentiation/Sources/Differentiation/Array+Update.swift at main · differentiable-swift/swift-differentiation · GitHub, plus it requires source code changes (from subscript to function updated)

SlugFiller · January 9, 2025, 6:29pm

Sorry if this was already asked, or if the answer is obvious, but, instead of doing a yield-once, wouldn't it be better to return a non-copyable borrowed view of the property, and perform the post-access stuff in the deinit of the view?

I'm not sure if the current lifetime tracking can do this (i.e. borrowed view that blocks access to the parent while in scope). But if it doesn't, then making sure that it does is at least equally worth chasing, if not more, than a single-yield function with complex semantics and limitations.

asl · January 9, 2025, 6:37pm

Not sure I follow. How we'd make deinit return some final value? Like in example above.

asl · January 9, 2025, 6:44pm

To make things a bit explicit: I'm not suggesting an entirely new language feature, it is much more complicated (e.g. how one would "call" a coroutine).

I'm suggesting two things:

Fix AST representation of existing coroutines (in the form of coroutine accessors) to be explicit in compiler AST
Expose couple of attributes (that already exist in SIL level) to allow their usage within well-controlled (standard library) code that would allow implementation of certain features and hopefully provide some baseline for future language development if this would be determined to be a good idea (like mentioined in the cited accessor pitches)
Make use in autodiff as a nice side effect and to ensure that the code will not rot and covered by some in-tree usage :)

SlugFiller · January 9, 2025, 7:41pm

I'm not sure I fully follow the example. But what I'm saying is to remodel Array.subscript.modify itself, so it's no longer a coroutine. Suppose instead of being a coroutine temporarily returning an inout variable, Array.subscript.modify returned a non-copyable view, which is functionally equivalent to the inout variable, besides having clear scoping and borrowed lifetime, instead of relying on a coroutine in order to scope itself.

How would you approach differentiating such a method? i.e. How do you generally differentiate something that returns non-copyable types? Would that still require a new language feature?

Alternately, if modify was implemented like a with-style function, where the inout variable is scoped to a closure, how would you differentiate that?

To be clear: I'm not suggesting to change the way a modify accessor is written. Just the way it works under the hood. In a way that, after an optimization step, the resulting assembly would be roughly equivalent.

asl · January 9, 2025, 8:26pm

Ah, I see. Well, in a pullback we'd need to take a tangent vector of Element (i.e. Element.TangentVector) and produce a tangent vector of the whole Array. The potential problem I'm seeing is that we'd need to differentiate through such view so we'd need to represent it somehow. Likely we'd end with similar problems as with Optional where Optional<T>.TangentVector is not Optional<T.TangentVector> so we'd need to special case the conversions here and there.

I am also not sure about performance implications: my understanding is that coroutine accessors were introduced to remove extra copies. However, view would be a proper separate object that would potentially incur an additional overhead.

SlugFiller · January 9, 2025, 8:38pm

I'm not sure how discrete changes are modelled, but it seems logical that if an option turns from a value to nil, that this would still have a tangent that is not, itself, nil. So it stands to reason that the tangent itself would not be optional. But yeah, I can imagine how this could make things harder to compose or wrap in larger data structures.

Done correctly, it should be a zero cost abstraction. i.e. It would only incur an additional overhead in debug, but should be completely optimized away in release, to the point where the end result is the same as with using a coroutine.

In theory, at least. I'm not an expert on the compilation and optimization process. So someone with more thorough knowledge would have to chime in.