[Pitch] `buildPartialBlock` for result builders

rxwei · February 23, 2022, 4:46pm

Hi all,

While working on the RegexBuilder-based DSL with strongly-typed regex captures, we’ve come across an issue with where concatenating the component types in a result builder block as a tuple would involve a factorial explosion of buildBlock overloads and unreasonably long compilation time, and it isn't something that could be trivially resolved even if we had variadic generics.

So, I wrote a pitch to explain these limitations and propose an extension to result builders, buildPartialBlock, which allows a block to be built recursively from pairs. The primary goal of this feature is to allow generic types in a block to compose more easily with less (and sometimes far less) overloading. In addition to that, it could also be useful for avoiding type-erasing component types in some other use cases. I’m interested in hearing your thoughts!

`buildPartialBlock` for result builders

Proposal: SE-NNNN
Author: Richard Wei
Implementation: apple/swift#40799
- Available in trunk development snapshots with -Xfrontend -enable-experimental-pairwise-build-block. The required method name is buildBlock(combining:into:) which is currently outdated. See compiler tests for usage.
Review Manager: TBD
Status: Awaiting implementation

Overview

We introduce a new result builder customization point that allows components of a block to be combined pairwise.

@resultBuilder
enum Builder {
    /// Builds a partial result component from the first component.
    static func buildPartialBlock(first: Component) -> Component
    
    /// Builds a partial result component by combining an accumulated component
    /// and a new component.  
    /// - Parameter accumulated: A component representing the accumulated result
    ///   thus far. 
    /// - Parameter next: A component representing the next component after the
    ///   accumulated ones in the block.
    static func buildPartialBlock(accumulated: Component, next: Component) -> Component
}

When buildPartialBlock(first:) and buildPartialBlock(accumulated:next:) are both provided, the result builder transform will transform components in a block into a series of calls to buildPartialBlock, combining one subsequent line into the result at a time.

// Original
{
    expr1
    expr2
    expr3
}

// Transformed
// Note: `buildFinalResult` and `buildExpression` are called only when they are defined, just like how they behave today.
{
    let e1 = Builder.buildExpression(expr1)
    let e2 = Builder.buildExpression(expr2)
    let e3 = Builder.buildExpression(expr3)
    let v1 = Builder.buildPartialBlock(first: e1)
    let v2 = Builder.buildPartialBlock(accumulated: v1, next: Builder.buildExpression(e2))
    let v3 = Builder.buildPartialBlock(accumulated: v2, next: Builder.buildExpression(e3))
    return Builder.buildFinalResult(v3)
}

The primary goal of this feature is to reduce the code bloat casued by overloading buildBlock for multiple arities, allowing libraries to define builder-based generic DSLs with joy and ease.

Motivation

Among DSLs powered by result builders, it is a common pattern to combine values with generic types in a block to produce a new type that contains the generic parameters of the components. For example, ViewBuilder and SceneBuilder in SwiftUI use buildBlock to combine views and scenes without losing strong types.

extension SceneBuilder {
  static func buildBlock<Content>(Content) -> Content
  static func buildBlock<C0, C1>(_ c0: C0, _ c1: C1) -> some Scene where C0: Scene, C1: Scene
  ...
  static func buildBlock<C0, C1, C2, C3, C4, C5, C6, C7, C8, C9>(_ c0: C0, _ c1: C1, _ c2: C2, _ c3: C3, _ c4: C4, _ c5: C5, _ c6: C6, _ c7: C7, _ c8: C8, _ c9: C9) -> some Scene where C0: Scene, C1: Scene, C2: Scene, C3: Scene, C4: Scene, C5: Scene, C6: Scene, C7: Scene, C8: Scene, C9: Scene
}

Due to the lack of variadic generics, buildBlock needs to be overloaded for any supported block arity. This unfortunately increases code size, causes significant code bloat in the implementation and documentation, and it is often painful to write and maintain the boiletplate.

While this approach works for types like ViewBuilder and SceneBuilder, some builders need to define type combination rules that are far too complex to implement with overloads. One such example is RegexBuilder in Declarative String Processing.

The result builder syntax for regular expressions is designed to allow developers to easily compose regex patterns. Strongly typed captures are represented as part of the Match generic parameter in the Regex type, which has a builder-based initializer.

struct Regex<Match> {
   init(@RegexBuilder _ builder: () -> Self)
}

Recap: Regular expression capturing basics

When a regular expression does not contain any capturing groups, its Match type is Substring, which represents the whole matched portion of the input.

let noCaptures = #/a/# // => Regex<Substring>

When a regular expression contains capturing groups, i.e. (...), the Match type is extended as a tuple to also contain capture types. Capture types are tuple elements after the first element.

//                          ________________________________
//                       .0 |                           .0 |
//                  ___________________                _________
let yesCaptures = #/a(?:(b+)c(d+))+e(f)?/# // => Regex<(Substring, Substring, Substring, Substring?)>
//                      ---- ----   ---                            ---------  ---------  ----------
//                    .1 | .2 |   .3 |                              .1 |       .2 |       .3 |
//                       |    |      |                                 |          |          |
//                       |    |      |_______________________________  |  ______  |  ________|
//                       |    |                                        |          |
//                       |    |______________________________________  |  ______  |
//                       |                                             |
//                       |_____________________________________________|
//                                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//                                                                          Capture types

Using the result builder syntax, the regular expression above becomes:

let regex = Regex {
    "a"                             // => Regex<Substring>
    oneOrMore {                     // {
        capture { oneOrMore("b") }  //     => Regex<(Substring, Substring)>
        "c"                         //     => Regex<Substring>
        capture { oneOrMore("d") }  //     => Regex<(Substring, Substring)>
    }                               // } => Regex<(Substring, Substring, Substring)>
    "e"                             // => Regex<Substring>
    optionally { capture("f") }     // => Regex<(Substring, Substring?)>
}                                   // => Regex<(Substring, Substring, Substring, Substring?)>
                                    //                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    //                               Capture types

let result = "abbcddbbcddef".firstMatch(of: regex)
// => MatchResult<(Substring, Substring, Substring, Substring?)>

RegexBuilder concatenates the capture types of all components as a flat tuple, forming a new Regex whose Match type is (Substring, CaptureType...). We can define the following RegexBuilder:

@resultBuilder
enum RegexBuilder {
    static func buildBlock() -> Regex<Substring>
    static func buildBlock<Match>(_: Regex<Match>) -> Regex<Match>
    // Overloads for non-tuples:
    static func buildBlock<WholeMatch0, WholeMatch1>(_: Regex<WholeMatch0>, _: Regex<WholeMatch1>) -> Regex<Substring>
    static func buildBlock<WholeMatch0, WholeMatch1, WholeMatch2>(_: Regex<WholeMatch0>, _: Regex<WholeMatch1>, _: Regex<WholeMatch2>) -> Regex<Substring>
    ...
    static func buildBlock<WholeMatch0, WholeMatch1, WholeMatch2, ..., WholeMatch10>(_: Regex<WholeMatch0>, _: Regex<WholeMatch1>, _: Regex<WholeMatch2>, ..., _: Regex<WholeMatch10>) -> Regex<Substring>
    // Overloads for tuples:
    static func buildBlock<W0, W1, C0>(_: Regex<(W0, C0)>, _: Regex<W1>) -> Regex<(Substring, C0)>
    static func buildBlock<W0, W1, C0>(_: Regex<W0>, _: Regex<(W1, C0)>) -> Regex<(Substring, C0)>
    static func buildBlock<W0, W1, C0, C1>(_: Regex<(W0, C0, C1)>, _: Regex<W1>) -> Regex<(Substring, C0, C1)>
    static func buildBlock<W0, W1, C0, C1>(_: Regex<(W0, C0)>, _: Regex<(W1, C1)>) -> Regex<(Substring, C0, C1)>
    static func buildBlock<W0, W1, C0, C1>(_: Regex<W0>, _: Regex<(W1, C0, C1)>) -> Regex<(Substring, C0, C1)>
    ...
    static func buildBlock<W0, W1, W2, W3, W4, C0, C1, C2, C3, C4, C5, C6>(
       _: Regex<(W0, C0, C1)>, _: Regex<(W1, C2)>, _: Regex<(W3, C3, C4, C5)>, _: Regex<(W4, C6)>
    ) -> Regex<(Substring, C0, C1, C2, C3, C4, C5, C6)>
    ...
}

Here we just need to overload for all tuple combinations for each arity... Oh my! That is an O(arity!) combinatorial explosion of buildBlock overloads; compiling these methods alone could take hours.

Proposed solution

This proposal introduces a new block-building approach similar to building heterogeneous lists. Instead of calling a single method to build an entire block wholesale, this approach recursively builds a partial block by taking one new component at a time, and thus significantly reduces the number of overloads.

We introduce a new customization point to result builders via two user-defined static methods:

@resultBuilder
enum Builder {
    static func buildPartialBlock(first: Component) -> Component
    static func buildPartialBlock(accumulated: Component, next: Component) -> Component
}

When buildPartialBlock(first:) and buildPartialBlock(accumulated:next:) are both defined, the result builder transform will turn components in a block into a series of calls to buildPartialBlock, combining components from top to bottom.

With this approach, many result builder types with overloaded buildBlock can be simplified. For example, the buildBlock overloads in SwiftUI's SceneBuilder could be simplified as the following:

extension SceneBuilder {
    static func buildPartialBlock(first: some Scene) -> some Scene 
    static func buildPartialBlock(accumulated: some Scene, next: some Scene) -> some Scene 
}

Similarly, the overloads of buildBlock in RegexBuilder can be vastly reduced from O(arity!), down to O(arity^2) overloads of buildPartialBlock(accumulated:next:). For an arity of 10, 100 overloads are trivial compared to over 3 million ones.

extension RegexBuilder {
    static func buildPartialBlock<M>(_ regex: Regex<M>) -> Regex<M>
    static func buildPartialBlock<W0, W1>(accumulated: Regex<W0>, next: Regex<W1>) -> Regex<Substring>
    static func buildPartialBlock<W0, W1, C0>(accumulated: Regex<(W0, C0)>, next: Regex<W1>) -> Regex<(Substring, C0)>
    static func buildPartialBlock<W0, W1, C0>(accumulated: Regex<W0>, next: Regex<(W1, C0)>) -> Regex<(Substring, C0)>
    static func buildPartialBlock<W0, W1, C0, C1>(accumulated: Regex<W0>, next: Regex<(W1, C0, C1)>) -> Regex<(Substring, C0, C1)>
    static func buildPartialBlock<W0, W1, C0, C1>(accumulated: Regex<(W0, C0, C1)>, next: Regex<W1>) -> Regex<(Substring, C0, C1)>
    static func buildPartialBlock<W0, W1, C0, C1>(accumulated: Regex<(W0, C0)>, next: Regex<(W1, C1)>) -> Regex<(Substring, C0, C1)>
    ...
}

Detailed design

When a type is marked with @resultBuilder, the type was previously required to have at least one static buildBlock method. With this proposal, such a type is now required to have either at least one static buildBlock method, or both buildPartialBlock(first:) and buildPartialBlock(accumulated:next:).

In the result builder transform, the compiler will look for static members buildPartialBlock(first:) and buildPartialBlock(accumulated:next:) in the builder type. If the following conditions are met:

Both methods buildPartialBlock(first:) and buildPartialBlock(accumulated:next:) exist.
The availability of the enclosing declaration is greater than or equal to the availability of buildPartialBlock(first:) and buildPartialBlock(accumulated:next:).

Then, a non-empty block will be transformed to the following:

// Original
{
    expr1
    expr2
    expr3
}

// Transformed
// Note: `buildFinalResult` and `buildExpression` are called only when they are defined, just like how they behave today.
{
    let e1 = Builder.buildExpression(expr1)
    let e2 = Builder.buildExpression(expr2)
    let e3 = Builder.buildExpression(expr3)
    let v1 = Builder.buildPartialBlock(first: e1)
    let v2 = Builder.buildPartialBlock(accumulated: v1, next: e2)
    let v3 = Builder.buildPartialBlock(accumulated: v2, next: e3)
    return Builder.buildFinalResult(v3)
}

Otherwise, the result builder transform will transform the block to call buildBlock instead as proposed in SE-0289.

Source compatibility

This proposal does not intend to introduce source-breaking changes. Although, if an existing result builder type happens to have static methods named buildPartialBlock(first:) and buildPartialBlock(accumulated:next:), the result builder transform will be creating calls to those methods instead and may cause errors depending on buildPartialBlock's type signature and implementation. Nevertheless, such cases should be extremely rare.

Effect on ABI stability

This proposal does not contain ABI changes.

Effect on API resilience

This proposal does not contain API changes.

Alternatives considered

Prefer viable `buildBlock` overloads to `buildPartialBlock`

As proposed, the result builder transform will always prefer buildPartialBlock to buildBlock when they are both defined. One could argue that making a single call to a viable overload of buildBlock would be more efficient and more customizable. However, because the result builder transform currently operates before type inference has completed, it would increase the type checking complexity to decide to call buildBlock or buildPairwiseBlock based on argument types. None of the informal requirements (buildBlock, buildOptional, etc) of result builders depend on argument types when being transformed to by the result builder transform.

Use nullary `buildPartialBlock()` to form the initial value

As proposed, the result builder transform calls unary buildPartialBlock(_:) on the first component in a block before calling buildPartialBlock(accumulated:next:) on the rest. While it is possible to require the user to define a nullary buildPartialBlock() method to form the initial result, this behavior may be suboptimal for result builders that do not intend to support empty blocks, e.g. SwiftUI's SceneBuilder. Plus, the proposed approach does allow the user to define an nullary buildBlock() to support building an empty block.

Rely on variadic generics

It can be argued that variadic generics would resolve the motivations presented. However, to achieve the concatenating behavior needed for RegexBuilder, we would need to be able to express nested type sequences, perform collection-like transformations on generic parameter packs such as dropping elements and splatting.

extension RegexBuilder {
  static func buildBlock<(W, (C...))..., R...>(_ components: Regex<(W, C...)>) -> Regex<(Substring, (R.Match.dropFirst()...).splat())>
}

Such features would greatly complicate the type system.

Alternative names

Overload `buildBlock` method name

Because the proposed feature overlaps buildBlock, one could argue for reusing buildBlock as the method base name instead of buildPartialBlock and using arguemnt labels to distinguish whether it is the pairwise version, e.g. buildBlock(partiallyAccumulated:next:) or buildBlock(combining:into:).

extension Builder {
  static func buildBlock(_: Component) -> Component
  static func buildBlock(partiallyAccumulated: Component, next: Component) -> Component
}

However, the phrase "build block" does not have a clear indication that the method is in fact building a partial block, and argument labels do not have the prominence to carry such indication.

Use `buildBlock(_:)` instead of `buildPartialBlock(first:)`

The unary base case method buildPartialBlock(first:) and buildBlock(_:) can be viewed as being functionally equivalent, so one could argue for reusing buildBlock. However, as mentioned in Overload buildBlock method name, the phrase "build block" lacks clarity.

A more important reason is that buildPartialBlock(first:), especially with its argument label first:, leaves space for a customization point where the developer can specify the direction of combination. As a future direction, we could allow buildPartialBlock(last:) to be defined instead of buildPartialBlock(first:), and in this scenario the result builder transform will first call buildPartialBlock(last:) on the last component and then call buildPartialBlock(accumulated:next:) on each preceeding component.

Different argument labels

The proposed argument labels accumulated: and next: took inspirations from some precedents in the standard library:

"accumulated" is used as an argument name of Array.reduce(into:_:).
"next" is used as an argument name of Array.reduce(_:_:)

Meanwhile, there are a number of alternative argument labels considered in the place of accumulated: and next:.

Possible replacements for accumulated::

partialResult:
existing:
upper:
_:

Possible replacements for next::

new:
_:

We believe that "accumulated" and "next" have the best overall clarity.

filip-sakel · February 23, 2022, 5:03pm

While the proposal seems pretty straightforward, I think the motivation targets a very specific use case. The initial result-builder proposal laid out a model of when to use what that method, making all build methods fit together organically; this addition feels more like an add-on rather than a missing capability.

Philippe_Hausler · February 23, 2022, 10:30pm

I just had an opportunity to use this feature in some sketch code. This works amazingly well! TBH I think the pitch undersells how powerful this really is; particularly the type of the first and the accumulation can vary - which means that if you can specify via types the only proper build sequences you can end up building result builders that enforce specific constitutions of varying components.

10s across the board. This is a cool feature.

tgrapperon · February 24, 2022, 1:55am

This new result builder method would be invaluable as soon as we have to handle heterogenous types and we want to perform some logic over them at the builder level.

In swift-parsing, some approach similar to the String processing experiment is used, with the same shortcomings. Many interesting functionalities already had to be pruned to preserve acceptable build-times/binary-size. These new methods would allow to implement some of them in a few lines of code, instead of having to generate a new set of ad-hoc overloads. It would also allow to implement fine-grained behaviors/simplifications at the builder level which could transitively transpose to the types that are using a builder to define one of their generic properties.

The "reducing" approach is simple and fundamental. I don't see how to make something else as general as this solution for combining blocks.

Overall, this will make builders even more powerful and thus fun to use.

So I'm definitely favorable to the feature, whatever it is named at the end. I'm personally pleased by the current choice.

masters3d · February 24, 2022, 2:05am

+1 I like it.

Would you consider buildBlockPartial to preserve the root name buildBlock?

ksluder · February 24, 2022, 2:46am

As worthwhile as first class regular expressions are, this addition feels like another step towards accidentally introducing a macro system into Swift. This pitch particularly reminds me of some functional programming formalizations of parsing.

I recall that during the review for result builders, the question was asked whether we should be focusing on a hygienic macro system instead. Builders were pitched as a less complicated system. We are now seeing the consequences of those simplifying limitations. Should we perhaps take the opportunity to step back and re-evaluate whether it should be individual features or forethought that drives the evolution of Swift’s metaprogramming story?

rxwei · February 24, 2022, 8:24am

I don't feel strongly about the proposed name, but I think buildPartialBlock is consistent with most existing builder methods where the base name is a verb phrase composed of "build" and a noun phrase, e.g. "build expression", "build array", "build optional" (where "optional" refers to the Optional type), "build final result". buildBlockPartial, in comparison, reads less fluent than existing builder methods since "block partial" isn't a noun phrase.

stephencelis · March 1, 2022, 6:59pm

While there isn't a ton of discussion here, we're eagerly awaiting for this to land, even if our use case is similar to the work being done with string processing.

Tyler-Keith-Thompson · March 8, 2022, 2:26am

+1 to this idea. This could be very useful for SwiftCurrent. As we build out rather complex workflows with generics.

stephencelis · March 18, 2022, 6:15pm

Just wanted to share our experience taking this for a spin in swift-parsing. This feature is great!

A few fun observations:

We were able to delete over 21K lines of generated code.

We were able to improve the arity of concatenating parsers from 6 total per block to 10+, where 10 is the number of non-Void parsers, and + is an unlimited number of Void parsers throughout.

Expand for example…

Previously, we were limited to 6 parsers total in a builder block:

let x = Parse {
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
}

Because Void outputs are automatically discarded, we could nest to get things further along, but it makes things much less readable:

let x = Parse {
  Parse {
    "hello"
    "world"
    Int.parser()
    "hello"
    "world"
  }
  Parse {
    Int.parser()
    "hello"
    "world"
  }
  Parse {
    Int.parser()
    "hello"
    "world"
  }
  Parse {
    Int.parser()
    "hello"
    "world"
  }
  Parse {
    Int.parser()
    "hello"
    "world"
  }
  Parse {
    Int.parser()
    "hello"
    "world"
  }
}

Using buildPartialBlock, we can now do this:

let x = Parse {
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
  Int.parser()
  "hello"
  "world"
   Int.parser()
  "hello"
  "world"
}

We were able to improve the arity of alternating parsers from 10 to unlimited.
Compile times also improved significantly (from 20 seconds to <2 seconds in debug mode).

ben-cohen · March 22, 2022, 6:43pm

Hi everyone,

The review for this proposal is now open here.

iampatbrown · March 23, 2022, 7:47am

This is incredible. When I first saw +571 −21,855 I thought you were having a breakdown!

Wasn't expecting to see such a significant improvement to compile times either :)

Looking forward to this one!

technogen · May 21, 2022, 8:32pm

I'm really looking forward to having this in the language! Are there any plans to submit this proposal to review?

xwu · May 21, 2022, 8:47pm

See:

technogen · May 21, 2022, 8:48pm

I must've missed it when looking through the proposal list. Thank you!