[Pitch #2] Function builders

hisekaldma · August 19, 2020, 5:13am

Perhaps the name of the feature should capture this aspect somehow? The name function builder suggests that they let you build arbitrary functions, but that’s not what they do – they let you build tree structures using functions. If the feature was named tree builder, list builder, schema builder or something similar, the intent would be much clearer.

(Edit: I see that @davedelong made essentially the same point earlier. So ditto that.)

beccadax · August 19, 2020, 5:16am

(I started writing this on Saturday, and although I’ve tried to follow the discussion since, I probably haven’t read everything. Sorry if this duplicates anything previously discussed.)

Some comments directly on the pitch:

Variation in the DSL’s dialect

The contents of a function builder are a distinct dialect of Swift. Some features, like variable declarations, work basically as usual; others, like expression statements, are slightly modified; still others, like catch blocks, are banned.

A few constructs, like if, for, and do, are banned by default unless the function builder declares support for them. I am a bit skeptical of this last category. Function builders are sort of an inherently confusing feature—you are putting the language into a special mode where it behaves differently from usual. That’s bad enough, but these knobs introduce additional variation in capabilies, reducing user confidence in their behavior. It seems like they ought to require additional justification.

(If this seems like I’m making a mountain out of a molehill—there are only three of these, after all—it’s because I have one eye on the future evolution of function builders. I could easily imagine there being a dozen of these things controlling break and while and catch and all sorts of other language features, and I could easily imagine many function builders not supporting some of these features purely by accident—because a developer forgot to turn something on or because they were designed before a particular feature had a build function. I’d like to establish a design direction at the outset that would avert this.)

There’s at least one obvious good reason for for to be controllable: a DSL which provides its own loop-like constructs, like SwiftUI.ForEach, may want to disable for to avoid it being an attractive nuisance. I can certainly imagine DSLs which would have simliar alternatives to if and would want to disable the built-in one for the same reason. I’d like to see the proposal explore this question, but I think there’s probably adequate motivation for these.

But why do? With catch blocks disabled, do’s only effect is to limit the scope of variables. I can’t imagine a reason why a function builder might want to prevent you from using it. On that basis, I think do should be supported in function builders all of the time, not just when buildDo is implemented.

(In fact, I’m not sure we really need buildDo at all. buildOptional, buildEither, and buildArray are all necessary because they reflect different patterns of control flow, but do doesn’t actually affect control flow. Instead, we could either represent do with a bare buildBlock or inline its values into the parent block’s.)

`buildDo`’s arity

Even if we keep buildDo, though, I think it should be a unary function with a buildBlock call nested inside it, rather than directly taking the values generated by its block.

Blocks can have more than one statement, so buildBlock needs to take a variable number of parameters. This means that type-preserving function builders (as opposed to ones which erase all inner types) need to provide many overloads of buildBlock with different arities (at least until variadic generics arrive). This is unfortunate, but necessary for now.

But the proposed design puts buildDo in the same boat for no apparent benefit. If we instead had buildDo take one parameter and nested a call to buildBlock inside buildDo, there would be no need to complicate buildDo to support multiple arities.

This would also make buildDo more similar to buildOptional, buildEither, and buildArray, which all use buildBlock to group together the values they’re passed. I think “Groups of statements are always passed through buildBlock” would create a simpler mental model for implementors of function builders—see a scope, mentally translate it into a buildBlock.

So if buildDo does survive, I think it should take one parameter and it should be expected that a buildBlock call will produce that parameter.

`buildOptional` and `buildEither`

I don’t like the redundancy of buildOptional and buildEither.

Broadly, I think of function builders as falling into two categories:

Type-erasing function builders, which are just trying to collect values from a computation to marshal to some common type. HTMLBuilder is an example of a type-erasing function builder.
Type-preserving function builders, which are trying to capture not only the values produced by that computation, but also the domain of possible values it could produce in the form of a complicated generic type. SwiftUI.ViewBuilder is an example of a type-preserving function builder.

Type-preserving function builders are probably almost always going to want to capture the distinction between an if/else and two adjacent if statements. So they will always want to implement buildEither; buildOptional is an unwelcome complication. As far as these are concerned, we could treat an else-less if as having an empty else block, represented as B.buildEither(second: B.buildBlock()).

(If SwiftUI.ViewBuilder wants to keep using Optional for one-way branches, it could provide a second set of buildEither overloads where second: EmptyView.)

So that means buildOptional is mainly a convenience for type-erasing function builders. But it’s only going to save one method declaration over a world with no buildOptional; if we instead provided a fallback from buildEither to buildArray, that would save three declarations with no loss of functionality.

So I don’t think buildOptional carries its weight.

Ad-hoc method patterns

Generally, the patterns of calls inserted by function builders feel ad-hoc. Reading the proposal alone didn’t make me feel confident that I fully understand which calls will be inserted into a function where; I’ve had to experiment to get a feel for it.

This problem is exacerbated, of course, by the fact that we can’t print ASTs to show the generated calls and declarations. In evaluating this proposal, I found it helpful to write a function builder which records the calls used to produce its result. We might want to consider including something similar in the standard library or as a public package.

But I think this also reflects weaknesses in how the build methods are named, how the feature is currently being taught, and ultimately in some of the decisions made by the proposal. If the various build methods were more clearly connected to control flow and differences in how often code is run—rather than to specific syntactic features—I think function builders would be easier to think about and work with.

The playground transform

As a long-term goal, I think we should think about how much we might be able to unify function builders with the playground transform. The playground transform is largely about inserting calls into code to expose the values and control flow without actually changing what it does. There are obvious differences—the playground transform applies to all code in the input file, doesn’t affect return values, and supports more code constructs—but the first two differences seem like flags that could be turned on or off, and the third seems like it could be addressed by extending function builders to support more constructs. In the long run—if not today—we might be able to reduce the size of the compiler (and get rid of some code that’s tied pretty closely to Apple implementation details).

beccadax · August 19, 2020, 5:39am

This is going to sound kind of silly when I say it to the document’s author, but I think this is a misunderstanding of the concept the proposal is describing. The primary tree being declared by the function-buildered function is the tree of values being produced—the HTML tags, for instance—not the tree of control-flow decisions and constructs used to get there. SwiftUI does actually represent control flow as nodes in the tree too, but that’s just how SwiftUI works—it’s not something the user is really meant to think about. I think most function builders will flatten those control flow decisions away.

lassejansen · August 20, 2020, 8:57am

Thanks for your reply Lantua!

Lantua:

throw is already supported and is left alone by the transformation (see Error-handling statement section in the new pitch). I believe what you're trying to achieve can be done with proper closure signature:
func p(_ block: @Builder () throws -> Result) rethrows -> Result {
  ...
}

Yes, I tried that, but got the error "Closure containing control flow statement cannot be used ...". But that may be because I used an Xcode Beta, and not the trunk compiler. Anyway, good to know that it should work. I wasn't sure if the statement from the proposal ("left alone in the transformation") implies that it should be functional.

Lantua:

I think we can adjust @anreitersimon implementation a bit.
...

  static func buildFinalResult(_ components: Component) -> ((First) -> (), (Second) -> ()) -> () {
    { compute(component: components, first: $0, second: $1) }
  }

Nice

lassejansen · August 20, 2020, 10:07am

Yes, that works, thanks. My initial idea was to hook directly into the buildExpression methods and stream the expressions out to the handler closures directly, instead of buffering them through the build{Block,...} functions and calling the closures in a loop at the end.

It's probably my expectation of a "stream" of expressions that explains the disagreement about having all of Swift syntax available in the builder closures. I haven't thought about Playgrounds until @beccadax brought them up, but it's a good analogy of my mental model for building DSLs: When evaluating a builder closure, the contents from the right pane in a Playground are handed back to me one by one during evaluation (with the difference that I only get the unbounded/"free" expressions). I think that explains why restricting any syntax elements looks like an unnecessary limitation to me.

Coming from a "combinator" API like in function builders, that essentially traces the AST of the closure, I can of course understand that supporting non-functional/imperative syntax features seems unnatural.

I think I may have a good compromise for a simplified API that is in line with the current proposal:

Keep everything as it is proposed, except:

Make buildFinalResult the only required function. It would be used to define and optionally convert the input out output types of the builder.
Use the existence of the buildBlock function to opt-in to the AST-tracing features.

If buildBlock is not defined, all syntax features (that are currently available to function builders) are enabled and the expressions are flattened and passed as an array to the buildFinalResult function.

If buildBlock is defined, all syntax features that should be available must be enabled by providing the respective function (buildEither, buildArray etc).

Different overloads of buildExpression could be implemented without triggering AST-tracing.

A simple example of an HTML DSL could look like this:

@functionBuilder
struct HTMLBuilder {
  static func buildFinalResult(_ nodes: [HTMLNode]) -> [HTMLNode] {
    return nodes
  }
}

An HTML example like in the proposal that lifts Strings to TextNodes would look like this:

@_functionBuilder
struct HTMLBuilder {

  enum Expression {
    case node( HTMLNode)
    case text( String)
  }

  static func buildExpression( _ node: HTMLNode) -> [Expression] {
    return [.node( node)]
  }

  static func buildExpression( _ text: String) -> [Expression] {
    return [.text( text)]
  }

  static func buildFinalResult(_ expressions: [Expression]) -> [HTMLNode] {
    return expressions.map {
      switch $0 {
      case .node( let node):
        return node
      case .text( let text):
        return TextNode(text: text)
      }
    }
  }
}

What do you think?

jayton · August 20, 2020, 12:18pm

Out of curiosity, why not:

@functionBuilder
struct HTMLBuilder {

  static func buildExpression( _ node: HTMLNode) -> [HTMLNode] {
    return node
  }

  static func buildExpression( _ text: String) -> [HTMLNode] {
    return [Node(text: text)]
  }

  static func buildFinalResult(_ expressions: [HTMLNode]) -> [HTMLNode] {
    return expressions
  }
}

This immediately raises the question of why buildFinalResult should be required (it isn’t in the current proposal).

lassejansen · August 20, 2020, 12:25pm

I have it from here: https://github.com/DougGregor/swift-evolution/blob/function-builders/proposals/XXXX-function-builders.md#function-building-methods

buildFinalResult(_ components: Component) -> Return is used to finalize the result produced by the outermost buildBlock call for top-level function bodies. It is only necessary if the DSL wants to distinguish Component types from Return types, e.g. if it wants builders to internally traffic in some type that it doesn't really want to expose to clients. If it isn't declared, the result of the outermost buildBlock will be used instead.

Isn't that the right one?

Yes, right, that would be even simpler.

lassejansen · August 20, 2020, 3:32pm

Maybe I misread your question. If it was "Why don't we require at least one implementation of buildExpression, instead of requiring buildFinalResult" – then yes, you are right, that would probably the most simple API of all.

The revised simple HTMLNode example would look like this then:

@functionBuilder
struct HTMLBuilder {

  static func buildExpression( _ node: HTMLNode) -> [HTMLNode] {
    return [node]
  }
}

HTMLNode + String:

@functionBuilder
struct HTMLBuilder {

  static func buildExpression( _ node: HTMLNode) -> [HTMLNode] {
    return [node]
  }

  static func buildExpression( _ text: String) -> [HTMLNode] {
    return [TextNode(text: text)]
  }
}

Douglas_Gregor · August 20, 2020, 5:56pm

This is, in fact, the motivation. I'll update the proposal accordingly.

Thinking about it further, I agree with you. buildDo is really quite out of step with the rest of the builder functions---it's not needed to aggregate values (buildBlock does that), and it's specific to one rarely-used statement kind. I'll remove it outright.

I can see this as an implementation fallback---if you've implemented the more-general buildEither(first:) and buildEither(second:) but not buildOptional(_:), we can do the above transformation. That makes function builders slightly more convenient to implement. I don't like striking buildOptional entirely, because I don't want to push more type-based overloading into the builders like you mention here:

buildArray enables the for..in loop. While you could use it for selection statements (if/else/switch) as a fallback---heck, you could use it instead of buildBlock as a fallback---it feels wrong to conflate these. buildEither is about selection, buildArray is about looping, buildBlock is the basic aggregation.

If we're trying to minimize the effort needed for a non-type-preserving function builder, I think the "simple" function builder protocol is a better approach than fine-tuning the fallback mechanisms for the language proposal.

It feels like a refactoring action: show the translation of a transformed function into one that has explicit calls into the function builder and an explicit return with the result. I agree that this would help a lot.

That the playground transform happens up at the language level, rather than in a lower-level pass, has been a significant source of issues. If we were to change the playground transform, it wouldn't be toward integrating more in the language---it would be to move it later in the compilation pipeline, after semantic analysis, so we could be 100% sure it didn't affect the semantics of the program. Perhaps we can draw some inspiration from the idea of the playground transform, but function builders won't be a replacement for it, no matter how expressive they become.

Doug

John_McCall · August 20, 2020, 6:16pm

The idea behind buildDo was that it might combine well with attributes to allow site-specific customization of building. But we haven't added that, and there's no real reason to vs. just using some sort of DSL-specific combinator. So I'm fine with just dropping it.

Palle · August 21, 2020, 3:50pm

I am still convinced that the buildBlock functions that take a fixed amount of parameters are a bad design choice for the following reasons:

Without variadic generics, these are either limited to homogeneous types or a fixed number of arguments. For example in SwiftUI, every function builder closure can have 10 subviews at max. In some projects of mine, I have found this limitation to be annoying, especially when building things other than views (e.g. a Layer builder for neural network layers)
Related to the first: Without variadic generics, there will be overload hell
Even with variadic generics, their expressiveness is more limited than other approaches.
They can only be type checked in one big block, not at each line in code.

My opinion therefore is that function builders should behave more like a state machine. They should be in a state, consume an expression and produce a new state. This goes beyond the stateful function builder future direction in the proposal. IMO, function builders should be completely based on this approach.

In code, this could look like the following:

@functionBuilder
enum ViewBuilder {
    static func initialState() -> EmptyView {
        EmptyView()
    }
    static func next<State: View, Input: View>(state: State, input: Input) -> TupleView<(State, Input)> {
        ...
    }
}

Methods like buildIf would pre-process inputs, such that they can then be consumed by next.

Then, a function builder closure would work the following way:

VStack {
    Text("Hello")
    Text("World")
}

VStack {
    let s0 = ViewBuilder.initialState()
    let s1 = ViewBuilder.next(state: s0, input: ViewBuilder.buildExpression(Text("Hello")))
    let s2 = ViewBuilder.next(state: s1, input: ViewBuilder.buildExpression(Text("World")))
    return s2
}

This would make function builders much more flexible in the following ways:

The 10 view limit will be gone.
Type checking happens at each step when an input is consumed.
With the state machine approach, function builders can express more complex constraints. For example, generic constraints could be added such that a function builder parses a regular or context free language by modelling a pushdown automaton:

protocol ParenthesisState {}
struct EmptyState: ParenthesisState {}
struct NestedState<Child: ParenthesisState>: ParenthesisState {}

protocol Symbol {}
struct OpenParenthesis: Symbol {}
struct CloseParenthesis: Symbol {}

@functionBuilder
enum ParenthesesBuilder {
    static func initialState() -> EmptyState
    static func next<State: ParenthesisState>(state: State, input: OpenParenthesis) -> NestedState<State>
    static func next<Substate: ParenthesisState>(state: NestedState<Substate>, input: CloseParenthesis) -> Substate
}

This would then yield the following:

// return type specifies that PDA accepts with empty stack
@ParenthesisBuilder var expression: EmptyState {
    // let s0 = ParenthesisBuilder.initialState()
    OpenParenthesis() 
    // let s1: NestedState<EmptyState>
    OpenParenthesis()
    // let s2: NestedState<NestedState<EmptyState>>
    CloseParenthesis()
    // let s3: NestedState<EmptyState>
    CloseParenthesis()
    // let s4: EmptyState
    // state type matches specified return type → automaton is in accepting state
}

This builder would for example ensure that every OpenParenthesis() has a matching CloseParenthesis().

This is simply not possible with the buildBlock based function builders and would also not be possible if we take variadic generics into account.

Lantua · August 21, 2020, 4:45pm

I have questions about the build* synthesis. So we have a few default implementations:

for buildExpression when Expression == Component
for buildFinalResult when Component == Result
for buildWithLimitedAvailability (using buildBlock)
for buildDo (using buildBlock), should we add it later

Given that we can have multiple Component, is one buildExpression generated for each Component? Does the compiler still synthesize these functions even if the type has explicit buildExpression but not of the matching Component type? The same with buildFinalResult.

Douglas_Gregor · August 21, 2020, 4:49pm

(3) isn't actually true. The type inference model described in the proposal effectively type-checks each argument to buildBlock independently (they are separate statements in the transformed function). Under the hood, this is accomplished via one-way constraints.

(1) isn't something we should contort new features around. I, too, want variadic generics yesterday, and the 10-view limit in SwiftUI is annoying. But it's temporary: once we get variadic generics, the limit goes away. SwiftUI was intentionally architected so it could take advantage of variadic generics here without breaking the ABI.

Beyond that, it's been widely agreed on this thread that most function builders aren't type-preserving: they don't need the parameters of buildBlock to have different types. That means they fall into the case where normal variadic function parameters work fine. So this problem, while highly visible because of SwiftUI, is unlikely to be widespread across many different function builders.

I agree with (2), that the state-machine formation you describe is more expressive. It's a different transformation entirely, that feeds the results from prior statements into later statements. It's a functional fold operation, whereas the current transformation is more like map. To make the case that your design is better than the proposed one, you need use cases to support the additional expressivity--ones that are clear, and cannot be expressed with the current proposal. You also need a mental model that people can use to understand how the DSL applies. And the hard part of the argument is the subjective one: that the use cases you've provided are important enough, numerous enough, and understandable enough to justify the change to the model that you're proposing.

Personally, I am biased against this change, because I think the resulting model is harder to understand, and that the use cases don't outweigh the additional complexity.

Doug

Douglas_Gregor · August 21, 2020, 5:02pm

There is no synthesis of default implementations; we either form calls to these build functions (if they exist), or we don't. Let's take a little SwiftUI example from the proposal:

@ViewBuilder var body: some View {
  ScrollView {
    if #available(iOS 14.0, *) {
      LazyVStack(...)
    } else {
      Stack(...)
    }
  }
}

If you implemented all of buildFinalResult, buildExpression, or buildWithLimitedAvailability, you'd have translated code that looks like this:

var body: some View {
  let v0 = ViewBuilder.buildExpression(ScrollView {
    let v2 /*inferred type */
    if #available(iOS 14.0, *) {
      let v3 = ViewBuilder.buildExpression(LazyVStack(...))
      let v4 = ViewBuilder.buildBlock(v3)
      let v5 = ViewBuilder.buildLimitedAvailability(v4)
      v2 = ViewBuilder.buildEither(first: v5)
    } else {
      let v6 = ViewBuilder.buildExpression(Stack(...))
      let v7 = ViewBuilder.buildBlock(v6)
      v2 = ViewBuilder.buildEither(second: v7)
    }
    let v8 = ViewBuilder.buildBlock(v2)
    return ViewBuilder.buildFinalResult(v8)
  })
  let v1 = ViewBuilder.buildBlock(v0)
  return ViewBuilder.buildFinalResult(v1)
}

For any one of those you don't implement, just drop the call and reference the variable, e.g., without buildExpression, v0 gets initialized directly with the ScrollView instance:

let v0 = ScrollView {

Without buildLimitedAvailability, v5 just becomes v4 (in practice, we wouldn't create v5 at all):

let v5 = v4

And without buildFinalResult, you just return the variable without the call, e.g., the final return:

return v1

Doug

Lantua · August 21, 2020, 5:08pm

I see, thanks, that makes more sense. What would happen if the return type matches no buildFinalResult (and some are declared), but coincidentally matches the Component from the block. A similar situation can also happen with buildExpression.

I think the transformation should be forced to use the buildFinalResult if at least one is declared. So it would reject Component return type if the API author defines buildFinalResult, but not buildFinalResult(_: Component) -> Component. Likely Component return type wouldn't be expected in such scenario. Same with buildExpression. I haven't thought about buildWithLimitedAvailability enough to be sure, though.

Douglas_Gregor · August 21, 2020, 5:16pm

You'll get an error. If a buildFinalResult is declared, we'll form a call to it; if that call fails to type check, it's an error in the program. Whether it's an error in the function builder itself (wrong signature, etc.) or in the function body (used the DSL wrong and the type checker caught it). Same thing with any other function. buildExpression is perhaps the most interesting, because you can use it to say what values are allowed in your DSL. SwiftUI's ViewBuilder, for example, requires all values to conform to View.

Doug

Lantua · August 21, 2020, 5:19pm

Technically that could be done at buildBlock, right? Or do you mean that it allows for fast bail-out since it'd fail at the per-statement type-checking? That's another interesting technique that's only allowed by the buildExpression type checking rule.

Douglas_Gregor · August 21, 2020, 5:27pm

You can do this at buildBlock, but buildExpression is more powerful because you can affect how the expression is type-checked.

Doug

Palle · August 21, 2020, 5:42pm

I did not know the details regarding the type checking system with forward constraints, so my point (4) can therefore be disregarded to some extent. Is this forward type checking function builder specific or is it a general Swift feature? If it is the former, having a fold-based approach would eliminate the need for this kind of special implementation, having each step being a separate expression that follows the previous expression would enforce this without any special type checking logic.

Also, besides type checking, the enforcement of runtime constraints could be easier to follow in the debugger. If the buildBlock does some checks that require multiple views and that check fails, the assertion error would occur in the buildBlock function with no indication where something went wrong.
With the fold approach, these checks could occur in each next call, thereby failing closer to the cause of the issue.

The fold-based approach would not necessarily be more complex. In most use cases, the builder would be provided by library authors anyways and not enforce any special constraints, thereby behaving like the buildBlock approach without the 10 view limit. If no type system juggling (like in the PDA example) takes place, we would simply generate a state that describes all the views that have been captured so far.
A function builder would only have to have three or four methods instead of ten or more for all the different overloads of buildBlock.

Use cases where the stronger expressivity of the fold based approach could be useful for the following situations:

A builder could enforce limits on what elements can be added after another element has been added. For example, they could enforce having only one NavigationView in a view builder or only one <head> in a HTML builder.
In a HTML table, they could enforce a constraint such that every row has the same number of cells.
In a layer builder for neural networks, they could enforce a constraint such that the output type of a layer matches the input type of the next one without a limit on the number of layers. (If Variadic generics can express something like (LayerType...).(x-1).Output == (LayerType...).x.Input, this would also work with them).
Diagnostics could be improved: By specifying an @available attribute for some overloads of the next function, a warning or error can be emitted for a specific element of a function builder closure (like "X should not be used in this place").

Also, regarding variadic generics: I don't believe it would be a great design choice to limit the expressiveness of an existing feature like function builders based on some future feature that could still be very far off, not make it into the language at all in case it is rejected in the evolution process for whatever reason or be blocked by implementation problems.

Douglas_Gregor · August 21, 2020, 6:12pm

We added it to make function builder type checking perform adequately, but it's a general implementation mechanism that's fairly important for closures in general. For example, I'm using it to explore multi-statement closure type inference.

That is incorrect. Your approach still depends on one-way constraints to avoid propagating information "backwards" through the closure. Without them, you'll have the same problem Swift 5.1 had, which was fixed in Swift 5.1.1 with one-way constraints.

Yes, but there are other ways to express this structural property without embedding it in the type system, e.g., multiple trailing closures to separate the different bits.

I expect that variadic generics can do this with buildBlock as it is today.

Embedding these constraints deep in overloaded buildNext functions in the type checker is unlikely to improve diagnostics.

It's not a limitation, it's boilerplate. Yes, it's annoying to write 10 different versions of buildBlock, but it isn't a problem with expressiveness.

Doug

[Pitch #2] Function builders

Variation in the DSL’s dialect

buildDo’s arity

buildOptional and buildEither

Ad-hoc method patterns

The playground transform

`buildDo`’s arity

`buildOptional` and `buildEither`