Advice for implementing Closures As Structs

I'm working on implementation for swift-evolution/NNNN-closures-as-structs.md at master · nickolas-pohilets/swift-evolution · GitHub. I'm pretty new to the compiler codebase, and still feeling a bit lost, but slowly making my way through.

I would appreciate some feedback on my implementation plan so far and some help with filling in the gaps.

Proposal describes being able to satisfy potential any protocol requirement with closure body. I'm not sure this is solvable in general case, so for now I've limited my scope only to a single callAsFunction requirement.

Implementation consists of two parts:

  1. Detect if closure should be transformed at all, and if it should - what protocols it should conform to, which protocol requirement should be satisfied by the closure body, and types for associated types used in the signature of the protocol requirement.
  2. Do the AST re-writing.

First part

This is mostly about constraint solver.

I'm planning to introduce a new type, something like ClosureAsStructType. This type should be used as a placeholder for the type of the structure created. If closure got assigned this type after type-checking - that means it should be re-writen. After re-writing this type will disappear. But depending on where re-writing will happen, it may escape from the constraint solver. Do I understand correctly that in order to be UNCHECKED_TYPE it should not escape from constraint solver and be allocated in the constraint solver area?

Current logic always assigns FunctionType to a closure. I'm planning to assign a fresh type variable to it, and create a disjunction of two constraints equality constraints - either to FunctionType, or to a ClosureAsStructType. Or should I keep FunctionType assigned to a closure, and assign type variable to something else?

I suspect assigning a fresh type variable to every closure may cause a performance regression. But for now I'm concerned about getting something working, and will return to performance optimisations later.

I'm planning to update the score calculation system, so that overloads where closure literal matches with function type should win against overloads where closure get-s re-written. That's mostly for backward compatibility and can be reversed later. If we ever make function types true protocols, should be prioritise them differently compared to hand-written protocols? Or should we prioritise between generic vs existential types? Looks like currently existentials win for regular protocols (but I suspect there might be nuances), which is consistent with current decision to prefer function types to closure-as-struct.

When processing protocol conformance constraint against a ClosureAsStructType, constraint is always solved (unless protocol requires a class), but matched protocol is recorded for the future processing. I should be able to backtrack matched protocols, so looks like it should be stored in the ConstraintSystem and reverted by the SolverScope.

I don't know yet how to handle protocols that don't have suitable requirement or where requirement is ambiguous. Probably they should be silently ignored if there are other overloads available.

To validate protocol requirements, I need to know that I've seen all the protocol requirements. How can I ensure that in the constraint solver?

Also, I need to match FunctionType of the closure with the type of the protocol requirement. I think I can handle this by creating a bunch of constraints of existing kinds when processing protocol requirements. But maybe this should be a new kind of constraint between ClosureAsStructType and FunctionType.

I need to keep track of association between ClosureAsStructType and related FunctionType of the closure. Can I store reference to FunctionType inside the ClosureAsStructType , or should I use locators or some other mechanism?

I don't yet fully understand how SolverStep's work. I'll keep reading the code, but maybe there are any other sources I could check? https://github.com/apple/swift/blob/master/docs/TypeChecker.rst does not seem to cover this.

Second part

First of all, what would be the place to do the re-writing? Should it happen when applying constraint solver solution or later?

I'll need to re-write closure expression into something which is an expression, but may contain function declaration inside. Any alternatives to wrapping all this into an immediately invoked closure? Can I even construct an AST for closure with return type that is declared inside the closure body? Alternatively I can sacrifice locality of the re-writing and put all the declarations for anonymous structs in the top of the parent scope and re-write closure expressions into sole structure instance construction.

Re-writing calls to super may a bit challenging. Will SuperRefExpr work fine, if it's Self is not an actual self of the struct, but a property in the struct?

I was able to make some progress, and currently my challenge is to make sure that I've seen all the protocol conformance constraints.

I need to validate that there is exactly one protocol requirement that should be implemented by the closure body, and create a new constraint that binds signature of that requirement with function type of the closure.

Alternatively, I can try to create constraints for function type as I encounter protocols one by one, and have some additional validation that at least one matching protocol requirement was found before producing a solution. Looks like the later should go into the ComponentStep. I don't like this approach much, as it spreads logic over the codebase.

@Douglas_Gregor, @xedin, could you advise something?

After reading the description and associated draft of the proposal it seems like https://github.com/apple/swift/pull/28837 is going to make it easier for you to do what you want because it would delay constraint generation until contextual type is available (e.g. Predicate from your example).

I don't understand exactly how you are planing to do witness matching but I guess some of the existing declaration checking logic could be re-used for that. Also, to limit performance impact, it might make sense to add a special declaration attribute so the implicit transformation is only attempted on the protocols that have expected structure.

How would it work with overloading? In the following example, I cannot know contextual type without considering closure body:

protocol P { func callAsFunction(_ x: Int) -> Int }
protocol Q { func callAsFunction(_ x; String) -> String }
f(_ x: P) {}
f(_ x: Q) {}
f(_ x: (Int, Bool) -> String) {}

f { $0 + 2 } // should match first overload, no ambiguity

Not really. Because declaration does not exist at the moment of type checking. Instead, type checking tells me if declaration should be created at all.


Looks like I need to create a new kind of constraint to maintain adjacency between ClosureAsStructType and related FunctionType. Otherwise my constraint graph falls apart into two components and fails to type-check.

Currently, given

protocol Bar {
	func callAsFunction(_ x: Int) -> Int
}
func f<T: Bar & Equatable>(_ x: T) {}
func test() {
	f { $0 } //(z)
}

I have the following starting constraint system:

Score: 0 0 0 0 0 0 0 0 0 0 0 0
Type Variables:
  $T0 [lvalue allowed] [noescape allowed] as ($T1) -> () @ locator@0x126ad1a00 [DeclRef@/Users/npohilets/mini.swift:47:2]
  $T1 potentially_incomplete involves_type_vars #defaultable_bindings=1 bindings={<<unresolvedtype>>} @ locator@0x126ad1a50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> generic parameter 'T']
  $T2 subtype_of_existential involves_type_vars bindings={} @ locator@0x126ad1c28 [Closure@/Users/npohilets/mini.swift:47:4 -> closure result]
  $T3 [inout allowed] [noescape allowed] subtype_of_existential involves_type_vars bindings={} @ locator@0x126ad1c80 [Closure@/Users/npohilets/mini.swift:47:4 -> tuple element #0]
  $T4 [lvalue allowed] [noescape allowed] potentially_incomplete subtype_of_existential involves_type_vars bindings={} @ locator@0x126ad1c80 [Closure@/Users/npohilets/mini.swift:47:4 -> tuple element #0]
  $T5 [noescape allowed] subtype_of_existential involves_type_vars bindings={} @ locator@0x126ad1de0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]
  $T6 [lvalue allowed] [noescape allowed] equivalent to $T4 @ locator@0x126ad1f98 [DeclRef@/Users/npohilets/mini.swift:47:6]
  $T7 [noescape allowed] as () @ locator@0x126ad2080 [Call@/Users/npohilets/mini.swift:47:2 -> function result]

Active Constraints:
  $T1 conforms to Equatable [[locator@0x126ad1ac0 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #0 (conformance)]]];
  $T1 conforms to Bar [[locator@0x126ad1b50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #1 (conformance)]]];
  $T3 bind param $T4 [[locator@0x126ad1c80 [Closure@/Users/npohilets/mini.swift:47:4 -> tuple element #0]]];
  disjunction (remembered) [[locator@0x126ad1de0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]]]:$T5 bind ($T3) -> $T2 [[locator@0x126ad1de0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]]]; or $T5 bind closure-as-struct at /Users/npohilets/mini.swift:47:4 [[locator@0x126ad1de0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]]];

Inactive Constraints:
  $T4 conv $T2 [[locator@0x126ad1c28 [Closure@/Users/npohilets/mini.swift:47:4 -> closure result]]];
  $T5 arg conv $T1 [[locator@0x126ad2148 [Call@/Users/npohilets/mini.swift:47:2 -> apply argument -> comparing call argument #0 to parameter #0]]];
Resolved overloads:
  selected overload set choice f: $T0 == ($T1) -> ()
  selected overload set choice $0: $T6 == $T4

Where:

  • $T1 is a type of the opened generic parameter T
  • $T5 - type of the closure expression which is either ($T3) -> T2 or closure-as-struct.

First alternative in disjunction tries to bind $T5 to ($T3) -> T2 and fails as expected.
But when attempting second alternative, $T2 and $T3 end up being disconnected from $T5.

I need a way to express that in the second alternative ($T3) -> T2 matches with the type of the callAsFunction protocol requirement. I think it should start as a constraint between closure-as-struct and ($T3) -> T2, which could be simplified into ($T3) -> T2 := (Int) -> Int after processing closure-as-struct conforms to Bar.

The https://github.com/apple/swift/blob/master/docs/TypeChecker.rst says:

It is expected that the constraints themselves will be relatively stable, while the solver will evolve over time to improve performance and diagnostics.

@xedin, do you think this is a valid case for introducing a constraint kind?

In your example for each overload there is going to be argument type provided as contextual for the closure to match against: (P) -> Void, (Q) -> Void and (Int, Bool) - > String. Body is re-opened for each new contextual type.

I need to think about this some more but I'd prefer we didn't introduce any new types or constraint kinds to the constraint system especially since it would lead to even more disjunctions...

In the example with Bar & Equatable before opening the body of the closure we'd already know that closure has a shape of (<arg>) -> <result> and contextual type is ($T) -> Void where $T conforms to Bar and Equatable.

Maybe easiest solution would be to use function type from callAsFunction requirement of Bar as a contextual type for closure body. If that doesn't fail it would be good enough indication that closure matches that requirement, but Equatable conformance I'm not sure how to handle...

At this stage Equatable just needs to be recorded somewhere. If type-checking will indicate that re-writing should happen, then generated anonymous structure will declare conformance to Equatable and Bar. And only after re-writing it will be checked if declared conformance is actually implemented. All the requirements but single callAsFunction need to be implemented by protocol extensions or synthesized by the compiler.

Ok, that makes sense, although not ideal because it means that solver would produce potentially invalid solution(s) if something can't be synthesized. Regardless of that, I think a good strategy here would be to
modify matchCallArguments to replace Bar with its opened callAsFunction requirement if the argument is a closure (I have added TypeVariable::Implementation::isClosureType() for that) and record that such conversion has happened, every other parameter requirement has to be ignored. That way when solution is formed it would have necessary information for rewriting to happen.

I've checked out the latest master. For the following example:

protocol Bar {
	func callAsFunction(_ x: Int) -> Int
}
func f<T: Bar & Equatable>(_ x: T) {}

func test() {
	f { $0 }
}

Contextual type does not get inferred, instead DefaultClosureType is used as a source of binding for closure type variable. So looks like, disjunction between closure-as-struct type and function type cannot be avoided. To be precise, now it should be a disjunction between closure-as-struct type and type variable constrained to function type using DefaultClosureType constraint.

I'll try to see if I can use ValueMember to bind closure-as-struct type and function type (again using that variable) to avoid introducing new constraint kinds.

It's not because CSBindings doesn't know how to infer it. I think to make this work you have to teach getPotentialBindings how to infer types in such situations.

How should inferred type even look like in this situation? This should be something capturing protocol requirements.

It would probably have to do something similar to what simplifyDynamicCallableApplicableFnConstraint does today where it would detect that parameter is a "callable" nominal type and infer the function type from callAsFunction requirement (simplifyDynamicCallableApplicableFnConstraint forms a member constraint but that is just a nuance). Type variable opened to replace generic parameter has information about its requirements so it should be easy to retrieve Bar and do inference in getPotentialBindings.

Actually I meant simplifyApplicableFnConstraint it detects that "function type" is callable via - desugar2->isCallableNominalType(DC) and adds an implicit call to the callAsFunction member.

In case of generic function, to obtain a contextual type we would need to reverse effect of openFunctionType(). And I'm still not sure about the form of the type. If we consider a complicated case of:

func f<T, U>(x: U, y: T) where T: Proto, T: BaseClass, T == DerivedClass<U> {}
f(x: 1, y: { $0 }) // should fail to type-check

What should be the contextual type of closure be in this case? Should it be GenericTypeParamType, OpaqueTypeArchetypeType or ProtocolCompositionType? None seem to fit well.

Meanwhile, I'm still investigating a disjunction-based solution. I've tried to use ValueMember constraint to link ClosureAsStructType to function signature. But constraint graph keeps falling into multiple components. My understanding is that's because ClosureAsStructType is a concrete type, not a type variable and it does not make a vertex in constraint graph.

And also conjunction constraint was removed in remove ConstraintKind::Conjunction, it is unused and unnecessary - after · apple/swift@f6ce68a · GitHub, so there is no efficient way to group binding to ClosureAsStructType and ValueMember to function type together.

I still need ClosureAsStructType to record the result, but looks like it should not be part of the original disjunction, but be supplied as a binding when a suitable protocol requirement is found. Then, I don't need conjunction, and can just use a ValueMember constraint (or maybe a special kind of constraint).

For the given example:

protocol Bar {
	func callAsFunction(_ x: Int) -> Int
}
func f<T: Bar & Equatable>(_ x: T) {}

func test() {
	f { $0 }
}

Initial constraint system would look something like this (edited by hand, locators may be incorrect):

Score: 0 0 0 0 0 0 0 0 0 0 0 0
Type Variables:
  $T0 [lvalue allowed] [noescape allowed] as ($T1) -> () @ locator@0x130809e00 [DeclRef@/Users/npohilets/mini.swift:47:2]
  $T1 potentially_incomplete involves_type_vars #defaultable_bindings=1 bindings={<<unresolvedtype>>} @ locator@0x130809e50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> generic parameter 'T']
  $T2 [noescape allowed] bindings={} @ locator@0x13080a028 [Closure@/Users/npohilets/mini.swift:47:4] @ locator@0x13080a1e0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]
  $T3 [inout allowed] [noescape allowed] subtype_of_existential bindings={} @ locator@0x13080a070 [Closure@/Users/npohilets/mini.swift:47:4 -> tuple element #0]
  $T4 subtype_of_existential bindings={} @ locator@0x13080a0c0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure result]
  $T5 [noescape allowed] as () @ locator@0x13080a440 [Call@/Users/npohilets/mini.swift:47:2 -> function result]

Active Constraints:
  $T1 conforms to Equatable [[locator@0x130809ec0 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #0 (conformance)]]];
  $T1 conforms to Bar [[locator@0x130809f50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #1 (conformance)]]];

Inactive Constraints:
  disjunction (remembered) [[locator@0x13080a1e0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice]]]:$T2 closure can default to ($T3) -> $T4 [[locator@0x13080a028 [Closure@/Users/npohilets/mini.swift:47:4]]]; or $T2[.callAsFunction: value] == ($T3) -> $T4 [[locator@0x13080a2a0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct protocol requirement]]];
  $T2 arg conv $T1 [[locator@0x13080a508 [Call@/Users/npohilets/mini.swift:47:2 -> apply argument -> comparing call argument #0 to parameter #0]]];
Resolved overloads:
  selected overload set choice f: $T0 == ($T1) -> ()


Opened types:
  locator@0x130809e00 [DeclRef@/Users/npohilets/mini.swift:47:2] opens τ_0_0 -> $T1

Which after attempting second alternative in disjunction, should boil down to:

Score: 0 0 0 0 0 0 0 0 0 0 0 0
Type Variables:
  $T0 [lvalue allowed] [noescape allowed] as ($T1) -> () @ locator@0x1258afe00 [DeclRef@/Users/npohilets/mini.swift:47:2]
  $T1 potentially_incomplete bindings={(supertypes of) closure-as-struct at /Users/npohilets/mini.swift:47:4} @ locator@0x1258afe50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> generic parameter 'T']
  $T2 [noescape allowed] bindings={($T3) -> $T4} @ locator@0x1258b0028 [Closure@/Users/npohilets/mini.swift:47:4]
  $T3 [inout allowed] [noescape allowed] subtype_of_existential bindings={} @ locator@0x1258b0070 [Closure@/Users/npohilets/mini.swift:47:4 -> tuple element #0]
  $T4 subtype_of_existential bindings={} @ locator@0x1258b00c0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure result]
  $T5 [noescape allowed] as () @ locator@0x1258b0440 [Call@/Users/npohilets/mini.swift:47:2 -> function result]

Active Constraints:
  $T2 arg conv $T1 [[locator@0x1258b0508 [Call@/Users/npohilets/mini.swift:47:2 -> apply argument -> comparing call argument #0 to parameter #0]]];
  $T1 conforms to Bar [[locator@0x1258aff50 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #1 (conformance)]]];
  $T1 conforms to Equatable [[locator@0x1258afec0 [DeclRef@/Users/npohilets/mini.swift:47:2 -> opened generic -> type parameter requirement #0 (conformance)]]];
  T2[.callAsFunction: value] == ($T3) -> $T4 [[locator@0x1258b02a0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct protocol requirement]]];

Resolved overloads:
  selected overload set choice f: $T0 == ($T1) -> ()

Disjunction choices:
  locator@0x1258b01e0 [Closure@/Users/npohilets/mini.swift:47:4 -> closure-as-struct disjunction choice] is #1

Opened types:
  locator@0x1258afe00 [DeclRef@/Users/npohilets/mini.swift:47:2] opens τ_0_0 -> $T1

Suggested binding should already contain a protocol requirement. So probably $T1 conforms to Bar should be the binding source, suggesting to bind $T1 to ClosureAsStructType capturing FuncDecl for func callAsFunction(_ x: Int) -> Int.

The last idea seemed to work so far. Now I'm working on re-writing and then will return to type-checker for handling more cases and performance optimisation.

Currently, I'm creating a struct declaration at the file level and re-writing closure expression into constructor call. Using file level simplifies implementation - no need to handle different scopes. And also enables anonymous structs to be generic.

I'm concerned that modifying FileUnit while type-checking it may lead to some problems. But I haven't looked into that yet.

Currently all the structs are named as $_closure_as_struct_N, where N is a number starting from zero. But would be nice to encode somehow the name of the function where closure is created in the name of struct, to have more meaningful stack traces.

Should I use some closure-related source locations (probably location of the opening brace) as a location for struct elements, of just keep them all empty?

My current challenge is to replace ClosureAsStructType used inside other types with the actual struct type. Looks like one of the simplifyType() methods is the right tool to do that. For the purpose of constraint solving ClosureAsStructType is a concrete type, which is part of the solution. So probably I need Solution::simplifyType(), but not the ConstraintSystem::simplifyType(), right?

While applying the solution, I need to keep track which structure type was associated with each ClosureAsStructType. Can I store this in the Solution itself and modify solution object or solution object is assumed to be immutable at this stage?

And finally, I still need to convert closure body into body of the structure method. I will need to replace all references to captured variables with references to properties and move all the decls to new DeclContext. Can I update existing AST of closure, or do I need to recursively clone it with modifications?