[Pitch] Enable multi-statement closure parameter/result type inference

xedin · October 6, 2021, 12:52am

Authors: Pavel Yaskevich
Implementation: [TypeChecker] Incremental multi-statement closure type-checking (disabled by default) by xedin · Pull Request #38577 · apple/swift · GitHub

Introduction

I propose to improve inference behavior of multi-statement closures by enabling parameter and result type inference from the closure body. This will make type inference less surprising for developers, and remove the existing behavior cliff where adding one more expression or statement to a closure could result in a compilation failure.

Motivation

Multi-statement closures, unlike their single-statement counterparts, currently cannot propagate information, e.g. parameter and result types from their body, back to the enclosing context, because they are type-checked separately from the expression containing the closure. Information in such closures flows strictly in one direction - from the enclosing context into the body, and statement by statement from the top to the bottom of the closure.

Currently adding a new expression or statement to a single-statement closure could lead to surprising results.

Let’s consider the following example:

func map<T: BinaryInteger>(fn: (Int) -> T) -> T {
  return fn(42)
}

func doSomething<U: BinaryInteger>(_: U) -> Void { /* processing */ }

let _ = map {
  doSomething($0)
}

Because single-statement closures are type-checked together with enclosing context, it’s possible to infer a type for generic parameter T based on the result type of a call to doSomething. The behavior is completely different if doSomething is not the only call in the body of the map closure:

let _ = map {
  logger.info("About to call 'doSomething(\$0)'")
  doSomething($0)
}

This closure is considered to a multi-statement closure, and currently the type inference behavior is different from the previous example. The body of a multi-statement closure is type-checked after the call to map has been fully resolved, which means that a type for generic parameter T could either a). not be determined at all (because it depends on the body of the closure); or b). be inferred to Void which is a type all single-expression closures could default to.

Neither a). nor b). could be considered an expected outcome in this case from the developer’s perspective because by looking at the code it’s unclear why a). the result type of the closure couldn’t be determined because doSomething($0) is a valid call and b). Where did Void come from and/or why default type has been applied. Another angle here is compiler diagnostics - because the body of the map closure is type-checked separately, it’s impossible to pinpoint the precise source of an error which leads to either mis-diagnosing the issue, e.g. ‘Void’ does not conform to ‘BinaryInteger’, or a fallback diagnostic asking to specify type of the closure explicitly.

Proposed solution

I propose to allow multi-statement closures to propagate information inferred from the body back to the expression it’s contained in. Such would unify semantics with their single-statement counterparts, remove the distinction from the language, remove artificial restrictions from parameter and result type inference, and help developers to write better and more expressive code.

Detailed design

Multi-statement closure inference would have the following semantics:

Parameter and result type inference:
- Contextual type of the closure is the primary source of type information for its parameters and return type.
- If there is no contextual information, the first reference to an anonymous parameter defines its type for the duration of the context. If subsequent statements produce a different type, such situation is considered an ambiguity, and reported as an error;
- Just like anonymous parameters, the first return statement defines the result type of the closure (inference is done only if there is no contextual information), if type-checker encounters return statements that produce different types such situation is considered an error.
- inout inference from the body of a closure and back-propagation to the context is not supported - inout must be explicit on the parameter declaration or in the contextual type. This is a status quo behavior without source compatibility implications, and the least surprising for the developers albeit inconsistent with parameter type inference in single-statement closures. Please refer to the Future Directions section for the possibility of unifying this behavior.
- Void is a default result type for a closure without any explicit return statements.
  - Single-expression closures are allowed to default to Void if enclosing context requires it e.g. let _: (Int) → Void = { $0 } unless return is explicit let _: (Int) -> Void = { return $0 } results in an error.
The body of a closure is type-checked just like before, information flows one way, from the first statement to the last, without any backward propagation of information between statements.
- This is important because in this model closure inference is consistent with other types of declarations: functions, subscripts, getters etc., and first type inferred from a declaration becomes its de facto type;
- Type-checking of a closure that contains a single expression remains unchanged in this proposal.

Let’s go back to our example from Motivation section.

func map<T: BinaryInteger>(fn: (Int) -> T) -> T {
  return fn(42)
}

func doSomething<U: BinaryInteger>(_: U) -> Void { /* processing */ }

let _ = map {
  logger.info("About to call 'doSomething(\$0)'")
  doSomething($0)
}

According to new unified semantics, it is possible to correctly type-check the closure argument to map and infer:

Anonymous parameter $0 to be Int type based on the expected argument type for the call to doSomething;
Result type of the closure to be Void because there are no explicit return statements in the body.

Let’s consider another example which would be supported under new semantic rules:

struct Box {
  let weight: UInt
}

func heavier_than(boxies: [Box], min: UInt) -> [UInt] {
  let result = boxies.map {
    if $0.weight > min {
       return $0.weight
    }

    return 0
  }

  return result
}

Currently, because multi-statement closures are type-checked separately from the expression, it’s impossible to determine the result type of the closure. Things are made even worse because diagnostics can’t provide any useful guidance either and would only suggest to specify type of the closure explicitly.

Under the new semantic rules, result would be type-checked to have a type of [UInt] because the first return statement return $0.weight is inferred as UInt and that type information is first propagated to the integer literal 0 in the next return statement (because information flows from the first statement to the last), and back to the expression afterwards.

This could be extrapolated to a more complex expression, for example:

struct Box {
  let weight: UInt
}

func precisely_between(boxies: [Box], min: UInt, max: UInt) -> [UInt] {
  let result = boxies.map {
    if $0.weight > min {
       return $0.weight
    }

    return 0
  }**.****filter {
    $0 < max
  }**

  return result
}

Use of multi-statement closures (or simply closures) becomes less cumbersome by removing the need to constantly specify explicit closure types which sometimes could be pretty large e.g. when there are multiple parameters or a complex tuple result type.

Type-Checker Performance Impact

There were attempts to improve this behavior over the years, the latest one being Multi-statement closure type inference by DougGregor · Pull Request #32223 · apple/swift · GitHub by Doug Gregor. All of them ran into technical difficulties related to internal limitations of the type-checker, because multi-statement closures, just like function/subscript/setter bodies, tend be composed of a substantial number of statements and cannot be efficiently type-checked the same way as single-expression closures are, which leads to “expression too complex” errors. These issues are resolved by new implementation that takes a more incremental approach.

Source compatibility

All of the aspects of a single-statement closure type-checking are preserved as-is by this proposal, so there is no source compatibility impact for them.

There is at least one situation where type-checker behavior differs between single- and multi-statement closures because multi-statement do not support parameter type inference from the body. Some of the expressions that used to fail type-check (due to ambiguity) with single-statement closure argument, but type-checked with multi-statement, would now become ambiguous regardless of type of the closure used.

Let’s consider a call to an overloaded function test that expects a closure argument:

func test<PtrTy, R>(_: (UnsafePointer<PtrTy>) -> R) -> R { ... }
func test<ResultTy>(_: (UnsafeRawBufferPointer) -> ResultTy) -> ResultTy { ... }

let _: Int = test { ptr in
  return Int(ptr[0]) << 2
}

Currently call to test is ambiguous because it’s possible to infer that PtrTy is Int from the body of the (single-statement closure) closure, so both overloads of test produce a valid solution. The situation is different if we were to introduce a new, and possibly completely unrelated, statement to this closure e.g.:

func test<PtrTy, R>(_: (UnsafePointer<PtrTy>) -> R) -> R { ... }
func test<ResultTy>(_: (UnsafeRawBufferPointer) -> ResultTy) -> ResultTy { ... }

let _: Int = test { ptr in
  print(ptr) // <-- shouldn't affect semantics of the body
  return Int(ptr[0]) << 2
}

This new call to test type-checks because the body of this multi-statement closure under current language rules doesn’t participate in the type-check, so there is no way to infer PtrTy from the first overload choice. Under the proposed rules type-checker would be able to determine that PtrTy is Int based on return statement just like it did in the previous single-statement closure example, which means that call to test becomes ambiguous just like it did before introduction of print .

There are a couple of ways to mitigate situations like this:

Add a special ranking rule to type-checker that preserves current behavior by rejecting solutions where parameter type has unresolved (either completely or partially unresolved) parameter and/or result types and argument is a multi-statement closure in favor of an overload choice with “less generic” type e.g. one that only has an unresolved result type.
Ask users to supply an explicit type for a parameter and/or result type that unambiguously determines the overload e.g. UnsafePointer<Int> or UnsafeRawBufferPointer in our example.
Don’t do any parameter and/or result type inference from the body of the closure. This is exactly how current multi-statement closures are type-checked under existing rules, which is too restrictive.

Future Directions

`inout` inference without a contextual type

There is an inconsistency between single- and multi-statement closures - inout inference from the body of a multi-statement closure and its back-propagation is unsupported and requires explicit parameter annotation e.g. { (x: inout Int) -> Void in ... } .

Currently inout is allowed to be inferred:

From contextual type for anonymous and name-only parameters e.g. [1, 2].reduce(into: 0) { $0 += $1 }. In this case inout is passed down to the body from the contextual type - (inout Result, Self.Element) → Void, so inference only happens one way.
For single-statement closures it’s possible to infer inout of the external parameter type (visible to the expression closure is associated with) based on its use in the body - assignment, operators, in argument positions with explicit &.

Back-propagation behavior, second bullet, is inconsistent across different kinds of closures (it works only for single-statement closures). This is confusing because there are no visual clues for the developers to reason about the behavior, and easily fixed by providing explicit closure type.

To make incremental progress, I think it’s reasonable to split inout changes from this proposal, because of uncertainty of source compatibility impact (that might be too great for such change to be reasonable for the language) unification of this behavior between single- and multi-statement closures could be a future direction for Swift 6. Doing so would allow to improve closure ergonomics without source compatibility impact, and take advantage of the new implementation to improve result builder and code completion performance and reliability.

Type inference across `return` statements in the body of a closure

It’s common to have situations where an early guard statement returns nil that doesn’t supply enough type information to be useful for inference under the proposed rules:

func test<T>(_: () -> T?) { ... }

test {
  guard let x = doSomething() else {
    return nil // there is not enough information to infer `T` from `nil`
  }

  ...
}

Only way to get this closure to type-check is to supply explicit type e.g. () -> Int? in ... . To improve this situation type-checker could allow type inference across return statements in the body of a closure. That would mean that the actual type of the result would be a join between all of the types encountered in return statements, which is going to be semantically unique for the language.

Effect on ABI stability

No ABI impact since only type-checker handling of closures is going to change but outcomes should not.

Effect on API resilience

This is not an API-level change and would not impact resilience.

Alternatives considered

Keep current behavior.

Acknowledgments

Holly Borla - for helping with content and wording of this proposal.

ktoso · October 6, 2021, 2:38am

This is fantastic, thanks @xedin!

We've had a number of APIs that were held back a little bit by this limitation, e.g. in swift aws lambda runtime. So this is very a very exciting and welcome change.

xwu · October 6, 2021, 2:52am

It’s fantastic to see forward movement on this initiative. A few questions come to mind:

First, if I recall, there was an earlier draft implementation by someone else (sorry, their name isn’t coming to mind) which looked like it was getting in shape but never quite made it as far as this.

Can you speak to what differences in design and implementation are reflected in your effort as compared to the last? Was there anything that changed in terms of the approach that made your implementation progress further? Any differences in the fine details which may give readers a better sense of alternative approaches and their pros and cons?

Second, considering this example:

func heavier_than(boxies: [Box], min: UInt) -> [UInt] {
  let result = boxies.map {
    if $0.weight > min {
       return $0.weight
    }
    return 0
  }

  return result
}

If, instead of return 0 at the end of the closure, the author chooses to guard with an early exit at the beginning of the closure: guard ... else { return 0 }, would that cause a compile time error because that 0 in the first return statement would then be inferred as having default literal type Int?

If so—as it seems from the stated rules—it seems to me that there are several scenarios such as this where literals interact poorly with other type inference rules and can lead to surprising results (we still haven’t solved the issue with integer literals and heterogeneous comparison operators in generic contexts). It would not be ideal to have a minor refactoring such as using an early exit (which Swift explicitly encourages with guard statements) cause unintended changes in type inference. Is there any way for literals’ inferred type to be in a sort of “purgatory” in scenarios such as this without being a form of “back-propagation”?

Third, can you expand a little on why you feel leaving the difference in inout behavior between single and multiple statement closures in place is a suitable “resting place” for the design of Swift? The text states that it’s “confusing”—which raises the question: if it is admittedly confusing, then why is it appropriate to leave fixing the confusion to a future version of Swift instead of an obligatory part of the current proposal?

hborla · October 6, 2021, 3:11am

I believe the previous implementation attempt you're thinking of is this PR from Doug: Multi-statement closure type inference by DougGregor · Pull Request #32223 · apple/swift · GitHub

The differences are in the implementation approach. The original implementation used the approach that result builders currently use, which is to generate constraints for the entire closure body upfront (after the contextual type for the closure has been resolved). Pavel's approach generates constraints for statements in the closure body incrementally, after the previous statement has been solved, which helps the constraint system scale with larger closures. We'd like to migrate result builders to also use this new incremental constraint generation infrastructure, particularly to improve diagnostic performance in result builders. Without incremental constraint generation, large closures can lead to "expression too complex" errors due to the constraint system passing its memory thresholds.

xedin · October 6, 2021, 3:31am

Yes, that’s unfortunate but existing behavior, although I made that example a bit awkward because result is not necessary there, it’s better to write just boxes.map { … } in the body and that would propagate UInt down into the closure and should enable guard refactoring you are talking about. I wanted to show that result would still be a UInt without a contextual type…

That’s what I wanted to do originally but instead opted out to this phased approach because it would allow us to improve ergonomics of closures without source impact and at the same time improve result builder implementation and code completion performance/results. I want to follow up and unify the behavior for single- and multi-statement closures for Swift 6.

jrose · October 6, 2021, 4:18am

It’s cool to see this finally taking form! But inferring parameter types from their use within the closure seems weird to me. Wouldn’t the use site have useful type information? That is, does the following example compile, and if so what’s the rule that allows it to do so?

func map2<T: BinaryInteger>(fn: (UInt) -> T) -> T {
  return fn(42)
}

func doSomething<U: BinaryInteger>(_: U) -> U { /* processing */ }

let _ = map2 {
  let result = doSomething($0)
  return result
}

xedin · October 6, 2021, 5:23am

It would only fallback to the inference of untyped parameter if there is no contextual information. In your example $0 would be inferred as UInt which would then get propagated to result through doSomething.

xedin · October 6, 2021, 5:24am

I realize now that I need to clarify that point in the proposal to mention that inference only happens without contextual information...

stevapple · October 6, 2021, 6:16am

This is extremely useful for the following use case:

let decoder = {
    let decoder = JSONDecoder()
    … // do some config
    return decoder
}()

Currently this code won’t compile. You need to specify : JSONDecoder or () -> JSONDecoder in explicitly, but such statement is rather redundant.

ExFalsoQuodlibet · October 6, 2021, 7:14am

It's wonderful to finally see this! One of the major annoyances I have with Swift is the verbosity of associated values extraction from enums in a transformation pipeline. For example:

enum Foo {
  case bar(String)
  case baz(Int)
}

let foos: [Foo] = [
  .bar("yello"),
  .baz(42)
]

// doesn't compile
// Generic parameter 'ElementOfResult' could not be inferred
let numbers = foos.compactMap {
  if case .baz(let value) = $0 {
    return value
  } else {
    return nil
  }
}

To make the example work, you need to specify the return type of the closure, which also forbids using an anonymous closure argument (thus, redundancy strikes twice). A option to solve this would be to have the if ... else block resolve as a single expression, but I think it's more interesting and scalable (not to mention more "Swifty") to have exactly this: closure type inference in case of multiple statements, and if I understand the pitch correctly, this problem would be solved with it.

DaveZ · October 6, 2021, 2:14pm

Interesting! Nice work. After this proposal is implemented, what is stopping normal functions from having multi-statement type inference (other than policy)?

xwu · October 6, 2021, 3:28pm

This is fantastic information and useful, I think, for the detailed design portion of the text. Please include so we can have it for posterity!

Awkward examples are fine if they serve a didactic purpose! I think these corner cases are worth calling out, including where it aligns with existing behavior and how as a possible future direction (or even as part of this present work) it might be ameliorated—or conversely, why it cannot be improved due to whatever theoretical or practical limitations. Please include this and any other tricky scenarios you think users might encounter!

Could you detail this some more in the text? I think the community reviewing this should be empowered to evaluate the pros and cons of the two approaches (change behavior for single statement closures now versus later)—or even decide that the confusion isn’t really an issue at all and that it doesn’t ever have to be changed.

hborla · October 6, 2021, 4:20pm

Hmm, I disagree that constraint system implementation details belong in proposals. While this information is interesting and useful for those who have an understanding of the constraint system, most people reading this proposal don't, and you certainly don't need to understand the constraint system in order to understand the type inference semantics laid out in this proposal. The implementation approach is detailed in Pavel's PR description, though, and that is the best place for that information to live in my opinion.

soroush · October 6, 2021, 4:33pm

xedin:

Let’s consider the following example:
func map<T: BinaryInteger>(fn: (Int) -> T) -> T {
  return fn(42)
}

func doSomething<U: BinaryInteger>(_: U) -> U { /* processing */ }

let _ = map {
  doSomething($0)
}
Because single-statement closures are type-checked together with enclosing context, it’s possible to infer a type for generic parameter T based on the result type of a call to doSomething . The behavior is completely different if doSomething is not the only call in the body of the map closure:
let _ = map {
  logger.info("About to call 'doSomething(\$0)'")
  doSomething($0)
}

I was initially very confused by this example. In the first code block, the underscored-out value (let _) would have a type of [U], if it was assigning to anything. The second code block, however, subtly changes the type of the underscored-out value to [Void]. I thought the proposal was for removing the return type (which it is, yes), but also removing the return keyword (which it isn't).

Given that the two abilities of single line functions are to a) remove the explicit return type (-> U) and b) remove the return keyword, adding only one of these abilities to multi-line closures seems like a half measure.

Here's a concrete example of how not removing both restrictions makes a worse world. Let's take an example from some NIO code:

database.fetchUser(id)
    .flatMap({ user in
        database.fetchChildren(for: user)
    })

And change it to this:

database.fetchUser(id)
    .flatMap({ user in
        print(user)
        database.fetchChildren(for: user)
    })

Now it doesn't compile anymore. This is something I do often in NIO, to inspect a value as it's coming down the pike. But now the function is returns Void (because there’s no return statement), gives an unused result warning, and an error because there is no flatMap overload that allows you to return Void.

I know people are split on whether SE-0255 (Implicit returns from single-expression functions) was a good idea, but given that it does exist in the language, and we're removing the restriction that requires a return type, I think it makes sense to remove the explicit return keyword restriction from multi-line functions as well.

gregtitus · October 6, 2021, 4:45pm

I agree that the example is confusing, and I think that it draws too much attention to the "I added a second line to my one line closure" use case, whereas I think this feature is more useful in the "I'm already writing a complicated closure" case. (Because the potential return types that we'll now get to omit tend to be more complicated for more complicated code.)

My strong preference is that Swift not become a language where the return value of a block is the last expression in that block. But either way, that is an entirely separate feature request from the ability to omit the return type at the beginning in more cases.

gwendal.roue · October 6, 2021, 4:49pm

I'm not sure a proposal that ties the two changes would be accepted as is. We can make two proposals, so this is what we should do.

Now I agree that this proposal sheds a new light on return.

But omitting return brings a lot of side effects. For example, I'd expect it to make people ask that if statements become expressions:

... { (b: Bool) in // inferred to return Int
    print(b)
    if b { 1 } else { 0 }
}

It's a whole other topic. We can work on multi-line closures independently.

soroush · October 6, 2021, 4:50pm

Agreed. I don't expect too many people to agree with me on the issue of omitting return statements from multiline closures, but I think at the very least the example should be changed to something that is more clear about what benefit this proposal provides.

xedin · October 6, 2021, 4:59pm

I concur on this point. I'd effectively have to explain how the constraint solver works to give enough background information for the changes to make sense...

Jumhyn · October 6, 2021, 5:00pm

Yeah, I agree, and wonder if we should go even further—need the precise semantics of the inference (based on first use of arguments/first return statement) be explicitly specified at all? Would we need a further proposal if we come up with a better heuristic for determining the type of the closure?

I guess I'm just curious about how much of type inference falls under Swift evolution versus miscellaneous "improvements and bug fixes."

Perhaps it could be a warning if a multi-statement closure is inferred to have result type Void? We already have similar warnings elsewhere IIRC.

xedin · October 6, 2021, 5:00pm

What if I either make doSomething return Void instead of U or add an explicit contextual type to a let binding in both cases?