Merging Closure into Protocols

aetherealtech · April 27, 2024, 2:00pm

This is an idea that’s been in my mind for years. I’m curious if the language team has ever considered it too.

I find it useful to imagine that, under the hood, function types are syntactic sugar for protocols. For example here:

let myFunction: (String) -> Int = { value -> Int(value) ?? 0 }

I imagine that the compiler translates this into the following code:

protocol Function1<T, R> {
  associatedtype T
  associatedtype R

  func callAsFunction(_: T) -> R
}

struct __ClosureAtLineXInFileY: Function1 {
  func callAsFunction(_ value: String) -> Int {
    Int(value) ?? 0
  }
}

let myFunction: any Function1<String, Int> = __ClosureAtLineXInFileY()

This is especially useful to understand what is really happening with capture. This:

let defaultValue = 15

let myFunction: (String) -> Int = { value -> Int(value) ?? defaultValue }

I imagine turns into this:

protocol Function1<T, R> {
  associatedtype T
  associatedtype R

  func callAsFunction(_: T) -> R
}

struct __ClosureAtLineXInFileY: Function1 {
  func callAsFunction(_ value: String) -> Int {
    Int(value) ?? defaultValue
  }

  private let defaultValue: Int
}

let defaultValue = 15

let myFunction: any Function1<String, Int> = __ClosureAtLineXInFileY(defaultValue: defaultValue)

I teach the mechanics of closure to junior devs by showing them this. It’s also how you would implement closure if you, for some reason, wanted or needed to use an explicit protocol instead of a function type.

This proves useful yet again to understand the proliferation of attributes we can now apply to function types. For example, @Sendable is really saying to add a conformance of Sendable to the existential. This:

let myFunction: @Sendable (String) -> Int

Becomes this:

let myFunction: any Function1<String, Int> & Sendable

What I want to propose is that this equivalence be made formal, to the point that today’s function syntax becomes literally syntactic sugar for the corresponding protocol code.

What is needed for this to be possible? For one, we need to be able to define protocols for callable types. We could manually define one for each number of parameters up to some reasonable maximum, but the “correct” way would be to make use of a parameter pack for the associated type:

protocol Function<each Ts, R> {
  associatedtype each Ts
  associatedtype R

  func callAsFunction(_: repeat each Ts) -> R
}

More significantly, IMO, making today’s closures/blocks sugar for callable protocols would require introducing two features we see in other languages that for whatever reason had to make closures formally equivalent to special types of interfaces:

Automatic closure over inner and local types
Anonymous types

The first point means that we wouldn’t have to spell out capture manually if we used actual protocols. We could do this:

protocol MyCallable {
  func callAsFunction(input: String) -> Int
} 

…

func someFunction() {
  let defaultValue: Int

  struct MyClosure: MyCallable {
    func callAsFunction(input: String) -> Int {
      Int(input) ?? defaultValue
    }
  }
}

And the compiler will automatically add a private defaultValue member to MyClosure and rewrite any initializers to automatically receive it from the local value. This also implies we would be able to customize capture on a locally defined type just as we can today with blocks:

var byValue = 5
var byReference = 10
let readOnlyEitherWay = “Hello!”

struct MyClosure: MyCallable { [byValue, readOnlyEitherWay] in
  func callAsFunction(input: String) -> Int {
    print(readOnlyEitherWay)
    byReference += 1
    return Int(input) ?? defaultValue
  }
}

Again, the compiler would rewrite this type to appropriately store either copies of the captured variables, or getters/setters for them, and modify initializers to pass them in. I believe this means that the concept of @escaping needs to be extended to all protocol-type parameters. Since protocol existentials can always be stored today, this means for backward compatibility @escaping would have to be the default. Oppositely of function types, you’d have to turn it off with @noescape (maybe a major language update like 5 -> 6, or 6 -> 7, is an opportunity to clean that up?).

This ability to capture generally in local types would be very useful on its own IMO. Not only will it make it easier to use more explicit callable protocols, which can increase type safety (I want to say that any old (String) -> Int isn’t good enough here, it has to conform to a specific callable protocol so I can flag accidentally passing in the wrong function that just happens to have a matching signature), there are other situations I’ve encountered where I had to do capture manually in local types and wished I could do it automatically with the above syntax. Sometimes as a compromise I make a concrete type that implements a protocol by accepting function type members for each requirement, which is just unnecessary boilerplate.

One problem that comes up here is that once you can capture in local types, the self keyword becomes ambiguous. Java and Kotlin fix this by allowing this to be scoped (dot syntax in Java, trailing @ syntax in Kotlin). I’ve never been a huge fan of that, although it does correctly solve the problem. Another approach might be to require capture of an outer self to be done explicitly and bound to a different name:

protocol MyProtocol {
  func send() -> OuterValue
}

struct OuterValue {
  func giveMeAProtocol() -> some MyProtocol {
    struct LocalType: MyProtocol { [outerSelf = self] in 
      func send() -> OuterValue {
        outerSelf
      }
    }

    return LocalType()
  }
}

If that approach were used, we’d probably need to introduce a way to explicitly bind inout capture. The “obvious” way seems to be with &:

struct OuterValue {
  mutating func giveMeAProtocol() -> some MyProtocol {
    struct LocalType: MyProtocol { [outerSelf = &self] in // Won’t compile because you’re trying to store something with `inout` capture of mutable `self`, which is automatically `@escaping`
      func send() -> OuterValue {
        outerSelf
      }
    }

    return LocalType()
  }
}

Extending closure to inner (not local) types introduces another challenge: if an inner type captures the outer self, it’s no longer statically initializable. I can’t create an OuterType.InnerType(), because the InnerType captures an instance of OuterType, which doesn’t exist there. I would instead have to initialize one as an OuterType().InnerType(). It then raises the question of how you express which one you want. Java does this with the keyword static (so a nested static class doesn’t capture the outer this, but a nested class does), Kotlin uses the keyword inner (a plain nested class doesn’t capture outer this, an inner class does). I’m not totally sure how this would interact with protocols and associated types. I think by default a nested type can’t satisfy a normal associated type requirement unless it’s a “static” (in Java parlance) type. But then that might mean you’d want to specify in a protocol than associated type can capture outer self, which forces generic code to initialize instances of that associated type through an instance of the outer type.

The second point I think is a simpler matter, it just means we get to create types without naming them. In the above example, why bother naming that LocalType? I’d rather just define it as I’m returning it:

protocol MyProtocol {
  func send() -> OuterValue
}

struct OuterValue {
  func giveMeAProtocol() -> some MyProtocol {
    // `return` can be omitted now
    return struct: MyProtocol { [outerSelf = self] in 
      func send() -> OuterValue {
        outerSelf
      }
    }
  }
}

It might make sense to use struct as the implicit default, and infer where possible any necessary conformances. And then this gets closer to block syntax:

struct OuterValue {
  func giveMeAProtocol() -> some MyProtocol {
    return { [outerSelf = self] in 
      func send() -> OuterValue {
        outerSelf
      }
    }
  }
}

We could also decide that for inferred conformances that have only a single function requirement, it can also infer the one function being implemented, and then you have literal block syntax:

struct OuterValue {
  func giveMeAProtocol() -> some MyProtocol {
    return { [outerSelf = self] in 
        outerSelf
    }
  }
}

And that’s how we’d achieve “blocks are literally just sugar over protocols”.

I think this can help get a handle on all the attributes that are being added to function types. It opens up a way to allow user-defined attributes, i.e. an let function: @MyOtherProtocol (String) -> Int is just telling the compiler to require the concrete type assigned to this variable to conform to MyOtherProtocol, and to require an anonymous type assigned to it to include conformance to that protocol. And since most protocols will, unlike Sendable, include requirements that need to be implemented, extending closure and anonymity to arbitrary local types also gives us a way to satisfy those extra requirements inline. This also opens up a way to abstract over “Sendability”. Today, I can’t declare a conditional extension on LazyMapSequence to conform to Sendable only if both its Base and the transform are Sendable, because the transform isn’t a generic parameter, it’s just a function type. But if we rewrote it to be struct LazyMapSequence<Base: Sequence, Element, Transform: Function<Base.Element, Element>>, then we could write a constraint where Transform: Sendable, but not lose the ability for users to supply blocks with closure for the transform.

This also implies that function types as they’re used today really need to be (at least in Swift 6) adorned with any :

let function: any (String) -> Int // This is really the correct spelling

That correctly signals that such a function type is a type-erasing existential box, that implies the loss of type information in generic code. Correspondingly, when we use function types in functions, we have the option to replace any with some:

func map<T>(_ transform: some (Element) -> T) -> [T]

This allows us to avoid (especially if things are inlined) the cost and type information loss of existentials where it isn’t necessary to use them.

So, in summary, this is the strategy I envision:

Add support for anonymous types, which I think is the simpler of the two capabilities. I think this just requires deciding on what defaults/inference is allowed
Add support for closure in inner and local types. This can be broken down as:
- Add support for capture in local types only, and only explicit capture by value of outer self. Make protocol types @escaping by default and add support to annotate any protocol-type parameter in functions with @noescape.
- Decide to either support implicit capture of outer self using some kind of disambiguation syntax, or add support for explicit capture by reference using &.
- Add a keyword like inner to nested types to allow them to capture outer self. Those types cannot satisfy associated type requirements.
- Add a way to declare associated types as inner types, which requires generic code to initialize them through instances of the outer type.
Wait until Swift supports parameter packs for associated types.
Redefine function types from a built-in type to a typealias for a Function protocol with associated types for a parameter pack and return value
- The first step might be to make the typealias be for the any Function<…> existential, and this gets changed to the protocol directly only at a major version update. Before that, to use some you have to just spell out Function<…>.
Start rewriting Standard Library (and then later platform SDK) functions to bind function type members to generic parameters (like the LazyMapSequence example above).
- With all the previous steps in place, this should be completely transparent to client code, it’s really just hoisting the generic parameter of functions like map, which get erased to an existential by the time they are stored, up to the enclosing type to preserve type information.
Redefine annotations on function types to signal conformance to a corresponding protocol (or possibly application of a macro?). I’m not sure how this applies to actor isolation. Is @MainActor eventually going to signal conformance to a protocol? Will it eventually become a non-magical macro?

dnadoba · April 29, 2024, 8:39pm

I quite like this idea and have written my thoughts on this topic down here on a similar thread:

The Function protocol looks very interesting. If we add an associated error type we can support throwing functions as well e.g.:

protocol Function<each Parameters, Error, Result> {
  associatedtype each Parameters
  associatedtype Error 
  associatedtype Result

  func callAsFunction(_: repeat each Parameters) throws(Error) -> Result
}

In addition, we would want to make in ~Escable to make it non-escaping by default and likely ~Copyable or inherit from AnyObject to have reference semantics.

Async functions would likely just need a separate AsyncFunction protocol.

Nobody1707 · April 29, 2024, 11:35pm

Would we need a refined protocol with a mutating callAsFunction(_:) for functions with mutable captures, like in Rust, or would they just be reference types and can get away with the non-mutating call?

dnadoba · April 30, 2024, 12:12am

I think both ways are possible. However for ~Copyable Functions a separate protocol for mutating callAsFunction() would be required. Same for consuming callAsFunction() (on a ~Copyable struct) if we want to support at most once executing closures which could consume ~Copyable types. That would be great because closures today can only borrow ~Copyable types.

Nobody1707 · April 30, 2024, 12:32am

I think the missing features for this are generalized inout and consuming bindings & parameter packs as associated types.

Also, I can't seem to refine any protocol from or to consuming callAsFunction even though I can definitely refine from a mutating callAsFunction to a non-mutating one.

jrose · April 30, 2024, 2:10am

I'll note that mutating functions can't be used as values at all today, for better or worse, because it's unclear whether the mutation should last for the lifetime of the closure (not even exactly possible) or only for the duration of the closure call (what you'd get if you wrote out the closure explicitly). Rust's choice is something like the former, but they can describe that choice with lifetimes.

Nobody1707 · April 30, 2024, 2:13am

I thought the big reason was the failed currying experiment stopping us from taking a mutating method and getting an (inout Self, Args...) -> Result from it.

jrose · April 30, 2024, 2:21am

The reason you can't say SomeType.mutatingMethod(_:) and have it mean { $0.mutatingMethod($1) } is that, yeah (SE-0042). But you still could have said someValue.mutatingMethod(_:) and have it mean { someValue.mutatingMethod($0) }, and Swift doesn't allow that either. I personally think that's fine, but someone else might have wanted it.

In any case, Rust's FnMut would be like if you started a mutating access (in the exclusivity sense, SE-0176), then waited until the closure was destroyed to end it. I'm not saying Swift will never have something like that—it's not that different from having inout in a return type—but we're a ways off.

Nobody1707 · April 30, 2024, 2:24am

Oh, I did not realize that about FnMut, but it does explain some issues I was having with closures.

dnadoba · April 30, 2024, 7:03pm

Turns out there has already been a pitch for something similar: