Monadic DSL (based on function builders)

Max_Desiatov · June 9, 2019, 5:25pm

First of all I'd like to mention, I deeply appreciate the work of the Core Team and compiler engineers focused on delivering these new features for 5.1 release. Even though we have a somewhat "special" review process for few of these, the improvements did start clicking in my head by this point. They feel like very impactful changes that would allow us to build very powerful stuff.

Please correct me if I'm wrong, but I haven't seen any points mentioned about variable bindings in the recent pitch discussion. I'll name it "function builders" for now, even though there seems to be a strong preference to change this to "builder functions".

People keep mentioning inability to use for and switch statements in the DSL, but I think that lack of variable bindings is far more important and could have a much bigger implications long term.

The new DSL could allow expressing not only markup-related structures like View types in SwiftUI and HTML in the proposal itself or in libraries that started popping up, but also more interesting control flow structures. Consider this extension that utilises Combine:

struct UnknownError: Error {}

extension URLSession {
  func request(url: URL) -> AnyPublisher<Data, Error> {
    .init(Publishers.Future { completion in
      self.dataTask(with: url) {
        switch ($0, $2) {
        case let (_, error?):
          return completion(.failure(error))
        case let (data?, _):
          return completion(.success(data))
        default:
          return completion(.failure(UnknownError()))
        }
      }
    })
  }
}

Then given this builder

@_functionBuilder
struct PublisherBuilder {
  static func buildBlock<T>(
    _ publishers: AnyPublisher<T, Error>...
  ) -> AnyPublisher<T, Error> {
    guard let first = publishers.first else {
      return Publishers.Empty().eraseToAnyPublisher()
    }
    return publishers.reduce(first) { f1, f2 in
      f1.flatMap { _ in f2 }.eraseToAnyPublisher()
    }
  }
}

func chain<T>(
  @PublisherBuilder child: () -> AnyPublisher<T, Error>
) -> AnyPublisher<T, Error> {
  return child()
}

we could write something like this without async/await:

let session = URLSession(configuration: .default)

chain {
  session.request(url: url1)
  session.request(url: url2)
  session.request(url: url3)
}

In this example requests would run sequentially one after another. Quite frequently though, one sequential async operation needs to feed data into the other:

chain {
  session.request(url: url1)
  session.request(url: url2).flatMap {
    session.request(url: processResponse($0))
  }.eraseToAnyPublisher()
}

If variable bindings were supported in the DSL, I hope we could write it like this:

chain {
  session.request(url: url1)
  let response = session.request(url: url2)
  session.request(url: processResponse(response))
}

I admit, I see a lot of feedback for the builders proposal pitch about the magic of DSL being hidden, so this could be more explicit in the fact that response is being "unwrapped" from the publisher (in addition to whatever changes are possible made to the DSL itself to make it more explicit):

chain {
  session.request(url: url1)
  let response <- session.request(url: url2)
  session.future(url: processResponse(response))
}

Looking at the draft proposal, the most obvious API surface could be added on top with

static func buildBinding(
  _ component: Component, 
  _ binder: (Expression) -> Component
) -> Component

optional requirement on the @functionBuilder type being involved, e.g. for Publishers:

  static func buildBinding<T1, T2>(
    _ publisher: AnyPublisher<T1, Error>,
    _ binder: @escaping (T1) -> AnyPublisher<T2, Error>
  ) -> AnyPublisher<T2, Error> {
    publisher.flatMap {
      binder($0)
    }.eraseToAnyPublisher()
  }

On the implementation side, AST rewrite for this new API would need to handle pattern bindings, but I hope this is realistic. If there's a positive feedback to this pitch, I'm ready to try adding this as a PR to the current @John_McCall's work in progress PR.

Why is this called "monadic"?

What pushed me towards writing this is an apparent similarity between the proposed syntax and the monadic do-notation in PureScript and Haskell. Also, coroutines were mentioned in the builders thread, and coroutines are also monads. Now I know some people might say that monads are a useless mathematical abstractions that is hard to explain, but I'm not convinced that's true. As Swift developers we use monads on a regular basis in our code: optionals, arrays, Result type, futures, and now Publishers in Combine. Not having abstractions that capture common patterns among these types causes us to reinvent the wheel multiple types for every possible combination: that's why we have separate flatMap and compactMap implementations, even though they do the same thing and could be expressed with a slight change to the type system mentioned in the Generics Manifesto.

I do think that the best way to explain monads is not to rely on any mathematical abstractions, but to simply declare that a monad is a "protocol" with two simple requirements. Then showing how many real types follow these requirements, and how this helps building useful abstractions is key. I assume many people did suspect something fishy was going on, when we had flatMap for both arrays of arrays and arrays of optionals . I finally understood how powerful monads in Swift could be, when I saw flatMap name used for futures composition, instead of the usual then. And now with flatMap in Combine, we really do need to expand our vocabulary with a common name for types that have a flatMap implementation and certain initializer defined on them.

In pseudocode it could look like this:

protocol Monad {
  associatedtype Element
  init(_ single: Element)
  func flatMap<T>(_ binder: (Element) -> Self<T>) -> Self<T>
}

The only reason I call this "pseudocode" is that we can't express Self<T> requirements in protocols yet. If/when we have that, we could have the Monad protocol in the standard library, along with implementations of this protocol for Optional, Array, Set, Result etc. Implementing @functionBuilder for such types would be trivial, I actually do have a feeling that a simple builder could be synthesized by the compiler automatically for every type conforming to Monad.

Many interesting things can be built on top of that. Ever wish to avoid unwrapping optionals with guard just to use that optional in the next statement? Optional is a Monad, so it's easy with the DSL:

optional {
  let a <- array.first
  let b <- array.last // won't be executed if binding of `a` fails
  a + b // won't be executed if any bindings fail
}

We could express far more interesting things, e.g. Monad implementation for Array allows us to find all combinations of elements from given arrays that satisfy a predicate:

array {
  let a <- [1,2,3]
  let b <- [1,2,3]
  a + b == 4 ? [(a, b)] : []
} // returns [(1, 3),(2, 2),(3, 1)]

Here it feels like arrays with the monadic notation can express "branching" computations that explore all possible combinations.

Since the Monad protocol is such a powerful abstraction, much more interesting stuff can be expressed, e.g. monadic DSL for Result is equivalent to try and throws. We've been using specialized monadic notation for error handling, years before we even had Result type itself in the standard library

I'm pretty sure people can find a lot more interesting applications for this, e.g. recoverable errors, coroutines, monad transformers that lift monads of one type to another etc.

Hope any of this makes sense, thank you for reading. Looking forward to your feedback!

Lantua · June 9, 2019, 7:24pm

Regarding variable binding, note that you can do

chain {
  session.request(url: url1)
  let response = session.request(url: url2)
  response
  session.request(url: processResponse(response))
}

John_McCall · June 9, 2019, 7:34pm

Allowing the use of Swift statement-block syntax to silently express non-sequential control flow is an anti-goal for me. Exceptions are "just a monad", but Swift's error-handling design requires the use of try to mark throwing expressions for a reason — and that's just marking early termination, not arbitrary forking or backtracking, which surely ought to be more explicit in source, not less. Similarly, if Swift ever provides an async/await feature, we will almost assuredly require asynchronous calls to be marked with await, and there's literally no local control flow there at all — async/await is just the standard routine abstraction with a different implementation.

Also, the monadic transform you're proposing only works for straightline code; I am not aware of any way to merge control flow for an arbitrary monad.

Max_Desiatov · June 9, 2019, 7:52pm

This one's already possible in Xcode 11 with the example provided by @Lantua above:

Lantua:

Regarding variable binding, note that you can do

chain {
  session.request(url: url1)
  let response = session.request(url: url2)
  response
  session.request(url: processResponse(response))
}

This isn't an ideal syntax, but it's still much more readable than the usual flatMap pyramids of doom.

With regards to "silently express", this would be possible with the proposal as it's posted on GitHub with buildExpression transform, although that one isn't availble with your PR, or the current publicly available build of Xcode 11. I'm for one, also against possible confusion caused by that, and it looks like implementing buildExpression as stated in the proposal would allow implicit "silent" lifting of arbitrary expressions into the monad expression being built.

If buildExpression isn't going to be implemented in the final proposal and variable binding transform is added on top of that, none of the possible DSL would allow to silently lift into a monad. I don't know if anyone would like to implement the Result DSL builder, as there's little benefit to it compared to available try/throw syntax. But in that context the implementation would be exactly like try in throwing functions, but inverted: all monadic expressions have "implicit" try and non-monadic have to be explicitly lifted with .success:

result {
  let a <- result1
  let b <- result2
  5 // this line causes an error until `buildExpression` is available IIRC
  .success(a + b)
}

The optional example in the original post doesn't have explicit lifting, because in my understanding is that non-optional values are lifted to optionals implicitly anywhere?

Sorry, I'm not sure what "merge control flow" means in this context, would you be able to provide an example?

Max_Desiatov · June 9, 2019, 7:56pm

Oh, my bad, this doesn't work, although I wish it did:

Lantua · June 9, 2019, 8:07pm

Not the person in question, but I'm sure he refers to branching paths (if-else, for, while).

Foo {
  someResult
  if condition {
    resultTrue
  } else {
    resultFalse
  }
}

// Synthesized

Foo {
  let state0 = ...
  let state1 = state0.consume(someResult)

  let state2T, state2F: ...
  if condition {
    state2T = state1.consume(resultTrue)
  } else {
    state2F = state1.consume(resultFalse)
  }
  let state2 = merge(state2T, state2F) // How do we merge them?

  return state2.finalize()
}

The problem is to get the resulting monad after the branching path.
I suppose we could enforce that all the branching paths result in the same monad type, and any iteration block doesn't change the type of the monad, but that's up for debate.

John_McCall · June 10, 2019, 3:07pm

Putting aside that let isn't supported in function builders yet, I'm not sure what about this would turn into non-sequential control flow under the proposed function builders feature. This is just binding a local and then also making it a block result. Function builders do not do a CPS conversion.

Similarly, I don't know how buildExpression would allow anything non-sequential. I suppose if we didn't ban autoclosure parameters there then you could evaluate statements twice or even out of order, but I think we clearly should ban them by the logic of the proposal.

Yes, that's a very nice summary of the primary language-design problem with monads as a tool for abstracting over program behavior.

Lantua covered it.

Joe_Groff · June 10, 2019, 3:38pm

In F#'s compute expressions, monadic bindings are decorated differently from regular bindings; they use let* for bindings that unwrap a value into the following scope. Ocaml adopted let* as well and also introduced let+ for applicative rather than monadic bindings. Maybe we could support buildUnwrappedBinding(value: M<T>, in body: (T) -> M<U>) as a function builder method for explicit unwrapped bindings in the future, though as you note, that would require us to CPS or coroutine-ify the built body.

GetSwifty · June 10, 2019, 6:50pm

FWIW, my frustration with the term isn't the abstraction, but the name. To anyone without a very specific interest in math/ category theory, you're more likely to think it has something to do with an alien race than a term in programming.

Oddly it seems implementation's naming makes sense...so maybe there isn't an understandable term for the abstraction. At least in popular english. I would love it if someone could come up with a better term for it

Regardless, I really like what the idea generally enables. Especially around threading and Reactive programming it could see it enabling some powerful and expressive APIs (Coroutines , C#'s async/await, etc) above and beyond what FunctionBuilders enable.

I'm not sure if it could/should replace FunctionBuilders, but to me it seem like a more programming-focused (rather than DSL focused) implementation.