Generalized pattern matching

A while ago @Michael_Ilseman posted a document discussing future directions for String. This document includes a sketch for generalized pattern matching (here: State of String: ABI, Performance, Ergonomics, and You! · GitHub). I would like to start a discussion focused specifically on this topic.

The design in the string update document consists of the following protocol:

protocol Pattern {
  associatedtype In
  associatedtype Out
  func match(_: In) -> Out?
}

as well as the following usage for user-defined patterns:

let myValue: T = ...
let pattern: MyTPattern<(U,U)> = ... // matches a T to (U, U)
let otherPattern: OtherTPattern<V> = ... // matches a T to V
switch myValue {
  case (let a: Int, let b: Int) <- pattern: ...
  case let pair <- pattern: ... // pair has type (U,U)
  case let d: Double <- otherPattern: ...
  case let num <- otherPattern: // num has type V
}

There is a lot to like about this design. In particular, patterns can be stored in variables. They are first class entities in the langauge.

There are two ideas that I started thinking about when reading this design. First, unfortunately this design does not allow users to define a set patterns that exhaustively matches a given type. I think it would be useful to discuss what a design that does support exhaustiveness might look like.

Second, perhaps we could support a new form of dot shorthand to allow user-defined patterns to be matched using the same syntax as enum cases. Suppose we have the following extension on MyTPattern from the preceding example:

extension MyTPattern {
   static var twoInts: MyTPattern<(Int, Int)> { ... }
}

This pattern declaration along with a dot shorthand might allow us to write a switch using this syntax:

switch myValue {
  case .twoInts(let int1, let int2): ...
  // other patterns
}

The idea here is that the case statement creates a type context that expects a pattern matching T. Static members of types that conform to Pattern where In == T would be available for use. The more general syntactic form supporting patterns as variables would also be available.

An alternative approach might be for dot shorthand to look for static members of T that return a type conforming to Pattern where In == T. This alternative might help implementation by scoping lookup to static members of T rather than all types X: Pattern where In == T.

I am interested in hearing what the community thinks about these ideas in particular and user-defined patterns in general.

I'm not sure to achieve this exactly, but I think we should aim to come up with a solution that unifies the various kinds of pattern matching we have:

  • conditional destructuring: case let (x, y) = getPoint()
  • ~= operator (implicitly used in some case statements)
  • <- operator (from the proposal).

I'm also not a fan of using a Pattern protocol, because that precludes the use of anonymous closures as patterns. Oddly enough, Java's batshit crazy lambdas-as-instantiation-of-an-anonomous-inner-subclass approach would work well here.

Perhaps the right side of this <- operator can be any (In) -> Out? function, which could either be an anonymous closure e.g. let i <- { Int(str) }, or any Pattern's match function could be used: let x, y <- PointPattern.match, which is itself a (In) -> Out? function

1 Like

If you think about pattern matching in Swift 4 as a function from (T) -> Bool, it’s a basic pass/fail. My proposal was extending it to produce a value upon passing, i.e. a function from (T) -> U?. Or, alternatively, as a partial function from (T) -> U with a statically-unknown domain, where the caller “guards” against the argument not being part of the function’s run-time domain.

By “exhaustiveness”, are you proposing a way to add some kind of totality-annotation to a group of these partial functions to aid static reasoning and/or provide exhaustiveness in switches? That certainly sounds interesting! Do you have any concrete use cases that you think would be improved by this addition? How “fragile” would this concept of totality be, that is, how amenable to library evolution, unplanned-for corner cases, changes to T, etc.?

For this switch syntax the compiler takes case .twoInts(…) and infers case MyTPattern.twoInts(…)? If the search space include patterns from other modules, then as you mention some restrictions to the scope of the search might aid compilation time.

You mentioned an alternative where the compiler looks for static members of T. This seems like it might risk having T become a grab-bag of all possible patterns, and I don’t know how exhaustiveness annotations could fit with this model. What about another alternative to help further reduce the scope:

switch MyTPattern.match(myValue) {
  case .twoInts(…)
}

where name lookup is constrained to static members of MyTPattern? What does the type of “match” look like?

For the idea of dot syntax for pattern matching, do you have any concrete use cases to help demonstrate the power of this syntax? Alternatively, how would this syntax map onto the proposed direction for the Regex type?

@Michael_Ilseman sorry for the extremely delayed response. I've been quite busy lately! What I have in mind in terms of usage is similar to what you could accomplish today by declaring an enum for the set of patterns, defining a property that projects the underlying type to the enum, and then switching on the projected enum value. Here's an example:

enum ListPattern<S: Sequence> {
    case empty
    case nonEmpty(S.Element, S.SubSequence)
}
extension Sequence where SubSequence: Sequence, SubSequence.SubSequence == SubSequence {
    var pattern: ListPattern<SubSequence> {
        if let first = Array(prefix(1)).first {
            return .nonEmpty(first, dropFirst())
        } else {
            return .empty
        }
    }
}

switch [1, 2, 3].pattern {
case .empty: print("empty")
case .nonEmpty(let v, let s): print("v: \(v) s: \(s)")
}

The problem with this approach is that it does not interact properly with nested patterns. More generally, I have a vague notion that some combination of support for fully general user-defined pattern matching, pattern requirements in protocols, the ownership system, and optimizer smarts might let us write some rather elegant and general yet still relatively high performance code. I started thinking about this when I was reading @lorentey's book on Optimizing Collections.

In other words, why shouldn't we be able to write elegant algorithms that recursively decompose a data structure just because the elements aren't stored with indirect / recursive enums? This is the primary use case I would like to enable if possible.

This is a valid concern. I think I would be ok with something along these lines if we could figure out how to make it work in nested patterns.

The idea here is mostly to make user-defined patterns feel as "first class" as possible. This shorthand would provide usage syntax for user-defined patterns that matches the patterns provided by enum cases.