Additional String Processing APIs

jjrscott · July 15, 2020, 9:19pm

I've been thinking of a similar thing for a while now, but from a bit of a different angle:

What does Brian Kernighan's Regular Expression Matcher actually mean when implemented in Swift?

The result similarly drops down a few levels to BidirectionalCollection et al, and splits out the special pattern matching tokens (*, $ etc) into an enum. I'm a bit out of practice with generics but I do think this is more in the spirit of Swift an its DSLs than some runtime encoding of the pattern as a string.

enum RegularExpressionElement<Element> : Equatable where Element : Equatable {
    case any              // .
    case zeroOrMore       // *
    case begin            // ^
    case end              // $
    case element(Element) // a character
}

extension BidirectionalCollection where Element : Equatable {
    typealias RegularExpression = ArraySlice<RegularExpressionElement<Self.Element>>

    /* match: search for regexp anywhere in collection */
    func match(regexp: RegularExpression) -> Bool
}

Here's an example from the original String:

"hellooooo!".match("^h..lo*!$")

"hellooooo!".match([.begin, .element("h"), .any, .any, .element("l"), .element("o"), .zeroOrMore, .element("!"), .end])

There's a full implementation here: Reimplementation of Brian Kernighan's Regular Expression Matcher on BidirectionalCollection.

Update: I know that . zeroOrMore(.element("o")) would make more sense, but I wanted to keep as close to the C original as possible for this demonstration.