[Pitch] Rule<T: Collection> — A Composable Pattern Recognizer for any Collection

MaximilianServais · March 24, 2026, 9:57pm

that's a really good idea, thanks!

I'm currently working on a performance test for Rule with String and String.UTF16View. I've already incorporated your idea to see if there are any side effects.

I don't have much time left this week, but as soon as I make some progress with the performance tests, I'll update my post and include the new code. I'll mention in the acknowledgments section that you contributed the idea. Is that OK?

Loooop · March 25, 2026, 10:52am

That's fine. Thanks.

ExFalsoQuodlibet · March 25, 2026, 2:12pm

No, but it should exist. I’m also uninterested in participating in a thread where the text is generated by AI (even if it’s rewritten).

It changes everything. These threads are conversations, and I (alongside many other people) am not interested “conversing” with AI-generated content. AI is not “just a tool”, and thinking that it is downplays how it profoundly (and negatively) impacts how we exchange and relate with each other’s ideas.

Avi · March 25, 2026, 2:59pm

The original author has been interacting with commenters on this thread. Just because his posts may be AI generated or edited does not mean he is not reading replies and responding to them from his own knowledge. To me, it smacks of bigotry to jump straight to the assumption that the human poster is acting merely as a go-between for commenters and his chosen agent.

Would you have the same visceral reaction if the poster had a professional translator write the English version for them so as to present content that is more widely-accessible to the population of this forum? If they required that translator to act as a go-between because he doesn’t understand the English replies he receives?

I know this is all off-topic for the thread, and I apologize to the OP. I think there should be a discussion about this and the rules established by the forum maintainers. AI tools are going to see increased usage, and it’s absolutely useless and wasteful to have the same anti-AI posts show up every time.

John_McCall · March 25, 2026, 3:45pm

I agree. I understand that some folks are annoyed about the level of obvious LLM authorship in the proposal, and they are totally welcome to decline to contribute to the thread because of it, but piling on the author when they’re actually here and writing their own posts is not really acceptable.

The Core Team is working on a policy about LLM use in contributions. The details are still up in the air, but it is going to provide guidance about responsible and respectful use, not be a blanket ban.

At any rate, I would like the discussion going forward in this thread to be about the technical content of the proposal.

snej · March 25, 2026, 3:49pm

This looks very similar to the mature swift-parsing package. Have you compared it against it?

Parser combinators are very useful — I’ve used swift-parsing myself — but IMHO not broadly enough to include in a language’s standard library.

greggwon · March 26, 2026, 4:04pm

This proposal is largely the construction of a lexical analyzer that would tokenize a stream and a grammar parser that would evaluate/shift tokens in the order they arrive and then decide on a ‘reduction’ on the recognition of a ‘rule’ that can be resolved from the set of tokens provided.

I like that this is a little more freeform and a lot more reusable than a “lex” and “yacc” composed language system or a recursive descent parser because the pieces are extractable and composable much more readily.

MaximilianServais · March 26, 2026, 10:37pm

Hi Jens,

Thanks for pointing that out.

This looks very similar to the mature swift-parsing package. Have you compared it against it?

Not yet, but I will. I'll need some time to properly check what Swift-Parsing is doing.

bradhowes · March 29, 2026, 10:13am

Thanks for the pitch. I do think there is some benefit in having a library that offers the features, but I am not sure there is enough value in it being part of the standard library. I agree with @Jon_Shier that this could be useful for motivating language improvements based on performance issues you might uncover. I seem to recall the PointFree team with their already-mentioned swift-parsing did some performance measurements with String character and utf8 views (I even wrote my own package based on theirs for math expression parsing).

MaximilianServais · March 29, 2026, 4:32pm

Hi @snej

I finally had a chance to have a closer look.
I think @greggwon has already answered it. Rule is a pattern recognizer, not a parser framework. Think of it as: Regex that can handle nested structures for any Collection. So, same idea but broader reach. Building a parser on top of it is absolutely possible, but that's your code, not the library. This is very different from swift-parsing .

MaximilianServais · March 29, 2026, 4:40pm

Hi @bradhowes

Thanks for your comment!
As I just said to @snej : Rule is a pattern recognizer, not a parser framework like swift-parsing. Rule has the same idea as Regex, but it works on any Collection (not only strings) and can handle nested structures. What you do with the capture tree afterwards is up to you.

Cool parser package swift-math-parser !

MaximilianServais · March 29, 2026, 4:46pm

Hi @Loooop

I wanted to edit my original post to update the code and add an acknowledgment for your N: Equatable suggestion, but I can no longer edit it.

So let me just say it here: thanks, it was a clean improvement that fits perfectly into the design!

MaximilianServais · March 29, 2026, 6:04pm

First benchmarks!

Great news – the first benchmarks are here!
All times are averages over 100 runs on a MacBook Pro M3, ~585,000 characters of generated input.

Test	Regex	Rule<String>	Rule<UTF16>	Rule<UTF16> + jumpTo
1. Numbers	49.1 ms	23.0 ms	11.6 ms	8.4 ms
2. Words	30.1 ms	22.4 ms	8.0 ms	7.7 ms
3. Quoted strings	13.1 ms	12.8 ms	6.7 ms	1.8 ms
4. Dates DD.MM.YYYY	23.2 ms	24.6 ms	8.8 ms	6.3 ms

Tests 5–6 can't use Regex — it cannot express nested structures. The comparison baseline is a hand-written character scanner, which is the only alternative today.

Test	Manual scanner	Rule<String>	Rule<UTF16>	Rule<UTF16> + jumpTo
5. Balanced parentheses	5.1 ms	40.9 ms	39.0 ms	2.8 ms
6. Nested block comments	11.9 ms	46.8 ms	40.0 ms	6.0 ms

Test details

The input (~585,000 characters) is split into two sections:

Flat section (~510,000 characters): 6,000 log-like lines containing dates, quoted strings, numbers, and words — patterns that both Regex and Rule can handle.
Nested section (~75,000 characters): balanced parentheses (depths 2–8) and nested block comments (depths 1–4) — patterns that Regex fundamentally cannot handle.

Numbers — integers and decimals: /[0-9]+(?:\.[0-9]+)?/ e.g. 43 or 3.14
Words — ASCII letter runs: /[a-zA-Z]+/ e.g. GET, localhost
Quoted strings — double-quoted content: /"[^"]*"/ e.g. "/api/v1/resource/3"
Dates — DD.MM.YYYY: /[0-9]{2}\.[0-9]{2}\.[0-9]{4}/ e.g. 22.03.2026
Balanced parentheses — recursive grammar, all nesting depths in one pass
Nested block comments — /* … */ with nested /* … */, recursive grammar

Discussion

As @jrose and the others rightly pointed out, performance is important because in the past, generic solutions often meant significant performance losses. As I mentioned in my last message to @jrose , the Rule struct is the one thing that stays the same in the design. Everything else, like the collection type, matchers and scanning strategy, is up to you. I gave three examples: (a) A rule for the common case, (b) a rule for String.UTF16View when you know your characters are fixed-width, and (c) a custom contiguous collection if you need to go further. These benchmarks now give us real numbers for scenarios (a) and (b).

For the flat tests (1–4), Rule is already as good as Regex. I'm actually surprised, since Regex has had priority access, so I expected it to be faster. Maybe someone here would like to run the tests themselves to check the numbers on other machines? (The new code is below)

Rule<String.UTF16View> is about 2x faster simply because index(after:) is a fixed offset instead of a UTF-8 decode, so there's no magic, just less work per step.

I'm really pleased with how tests 5 and 6 turned out. The rule <String.UTF16View> with jumpToNext takes 2.8 ms and 6.0 ms, which is quicker than the hand-written scanner that every project has to write from scratch these days. Plus, you also get a composable grammar and a structured capture tree.

One thing worth saying: Rule is a pattern recognizer, not a parser framework. Think of it as Regex that can handle nested structures for any Collection - same idea, broader reach. You can use it as a Regex replacement if you like.

jumpToNext is a good example of what "you can add your own rules" looks like in practice. It's a small extension specific to String.UTF16View that skips ahead to the next relevant code unit with no closure dispatch (similar to memchr, but as a Rule). I haven’t proposed it as part of the core API (but maybe it should be), just a specific rule that plugs straight into the composition system. The counting rule used to track matches in the benchmarks works the same way. The library didn't need any changes. That's the cool thing about Rule< T, N>

The String matchers were previously using CharacterSet from Foundation, but I've rewritten them in pure Swift for these benchmarks. The matchers aren't final, and there's still room to optimize, using ContiguousArray for character sets is one example I've already looked at. But I think these numbers make a solid case: Rule<T,N> is not a trade-off between "generic, flexible, easy to read" and "fast". You can have both!

MaximilianServais · March 29, 2026, 6:11pm

Code Part 1

Unfortunately I can no longer edit the original post, so I'm posting the updated version below. Sorry for the noise!

Rule.swift

//
//  Rule.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A universal, composable pattern matching library for any Swift Collection.
//
//  Rules describe patterns that can be applied to strings, byte arrays, token lists,
//  or any other Collection type. They compose into complex parsers and lexers through
//  a small set of orthogonal primitives: matchers, combinators, selectors, quantifiers,
//  lookahead, and captures.
//

// MARK: - Rule

/// A composable pattern that evaluates a Collection within a given range.
///
/// `Rule` is the core building block of the library. Each rule answers one question:
/// *"Does this pattern match at this position, and if so, how much input does it consume?"*
///
/// Rules compose through combinators (`sequence`), selectors (`firstMatch`, `longestMatch`),
/// quantifiers (`repeat`, `zeroOrMore`, `oneOrMore`), and lookahead (`expecting`,
/// `notExpecting`). This small set of orthogonal primitives can express anything from
/// simple patterns to full recursive parsers.
///
/// If the library does not provide what you need, create a custom rule with a closure:
///
///     let myRule = Rule<String, Never> { sequence, range in
///         // Your matching logic here
///     }
///
struct Rule<T: Collection, N: Equatable> {
    private let evaluationBlock: (T, Range<T.Index>) -> RuleMatch<T,N>?
    
    /// Creates a rule with a custom evaluation closure.
    ///
    /// The closure receives the input collection and the range to match within.
    /// Return a `RuleMatch` on success, or `nil` if the pattern does not match.
    init(evaluationBlock: @escaping (T, Range<T.Index>) -> RuleMatch<T,N>?) {
        self.evaluationBlock = evaluationBlock
    }
    
    /// Evaluates the rule on `sequence` within `range`.
    ///
    /// Returns a `RuleMatch` describing the consumed range and any captures,
    /// or `nil` if the pattern does not match at the given position.
    func evaluate(_ sequence: T, in range: Range<T.Index>) -> RuleMatch<T,N>? {
        return evaluationBlock(sequence, range)
    }
    
    /// Evaluates the rule on the entire sequence.
    ///
    /// Convenience for the common case where matching starts at the beginning.
    func evaluate(_ sequence: T) -> RuleMatch<T,N>? {
        return evaluationBlock(sequence, sequence.startIndex..<sequence.endIndex)
    }
}


// MARK: - RuleMatch

/// The result of a successful rule evaluation.
///
/// Contains the consumed range and any captures extracted during evaluation.
/// A `nil` return from `Rule.evaluate` means the rule did not match;
/// a non-nil `RuleMatch` means it did, and describes how much input was consumed.
struct RuleMatch<T: Collection, N: Equatable> {
    let range: Range<T.Index>
    let captures: [RuleCapture<T,N>]
    
    init(range: Range<T.Index>, captures: [RuleCapture<T,N>] = []) {
        self.range = range
        self.captures = captures
    }
}

extension RuleMatch: CustomStringConvertible {
    
    /// A flat summary showing the matched range and capture count.
    var description: String {
        "RuleMatch [\(range.lowerBound)..<\(range.upperBound)] captures: \(captures.count)"
    }
}

extension RuleMatch: CustomDebugStringConvertible {
    
    /// Full representation including the capture tree.
    var debugDescription: String {
        var description = "RuleMatch [\(range.lowerBound)..<\(range.upperBound)]"
        if !captures.isEmpty {
            description += RuleCapture.treeDescription(for: captures)
        }
        return description
    }
}


// MARK: - RuleCapture

/// A named region within the input, forming a tree of nested captures.
///
/// Captures are the structural output of rule evaluation. They record *what* was
/// matched and *where*, organized hierarchically. A capture's `name` identifies it;
/// interpretation of the captured content is left to a separate phase after matching.
struct RuleCapture<T: Collection, N: Equatable> {
    let name: N
    let range: Range<T.Index>
    let innerCaptures: [Self]
}

extension RuleCapture: CustomStringConvertible {
    
    /// A flat, single-line summary showing name and range.
    var description: String {
        "\(name) [\(range.lowerBound)..<\(range.upperBound)]"
    }
}

extension RuleCapture: CustomDebugStringConvertible {
    
    /// Full tree representation including all nested captures.
    var debugDescription: String {
        var description = "\(name) [\(range.lowerBound)..<\(range.upperBound)]"
        if !innerCaptures.isEmpty {
            description += Self.treeDescription(for: innerCaptures, indent: "")
        }
        return description
    }
    
    /// Renders captures as an indented ASCII tree.
    ///
    /// Produces a hierarchical view that makes it easy to visualize
    /// the structure of parsed content during development and debugging.
    static func treeDescription(for captures: [Self], indent: String = "") -> String {
        var description = ""
        
        for (index, capture) in captures.enumerated() {
            let isLast = index == captures.count - 1
            
            description += "\n"
            
            let captureDescription = capture.debugDescription
            let prefix = isLast ? "╰─ " : "├─ "
            let continuation = isLast ? "   " : "│  "
            
            let lines = captureDescription.split(omittingEmptySubsequences: false, whereSeparator: \.isNewline).map(String.init)
            for (lineIndex, line) in lines.enumerated() {
                if lineIndex == 0 {
                    description += indent + prefix + line
                } else {
                    description += "\n" + indent + continuation + line
                }
            }
        }
        
        return description
    }
}


// MARK: - Capture Tree Navigation

extension RuleCapture {
    
    /// Finds the first capture with the given name using depth-first search.
    ///
    /// Returns `nil` if no capture with that name exists in the tree.
    func find(_ name: N) -> Self? {
        if self.name == name { return self }
        for inner in innerCaptures {
            if let found = inner.find(name) { return found }
        }
        return nil
    }
    
    /// Finds all captures with the given name using depth-first search.
    ///
    /// Returns an empty array if no captures with that name exist in the tree.
    func findAll(_ name: N) -> [Self] {
        var results: [Self] = []
        if self.name == name { results.append(self) }
        for inner in innerCaptures {
            results.append(contentsOf: inner.findAll(name))
        }
        return results
    }
}


// MARK: - Matchers

extension Rule {
    
    /// Always succeeds, consuming nothing.
    ///
    /// Useful as a fallback in `firstMatch` for error recovery or default cases:
    /// `.firstMatch(expectedRule, .empty().capture(as: "Error"))`
    static func empty() -> Self {
        Self { _, range in
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
    
    /// Matches any single element, regardless of content.
    ///
    /// Fails only if the range is empty. Use inside repeat loops to consume
    /// input one element at a time: `.zeroOrMore(.sequence(.notExpecting(stop), .any()))`
    static func any() -> Self {
        Self { sequence, range in
            guard !range.isEmpty else { return nil }
            let next = sequence.index(after: range.lowerBound)
            return RuleMatch(range: range.lowerBound..<next, captures: [])
        }
    }
    
    /// Matches a single element satisfying the given predicate.
    ///
    /// This is the generic building block for element-level conditions on any collection type.
    /// Domain-specific matchers like `unicode(_:)` for strings are built on the same principle.
    ///
    ///     .element(where: { $0.isLetter })
    ///     .element(where: { $0 > 0x7F })
    static func element(where predicate: @escaping (T.Element) -> Bool) -> Self {
        Self { sequence, range in
            guard !range.isEmpty else { return nil }
            guard predicate(sequence[range.lowerBound]) else { return nil }
            let next = sequence.index(after: range.lowerBound)
            return RuleMatch(range: range.lowerBound..<next, captures: [])
        }
    }
    
    /// Succeeds only if the remaining range is empty. Consumes nothing.
    ///
    /// Use this to assert that a rule must match the entire remaining input,
    /// not just a prefix: `.sequence(decimal, .endOfSequence())`
    static func endOfSequence() -> Self {
        Self { _, range in
            guard range.isEmpty else { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
}


// MARK: - Annotation

extension Rule {
    
    /// Wraps the match in a named capture group.
    ///
    /// The entire matched range becomes a `RuleCapture` node with the given name.
    /// Any captures from the inner rule become `innerCaptures` of the new node.
    func capture(as name: N) -> Self {
        Self { sequence, range in
            guard let result = self.evaluate(sequence, in: range) else { return nil }
            let capture = RuleCapture(name: name, range: result.range, innerCaptures: result.captures)
            return RuleMatch(range: result.range, captures: [capture])
        }
    }
    
    /// Wraps the match in a named capture only if a name is provided.
    ///
    /// Convenience for optional capturing in generic combinators.
    /// Passes through unchanged when `name` is `nil`.
    func captureIf(_ name: N?) -> Self {
        guard let name else { return self }
        return self.capture(as: name)
    }
}


// MARK: - Combinator

extension Rule {
    
    /// Matches a sequence of rules consecutively.
    ///
    /// Each rule is applied in order, starting where the previous one ended.
    /// If any rule fails, the entire sequence fails. Captures from all rules
    /// are combined in the result.
    static func sequence(_ rules: Self...) -> Self {
        precondition(rules.count > 0, "At least one rule is required.")
        let rules = ContiguousArray(rules)
        
        return Self { sequence, range in
            var remaining = range
            var captures: [RuleCapture<T,N>] = []
            
            for rule in rules {
                guard let result = rule.evaluate(sequence, in: remaining) else { return nil }
                remaining = result.range.upperBound..<range.upperBound
                if !result.captures.isEmpty {
                    captures.append(contentsOf: result.captures)
                }
            }
            
            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }
}


// MARK: - Selectors

extension Rule {
    
    /// Returns the result of the first rule that matches.
    ///
    /// Rules are evaluated in order — the first success wins. Order defines priority.
    /// Use this when you need deterministic precedence (e.g. keywords before identifiers).
    static func firstMatch(_ rules: Self...) -> Self {
        precondition(rules.count > 0, "At least one rule is required.")
        let rules = ContiguousArray(rules)
        
        return Self { sequence, range in
            for rule in rules {
                if let result = rule.evaluate(sequence, in: range) { return result }
            }
            return nil
        }
    }
    
    /// Returns the result of the rule that consumes the most input.
    ///
    /// All rules are evaluated and the longest match wins. Order does not matter.
    /// Use this when the correct interpretation is the greediest one (e.g. tokenizers).
    static func longestMatch(_ rules: Self...) -> Self {
        precondition(rules.count > 0, "At least one rule is required.")
        let rules = ContiguousArray(rules)
        
        return Self { sequence, range in
            var best: RuleMatch<T,N>?
            
            for rule in rules {
                if let result = rule.evaluate(sequence, in: range) {
                    if best == nil || best!.range.upperBound < result.range.upperBound {
                        best = result
                    }
                }
            }
            
            return best
        }
    }
}


// MARK: - Quantifiers

extension Rule {
    
    /// Repeats a rule between `min` and `max` times (greedy).
    ///
    /// The rule is applied as many times as possible up to `max`. Succeeds if it
    /// matched at least `min` times. Stops early if the rule fails or makes no
    /// progress (to prevent infinite loops with zero-width matches).
    static func `repeat`(_ rule: Self, min: UInt, max: UInt) -> Self {
        precondition(min <= max, "Minimum must be ≤ maximum.")
        
        return Self { sequence, range in
            var count: UInt = 0
            var remaining = range
            var captures: [RuleCapture<T,N>] = []
            
            while !remaining.isEmpty && count < max {
                // Check for match.
                guard let result = rule.evaluate(sequence, in: remaining) else { break }
                // If there is no progress break to avoid infinite looping.
                if result.range.upperBound == remaining.lowerBound { break }
                // Increase count.
                count += 1
                // Prepare next loop.
                remaining = result.range.upperBound..<range.upperBound
                if !result.captures.isEmpty {
                    captures.append(contentsOf: result.captures)
                }
            }
            
            guard min <= count else { return nil }
            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }
    
    /// Matches the rule zero or one time.
    ///
    /// Equivalent to .optional or `.repeat(rule, min: 0, max: 1)`  but faster than using `repeat`.
    static func zeroOrOne(_ rule: Self) -> Self {
        return rule.optional
    }
    
    /// Matches the rule zero or more times.
    ///
    /// Equivalent to `.repeat(rule, min: 0, max: .max)` but faster than using `repeat`.
    static func zeroOrMore(_ rule: Self) -> Self {
        
        return Self { sequence, range in
            var remaining = range
            var captures: [RuleCapture<T,N>] = []
            
            while !remaining.isEmpty {
                // Check for match.
                guard let result = rule.evaluate(sequence, in: remaining) else { break }
                // If there is no progress break to avoid infinite looping.
                if result.range.upperBound == remaining.lowerBound { break }
                // Prepare next loop.
                remaining = result.range.upperBound..<range.upperBound
                if !result.captures.isEmpty {
                    captures.append(contentsOf: result.captures)
                }
            }
            
            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }
    
    /// Matches the rule one or more times.
    ///
    /// Equivalent to `.repeat(rule, min: 1, max: .max)`  but faster than using `repeat`.
    static func oneOrMore(_ rule: Self) -> Self {
        
        return Self { sequence, range in
            var remaining = range
            var captures: [RuleCapture<T,N>] = []
            
            while !remaining.isEmpty {
                // Check for match.
                guard let result = rule.evaluate(sequence, in: remaining) else { break }
                // If there is no progress break to avoid infinite looping.
                if result.range.upperBound == remaining.lowerBound { break }
                // Prepare next loop.
                remaining = result.range.upperBound..<range.upperBound
                if !result.captures.isEmpty {
                    captures.append(contentsOf: result.captures)
                }
            }
            
            guard range.lowerBound < remaining.lowerBound else { return nil }
            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }
    
    /// Makes the rule optional (zero or one match).
    ///
    /// Instance property for readability in chains: `sign.optional`, `whitespace.optional`.
    var optional: Self {
        
        return Self { sequence, range in
            if let result = self.evaluate(sequence, in: range) { return result }
            return RuleMatch(range: range.lowerBound..<range.lowerBound)
        }
    }
}


// MARK: - Lookahead

extension Rule {
    
    /// Succeeds if the rule *would* match at the current position, but consumes nothing.
    ///
    /// Use this to assert what comes next without advancing the position.
    /// For example, `.expecting(keyword)` ensures a keyword follows,
    /// but leaves it for the next rule to consume.
    static func expecting(_ rule: Self) -> Self {
        Self { sequence, range in
            guard rule.evaluate(sequence, in: range) != nil else { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
    
    /// Succeeds if the rule would *not* match at the current position. Consumes nothing.
    ///
    /// The complement of `expecting`. Most commonly used inside repeat loops:
    /// `.zeroOrMore(.sequence(.notExpecting(stop), .any()))` — consume everything
    /// until `stop` would match.
    static func notExpecting(_ rule: Self) -> Self {
        Self { sequence, range in
            guard !range.isEmpty else { return nil }
            if rule.evaluate(sequence, in: range) != nil { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
}


// MARK: - Convenience: Repeat with Termination

extension Rule {
    
    /// Repeats `body` until `end` would match, then stops. Does NOT consume `end`.
    ///
    /// Equivalent to `.zeroOrMore(.sequence(.notExpecting(end), body))`.
    /// The optional `as` parameter captures the repeated body under a name.
    static func `repeat`(_ body: Self, as bodyName: N? = nil, endBefore end: Self) -> Self {
        zeroOrMore(.sequence(.notExpecting(end), body))
            .captureIf(bodyName)
    }
    
    /// Repeats `body` until `end` matches, then consumes `end` as well.
    ///
    /// At each position the rule first tries `end`. If `end` matches, the
    /// repetition succeeds and includes `end` in the consumed range.
    /// Otherwise `body` is evaluated; if `body` fails, the whole rule fails.
    /// The optional `as` parameter captures the repeated body under a name.
    /// To also capture `end`, apply `.capture(as:)` to the end rule before passing it.
    static func `repeat`(_ body: Self, as bodyName: N? = nil, endAfter end: Self) -> Self {
        
        return Self { sequence, range in
            var remaining = range
            var captures: [RuleCapture<T,N>] = []
            
            while !remaining.isEmpty {
                // Check for end match.
                if let result = end.evaluate(sequence, in: remaining) {
                    // Only add capture if needed since this has an negaive impact on performence.
                    if !result.captures.isEmpty { captures.append(contentsOf: result.captures) }
                    return RuleMatch(range: range.lowerBound..<result.range.upperBound, captures: captures)
                }
                // Else evaluate body match.
                guard let result = body.evaluate(sequence, in: remaining) else { break }
                // If there is no progress break to avoid infinite looping.
                if result.range.upperBound == remaining.lowerBound { break }
                // Prepare next loop
                remaining = result.range.upperBound..<range.upperBound
                if !result.captures.isEmpty {
                    captures.append(contentsOf: result.captures)
                }
            }
        
            return nil
        }
    }
}


// MARK: - Indirection (Rule Library)

/// A named collection of rules that enables mutual recursion and late binding.
///
/// Rules can reference other rules in the library by key. The lookup happens at
/// evaluation time, so rules can be registered after the reference is created.
/// This enables mutually recursive grammars (e.g. expressions containing sub-expressions).
///
/// The library tracks recursion depth during evaluation and triggers a
/// `preconditionFailure` if it exceeds `maxDepth`, catching infinite recursion early.
class RuleLibrary<I: Hashable, T: Collection,N: Equatable> {
    
    /// Maximum allowed recursion depth (default: 1000).
    let maxDepth: Int
    
    /// Current recursion depth during evaluation.
    ///
    /// Incremented on entry, decremented on exit via `defer`.
    /// Readable by rules that want to bail out gracefully before hitting the hard limit.
    private(set) var depth: Int = 0
    
    private var dictionary: [I: Rule<T,N>] = [:]
    
    /// Creates a rule library with a configurable recursion depth limit.
    init(maxDepth: Int = 1_000) {
        self.maxDepth = maxDepth
    }
    
    subscript(identifier: I) -> Rule<T,N>? {
        get { dictionary[identifier] }
        set { dictionary[identifier] = newValue }
    }
    
    /// Evaluates a named rule from the library with recursion depth tracking.
    ///
    /// Returns the match result, or `nil` if the rule does not exist or does not match.
    func evaluate(_ identifier: I, sequence: T, in range: Range<T.Index>) -> RuleMatch<T,N>? {
        precondition(depth < maxDepth, "Rule recursion depth exceeded \(maxDepth). Possible infinite recursion.")
        
        depth += 1
        defer { depth -= 1 }
        
        return dictionary[identifier]?.evaluate(sequence, in: range)
    }
}

extension Rule {
    
    /// Creates a rule that delegates to a named rule in a library.
    ///
    /// The library is captured weakly to avoid retain cycles. The actual rule lookup
    /// happens at evaluation time, so rules can be registered after this reference
    /// is created. This enables mutually recursive grammars.
    static func rule<I: Hashable>(_ identifier: I, in library: RuleLibrary<I, T, N>) -> Self {
        weak var weakLibrary: RuleLibrary<I, T, N>? = library
        
        return Self { sequence, range in
            weakLibrary?.evaluate(identifier, sequence: sequence, in: range)
        }
    }
}

MaximilianServais · March 29, 2026, 6:17pm

Code Part 2

RuleString.swift

//
//  RuleString.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  String-specific matchers for Rule<String, N>.
//


// MARK: - String-Specific Matchers

extension Rule where T == String {

    // MARK: Core

    /// Matches a single character whose sole Unicode scalar falls within one of the given ranges.
    ///
    /// Only single-scalar characters are matched. Multi-scalar characters
    /// (e.g. flag emoji) will not match.
    ///
    ///     let digit  = Rule<String, Never>.unicode(["0"..."9"])
    ///     let letter = Rule<String, Never>.unicode(["a"..."z", "A"..."Z"])
    static func unicode(_ ranges: [ClosedRange<UnicodeScalar>]) -> Self {
        // // In case of 1 range: single contains check.
        if ranges.count == 1 {
            let range = ranges[0]
            return Self { sequence, searchRange in
                guard !searchRange.isEmpty else { return nil }
                let character = sequence[searchRange.lowerBound]
                // Only match characters with exactly one Unicode scalar.
                guard character.unicodeScalars.count == 1,
                      let scalar = character.unicodeScalars.first,
                      range.contains(scalar) else { return nil }
                return RuleMatch(range: searchRange.lowerBound..<sequence.index(after: searchRange.lowerBound))
            }
        }
        // // In case of 2+ ranges: iterate.
        let contiguous = ContiguousArray(ranges)
        return Self { sequence, searchRange in
            guard !searchRange.isEmpty else { return nil }
            let character = sequence[searchRange.lowerBound]
            // Only match characters with exactly one Unicode scalar.
            guard character.unicodeScalars.count == 1,
                  let scalar = character.unicodeScalars.first else { return nil }
            for range in contiguous {
                if range.contains(scalar) {
                    return RuleMatch(range: searchRange.lowerBound..<sequence.index(after: searchRange.lowerBound))
                }
            }
            return nil
        }
    }

    // MARK: Convenience

    /// Matches a single character whose sole Unicode scalar falls within one of the given ranges.
    ///
    ///     .unicode("a"..."z", "A"..."Z")
    static func unicode(_ ranges: ClosedRange<UnicodeScalar>...) -> Self {
        unicode(ranges)
    }

    // MARK: Character
    
    /// Matches a single character that equals one of the given Unicode scalars.
    ///
    ///     .unicode(".", ",", "!")
    static func characters(_ characters: Character...) -> Self {
        precondition(!characters.isEmpty, "Must not be empty.")
        
        // Rule for single units.
        if characters.count == 1 {
            let singleCharacter = characters[0]
            return Self { sequence, range in
                let index = range.lowerBound
                guard index < range.upperBound else { return nil }
                guard sequence[index] == singleCharacter else { return nil }
                return RuleMatch(range: index..<sequence.index(after: index))
            }
        }
        
        // Rule for multiple units.
        let characters = ContiguousArray(characters)
        return Self { sequence, range in
            let index = range.lowerBound
            guard index < range.upperBound else { return nil }
            for character in characters {
                if sequence[index] == character {
                    return RuleMatch(range: index..<sequence.index(after: index))
                }
            }
            return nil
        }
    }

    // MARK: Word

    /// Matches an exact string at the current position.
    ///
    /// Compares character by character. Use `caseInsensitive: true` to ignore case.
    static func word(_ word: String, caseInsensitive: Bool = false) -> Self {
        precondition(!word.isEmpty, "Word must not be empty.")

        return Self { sequence, range in
            var index = range.lowerBound

            for character in word {
                guard index < range.upperBound else { return nil }

                let match = caseInsensitive
                ? sequence[index].lowercased() == character.lowercased()
                : sequence[index] == character

                guard match else { return nil }
                index = sequence.index(after: index)
            }

            return RuleMatch(range: range.lowerBound..<index)
        }
    }
}

RuleUTF16.swift

//
//  RuleUTF16.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  String.UTF16View-specific matchers for Rule<String.UTF16View, N>.
//


// MARK: - String.UTF16View-Specific Matchers

extension Rule where T == String.UTF16View {

    // MARK: Core

    /// Matches a single UTF-16 code unit within any of the given ranges.
    ///
    /// Uses a bitmap for O(1) lookup when there are many ranges,
    /// direct comparison for few ranges.
    ///
    ///     let digit  = Rule<String.UTF16View, Never>.utf16([0x30...0x39])
    ///     let letter = Rule<String.UTF16View, Never>.utf16([0x41...0x5A, 0x61...0x7A])
    static func utf16(_ ranges: [ClosedRange<UInt16>]) -> Self {
        // In case of 1 range: single contains check.
        if ranges.count == 1 {
            let range = ranges[0]
            return Self { sequence, searchRange in
                guard !searchRange.isEmpty else { return nil }
                guard range.contains(sequence[searchRange.lowerBound]) else { return nil }
                return RuleMatch(range: searchRange.lowerBound..<sequence.index(after: searchRange.lowerBound))
            }
        }
        // In case of 2-4 ranges: iterate.
        if ranges.count <= 4 {
            let contiguous = ContiguousArray(ranges)
            return Self { sequence, searchRange in
                guard !searchRange.isEmpty else { return nil }
                let unit = sequence[searchRange.lowerBound]
                for range in contiguous {
                    if range.contains(unit) {
                        return RuleMatch(range: searchRange.lowerBound..<sequence.index(after: searchRange.lowerBound))
                    }
                }
                return nil
            }
        }
        // In case of 5+ ranges: bitmap for O(1) lookup.
        var bitmap = ContiguousArray<UInt64>(repeating: 0, count: 0xD800 / 64)
        for range in ranges {
            for unit in range {
                bitmap[Int(unit / 64)] |= 1 << (unit % 64)
            }
        }
        return Self { sequence, searchRange in
            guard !searchRange.isEmpty else { return nil }
            let unit = sequence[searchRange.lowerBound]
            // Surrogate code units (0xD800-0xDFFF) never match.
            guard unit < 0xD800 else { return nil }
            guard bitmap[Int(unit / 64)] & (1 << (unit % 64)) != 0 else { return nil }
            return RuleMatch(range: searchRange.lowerBound..<sequence.index(after: searchRange.lowerBound))
        }
    }

    // MARK: UInt16 Convenience

    /// Matches a single UTF-16 code unit within any of the given ranges.
    ///
    ///     .utf16(0x30...0x39, 0x41...0x5A)
    static func utf16(_ ranges: ClosedRange<UInt16>...) -> Self {
        utf16(ranges)
    }

    /// Matches a single UTF-16 code unit equal to one of the given values.
    ///
    ///     .utf16(0x22)   // matches '"'
    static func utf16(_ units: UInt16...) -> Self {
        precondition(!units.isEmpty, "Must not be empty.")
        
        // Rule for single units.
        if units.count == 1 {
            let codeUnit = units[0]
            return Self { sequence, range in
                let index = range.lowerBound
                guard index < range.upperBound else { return nil }
                guard sequence[index] == codeUnit else { return nil }
                return RuleMatch(range: index..<sequence.index(after: index))
            }
        }
        
        // Rule for multiple units.
        let utf16Units = ContiguousArray(units)
        return Self { sequence, range in
            let index = range.lowerBound
            guard index < range.upperBound else { return nil }
            for codeUnit in utf16Units {
                if sequence[index] == codeUnit {
                    return RuleMatch(range: index..<sequence.index(after: index))
                }
            }
            return nil
        }
    }

    // MARK: Word

    /// Matches an exact string at the current position.
    ///
    /// Compares code unit by code unit against the UTF-16 encoding of `word`.
    static func word(_ word: String) -> Self {
        let utf16Units = ContiguousArray(word.utf16)
        precondition(!utf16Units.isEmpty, "Word must not be empty.")

        return Self { sequence, range in
            var index = range.lowerBound

            for codeUnit in utf16Units {
                guard index < range.upperBound else { return nil }
                guard sequence[index] == codeUnit else { return nil }
                index = sequence.index(after: index)
            }

            return RuleMatch(range: range.lowerBound..<index)
        }
    }
    
    // MARK: UnicodeScalar Convenience - Only for this example

    /// A convinence methode so we can write .utf16("a"..."z", "A"..."Z"), This methode alows use all
    /// Unicodescalars in the prameter but will crash if the Unicodescalars are out of scope. This is fare
    /// from ideal and is only inteded to be used in this example to make the examples easier to read.
 
    static func utf16(_ ranges: ClosedRange<UnicodeScalar>...) -> Self {
        utf16(ranges.map {
            precondition($0.upperBound.value <= 0xD7FF,
                "Unicode scalar U+\(String($0.upperBound.value, radix: 16, uppercase: true)) "
                + "is outside the BMP. UTF16View can only match BMP scalars (U+0000 to U+D7FF).")
            return UInt16($0.lowerBound.value)...UInt16($0.upperBound.value)
        })
    }

    static func utf16(_ scalars: UnicodeScalar...) -> Self {
        utf16(scalars.map {
            precondition($0.value <= 0xD7FF,
                "Unicode scalar U+\(String($0.value, radix: 16, uppercase: true)) "
                + "is outside the BMP. UTF16View can only match BMP scalars (U+0000 to U+D7FF).")
            let unit = UInt16($0.value)
            return unit...unit
        })
    }
}

RuleCustomExtensions.swift

//
//  RuleCustomExtensions.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 28.03.26.
//


// MARK: - String.UTF16View Fast Scanning

extension Rule where T == String.UTF16View {

    /// Skips forward to the next occurrence of `codeUnit`.
    /// Returns a zero-width match at that position, or `nil` if not found.
    static func jumpToNext(_ codeUnit: UInt16) -> Self {
        Self { sequence, range in
            var index = range.lowerBound

            while index < range.upperBound {
                if sequence[index] == codeUnit {
                    return RuleMatch(range: index..<index)
                }
                index = sequence.index(after: index)
            }

            return nil
        }
    }

    /// Skips forward (past the current position) to the next code unit within `codeUnitRange`.
    /// Returns a zero-width match at that position, or `nil` if not found.
    static func jumpToNext(_ codeUnitRange: ClosedRange<UInt16>) -> Self {
        Self { sequence, range in
            // Start one past the current position to guarantee progress.
            var index = sequence.index(after: range.lowerBound)

            while index < range.upperBound {
                if codeUnitRange.contains(sequence[index]) {
                    return RuleMatch(range: index..<index)
                }
                index = sequence.index(after: index)
            }

            return nil
        }
    }

    /// Skips forward (past the current position) to the next code unit within any of the given ranges.
    /// Returns a zero-width match at that position, or `nil` if not found.
    static func jumpToNext(_ codeUnitRanges: ClosedRange<UInt16>...) -> Self {
        Self { sequence, range in
            // Start one past the current position to guarantee progress.
            var index = sequence.index(after: range.lowerBound)

            while index < range.upperBound {
                for codeUnitRange in codeUnitRanges {
                    if codeUnitRange.contains(sequence[index]) {
                        return RuleMatch(range: index..<index)
                    }
                }
                index = sequence.index(after: index)
            }

            return nil
        }
    }
}

Examples.swift

//
//  Examples.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 22.03.26.
//
//  Practical examples showing how to use Rule for common pattern matching tasks.
//  Each example builds on the previous one, introducing new primitives step by step.
//

import Foundation


// MARK: - Finding Numbers

/// The simplest useful rule: matching integers and decimal numbers in a string.
///
/// This example introduces `unicode`, `oneOrMore`, `sequence`, and `optional` —
/// the basic building blocks you will use in almost every rule.

func findingNumbers() {
    typealias MyRule = Rule<String,Never>
    
    let digit = MyRule.unicode("0"..."9")
    
    // An integer: one or more digits.
    let integer = MyRule.oneOrMore(digit)
    
    // A decimal number: digits, a dot, then more digits.
    // The fractional part is optional, so "42" and "42.5" both match.
    let decimal = MyRule.sequence(
        integer,
        MyRule.sequence(.word("."), integer).optional
    )
    
    let input = "42.5"
    if let match = decimal.evaluate(input) {
        let matched = input[match.range]
        print("Number: \(matched)")
        // Number: 42.5
    }
}


// MARK: - Finding Dates

/// Matching dates in DD.MM.YYYY format — a slightly more complex pattern
/// that shows how `sequence` and `repeat` combine for fixed-length fields.
///
/// This also introduces `capture` to extract the individual components
/// (day, month, year) from a match, so you can work with them separately.

func findingDates() {
    enum Token {
        case day, month, year, date
    }
    
    typealias MyRule = Rule<String,Token>
    
    let digit = MyRule.unicode("0"..."9")
    let dot   = MyRule.word(".")
    
    // Exactly two digits for day and month, four for year.
    let day   = MyRule.repeat(digit, min: 2, max: 2).capture(as: .day)
    let month = MyRule.repeat(digit, min: 2, max: 2).capture(as: .month)
    let year  = MyRule.repeat(digit, min: 4, max: 4).capture(as: .year)
    
    let date = MyRule.sequence(day, dot, month, dot, year).capture(as: .date)
    
    let input = "22.03.2026"
    if let match = date.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<10]
        // ╰─ Date [0..<10]
        //    ├─ Day [0..<2]
        //    ├─ Month [3..<5]
        //    ╰─ Year [6..<10]
        
        // Extract the individual parts using the capture tree:
        if let dateCapture = match.captures.first {
            for part in dateCapture.innerCaptures {
                print("\(part.name): \(input[part.range])")
            }
            // Day: 22
            // Month: 03
            // Year: 2026
        }
    }
}


// MARK: - Finding Quoted Strings

/// Matching quoted strings like `"hello world"` — including everything between
/// the opening and closing quote.
///
/// This introduces `repeat(endAfter:)` which repeats a body rule until
/// an end pattern matches, then consumes the end pattern as well.
/// It also shows `repeat(endBefore:)` which stops before the end pattern
/// without consuming it.

func findingQuotedStrings() {
    enum Token {
        case content, quotedString
    }
    
    typealias MyRule = Rule<String,Token>

    let quote = MyRule.word("\"")
    
    // Match everything from opening quote through closing quote.
    // `endAfter` consumes the closing quote as part of the match.
    let quotedString = MyRule.sequence(
        quote,
        MyRule.repeat(.any(), as: .content, endAfter: quote)
    ).capture(as: .quotedString)
    
    let input = "\"hello world\""
    if let match = quotedString.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<13]
        // ╰─ QuotedString [0..<13]
        //    ╰─ Content [1..<12]
        
        // Extract just the content without quotes:
        if let content = match.captures.first?.find(.content) {
            print("Content: \(input[content.range])")
            // Content: hello world
        }
    }
}


// MARK: - Finding Identifiers and Keywords

/// Matching programming-language-style identifiers (variable names) and
/// distinguishing them from keywords.
///
/// This introduces `firstMatch` for priority-based selection: keywords
/// are checked first, so `if` is recognized as a keyword, not an identifier.
/// It also shows `notExpecting` to ensure a keyword is not just a prefix
/// of a longer identifier (e.g. `iffy` should be an identifier, not keyword `if`).

func findingIdentifiersAndKeywords() {
    enum Token {
        case identifier, keyword
    }
    
    typealias MyRule = Rule<String,Token>
    
    let letter = MyRule.unicode("a"..."z", "A"..."Z")
    let digit  = MyRule.unicode("0"..."9")
    let letterOrDigit = MyRule.firstMatch(letter, digit)
    
    // An identifier starts with a letter, followed by letters or digits.
    let identifier = MyRule.sequence(
        letter,
        .zeroOrMore(letterOrDigit)
    )
    
    // A keyword is a specific word that is NOT followed by more letters/digits.
    // Without `notExpecting`, the word "iffy" would match keyword "if".
    func keyword(_ word: String) -> MyRule {
        .sequence(
            .word(word),
            .notExpecting(letterOrDigit)
        ).capture(as: .keyword)
    }
    
    // Keywords have priority over identifiers — check them first.
    let token = MyRule.firstMatch(
        keyword("if"),
        keyword("else"),
        keyword("return"),
        identifier.capture(as: .identifier)
    )
    
    for input in ["if", "iffy", "return", "x1"] {
        if let match = token.evaluate(input) {
            //    let name = match.captures.first?.name ?? "?"
            //    print("\(input) → \(name)")
            if let name = match.captures.first?.name {
                print("\(input) → \(name)")
            } else {
                print("\(input) → ?")
            }
        }
    }
    // if → Keyword
    // iffy → Identifier
    // return → Keyword
    // x1 → Identifier
}


// MARK: - Finding Multi-Line Comments

/// Matching C-style block comments: `/* ... */` — where the content can
/// span multiple lines and contain anything except the closing `*/`.
///
/// This shows `notExpecting` combined with `any` inside a repeat loop:
/// "consume any character, as long as it is not the start of `*/`."
/// This is a fundamental pattern for matching delimited content.

func findingMultiLineComments() {
    enum Token {
        case open, close, body, comment
    }
    
    typealias MyRule = Rule<String,Token>

    let open  = MyRule.word("/*")
    let close = MyRule.word("*/")
    
    // Repeat any character as long as `*/` does not follow, then consume `*/`.
    let comment = MyRule.sequence(
        open.capture(as: .open),
        MyRule.repeat(.any(), as: .body, endAfter: close.capture(as: .close))
    ).capture(as: .comment)
    
    let input = "/* This is\na comment */"
    if let match = comment.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<23]
        // ╰─ Comment [0..<23]
        //    ├─ Open [0..<2]
        //    ├─ Body [2..<21]
        //    ╰─ Close [21..<23]
        
        if let body = match.captures.first?.find(.body) {
            print("Body: \(input[body.range])")
            // Body:  This is
            // a comment
        }
    }
}


// MARK: - Finding Key-Value Pairs

/// Matching simple `key = value` pairs separated by newlines —
/// like a configuration file or HTTP headers.
///
/// This combines several techniques: `element(where:)` for custom predicates,
/// `repeat(endBefore:)` to collect characters up to a delimiter, and
/// `oneOrMore` to collect multiple pairs.

func findingKeyValuePairs() {
    enum Token {
        case key, value, pair
    }
    
    typealias MyRule = Rule<String,Token>
    
    let gap     = MyRule.zeroOrMore(.characters(" ","\t"))
    let newline = MyRule.word("\n")
    let equals  = MyRule.word("=")
    
    // A key is one or more letters.
    let key = MyRule.oneOrMore(.unicode("a"..."z", "A"..."Z")).capture(as: .key)
    
    // A value is everything up to a newline or end of input.
    let value = MyRule.oneOrMore(
        .sequence(.notExpecting(newline), .notExpecting(.endOfSequence()), .any())
    ).capture(as: .value)
    
    // A single pair: key = value
    let pair = MyRule.sequence(
        key, gap, equals, gap, value
    ).capture(as: .pair)
    
    // Multiple pairs separated by newlines.
    let config = MyRule.sequence(
        pair,
        .zeroOrMore(.sequence(newline, pair))
    )
    
    let input = "host = localhost\nport = 8080\nmode = debug"
    if let match = config.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<41]
        // ├─ Pair [0..<16]
        // │  ├─ Key [0..<4]
        // │  ╰─ Value [7..<16]
        // ├─ Pair [17..<28]
        // │  ├─ Key [17..<21]
        // │  ╰─ Value [24..<28]
        // ╰─ Pair [29..<41]
        //    ├─ Key [29..<33]
        //    ╰─ Value [36..<41]
        
        // Read the parsed key-value pairs:
        for pairCapture in match.captures {
            if let k = pairCapture.find(.key),
               let v = pairCapture.find(.value) {
                print("\(input[k.range]) → \(input[v.range])")
            }
        }
        // host → localhost
        // port → 8080
        // mode → debug
    }
}

MaximilianServais · March 29, 2026, 6:25pm

Code Part 3

MathExpression.swift

//
//  MathExpression.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A complete math expression parser built with Rule.
//
//  This example demonstrates how Rule scales from simple patterns to a full
//  recursive-descent parser with operator precedence, functions, error recovery,
//  and capture tree evaluation — all built from the same composable primitives.
//
//  Grammar:
//
//    root        =  gap? (expression | gap | unknown)+
//    expression  =  term (('+' | '-') term)*
//    term        =  factor (('*' | '/') factor)*
//    factor      =  '-'? (number | function | parenthesized)
//    function    =  functionKey '(' expression ')'
//    parenthesized = '(' expression ')'
//    number      =  hexadecimal | decimal | integer
//    hexadecimal =  '#' hexDigits
//    decimal     =  digits '.' digits | '.' digits | digits '.'
//    integer     =  digits
//    unknown     =  any single character (error recovery)
//

import Foundation

private enum Token {
    case root, expression, term, factor, function, parenthesized, number, hexadecimal, decimal, integer, unknown
    case negate, plus, minus, multiply, divide
    case sin, cos, sqrt, abs
}

// MARK: - Diagnostics

/// A diagnostic marker at a specific position in the input.
private struct Diagnostic {
    let name: Token
    let startOffset: Int
    let length: Int
}

/// Collects all "Unknown Expression" diagnostics from a capture tree.
private func collectDiagnostics(_ capture: RuleCapture<String,Token>, in source: String) -> [Diagnostic] {
    var results: [Diagnostic] = []
    
    if capture.name == .unknown {
        let startOffset = source.distance(from: source.startIndex, to: capture.range.lowerBound)
        let length = source.distance(from: capture.range.lowerBound, to: capture.range.upperBound)
        results.append(Diagnostic(name: capture.name, startOffset: startOffset, length: length))
    }
    
    for inner in capture.innerCaptures {
        results.append(contentsOf: collectDiagnostics(inner, in: source))
    }
    
    return results
}

/// Prints the input with a visual marker line below any unknown tokens.
///
///     ┌──────────────────────────────────────┐
///     │ Diagnostic: Unknown expression       │
///     │                                      │
///     │ 123+456+yxc+(2*5*zu)                 │
///     │         ^^^      ^^                  │
///     └──────────────────────────────────────┘
private func printDiagnostics(for input: String, capture: RuleCapture<String,Token>) {
    let diagnostics = collectDiagnostics(capture, in: input)
    
    let label = diagnostics.isEmpty
    ? "Diagnostic: No issues"
    : "Diagnostic: Unknown expression"
    
    var markerChars = Array(repeating: Character(" "), count: input.count)
    for diag in diagnostics {
        for offset in diag.startOffset..<(diag.startOffset + diag.length) {
            if offset < markerChars.count { markerChars[offset] = "^" }
        }
    }
    let markerLine = String(markerChars)
    
    let contentWidth = max(input.count, label.count)
    let border = String(repeating: "─", count: contentWidth + 2)
    let pad = { (s: String) in s + String(repeating: " ", count: contentWidth - s.count) }
    
    print("┌" + border + "┐")
    print("│ " + pad(label) + " │")
    print("│ " + pad("")    + " │")
    print("│ " + pad(input) + " │")
    if !diagnostics.isEmpty { print("│ " + pad(markerLine) + " │") }
    print("└" + border + "┘")
}


// MARK: - Evaluation

/// Walks the capture tree and computes a numeric result.
///
/// Every decision is based on capture names — the source text is only
/// read for number literals. Operators and functions are identified
/// purely by their capture names.
private func evaluateMath(_ capture: RuleCapture<String,Token>, in source: String) -> Double {
    
    switch capture.name {
        case .root:
            if capture.innerCaptures.count == 1,
               capture.innerCaptures[0].name == .expression {
                return evaluateMath(capture.innerCaptures[0], in: source)
            }
            return .nan
            
        case .expression:
            var result = evaluateMath(capture.innerCaptures[0], in: source)
            var i = 1
            while i + 1 < capture.innerCaptures.count {
                let op  = capture.innerCaptures[i].name
                let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
                if op == .plus  { result += rhs }
                else if op == .minus { result -= rhs }
                i += 2
            }
            return result
            
        case .term:
            var result = evaluateMath(capture.innerCaptures[0], in: source)
            var i = 1
            while i + 1 < capture.innerCaptures.count {
                let op  = capture.innerCaptures[i].name
                let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
                if op == .multiply  { result *= rhs }
                else if op == .divide { result /= rhs }
                i += 2
            }
            return result
            
        case .factor:
            var sign  = 1.0
            var value = 0.0
            for inner in capture.innerCaptures {
                if inner.name == .negate { sign = -1.0 }
                else { value = evaluateMath(inner, in: source) }
            }
            return sign * value
            
        case .integer:
            return Double(String(source[capture.range]))!
            
        case .decimal:
            return Double(String(source[capture.range]))!
            
        case .hexadecimal:
            let hex = String(source[capture.range]).dropFirst() // skip '#'
            return Double(UInt64(hex, radix: 16) ?? 0)
            
        case .parenthesized:
            for inner in capture.innerCaptures {
                if inner.name == .expression { return evaluateMath(inner, in: source) }
            }
            preconditionFailure("Parenthesized expression without inner Expression")
            
        case .function:
            var funcName = Token.unknown
            var argument = 0.0
            for inner in capture.innerCaptures {
                switch inner.name {
                    case .sin, .cos, .sqrt, .abs: funcName = inner.name
                    case .expression: argument = evaluateMath(inner, in: source)
                    default: break
                }
            }
            switch funcName {
                case .sin:  return sin(argument)
                case .cos:  return cos(argument)
                case .sqrt: return sqrt(argument)
                case .abs:  return abs(argument)
                default: preconditionFailure("Unknown function")
            }
            
        case .unknown:
            return .nan
            
        default:
            preconditionFailure("Unknown capture name: \(capture.name)")
    }
}


// MARK: - Demo

/// Runs the math expression parser demo.
///
/// Parses, evaluates, and prints a series of math expressions covering
/// basic arithmetic, operator precedence, parentheses, negation, decimals,
/// hexadecimals, functions, error recovery, and capture tree output.
func mathExpressionDemo() {
    typealias R = Rule<String,Token>
    
    // --- Character classes ---
    
    let unicode16Digit      = R.unicode("0"..."9")
    let unicode16Hex        = R.characters("0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f","A","B","C","D","E","F")
    //    let unicode16Letter     = R.unicode(.letters)
    let unicode16Whitespace = R.characters(" ")
    
    // --- Character sequences ---
    
    let digits       = R.oneOrMore(unicode16Digit)
    let hexadecimals = R.oneOrMore(unicode16Hex)
    let gap          = R.oneOrMore(unicode16Whitespace)
    
    // --- Punctuation ---
    
    let decimalSeparator = R.characters(".")
    let parenthesisOpen  = R.characters("(")
    let parenthesisClose = R.characters(")")
    
    // --- Operators ---
    // Each operator has its own capture name so the evaluation function can
    // identify it without reading the source text.
    // "Negate" is separate from "Minus": Minus is binary (a - b), Negate is unary (-a).
    
    let plus     = R.characters("+").capture(as: .plus)
    let minus    = R.characters("-").capture(as: .minus)
    let negate   = R.characters("-").capture(as: .negate)
    let multiply = R.characters("*").capture(as: .multiply)
    let divide   = R.characters("/").capture(as: .divide)
    
    let addOrSubtract    = R.firstMatch(plus, minus)
    let multiplyOrDivide = R.firstMatch(multiply, divide)
    
    // --- Keywords ---
    // Case insensitive: sin, SIN, and Sin all match.
    // longestMatch ensures "sinh" would win over "sin" if added later.
    
    let functionKey = R.longestMatch(
        R.word("sin",  caseInsensitive: true).capture(as: .sin),
        R.word("cos",  caseInsensitive: true).capture(as: .cos),
        R.word("sqrt", caseInsensitive: true).capture(as: .sqrt),
        R.word("abs",  caseInsensitive: true).capture(as: .abs)
    )
    
    // --- Numbers ---
    // longestMatch ensures "3.14" matches as Decimal, not Integer "3".
    
    let integer = digits.capture(as: .integer)
    
    let decimal = R.longestMatch(
        R.sequence(digits, decimalSeparator, digits),
        R.sequence(decimalSeparator, digits),
        R.sequence(digits, decimalSeparator)
    ).capture(as: .decimal)
    
    let hexadecimal = R.sequence(R.characters("#"), hexadecimals).capture(as: .hexadecimal)
    let number      = R.longestMatch(hexadecimal, decimal, integer)
    
    // --- Error recovery ---
    
    let unknownExpression = R.any().capture(as: .unknown)
    
    // --- Grammar ---
    // The RuleLibrary enables recursion: an expression can contain a
    // parenthesized expression, which again contains an expression.
    
    enum GrammarKey { case expression, term, factor }
    let grammar = RuleLibrary<GrammarKey, String, Token>()
    
    let expression = R.rule(.expression, in: grammar)
    let term       = R.rule(.term,       in: grammar)
    let factor     = R.rule(.factor,     in: grammar)
    
    let parenthesizedExpression = R.sequence(
        parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
    ).capture(as: .parenthesized)
    
    let function = R.sequence(
        functionKey, parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
    ).capture(as: .function)
    
    // Factor: optional negation + number, function, or parenthesized expression.
    grammar[.factor] = R.sequence(
        negate.optional, .firstMatch(number, function, parenthesizedExpression)
    ).capture(as: .factor)
    
    // Term: factors joined by * or /. Left-to-right: a * b / c = (a * b) / c.
    grammar[.term] = R.sequence(
        factor, .zeroOrMore(.sequence(gap.optional, multiplyOrDivide, gap.optional, factor))
    ).capture(as: .term)
    
    // Expression: terms joined by + or -. Left-to-right: a + b - c = (a + b) - c.
    grammar[.expression] = R.sequence(
        term, .zeroOrMore(R.sequence(gap.optional, addOrSubtract, gap.optional, term))
    ).capture(as: .expression)
    
    // --- Parse ---
    // The outer loop tries expression first, then skips whitespace, then
    // captures one unknown character as fallback. This ensures the parser
    // never fails — unknown input is visible in the capture tree.
    
    let fullMatch = R.sequence(
        gap.optional,
        .oneOrMore(.firstMatch(expression, gap, unknownExpression)),
        .endOfSequence()
    ).capture(as: .root)
    
    func parse(_ input: String) -> RuleCapture<String,Token>? {
        fullMatch.evaluate(input)?.captures.first
    }
    
    // --- Test helper ---
    
    func test(_ input: String, expected: Double) {
        guard let root = parse(input) else {
            print("[FAIL] '\(input)' — parse failed")
            return
        }
        
        let result = evaluateMath(root, in: input)
        
        let passed = expected.isNaN ? result.isNaN : Swift.abs(result - expected) < 0.0001
        
        if passed { print("[PASS] '\(input)' = \(result)") }
        else      { print("[FAIL] '\(input)' = \(result)  (expected \(expected))") }
        
        if !collectDiagnostics(root, in: input).isEmpty {
            printDiagnostics(for: input, capture: root)
        }
    }
    
    // --- Tests ---
    
    print("--- Basic arithmetic ---\n")
    test("1 + 2",               expected: 3)
    test("10 - 3",              expected: 7)
    test("2 * 3",               expected: 6)
    test("10 / 4",              expected: 2.5)
    
    print("\n--- Operator precedence ---\n")
    test("2 + 3 * 4",           expected: 14)
    test("10 - 2 * 3",          expected: 4)
    test("2 * 3 + 4 * 5",       expected: 26)
    
    print("\n--- Parentheses ---\n")
    test("(2 + 3) * 4",                    expected: 20)
    test("((1 + 2))",                      expected: 3)
    test("(2 + 3) * (4 - 1)",              expected: 15)
    test("(3 + ((2 + 6) / 4)) * (6 - 1)",  expected: 25)
    
    print("\n--- Negation ---\n")
    test("-5",                  expected: -5)
    test("-5 + 10",             expected: 5)
    test("-(2 + 3)",            expected: -5)
    test("-2 * -3",             expected: 6)
    
    print("\n--- Integers ---\n")
    test("42",                  expected: 42)
    test("100 + 200",           expected: 300)
    
    print("\n--- Decimals ---\n")
    test("3.14",                expected: 3.14)
    test("0.5 + 0.5",           expected: 1.0)
    test("2.5 * 4",             expected: 10.0)
    test(".5 + .5",             expected: 1.0)
    test("3. + .14",            expected: 3.14)
    
    print("\n--- Hexadecimal ---\n")
    test("#FF",                 expected: 255)
    test("#10 + #10",           expected: 32)
    test("#A * 2",              expected: 20)
    
    print("\n--- Functions ---\n")
    test("sqrt(9)",             expected: 3.0)
    test("abs(0 - 5)",          expected: 5.0)
    test("sqrt(3 * 3 + 4 * 4)", expected: 5.0)
    test("cos(0)",              expected: 1.0)
    test("SIN(0)",              expected: 0.0)
    test("Sqrt(16)",            expected: 4.0)
    
    print("\n--- Unknown tokens ---\n")
    test("123+456+yxc+(2*5*zu)", expected: .nan)
    
    print("\n--- Capture tree ---\n")
    let example = "2 + 3 * ( 4 + 1 )"
    if let root = parse(example) {
        print("'\(example)':\n")
        print(root.debugDescription)
        
        // Root [0..<18]
        // ╰─ Expression [0..<18]
        //    ├─ Term [0..<1]
        //    │  ╰─ Factor [0..<1]
        //    │     ╰─ Integer [0..<1]
        //    ├─ Plus [2..<3]
        //    ╰─ Term [4..<18]
        //       ├─ Factor [4..<5]
        //       │  ╰─ Integer [4..<5]
        //       ├─ Multiply [6..<7]
        //       ╰─ Factor [8..<18]
        //          ╰─ ParenthesizedExpression [8..<18]
        //             ╰─ Expression [10..<15]
        //                ├─ Term [10..<11]
        //                │  ╰─ Factor [10..<11]
        //                │     ╰─ Integer [10..<11]
        //                ├─ Plus [12..<13]
        //                ╰─ Term [14..<15]
        //                   ╰─ Factor [14..<15]
        //                      ╰─ Integer [14..<15]
    }
}

BenchmarkInput.swift

//
//  BenchmarkInput.swift
//  Universal Pattern Recognizer
// 
//  Created by Maximilian Servais on 22.03.26.
//
//  Deterministic test data and timing infrastructure for performance benchmarks.
//
//  The text is built from two sections:
//
//  Flat section  — 6 000 log-like lines (~510 000 characters)
//  ────────────────────────────────────────────────────────────
//  These patterns can be matched by Swift Regex and Rule alike.
//  They are designed to stress pure throughput on a large, realistic input.
//
//    Dates  "DD.MM.YYYY"      6 000 instances
//    Quoted strings            6 000 instances
//    Numbers (int + decimal)  ~18 000 instances
//    Words                    ~60 000 instances
//
//  Nested section — ~1 400 expressions (~75 000 characters)
//  ────────────────────────────────────────────────────────────
//  These patterns require counting nesting depth. Classical regular expressions
//  cannot handle them; Rule can, using the same composable primitives.
//
//    Balanced parentheses  1 400 expressions  (depths 2–8, 200 per depth)
//    Block comments          700 expressions  (depths 1–4, 175 per depth)
//
//  Total input size: ~585 000 characters.
//

import Foundation


// MARK: - Test Input

/// The shared input for all benchmarks.
///
/// Computed once on first access and reused across every benchmark function.
/// The content is fully deterministic — identical across all runs.
let benchmarkText: String = {
    var lines: [String] = []
    lines.reserveCapacity(8_500)

    // ── Flat section ────────────────────────────────────────────────────────

    let levels   = ["INFO", "WARN", "ERROR"]
    let methods  = ["GET", "POST", "PUT", "DELETE", "PATCH"]
    let paths    = (0..<20).map { "/api/v1/resource/\($0)" }
    let statuses = [200, 201, 400, 404, 500]

    // Cycle through five valid dates so every line contains DD.MM.YYYY.
    let dates    = ["01.01.2025", "15.06.2025", "22.03.2026", "07.11.2024", "30.09.2026"]

    for i in 0..<6_000 {
        let date   = dates[i % dates.count]
        let hh     = String(format: "%02d", (i / 3_600) % 24)
        let mm     = String(format: "%02d", (i / 60)    % 60)
        let ss     = String(format: "%02d",  i           % 60)
        let level  = levels[i   % levels.count]
        let method = methods[i  % methods.count]
        let path   = paths[i    % paths.count]
        // Two numbers per line: a decimal response time and an integer status code.
        let ms     = Double(i % 500) + 0.1 * Double((i * 7) % 10)
        let status = statuses[i % statuses.count]

        lines.append(
            "\(date) \(hh):\(mm):\(ss) [\(level)] \(method) \"\(path)\" completed in \(ms)ms with status \(status)"
        )
    }

    lines.append("") // section separator

    // ── Nested section: balanced parentheses ────────────────────────────────
    //
    // Builds arithmetic expressions with parentheses nested `depth` levels deep.
    //
    // depth 2: (1+2)
    // depth 3: ((1+2)*3)
    // depth 4: (((1+2)*3)+4)
    // …
    //
    // A rule can find all matching groups recursively in a single pass.
    // A classical regex cannot count nesting depth at all.

    for depth in 2...8 {
        for base in 0..<200 {
            var expr = "\(base % 9 + 1)"
            for d in 1..<depth {
                let op: Character = d % 2 == 0 ? "+" : "*"
                expr = "(\(expr)\(op)\((base + d) % 9 + 1))"
            }
            lines.append(expr)
        }
    }

    lines.append("") // section separator

    // ── Nested section: block comments ──────────────────────────────────────
    //
    // Builds C-style block comments nested `depth` levels deep.
    //
    // depth 1: /* comment */
    // depth 2: /* outer /* comment */ outer */
    // depth 3: /* a /* b /* comment */ b */ a */
    //
    // A rule with a recursive grammar matches the outermost comment in one pass
    // and exposes all inner comments as captures — no second pass required.
    // Classical regex fails even at depth 2.

    for depth in 1...4 {
        for i in 0..<175 {
            var comment = "body \(i)"
            for d in 0..<depth {
                comment = "/* L\(d) \(comment) */"
            }
            lines.append(comment)
        }
    }

    return lines.joined(separator: "\n")
}()


// MARK: - Benchmark Print

/// Prints a single benchmark result line with aligned columns.
private func printResult(_ label: String, matches: Int, averageMs: Double) {
    let namePart  = label.padding(toLength: nameColumnWidth, withPad: " ", startingAt: 0)
    let matchPart = "Matches " + String(matches).padding(toLength: matchColumnWidth, withPad: " ", startingAt: 0)
    let timePart  = String(format: "%.1f ms", averageMs)
    print("  \(namePart) \(matchPart) \(timePart)")
}


// MARK: - Benchmark Infrastructure

/// Measures the average wall-clock time of `block` over `iterations` runs.
///
/// - Parameters:
///   - label:      A short description printed in the results table.
///   - iterations: Number of timed runs after one warmup run (default: 5).
///   - block:      The work to measure. Returns an `Int` so the compiler
///                 cannot optimise the call away.
///
/// The function prints one formatted result line and returns the average in milliseconds.
@discardableResult
func measure(_ label: String, iterations: Int = 10, block: () -> Int) -> Double {
    // Warmup: one untimed run to prime caches and JIT.
    var sink = block()

    let startTime = Date()
    for _ in 0..<iterations {
        sink = block()
    }
    let totalTime = -startTime.timeIntervalSinceNow * 1_000 // convert to ms
    let avg = totalTime / Double(iterations)
    
    // Touch sink so the compiler cannot discard the block's side-effects
    // in case print("  Matches found: \(sink)") is removed.
    _ = sink
    
    // Print result
    printResult( label, matches: sink, averageMs: avg)
    return avg
}

/// Column widths for aligned benchmark output.
private let nameColumnWidth    = 45
private let matchColumnWidth   = 10
private let timeColumnWidth    = 12

MaximilianServais · March 29, 2026, 6:26pm

Code Part 4 (last)

Again, sorry for the noise. I thought I could update the code in my original post. If this keeps happening, I'll probably move the code to GitHub.

RegexBenchmark.swift

//
//  RegexBenchmark.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 24.03.26.
//
//  Performance measurements using Swift Regex (tests 1–4) and hand-written
//  character scanners for the patterns that Regex fundamentally cannot express
//  (tests 5–6).
//
//  The manual scanners in tests 5–6 are not a workaround or an optimisation —
//  they are the only option. Every project that needs nested matching has to
//  write this kind of code by hand, once per grammar, and maintain it separately.
//  That cost should be part of any honest performance comparison.
//

import Foundation


// MARK: - Regex Benchmark

func regexBenchmark() {

    let text = benchmarkText

    print("┌─────────────────────────────────────────────────────────┐")
    print("│  Regex Benchmark                                        │")
    print("└─────────────────────────────────────────────────────────┘")
    print()

    // ── Tests 1–4: What Swift Regex can do ──────────────────────────────────

    print("  Tests 1–4  Swift Regex\n")

    // Test 1: Numbers (integers and decimals)
    //
    // Matches sequences of digits with an optional decimal part.
    // Both "42" and "3.14" are matched; the decimal point must be followed by digits.
    let numberRegex = /[0-9]+(?:\.[0-9]+)?/

    measure("1. Numbers") {
        text.matches(of: numberRegex).count
    }

    // Test 2: Words
    //
    // Matches runs of ASCII letters. This exercises the inner loop of the Regex
    // engine, which advances one character at a time through letter runs.
    let wordRegex = /[a-zA-Z]+/

    measure("2. Words") {
        text.matches(of: wordRegex).count
    }

    // Test 3: Quoted strings
    //
    // Matches a double-quoted string whose content contains no quotes.
    // All log lines in the flat section contain exactly one such string.
    let quotedRegex = /"[^"]*"/

    measure("3. Quoted strings") {
        text.matches(of: quotedRegex).count
    }

    // Test 4: Dates in DD.MM.YYYY format
    //
    // Matches exactly two digits, a dot, two digits, a dot, and four digits.
    // Every log line in the flat section begins with such a date.
    let dateRegex = /[0-9]{2}\.[0-9]{2}\.[0-9]{4}/

    measure("4. Dates DD.MM.YYYY") {
        text.matches(of: dateRegex).count
    }

    // ── Tests 5–6: What Regex cannot do ─────────────────────────────────────
    //
    // Regular expressions are built on finite automata which have no memory
    // of nesting depth. There is no Regex pattern that correctly matches
    // balanced parentheses or nested block comments for arbitrary depth.
    // The only option is a hand-written scanner with an explicit depth counter.

    print()
    print("  Tests 5–6  Manual scanner  (Regex cannot express nesting)\n")

    // Test 5: Balanced parentheses
    // Counts every complete, balanced (...) group. A single pass with a depth counter.
    // The scanner produces only a count; structured output would require more code.
    measure("5. Balanced parentheses   [manual scanner]") {
        var count = 0
        var depth = 0

        for ch in text {
            if ch == "(" {
                depth += 1
            } else if ch == ")" && depth > 0 {
                depth -= 1
                if depth == 0 { count += 1 }
            }
        }

        return count
    }

    // Test 6: Nested block comments /* ... /* ... */ ... */
    // Counts every outermost /* */ block. A single pass tracking /* and */ with a depth counter.
    // Same limitation as test 5: only a count, no structured output.
    measure("6. Nested block comments  [manual scanner]") {
        var count = 0
        var depth = 0
        var idx   = text.startIndex

        while idx < text.endIndex {
            let next = text.index(after: idx)
            guard next < text.endIndex else { break }

            if text[idx] == "/" && text[next] == "*" {
                depth += 1
                idx = text.index(after: next) // consume both characters
            } else if text[idx] == "*" && text[next] == "/" && depth > 0 {
                depth -= 1
                if depth == 0 { count += 1 }
                idx = text.index(after: next) // consume both characters
            } else {
                idx = next
            }
        }

        return count
    }

    print()
}

RuleBenchmarkString.swift

//
//  RuleBenchmarkString.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 24.03.26.
//
//  Performance measurements using Rule<String, Never>.
//
//  String uses a variable-width character layout. Every call to index(after:)
//  must decode the next UTF-8 sequence to find the character boundary,
//  it cannot simply add an offset. This is the baseline Rule performance
//  on the most common collection type.
//
//  Tests 1-4 are identical in intent to the Regex benchmark.
//  Tests 5-6 use a recursive Rule grammar, one pass, composable with every
//  other Rule primitive, no external state, structured output included.
//


// MARK: - Rule<String, Never> Benchmark

func ruleBenchmarkString() {

    typealias R = Rule<String, Never>

    let text = benchmarkText


    // ── Build rules ──────────────────────────────────────────────────────────
    //
    // Rules are constructed once before timing starts.
    // Construction allocates closures; the cost must not pollute the measurements.


    // ── Character classes ────────────────────────────────────────────────────

    let digit  = R.unicode("0"..."9")
    let letter = R.unicode("a"..."z", "A"..."Z")


    // ── Test 1: Numbers ──────────────────────────────────────────────────────
    //
    // The optional decimal part ensures "3.14" is taken as a decimal,
    // not as integer "3".

    let integer = R.oneOrMore(digit)
    let decimal = R.sequence(integer, .sequence(.word("."), integer).optional)
    let number  = decimal

    // ── Test 2: Words ────────────────────────────────────────────────────────

    let word = R.oneOrMore(letter)

    // ── Test 3: Quoted strings ───────────────────────────────────────────────
    //
    // repeat(_:endAfter:) tries the end rule at each position first.
    // If end matches, the repetition succeeds (including end).
    // Otherwise the body is consumed and the loop continues.

    let quote        = R.characters("\"")
    let quotedString = R.sequence(quote, .repeat(.any(), endAfter: quote))

    // ── Test 4: Dates in DD.MM.YYYY format ──────────────────────────────────

    let dot  = R.word(".")
    let date = R.sequence(
        .repeat(digit, min: 2, max: 2),
        dot,
        .repeat(digit, min: 2, max: 2),
        dot,
        .repeat(digit, min: 4, max: 4)
    )

    // ── Test 5: Balanced parentheses ─────────────────────────────────────────
    //
    // A recursive grammar: a group is '(' ... ')' where ... can  either another
    // group (recursion) or any non-paren character.

    enum ParenthesisRuleKey { case parenthesis }
    let parenthesisLib = RuleLibrary<ParenthesisRuleKey, String, Never>()
    let singleParenthesis = R.rule(.parenthesis, in: parenthesisLib)

    parenthesisLib[.parenthesis] = R.sequence(
        .characters("("),
        .repeat(.firstMatch(singleParenthesis, .any()), endAfter: .characters(")"))
    )

    // ── Test 6: Nested block comments ────────────────────────────────────────
    //
    // The same recursive pattern as test 5, this time for /* ... */ blocks
    // that may contain nested /* ... */ blocks.
    
    enum BlockCommentRuleKey { case blockComment }
    let blockCommentLib = RuleLibrary<BlockCommentRuleKey, String, Never>()
    let openComment  = R.word("/*")
    let closeComment = R.word("*/")
    let singleBlockComment = R.rule(.blockComment, in: blockCommentLib)

    let bodyChar = R.sequence(
        .notExpecting(openComment),
        .notExpecting(closeComment),
        .any()
    )

    blockCommentLib[.blockComment] = R.sequence(
        openComment,
        .zeroOrMore(.firstMatch(singleBlockComment, bodyChar)),
        closeComment
    )


    // ── Match function ───────────────────────────────────────────────────────

    // Scans the full text for non-overlapping matches of `rule`.
    func matches(_ rule: R) -> Int {
        var counter = 0
        var index = text.startIndex

        while index < text.endIndex {
            if let result = rule.evaluate(text, in: index..<text.endIndex) {
                counter += 1
                index = result.range.upperBound
            } else {
                index = text.index(after: index)
            }
        }

        return counter
    }


    // ── Print results ────────────────────────────────────────────────────────

    print("┌─────────────────────────────────────────────────────────┐")
    print("│  Rule<String, Never> Benchmark                          │")
    print("└─────────────────────────────────────────────────────────┘")
    print()

    print("  Tests 1-4  Rule<String, Never>, same patterns as Regex\n")

    measure("1. Numbers")          { matches(number) }
    measure("2. Words")            { matches(word) }
    measure("3. Quoted strings")   { matches(quotedString) }
    measure("4. Dates DD.MM.YYYY") { matches(date) }

    print()
    print("  Tests 5-6  Rule<String, Never>, recursive grammar, one pass\n")

    measure("5. Balanced parentheses")  { matches(singleParenthesis) }
    measure("6. Nested block comments") { matches(singleBlockComment) }

    print()
}

RuleBenchmarkUTF16.swift

//
//  RuleBenchmarkUTF16.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 24.03.26.
//
//  Performance measurements using Rule<String.UTF16View, Never>.
//
//  Measuring Rule performance on String.UTF16View is interesting because
//  UTF16View uses fixed-width 16-bit code units. index(after:) can add a
//  constant offset instead of decoding a variable-width UTF-8 sequence,
//  so iteration is cheaper than on String. This benchmark measures how
//  much that difference matters in practice.
//
//  Tests 1-4 are identical in intent to the String and Regex benchmarks.
//  Tests 5-6 use a recursive Rule grammar, one pass, composable with every
//  other Rule primitive, no external state, structured output included.
//


// MARK: - Rule<String.UTF16View, Never> Benchmark

func ruleBenchmarkUTF16() {

    let benchmarkTextUTF16 = benchmarkText.utf16

    typealias R = Rule<String.UTF16View, Never>


    // ── Build rules ──────────────────────────────────────────────────────────
    //
    // Rules are constructed once before timing starts.
    // Construction allocates closures; the cost must not pollute the measurements.


    // ── Character classes ────────────────────────────────────────────────────

    let digit  = R.utf16("0"..."9")
    let letter = R.utf16("a"..."z", "A"..."Z")


    // ── Code unit ranges ─────────────────────────────────────────────────────
    //
    // UInt16 ranges used in the optimized jumpTo tests.

    let digitRange:            ClosedRange<UInt16> = 0x30...0x39   // '0'...'9'
    let lowerRange:            ClosedRange<UInt16> = 0x61...0x7A   // 'a'...'z'
    let upperRange:            ClosedRange<UInt16> = 0x41...0x5A   // 'A'...'Z'
    let quoteUnit:             UInt16              = 0x22          // '"'
    let parenthesisRange:      ClosedRange<UInt16> = 0x28...0x29   // '('...')'
    let commentDelimiterRange: ClosedRange<UInt16> = 0x2A...0x2F   // '*'...'/'


    // ── Test 1: Numbers ──────────────────────────────────────────────────────
    //
    // The optional decimal part ensures "3.14" is taken as a decimal,
    // not as integer "3".

    let integer = R.oneOrMore(digit)
    let decimal = R.sequence(integer, .sequence(.word("."), integer).optional)
    let number  = decimal

    // ── Test 2: Words ────────────────────────────────────────────────────────

    let word = R.oneOrMore(letter)

    // ── Test 3: Quoted strings ───────────────────────────────────────────────
    //
    // repeat(_:endAfter:) tries the end rule at each position first.
    // If end matches, the repetition succeeds (including end).
    // Otherwise the body is consumed and the loop continues.

    let quote        = R.utf16("\"")
    let quotedString = R.sequence(quote, .repeat(.any(), endAfter: quote))
    let quotedStringWithJump = R.sequence(quote, .jumpToNext(quoteUnit), quote)

    // ── Test 4: Dates in DD.MM.YYYY format ──────────────────────────────────

    let dot  = R.utf16(".")
    let date = R.sequence(
        digit, digit,
        dot,
        digit, digit,
        dot,
        digit, digit, digit, digit
    )

    // ── Test 5: Balanced parentheses ─────────────────────────────────────────
    //
    // A recursive grammar: a group is '(' ... ')' where ... can  either another
    // group (recursion) or any non-paren character.

    enum ParenthesisRuleKey { case parenthesis, parenthesis_useJumpToNext }
    let parenthesisLib = RuleLibrary<ParenthesisRuleKey, String.UTF16View, Never>()
    let singleParenthesis = R.rule(.parenthesis, in: parenthesisLib)
    let singleParenthesis_useJumpToNext = R.rule(.parenthesis_useJumpToNext, in: parenthesisLib)

    parenthesisLib[.parenthesis] = R.sequence(
        .utf16("("),
        .repeat(.firstMatch(singleParenthesis, .any()), endAfter: .utf16(")"))
    )

    parenthesisLib[.parenthesis_useJumpToNext] = R.sequence(
        .utf16("("),
        .repeat(.firstMatch(singleParenthesis_useJumpToNext, .jumpToNext(parenthesisRange)), endAfter: .utf16(")"))
    )

    // ── Test 6: Nested block comments ────────────────────────────────────────
    //
    // The same recursive pattern as test 5, this time for /* ... */ blocks
    // that may contain nested /* ... */ blocks.

    enum BlockCommentRuleKey { case blockComment, blockComment_useJumpToNext }
    let blockCommentLib = RuleLibrary<BlockCommentRuleKey, String.UTF16View, Never>()
    let openComment  = R.word("/*")
    let closeComment = R.word("*/")
    let singleBlockComment = R.rule(.blockComment, in: blockCommentLib)
    let singleBlockComment_useJumpToNext = R.rule(.blockComment_useJumpToNext, in: blockCommentLib)

    let bodyChar = R.sequence(
        .notExpecting(openComment),
        .notExpecting(closeComment),
        .any()
    )

    blockCommentLib[.blockComment] = R.sequence(
        openComment,
        .zeroOrMore(.firstMatch(singleBlockComment, bodyChar)),
        closeComment
    )

    blockCommentLib[.blockComment_useJumpToNext] = R.sequence(
        openComment,
        .repeat(.firstMatch(singleBlockComment_useJumpToNext, .jumpToNext(commentDelimiterRange)), endAfter: closeComment)
    )


    // ── Match functions ──────────────────────────────────────────────────────

    // Scans the full text for non-overlapping matches of `rule`.
    func matches(_ rule: R) -> Int {
        var counter = 0
        var index = benchmarkTextUTF16.startIndex

        while index < benchmarkTextUTF16.endIndex {
            if let result = rule.evaluate(benchmarkTextUTF16, in: index..<benchmarkTextUTF16.endIndex) {
                counter += 1
                index = result.range.upperBound
            } else {
                index = benchmarkTextUTF16.index(after: index)
            }
        }

        return counter
    }

    // Scans the full text, but skips code units that cannot begin a match.
    func matches(_ rule: R, firstCharacterHint: [ClosedRange<UInt16>]) -> Int {
        var counter = 0
        var index = benchmarkTextUTF16.startIndex

        let jumpToNextIndex: () -> ()
        if firstCharacterHint.count == 1 {
            let codeUnitRange = firstCharacterHint[0]
            jumpToNextIndex = {
                index = benchmarkTextUTF16.index(after: index)
                while index < benchmarkTextUTF16.endIndex {
                    if codeUnitRange.contains(benchmarkTextUTF16[index]) { return }
                    index = benchmarkTextUTF16.index(after: index)
                }
            }
        } else {
            let codeUnitRanges = firstCharacterHint
            jumpToNextIndex = {
                index = benchmarkTextUTF16.index(after: index)
                while index < benchmarkTextUTF16.endIndex {
                    for codeUnitRange in codeUnitRanges {
                        if codeUnitRange.contains(benchmarkTextUTF16[index]) { return }
                    }
                    index = benchmarkTextUTF16.index(after: index)
                }
            }
        }

        while index < benchmarkTextUTF16.endIndex {
            if let result = rule.evaluate(benchmarkTextUTF16, in: index..<benchmarkTextUTF16.endIndex) {
                counter += 1
                index = result.range.upperBound
            } else {
                jumpToNextIndex()
            }
        }

        return counter
    }

    // Counter used by the jumpTo rules to count matches from inside the rule.
    var counter: Int = 0
    let countingRule = R { string, range in
        counter += 1
        return RuleMatch(range: range.lowerBound..<range.lowerBound)
    }


    // ── Print results ────────────────────────────────────────────────────────

    print("┌─────────────────────────────────────────────────────────┐")
    print("│  Rule<String.UTF16View, Never> Benchmark                │")
    print("└─────────────────────────────────────────────────────────┘")
    print()

    print("  Using: Rule.zeroOrMore(.repeat(.any(), endAfter: .sequence( <rule>, countingRule) ))\n")

    measure(" 1. Numbers") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(number, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 2. Words") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(word, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 3. Quoted strings") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(quotedStringWithJump, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 4. Dates DD.MM.YYYY") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(date, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 5. Balanced parentheses") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(singleParenthesis, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 6. Block comments") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.any(), endAfter: .sequence(singleBlockComment, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }

    print()
    print("  Using: matches( <rule> )\n")

    measure(" 1. Numbers")               { matches(number) }
    measure(" 2. Words")                 { matches(word) }
    measure(" 3. Quoted strings")        { matches(quotedString) }
    measure(" 4. Dates DD.MM.YYYY")      { matches(date) }
    measure(" 5. Balanced parentheses")  { matches(singleParenthesis) }
    measure(" 6. Nested block comments") { matches(singleBlockComment) }

    print()
    print("  Using: Rule.zeroOrMore(.repeat(.jumpToNext( <codeUnitRange> ), endAfter: .sequence( <rule>, countingRule) ))\n")

    measure(" 1. Numbers              [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(digitRange), endAfter: .sequence(number, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 2. Words                [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(lowerRange, upperRange), endAfter: .sequence(word, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 3. Quoted strings       [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(quoteUnit), endAfter: .sequence(quotedStringWithJump, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 4. Dates DD.MM.YYYY     [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(digitRange), endAfter: .sequence(date, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 5. Balanced parentheses [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(parenthesisRange), endAfter: .sequence(singleParenthesis_useJumpToNext, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }
    measure(" 6. Block comments       [jumpTo]") {
        counter = 0
        let scanRule = R.zeroOrMore(.repeat(.jumpToNext(commentDelimiterRange), endAfter: .sequence(singleBlockComment_useJumpToNext, countingRule)))
        let _ = scanRule.evaluate(benchmarkTextUTF16)
        return counter
    }

    print()
    print("  Using: matches( <rule>, firstCharacterHint: [<codeUnitRange>] )\n")

    measure(" 1. Numbers              [scanWithHint]") { matches(number, firstCharacterHint: [digitRange]) }
    measure(" 2. Words                [scanWithHint]") { matches(word, firstCharacterHint: [lowerRange, upperRange]) }
    measure(" 3. Quoted strings       [scanWithHint]") { matches(quotedStringWithJump, firstCharacterHint: [quoteUnit...quoteUnit]) }
    measure(" 4. Dates DD.MM.YYYY     [scanWithHint]") { matches(date, firstCharacterHint: [digitRange]) }
    measure(" 5. Balanced parentheses [scanWithHint]") { matches(singleParenthesis_useJumpToNext, firstCharacterHint: [parenthesisRange]) }
    measure(" 6. Block comments       [scanWithHint]") { matches(singleBlockComment_useJumpToNext, firstCharacterHint: [commentDelimiterRange]) }

    print()
}

ksluder · March 29, 2026, 6:55pm

Perhaps you could have appended these to your most recent reply? But yes, the alternative strategy of keeping the code on GitHub and linking to it from here seems sound.

MaximilianServais · March 29, 2026, 7:00pm

Hi @ksluder,

Thanks for your reply! Yes, I posted the first code part as a reply to my benchmarks, but the post is still visible and not hidden under my benchmark post. Did you mean to add the code at the bottom of my benchmark post? I received a message saying that I had exceeded the maximum character length allowed; otherwise, I would have done this, as it would have been the cleanest solution.

If there is another clean way of doing this, I'd be grateful for any tips!

ksluder · March 29, 2026, 7:03pm

Ah, I was unaware of the character limit. In that case, I definitely think that keeping the source code in GitHub and linking to it from here (with a specific commit hash) is preferable to having multiple posts. You might even prefer to keep the raw benchmark data on GitHub too, and just surface the highlights here. Then folks will only receive one reply notification email while still having access to your raw data.