[Pitch] Rule<T: Collection> — A Composable Pattern Recognizer for any Collection

Pitch: Rule<T: Collection> — A Composable Pattern Recognizer for any Collection

Introduction

I would like to pitch a new standard library type, Rule<T: Collection>, a universal, composable pattern recognizer for any Swift Collection.

A Rule answers one question: "Does this pattern match here, and if so, how much input does it consume?" Everything else — repetition, alternatives, lookahead, capturing — is built by composing rules together using a small set of orthogonal primitives.

Because Rule is generic over Collection, the same combinators work on String, String.UTF8View, [Character], [UInt8], or any other collection type. You define domain-specific matchers for your element type; the library provides the composition.

This pitch covers:

  • The generic core: Rule<T>, matchers, combinators, quantifiers, lookahead, and captures.
  • Standard matchers for common collection types (e.g. String, [UInt8]).
  • The RuleLibrary type for recursive and late-bound grammars.

Motivation

Swift's Regex and RegexBuilder (SE-0350, SE-0351) brought powerful string matching to the language. But regular expressions were designed in an era of simple byte streams and ASCII text. Swift's world is broader than that.

1. They only work on some strings — and not with all characters

Regex operates on String, but Swift has many ways to look at text: String.UTF8View, String.UnicodeScalarView, [Character], raw [UInt8] buffers. A compiler working on UTF-8 bytes cannot use Regex. A text processor working on [Character] — where every grapheme cluster, every emoji, every combined character like à is a single element — cannot use Regex either.

This matters because Swift deliberately distinguishes these views. Regex only supports one of them.

2. They cannot express recursive structures

Matching nested parentheses, balanced brackets, or any recursive grammar is outside the regex formalism. This is not a limitation that can be fixed with more features — it is a fundamental boundary of regular languages. Anyone who needs to parse structured data must leave Regex behind entirely and build a separate parser.

3. Extending them requires framework changes

If RegexBuilder does not have the combinator you need, you wait for a future proposal. You cannot plug in your own matching logic. With a closure-based design, any developer can write a Rule { sequence, range in ... } and it composes with every other rule immediately — no framework changes, no proposals, no waiting.

4. The mental model changes between domains

String matching uses Regex. Byte matching uses manual loops. Token matching uses switch statements. The underlying composition primitives are always the same — sequence, choice, repetition, lookahead — but every domain reinvents them from scratch.

5. What is missing is a universal pattern recognizer

Swift has a strong tradition of generic, composable abstractions: Collection, Sequence, Comparable. Pattern matching is a gap. There is no generic answer to the question "does this sequence of elements match this pattern?" — regardless of whether the elements are characters, bytes, tokens, or something else entirely.

Rule fills that gap.


Consider how the same composition works across different collection types:

// Classic string matching — familiar territory
let decimal: Rule<String> = .sequence(
    .oneOrMore(.unicode(.decimalDigits)),
    .word("."),
    .oneOrMore(.unicode(.decimalDigits))
)

// The same pattern on raw UTF-8 — for compilers, network parsers,
// and anywhere you work with bytes instead of String
let decimal: Rule<String.UTF8View> = .sequence(
    .oneOrMore(asciiDigit),
    .byte(0x2E),   // "."
    .oneOrMore(asciiDigit)
)

// [Character] — where à, 한, and 🏳️‍🌈 are each ONE element.
// No Unicode scalar complications, no combining character surprises.
let displayName: Rule<[Character]> = .oneOrMore(.first(letter, digit, emoji, .element("_")))

// And the same primitives work far beyond text —
// recognizing a swipe gesture in a touch event stream
let swipeRight: Rule<[TouchEvent]> = .sequence(
    touchDown,
    .oneOrMore(movedRight),
    touchUp
)

The first three examples solve real pain points in string and text processing that Regex cannot address. The last one shows that the same composition primitives work on any sequence where patterns matter.

Proposed Solution

A new generic type Rule<T: Collection> that provides:

  • A universal contract: every rule is a function (T, Range<T.Index>) -> RuleMatch<T>?. Match or fail, nothing more.
  • Composable primitives: a small, orthogonal set of combinators that work on any collection type.
  • Structural captures: a tree of named ranges that records what matched and where, without mixing in value interpretation.
  • User extensibility: if the library does not have what you need, write a Rule { sequence, range in ... } closure and it plugs right into the composition system.

The key insight is that the composition primitives (sequence, choice, repetition, lookahead) are collection-agnostic. Only the leaf matchers are domain-specific. By separating these concerns, one set of combinators serves every collection type.

Comparison with RegexBuilder

Capability RegexBuilder Rule
Works on String
Works on any Collection
Recursive grammars ✓ (via RuleLibrary)
User-defined matchers Limited Full (closure-based)
Captures Typed tuples Named tree (separate phase)
Value transformation Built into captures Separate phase

Rule provides a general foundation that covers the same space as RegexBuilder — and extends it to any Collection type, recursive grammars, and user-defined matchers.

Detailed Design

Core Types

/// The result of a successful rule evaluation.
struct RuleMatch<T: Collection> {
    let range: Range<T.Index>
    var captures: [RuleCapture<T>]
}

/// A named region within the input, forming a tree of nested captures.
struct RuleCapture<T: Collection> {
    let name: String
    let range: Range<T.Index>
    let innerCaptures: [RuleCapture<T>]
}

/// A composable pattern that evaluates a Collection within a given range.
struct Rule<T: Collection> {
    init(evaluationBlock: @escaping (T, Range<T.Index>) -> RuleMatch<T>?)
    func evaluate(_ sequence: T, in range: Range<T.Index>) -> RuleMatch<T>?
}

Primitive Set

The entire library is built from a small set of orthogonal primitives:

Matchers — leaf rules that inspect elements:

extension Rule {
    /// Always succeeds, consuming nothing.
    static func empty() -> Rule<T>

    /// Matches any single element.
    static func any() -> Rule<T>

    /// Matches a single element satisfying a predicate.
    static func element(where predicate: @escaping (T.Element) -> Bool) -> Rule<T>

    /// Succeeds only at the end of the remaining range.
    static func endOfSequence() -> Rule<T>
}

Combinator — sequential composition:

extension Rule {
    /// Matches rules consecutively. Fails if any rule fails.
    static func sequence(_ rules: Rule<T>...) -> Rule<T>
}

Selectors — choosing between alternatives:

extension Rule {
    /// Returns the first rule that matches. Order defines priority.
    static func firstMatch(_ rules: Rule<T>...) -> Rule<T>

    /// Returns the rule that consumes the most input. Order does not matter.
    static func longestMatch(_ rules: Rule<T>...) -> Rule<T>
}

Quantifiers — repetition:

extension Rule {
    /// Repeats a rule between min and max times (greedy).
    static func `repeat`(_ rule: Rule<T>, min: UInt, max: UInt) -> Rule<T>

    static func zeroOrOne(_ rule: Rule<T>) -> Rule<T>
    static func zeroOrMore(_ rule: Rule<T>) -> Rule<T>
    static func oneOrMore(_ rule: Rule<T>) -> Rule<T>

    /// Instance property for readability: `sign.optional`
    var optional: Rule<T> { get }
}

Lookahead — zero-width assertions:

extension Rule {
    /// Succeeds if rule would match, consumes nothing.
    static func expecting(_ rule: Rule<T>) -> Rule<T>

    /// Succeeds if rule would NOT match, consumes nothing.
    static func notExpecting(_ rule: Rule<T>) -> Rule<T>
}

Annotation — structural captures:

extension Rule {
    /// Wraps the matched range in a named capture.
    func capture(as name: String) -> Rule<T>
}

Convenience — repeat with termination:

extension Rule {
    /// Repeats body until end would match. Does NOT consume end.
    static func `repeat`(_ body: Rule<T>, as name: String? = nil, endBefore end: Rule<T>) -> Rule<T>

    /// Repeats body until end matches, then consumes end as well.
    static func `repeat`(_ body: Rule<T>, as name: String? = nil, endAfter end: Rule<T>) -> Rule<T>
}

Indirection: RuleLibrary

For recursive and mutually-recursive grammars, rules need to reference each other before they are defined. RuleLibrary provides late binding with recursion depth tracking:

class RuleLibrary<I: Hashable, T: Collection> {
    let maxDepth: Int
    private(set) var depth: Int

    init(maxDepth: Int = 1_000)

    subscript(identifier: I) -> Rule<T>? { get set }
}

extension Rule {
    /// Creates a rule that delegates to a named rule in a library.
    /// Lookup happens at evaluation time, enabling mutual recursion.
    static func rule<I: Hashable>(_ identifier: I, in library: RuleLibrary<I, T>) -> Rule<T>
}

Example — parsing a simplified JSON-like format where values can be numbers, words, or nested { key: value } objects:

let lib = RuleLibrary<String, String>()

let digit = Rule<String>.oneOrMore(.unicode(.decimalDigits))
let word  = Rule<String>.oneOrMore(.unicode(.letters))
let gap   = Rule<String>.zeroOrMore(.unicode(.whitespaces))

lib["value"] = .firstMatch(
    digit.capture(as: "Number"),
    word.capture(as: "Word"),
    .rule("object", in: lib)
)

lib["pair"] = .sequence(
    word.capture(as: "Key"),
    gap, .word(":"), gap,
    .rule("value", in: lib)
)

lib["object"] = .sequence(
    .word("{"), gap,
    .rule("pair", in: lib),
    .zeroOrMore(.sequence(.word(","), gap, .rule("pair", in: lib))),
    gap, .word("}")
).capture(as: "Object")

if let match = lib.evaluate("object", sequence: "{name: Max, age: 30, address: {city: Berlin}}",
                            in: input.startIndex..<input.endIndex) {
    print(match.debugDescription)
}

// Object [0..<42]
// ├─ Key [1..<5]        → "name"
// ├─ Word [7..<10]      → "Max"
// ├─ Key [12..<15]      → "age"
// ├─ Number [17..<19]   → "30"
// ├─ Key [21..<28]      → "address"
// ╰─ Object [30..<41]
//    ├─ Key [31..<35]   → "city"
//    ╰─ Word [37..<43]  → "Berlin"

The recursive "object" rule references itself through the library — "value" can be a "Word", a "Number", or another "object". The capture tree reflects the nesting, making it straightforward to walk the structure and extract values in a separate phase.

String-Specific Matchers

Standard matchers for Rule<String>:

extension Rule where T == String {
    /// Matches a character whose Unicode scalar belongs to the given CharacterSet.
    static func unicode(_ characterSet: CharacterSet) -> Rule<T>

    /// Matches specific Unicode scalars.
    static func unicode(_ scalars: UnicodeScalar...) -> Rule<T>

    /// Matches Unicode scalars within ranges.
    static func unicode(_ ranges: ClosedRange<UnicodeScalar>...) -> Rule<T>

    /// Matches an exact string at the current position.
    static func word(_ word: String, caseInsensitive: Bool = false) -> Rule<T>
}

Equivalent matchers can be provided for other collection types — String.UTF8View, [Character], [UInt8], and so on. The standard library would ship matchers for the most common types. But the real strength of Rule is that if something is missing, you add it yourself: write a Rule { sequence, range in ... } closure and it composes with every other rule immediately. No proposals, no framework changes — just your code, plugged into the same composition system.

Three-Phase Architecture

The design follows a strict separation of concerns:

Input → Rules → Capture Tree → Your Code
         ↑          ↑              ↑
      Patterns    Structure     Semantics

Rules decide whether something matches and how much it consumes. They know nothing about what the match means.

Captures record what was matched and where, organized as a tree. They carry no values, no types, no interpretation.

Your code walks the capture tree and decides what to do with it — compute a value, highlight syntax, report errors, or anything else.

This separation is deliberate. The same capture tree can serve different purposes: a syntax highlighter reads the ranges, a compiler reads the values, a linter reads the structure. Mixing these concerns into the rule system would limit reuse.

Design Decisions

Why captures are name + range only (no value transformers)

I considered adding a transformer closure that converts matched ranges into typed values directly in RuleCapture. The problem is generics: every capture in a tree can produce a different type, leading to type erasure (Any), existential types, or heterogeneous trees. This complexity ripples through every combinator.

The clean alternative: transform in a separate phase. Navigate the capture tree with find() and findAll(), read the ranges, and convert values in your own code. This keeps captures as a simple, serializable data structure.

Why firstMatch and longestMatch instead of a single choice

The original design had a single choice combinator. In practice, the two use cases are fundamentally different:

  • firstMatch — order matters, first success wins (priority-based, like PEG).
  • longestMatch — order does not matter, longest match wins (greediness-based, like tokenizers).

Separate names make the behavior self-documenting.

Why expecting / notExpecting instead of lookahead / not

"Lookahead" is regex jargon that requires prior knowledge. "Not" is ambiguous. expecting and notExpecting read like English, form a natural pair, and their zero-width behavior is implied by the name.

Why closures instead of a Context generic parameter

I considered Rule<T: Collection, Context> for shared state. This infects every type signature and prevents composition between rules with different context types. Since Rule is closure-backed, you simply capture state in the closure environment when needed.

Source Compatibility

This is purely additive — new types and extensions. No existing APIs are changed.

Alternatives Considered

Extend RegexBuilder to support generic collections

This would require fundamental changes to the Regex type, which is deeply tied to string processing and the regex formalism. The recursive grammar support alone would change the computational model from regular to context-free.

Use Swift macros for grammar definitions

Macros could provide DSL syntax, but the core composition primitives are still needed. Macros could be a future addition on top of Rule, not a replacement.

Result builders for rule composition

Result builders could eliminate commas in sequences and enable if/else syntax. However, Rule's existing API is already compact:

// With result builders
Rule {
    oneOrMore(unicode(.decimalDigits))
    word(".")
    oneOrMore(unicode(.decimalDigits))
}

// Current API — barely different
.sequence(
    .oneOrMore(.unicode(.decimalDigits)),
    .word("."),
    .oneOrMore(.unicode(.decimalDigits))
)

The practical gain is small — a few dots and commas. But there is also a downside: with a result builder, the composition semantics become implicit. Is the block a sequence? A firstMatch? A longestMatch? With the current API, the intent is always visible at the call site. Removing that explicitness runs counter to Swift's emphasis on clarity at the point of use.

Future Directions

  • Standard matchers for additional collection typesString.UTF8View, [Character], [UInt8], and others. The generic core is ready; the matchers are additive and can be introduced incrementally.
  • Frequently needed, reusable rules — patterns like caching/memoization are already possible today through closure capture into external data structures (following the same principle as RuleLibrary). If specific patterns prove to be widely needed, they could be added to the library as composable rules — always as new rules, never as new parameters on the core type.

Acknowledgments

This design draws on ideas from PEG (Parsing Expression Grammars), parser combinators (particularly their Haskell and Scala traditions), and the design philosophy of Swift's own RegexBuilder.

15 Likes

Code (Full Implementation) - Part 1

Here is the full implementation of Rule.

The file is self-contained with no dependencies outside of Foundation. Drop it into any Swift project or package to get started. The structure follows the pitch exactly: Rule is defined first, followed by RuleMatch and RuleCapture, then the primitive extensions in order — Matchers, Annotation, Combinator, Selectors, Quantifiers, Lookahead, Convenience, RuleLibrary, and String-specific matchers.

One addition not mentioned in the pitch: element(where:) — a generic matcher that takes a predicate closure. This is the building block for domain-specific matchers on any collection type, in the same way that unicode(_:) is built for String.

Rule

//
//  Rule.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A universal, composable pattern matching library for any Swift Collection.
//
//  Rules describe patterns that can be applied to strings, byte arrays, token lists,
//  or any other Collection type. They compose into complex parsers and lexers through
//  a small set of orthogonal primitives: matchers, combinators, selectors, quantifiers,
//  lookahead, and captures.
//

import Foundation


// MARK: - Rule

/// A composable pattern that evaluates a Collection within a given range.
///
/// `Rule` is the core building block of the library. Each rule answers one question:
/// *"Does this pattern match at this position, and if so, how much input does it consume?"*
///
/// Rules compose through combinators (`sequence`), selectors (`firstMatch`, `longestMatch`),
/// quantifiers (`repeat`, `zeroOrMore`, `oneOrMore`), and lookahead (`expecting`,
/// `notExpecting`). This small set of orthogonal primitives can express anything from
/// simple patterns to full recursive parsers.
///
/// If the library does not provide what you need, create a custom rule with a closure:
///
///     let myRule = Rule<String> { sequence, range in
///         // Your matching logic here
///     }
///
struct Rule<T: Collection> {

    private let evaluationBlock: (T, Range<T.Index>) -> RuleMatch<T>?

    /// Creates a rule with a custom evaluation closure.
    ///
    /// The closure receives the input collection and the range to match within.
    /// Return a `RuleMatch` on success, or `nil` if the pattern does not match.
    init(evaluationBlock: @escaping (T, Range<T.Index>) -> RuleMatch<T>?) {
        self.evaluationBlock = evaluationBlock
    }

    /// Evaluates the rule on `sequence` within `range`.
    ///
    /// Returns a `RuleMatch` describing the consumed range and any captures,
    /// or `nil` if the pattern does not match at the given position.
    func evaluate(_ sequence: T, in range: Range<T.Index>) -> RuleMatch<T>? {
        return evaluationBlock(sequence, range)
    }

    /// Evaluates the rule on the entire sequence.
    ///
    /// Convenience for the common case where matching starts at the beginning.
    func evaluate(_ sequence: T) -> RuleMatch<T>? {
        return evaluationBlock(sequence, sequence.startIndex..<sequence.endIndex)
    }
}


// MARK: - RuleMatch

/// The result of a successful rule evaluation.
///
/// Contains the consumed range and any captures extracted during evaluation.
/// A `nil` return from `Rule.evaluate` means the rule did not match;
/// a non-nil `RuleMatch` means it did, and describes how much input was consumed.
struct RuleMatch<T: Collection> {
    let range: Range<T.Index>
    var captures: [RuleCapture<T>]

    init(range: Range<T.Index>, captures: [RuleCapture<T>]) {
        self.range = range
        self.captures = captures
    }
}

extension RuleMatch: CustomStringConvertible {

    /// A flat summary showing the matched range and capture count.
    var description: String {
        "RuleMatch [\(range.lowerBound)..<\(range.upperBound)] captures: \(captures.count)"
    }
}

extension RuleMatch: CustomDebugStringConvertible {

    /// Full representation including the capture tree.
    var debugDescription: String {
        var description = "RuleMatch [\(range.lowerBound)..<\(range.upperBound)]"
        if !captures.isEmpty {
            description += RuleCapture.treeDescription(for: captures)
        }
        return description
    }
}


// MARK: - RuleCapture

/// A named region within the input, forming a tree of nested captures.
///
/// Captures are the structural output of rule evaluation. They record *what* was
/// matched and *where*, organized hierarchically. A capture's `name` identifies it;
/// interpretation of the captured content is left to a separate phase after matching.
struct RuleCapture<T: Collection> {
    let name: String
    let range: Range<T.Index>
    let innerCaptures: [RuleCapture<T>]
}

extension RuleCapture: CustomStringConvertible {

    /// A flat, single-line summary showing name and range.
    var description: String {
        "\(name) [\(range.lowerBound)..<\(range.upperBound)]"
    }
}

extension RuleCapture: CustomDebugStringConvertible {

    /// Full tree representation including all nested captures.
    var debugDescription: String {
        var description = "\(name) [\(range.lowerBound)..<\(range.upperBound)]"
        if !innerCaptures.isEmpty {
            description += RuleCapture.treeDescription(for: innerCaptures, indent: "")
        }
        return description
    }

    /// Renders captures as an indented ASCII tree.
    ///
    /// Produces a hierarchical view that makes it easy to visualize
    /// the structure of parsed content during development and debugging.
    static func treeDescription(for captures: [RuleCapture], indent: String = "") -> String {
        var description = ""

        for (index, capture) in captures.enumerated() {
            let isLast = index == captures.count - 1

            description += "\n"

            let captureDescription = capture.debugDescription
            let prefix = isLast ? "╰─ " : "├─ "
            let continuation = isLast ? "   " : "│  "

            let lines = captureDescription.components(separatedBy: .newlines)
            for (lineIndex, line) in lines.enumerated() {
                if lineIndex == 0 {
                    description += indent + prefix + line
                } else {
                    description += "\n" + indent + continuation + line
                }
            }
        }

        return description
    }
}


// MARK: - Capture Tree Navigation

extension RuleCapture {

    /// Finds the first capture with the given name using depth-first search.
    ///
    /// Returns `nil` if no capture with that name exists in the tree.
    func find(_ name: String) -> RuleCapture<T>? {
        if self.name == name { return self }
        for inner in innerCaptures {
            if let found = inner.find(name) { return found }
        }
        return nil
    }

    /// Finds all captures with the given name using depth-first search.
    ///
    /// Returns an empty array if no captures with that name exist in the tree.
    func findAll(_ name: String) -> [RuleCapture<T>] {
        var results: [RuleCapture<T>] = []
        if self.name == name { results.append(self) }
        for inner in innerCaptures {
            results.append(contentsOf: inner.findAll(name))
        }
        return results
    }
}


// MARK: - Matchers

extension Rule {

    /// Always succeeds, consuming nothing.
    ///
    /// Useful as a fallback in `firstMatch` for error recovery or default cases:
    /// `.firstMatch(expectedRule, .empty().capture(as: "Error"))`
    static func empty() -> Rule<T> {
        Rule<T> { _, range in
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }

    /// Matches any single element, regardless of content.
    ///
    /// Fails only if the range is empty. Use inside repeat loops to consume
    /// input one element at a time: `.zeroOrMore(.sequence(.notExpecting(stop), .any()))`
    static func any() -> Rule<T> {
        Rule<T> { sequence, range in
            guard !range.isEmpty else { return nil }
            let next = sequence.index(after: range.lowerBound)
            return RuleMatch(range: range.lowerBound..<next, captures: [])
        }
    }

    /// Matches a single element satisfying the given predicate.
    ///
    /// This is the generic building block for element-level conditions on any collection type.
    /// Domain-specific matchers like `unicode(_:)` for strings are built on the same principle.
    ///
    ///     .element(where: { $0.isLetter })
    ///     .element(where: { $0 > 0x7F })
    static func element(where predicate: @escaping (T.Element) -> Bool) -> Rule<T> {
        Rule<T> { sequence, range in
            guard !range.isEmpty else { return nil }
            guard predicate(sequence[range.lowerBound]) else { return nil }
            let next = sequence.index(after: range.lowerBound)
            return RuleMatch(range: range.lowerBound..<next, captures: [])
        }
    }

    /// Succeeds only if the remaining range is empty. Consumes nothing.
    ///
    /// Use this to assert that a rule must match the entire remaining input,
    /// not just a prefix: `.sequence(decimal, .endOfSequence())`
    static func endOfSequence() -> Rule<T> {
        Rule<T> { _, range in
            guard range.isEmpty else { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
}


// MARK: - Annotation

extension Rule {

    /// Wraps the match in a named capture group.
    ///
    /// The entire matched range becomes a `RuleCapture` node with the given name.
    /// Any captures from the inner rule become `innerCaptures` of the new node.
    func capture(as name: String) -> Rule<T> {
        Rule<T> { sequence, range in
            guard let result = self.evaluate(sequence, in: range) else { return nil }
            let capture = RuleCapture(name: name, range: result.range, innerCaptures: result.captures)
            return RuleMatch(range: result.range, captures: [capture])
        }
    }

    /// Wraps the match in a named capture only if a name is provided.
    ///
    /// Convenience for optional capturing in generic combinators.
    /// Passes through unchanged when `name` is `nil`.
    func captureIf(_ name: String?) -> Rule<T> {
        guard let name else { return self }
        return self.capture(as: name)
    }
}


// MARK: - Combinator

extension Rule {

    /// Matches a sequence of rules consecutively.
    ///
    /// Each rule is applied in order, starting where the previous one ended.
    /// If any rule fails, the entire sequence fails. Captures from all rules
    /// are combined in the result.
    static func sequence(_ rules: Rule<T>...) -> Rule<T> {
        precondition(rules.count > 0, "At least one rule is required.")

        return Rule<T> { sequence, range in
            var remaining = range
            var captures: [RuleCapture<T>] = []

            for rule in rules {
                guard let result = rule.evaluate(sequence, in: remaining) else { return nil }
                remaining = result.range.upperBound..<range.upperBound
                captures.append(contentsOf: result.captures)
            }

            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }
}


// MARK: - Selectors

extension Rule {

    /// Returns the result of the first rule that matches.
    ///
    /// Rules are evaluated in order — the first success wins. Order defines priority.
    /// Use this when you need deterministic precedence (e.g. keywords before identifiers).
    static func firstMatch(_ rules: Rule<T>...) -> Rule<T> {
        precondition(rules.count > 0, "At least one rule is required.")

        return Rule<T> { sequence, range in
            for rule in rules {
                if let result = rule.evaluate(sequence, in: range) { return result }
            }
            return nil
        }
    }

    /// Returns the result of the rule that consumes the most input.
    ///
    /// All rules are evaluated and the longest match wins. Order does not matter.
    /// Use this when the correct interpretation is the greediest one (e.g. tokenizers).
    static func longestMatch(_ rules: Rule<T>...) -> Rule<T> {
        precondition(rules.count > 0, "At least one rule is required.")

        return Rule<T> { sequence, range in
            var best: RuleMatch<T>?

            for rule in rules {
                if let result = rule.evaluate(sequence, in: range) {
                    if best == nil || best!.range.upperBound < result.range.upperBound {
                        best = result
                    }
                }
            }

            return best
        }
    }
}


// MARK: - Quantifiers

extension Rule {

    /// Repeats a rule between `min` and `max` times (greedy).
    ///
    /// The rule is applied as many times as possible up to `max`. Succeeds if it
    /// matched at least `min` times. Stops early if the rule fails or makes no
    /// progress (to prevent infinite loops with zero-width matches).
    static func `repeat`(_ rule: Rule<T>, min: UInt, max: UInt) -> Rule<T> {
        precondition(min <= max, "Minimum must be ≤ maximum.")

        return Rule<T> { sequence, range in
            var count: UInt = 0
            var remaining = range
            var captures: [RuleCapture<T>] = []

            while !remaining.isEmpty && count < max {
                guard let result = rule.evaluate(sequence, in: remaining) else { break }
                if result.range.upperBound == remaining.lowerBound { break }

                count += 1
                remaining = result.range.upperBound..<range.upperBound
                captures.append(contentsOf: result.captures)
            }

            guard min <= count else { return nil }
            return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
        }
    }

    /// Matches the rule zero or one time.
    ///
    /// Equivalent to `.repeat(rule, min: 0, max: 1)`.
    static func zeroOrOne(_ rule: Rule<T>) -> Rule<T> {
        .repeat(rule, min: 0, max: 1)
    }

    /// Matches the rule zero or more times.
    ///
    /// Equivalent to `.repeat(rule, min: 0, max: .max)`.
    static func zeroOrMore(_ rule: Rule<T>) -> Rule<T> {
        .repeat(rule, min: 0, max: .max)
    }

    /// Matches the rule one or more times.
    ///
    /// Equivalent to `.repeat(rule, min: 1, max: .max)`.
    static func oneOrMore(_ rule: Rule<T>) -> Rule<T> {
        .repeat(rule, min: 1, max: .max)
    }

    /// Makes the rule optional (zero or one match).
    ///
    /// Instance property for readability in chains: `sign.optional`, `whitespace.optional`.
    var optional: Rule<T> {
        return .zeroOrOne(self)
    }
}


// MARK: - Lookahead

extension Rule {

    /// Succeeds if the rule *would* match at the current position, but consumes nothing.
    ///
    /// Use this to assert what comes next without advancing the position.
    /// For example, `.expecting(keyword)` ensures a keyword follows,
    /// but leaves it for the next rule to consume.
    static func expecting(_ rule: Rule<T>) -> Rule<T> {
        Rule<T> { sequence, range in
            guard rule.evaluate(sequence, in: range) != nil else { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }

    /// Succeeds if the rule would *not* match at the current position. Consumes nothing.
    ///
    /// The complement of `expecting`. Most commonly used inside repeat loops:
    /// `.zeroOrMore(.sequence(.notExpecting(stop), .any()))` — consume everything
    /// until `stop` would match.
    static func notExpecting(_ rule: Rule<T>) -> Rule<T> {
        Rule<T> { sequence, range in
            guard !range.isEmpty else { return nil }
            if rule.evaluate(sequence, in: range) != nil { return nil }
            return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
        }
    }
}


// MARK: - Convenience: Repeat with Termination

extension Rule {

    /// Repeats `body` until `end` would match, then stops. Does NOT consume `end`.
    ///
    /// Equivalent to `.zeroOrMore(.sequence(.notExpecting(end), body))`.
    /// The optional `as` parameter captures the repeated body under a name.
    static func `repeat`(_ body: Rule<T>, as bodyName: String? = nil, endBefore end: Rule<T>) -> Rule<T> {
        zeroOrMore(.sequence(.notExpecting(end), body))
            .captureIf(bodyName)
    }

    /// Repeats `body` until `end` matches, then consumes `end` as well.
    ///
    /// Equivalent to `.sequence(.zeroOrMore(.sequence(.notExpecting(end), body)), end)`.
    /// The optional `as` parameter captures the repeated body under a name.
    /// To also capture `end`, apply `.capture(as:)` to the end rule before passing it.
    static func `repeat`(_ body: Rule<T>, as bodyName: String? = nil, endAfter end: Rule<T>) -> Rule<T> {
        .sequence(
            zeroOrMore(.sequence(.notExpecting(end), body))
                .captureIf(bodyName),
            end
        )
    }
}


// MARK: - Indirection (Rule Library)

/// A named collection of rules that enables mutual recursion and late binding.
///
/// Rules can reference other rules in the library by key. The lookup happens at
/// evaluation time, so rules can be registered after the reference is created.
/// This enables mutually recursive grammars (e.g. expressions containing sub-expressions).
///
/// The library tracks recursion depth during evaluation and triggers a
/// `preconditionFailure` if it exceeds `maxDepth`, catching infinite recursion early.
class RuleLibrary<I: Hashable, T: Collection> {

    /// Maximum allowed recursion depth (default: 1000).
    let maxDepth: Int

    /// Current recursion depth during evaluation.
    ///
    /// Incremented on entry, decremented on exit via `defer`.
    /// Readable by rules that want to bail out gracefully before hitting the hard limit.
    private(set) var depth: Int = 0

    private var dictionary: [I: Rule<T>] = [:]

    /// Creates a rule library with a configurable recursion depth limit.
    init(maxDepth: Int = 1_000) {
        self.maxDepth = maxDepth
    }

    subscript(identifier: I) -> Rule<T>? {
        get { dictionary[identifier] }
        set { dictionary[identifier] = newValue }
    }

    /// Evaluates a named rule from the library with recursion depth tracking.
    ///
    /// Returns the match result, or `nil` if the rule does not exist or does not match.
    func evaluate(_ identifier: I, sequence: T, in range: Range<T.Index>) -> RuleMatch<T>? {
        precondition(depth < maxDepth, "Rule recursion depth exceeded \(maxDepth). Possible infinite recursion.")

        depth += 1
        defer { depth -= 1 }

        return dictionary[identifier]?.evaluate(sequence, in: range)
    }
}

extension Rule {

    /// Creates a rule that delegates to a named rule in a library.
    ///
    /// The library is captured weakly to avoid retain cycles. The actual rule lookup
    /// happens at evaluation time, so rules can be registered after this reference
    /// is created. This enables mutually recursive grammars.
    static func rule<I: Hashable>(_ identifier: I, in library: RuleLibrary<I, T>) -> Rule<T> {
        weak var weakLibrary: RuleLibrary<I, T>? = library

        return Rule<T> { sequence, range in
            weakLibrary?.evaluate(identifier, sequence: sequence, in: range)
        }
    }
}


// MARK: - String-Specific Matchers

extension Rule where T == String {

    /// Matches a single character whose Unicode scalar belongs to the given `CharacterSet`.
    ///
    /// Only characters consisting of exactly one Unicode scalar are considered.
    /// Multi-scalar characters (e.g. flag emoji) will not match.
    static func unicode(_ characterSet: CharacterSet) -> Rule<T> {
        Rule<T> { sequence, range in
            guard !range.isEmpty else { return nil }

            let character = sequence[range.lowerBound]

            guard character.unicodeScalars.count == 1,
                  let scalar = character.unicodeScalars.first,
                  characterSet.contains(scalar) else { return nil }

            return RuleMatch(range: range.lowerBound..<sequence.index(after: range.lowerBound), captures: [])
        }
    }

    /// Matches a single character matching one of the given Unicode scalars.
    ///
    /// Convenience for matching specific characters: `.unicode("a", "b", "c")`.
    static func unicode(_ scalars: UnicodeScalar...) -> Rule<T> {
        var set = CharacterSet()
        for scalar in scalars {
            set.insert(charactersIn: String(scalar))
        }
        return unicode(set)
    }

    /// Matches a single character whose Unicode scalar falls within one of the given ranges.
    ///
    /// Convenience for matching character ranges: `.unicode("a"..."z", "A"..."Z")`.
    static func unicode(_ ranges: ClosedRange<UnicodeScalar>...) -> Rule<T> {
        var set = CharacterSet()
        for range in ranges {
            set.insert(charactersIn: range.lowerBound...range.upperBound)
        }
        return unicode(set)
    }

    /// Matches an exact string at the current position.
    ///
    /// Compares character by character. Use `caseInsensitive: true` to ignore case.
    static func word(_ word: String, caseInsensitive: Bool = false) -> Rule<T> {
        precondition(!word.isEmpty, "Word must not be empty.")

        return Rule<T> { sequence, range in
            var index = range.lowerBound

            for character in word {
                guard index < range.upperBound else { return nil }

                let match = caseInsensitive
                    ? sequence[index].lowercased() == character.lowercased()
                    : sequence[index] == character

                guard match else { return nil }
                index = sequence.index(after: index)
            }

            return RuleMatch(range: range.lowerBound..<index, captures: [])
        }
    }
}

Code (Full Implementation) - Part 2

Here are six practical examples showing the library in use, in order of increasing complexity.

Each example introduces one or two new primitives and shows a realistic use case. The progression is intentional: findingNumbers uses only the most basic matchers; findingKeyValuePairs combines almost everything. Reading them in order gives a complete tour of the API without requiring the formal documentation.

The examples are written as standalone functions so they can be called directly from a playground or a @main entry point.

  • findingNumbersunicode, oneOrMore, sequence, optional

  • findingDatesrepeat(min:max:), capture, capture tree navigation

  • findingQuotedStringsrepeat(endAfter:), find() on captures

  • findingIdentifiersAndKeywordsfirstMatch, notExpecting for word boundaries

  • findingMultiLineCommentsnotExpecting + any inside repeat loops

  • findingKeyValuePairselement(where:), repeat(endBefore:), multiple captures combined

Examples

//
//  Examples.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 22.03.26.
//
//  Practical examples showing how to use Rule for common pattern matching tasks.
//  Each example builds on the previous one, introducing new primitives step by step.
//

import Foundation


// MARK: - Finding Numbers

/// The simplest useful rule: matching integers and decimal numbers in a string.
///
/// This example introduces `unicode`, `oneOrMore`, `sequence`, and `optional` —
/// the basic building blocks you will use in almost every rule.

func findingNumbers() {
    let digit = Rule<String>.unicode(.decimalDigits)

    // An integer: one or more digits.
    let integer = Rule<String>.oneOrMore(digit)

    // A decimal number: digits, a dot, then more digits.
    // The fractional part is optional, so "42" and "42.5" both match.
    let decimal = Rule<String>.sequence(
        integer,
        Rule<String>.sequence(.word("."), integer).optional
    )

    let input = "42.5"
    if let match = decimal.evaluate(input) {
        let matched = input[match.range]
        print("Number: \(matched)")
        // Number: 42.5
    }
}


// MARK: - Finding Dates

/// Matching dates in DD.MM.YYYY format — a slightly more complex pattern
/// that shows how `sequence` and `repeat` combine for fixed-length fields.
///
/// This also introduces `capture` to extract the individual components
/// (day, month, year) from a match, so you can work with them separately.

func findingDates() {
    let digit = Rule<String>.unicode(.decimalDigits)
    let dot   = Rule<String>.word(".")

    // Exactly two digits for day and month, four for year.
    let day   = Rule<String>.repeat(digit, min: 2, max: 2).capture(as: "Day")
    let month = Rule<String>.repeat(digit, min: 2, max: 2).capture(as: "Month")
    let year  = Rule<String>.repeat(digit, min: 4, max: 4).capture(as: "Year")

    let date = Rule<String>.sequence(day, dot, month, dot, year).capture(as: "Date")

    let input = "22.03.2026"
    if let match = date.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<10]
        // ╰─ Date [0..<10]
        //    ├─ Day [0..<2]
        //    ├─ Month [3..<5]
        //    ╰─ Year [6..<10]

        // Extract the individual parts using the capture tree:
        if let dateCapture = match.captures.first {
            for part in dateCapture.innerCaptures {
                print("\(part.name): \(input[part.range])")
            }
            // Day: 22
            // Month: 03
            // Year: 2026
        }
    }
}


// MARK: - Finding Quoted Strings

/// Matching quoted strings like `"hello world"` — including everything between
/// the opening and closing quote.
///
/// This introduces `repeat(endAfter:)` which repeats a body rule until
/// an end pattern matches, then consumes the end pattern as well.
/// It also shows `repeat(endBefore:)` which stops before the end pattern
/// without consuming it.

func findingQuotedStrings() {
    let quote = Rule<String>.word("\"")

    // Match everything from opening quote through closing quote.
    // `endAfter` consumes the closing quote as part of the match.
    let quotedString = Rule<String>.sequence(
        quote,
        Rule<String>.repeat(.any(), as: "Content", endAfter: quote)
    ).capture(as: "QuotedString")

    let input = "\"hello world\""
    if let match = quotedString.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<13]
        // ╰─ QuotedString [0..<13]
        //    ╰─ Content [1..<12]

        // Extract just the content without quotes:
        if let content = match.captures.first?.find("Content") {
            print("Content: \(input[content.range])")
            // Content: hello world
        }
    }
}


// MARK: - Finding Identifiers and Keywords

/// Matching programming-language-style identifiers (variable names) and
/// distinguishing them from keywords.
///
/// This introduces `firstMatch` for priority-based selection: keywords
/// are checked first, so `if` is recognized as a keyword, not an identifier.
/// It also shows `notExpecting` to ensure a keyword is not just a prefix
/// of a longer identifier (e.g. `iffy` should be an identifier, not keyword `if`).

func findingIdentifiersAndKeywords() {
    let letter = Rule<String>.unicode(.letters)
    let digit  = Rule<String>.unicode(.decimalDigits)
    let letterOrDigit = Rule<String>.firstMatch(letter, digit)

    // An identifier starts with a letter, followed by letters or digits.
    let identifier = Rule<String>.sequence(
        letter,
        .zeroOrMore(letterOrDigit)
    )

    // A keyword is a specific word that is NOT followed by more letters/digits.
    // Without `notExpecting`, the word "iffy" would match keyword "if".
    func keyword(_ word: String) -> Rule<String> {
        .sequence(
            .word(word),
            .notExpecting(letterOrDigit)
        ).capture(as: "Keyword")
    }

    // Keywords have priority over identifiers — check them first.
    let token = Rule<String>.firstMatch(
        keyword("if"),
        keyword("else"),
        keyword("return"),
        identifier.capture(as: "Identifier")
    )

    for input in ["if", "iffy", "return", "x1"] {
        if let match = token.evaluate(input) {
            let name = match.captures.first?.name ?? "?"
            print("\(input) → \(name)")
        }
    }
    // if → Keyword
    // iffy → Identifier
    // return → Keyword
    // x1 → Identifier
}


// MARK: - Finding Multi-Line Comments

/// Matching C-style block comments: `/* ... */` — where the content can
/// span multiple lines and contain anything except the closing `*/`.
///
/// This shows `notExpecting` combined with `any` inside a repeat loop:
/// "consume any character, as long as it is not the start of `*/`."
/// This is a fundamental pattern for matching delimited content.

func findingMultiLineComments() {
    let open  = Rule<String>.word("/*")
    let close = Rule<String>.word("*/")

    // Repeat any character as long as `*/` does not follow, then consume `*/`.
    let comment = Rule<String>.sequence(
        open.capture(as: "Open"),
        Rule<String>.repeat(.any(), as: "Body", endAfter: close.capture(as: "Close"))
    ).capture(as: "Comment")

    let input = "/* This is\na comment */"
    if let match = comment.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<23]
        // ╰─ Comment [0..<23]
        //    ├─ Open [0..<2]
        //    ├─ Body [2..<21]
        //    ╰─ Close [21..<23]

        if let body = match.captures.first?.find("Body") {
            print("Body: \(input[body.range])")
            // Body:  This is
            // a comment
        }
    }
}


// MARK: - Finding Key-Value Pairs

/// Matching simple `key = value` pairs separated by newlines —
/// like a configuration file or HTTP headers.
///
/// This combines several techniques: `element(where:)` for custom predicates,
/// `repeat(endBefore:)` to collect characters up to a delimiter, and
/// `oneOrMore` to collect multiple pairs.

func findingKeyValuePairs() {
    let gap     = Rule<String>.zeroOrMore(.unicode(.init(charactersIn: " \t")))
    let newline = Rule<String>.word("\n")
    let equals  = Rule<String>.word("=")

    // A key is one or more letters.
    let key = Rule<String>.oneOrMore(.unicode(.letters)).capture(as: "Key")

    // A value is everything up to a newline or end of input.
    let value = Rule<String>.oneOrMore(
        .sequence(.notExpecting(newline), .notExpecting(.endOfSequence()), .any())
    ).capture(as: "Value")

    // A single pair: key = value
    let pair = Rule<String>.sequence(
        key, gap, equals, gap, value
    ).capture(as: "Pair")

    // Multiple pairs separated by newlines.
    let config = Rule<String>.sequence(
        pair,
        .zeroOrMore(.sequence(newline, pair))
    )

    let input = "host = localhost\nport = 8080\nmode = debug"
    if let match = config.evaluate(input) {
        print(match.debugDescription)
        // RuleMatch [0..<41]
        // ├─ Pair [0..<16]
        // │  ├─ Key [0..<4]
        // │  ╰─ Value [7..<16]
        // ├─ Pair [17..<28]
        // │  ├─ Key [17..<21]
        // │  ╰─ Value [24..<28]
        // ╰─ Pair [29..<41]
        //    ├─ Key [29..<33]
        //    ╰─ Value [36..<41]

        // Read the parsed key-value pairs:
        for pairCapture in match.captures {
            if let k = pairCapture.find("Key"),
               let v = pairCapture.find("Value") {
                print("\(input[k.range]) → \(input[v.range])")
            }
        }
        // host → localhost
        // port → 8080
        // mode → debug
    }
}

The final example is a complete math expression parser: integers, decimals, hexadecimals (#FF), the four arithmetic operators with correct precedence, unary negation, parentheses, and built-in functions (sin, cos, sqrt, abs).

It is included here for two reasons. First, it demonstrates RuleLibrary in a real recursive grammar — expression contains term, term contains factor, and factor can contain a parenthesized expression. Second, it shows the full three-phase architecture from the pitch in practice: the grammar rules match and produce a capture tree, and a separate evaluateMath function walks the tree and computes the result. The two phases share nothing except the capture names.

The parser also includes error recovery: unknown characters are captured as "Unknown Expression" rather than causing a failure, and a diagnostic printer marks their positions in the source. This pattern — firstMatch(expectedRule, .empty().capture(as: "Error")) — generalises to any situation where a parser should report problems without stopping.

MathExpression

//
//  MathExpression.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A complete math expression parser built with Rule.
//
//  This example demonstrates how Rule scales from simple patterns to a full
//  recursive-descent parser with operator precedence, functions, error recovery,
//  and capture tree evaluation — all built from the same composable primitives.
//
//  Grammar:
//
//    root        =  gap? (expression | gap | unknown)+
//    expression  =  term (('+' | '-') term)*
//    term        =  factor (('*' | '/') factor)*
//    factor      =  '-'? (number | function | parenthesized)
//    function    =  functionKey '(' expression ')'
//    parenthesized = '(' expression ')'
//    number      =  hexadecimal | decimal | integer
//    hexadecimal =  '#' hexDigits
//    decimal     =  digits '.' digits | '.' digits | digits '.'
//    integer     =  digits
//    unknown     =  any single character (error recovery)
//

import Foundation


// MARK: - Diagnostics

/// A diagnostic marker at a specific position in the input.
private struct Diagnostic {
    let name: String
    let startOffset: Int
    let length: Int
}

/// Collects all "Unknown Expression" diagnostics from a capture tree.
private func collectDiagnostics(_ capture: RuleCapture<String>, in source: String) -> [Diagnostic] {
    var results: [Diagnostic] = []

    if capture.name == "Unknown Expression" {
        let startOffset = source.distance(from: source.startIndex, to: capture.range.lowerBound)
        let length = source.distance(from: capture.range.lowerBound, to: capture.range.upperBound)
        results.append(Diagnostic(name: capture.name, startOffset: startOffset, length: length))
    }

    for inner in capture.innerCaptures {
        results.append(contentsOf: collectDiagnostics(inner, in: source))
    }

    return results
}

/// Prints the input with a visual marker line below any unknown tokens.
///
///     ┌──────────────────────────────────────┐
///     │ Diagnostic: Unknown expression       │
///     │                                      │
///     │ 123+456+yxc+(2*5*zu)                 │
///     │         ^^^      ^^                  │
///     └──────────────────────────────────────┘
private func printDiagnostics(for input: String, capture: RuleCapture<String>) {
    let diagnostics = collectDiagnostics(capture, in: input)

    let label = diagnostics.isEmpty
        ? "Diagnostic: No issues"
        : "Diagnostic: Unknown expression"

    var markerChars = Array(repeating: Character(" "), count: input.count)
    for diag in diagnostics {
        for offset in diag.startOffset..<(diag.startOffset + diag.length) {
            if offset < markerChars.count { markerChars[offset] = "^" }
        }
    }
    let markerLine = String(markerChars)

    let contentWidth = max(input.count, label.count)
    let border = String(repeating: "─", count: contentWidth + 2)
    let pad = { (s: String) in s + String(repeating: " ", count: contentWidth - s.count) }

    print("┌" + border + "┐")
    print("│ " + pad(label) + " │")
    print("│ " + pad("")    + " │")
    print("│ " + pad(input) + " │")
    if !diagnostics.isEmpty { print("│ " + pad(markerLine) + " │") }
    print("└" + border + "┘")
}


// MARK: - Evaluation

/// Walks the capture tree and computes a numeric result.
///
/// Every decision is based on capture names — the source text is only
/// read for number literals. Operators and functions are identified
/// purely by their capture names.
private func evaluateMath(_ capture: RuleCapture<String>, in source: String) -> Double {

    switch capture.name {

    case "Root":
        if capture.innerCaptures.count == 1,
           capture.innerCaptures[0].name == "Expression" {
            return evaluateMath(capture.innerCaptures[0], in: source)
        }
        return .nan

    case "Expression":
        var result = evaluateMath(capture.innerCaptures[0], in: source)
        var i = 1
        while i + 1 < capture.innerCaptures.count {
            let op  = capture.innerCaptures[i].name
            let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
            if op == "Plus"  { result += rhs }
            else if op == "Minus" { result -= rhs }
            i += 2
        }
        return result

    case "Term":
        var result = evaluateMath(capture.innerCaptures[0], in: source)
        var i = 1
        while i + 1 < capture.innerCaptures.count {
            let op  = capture.innerCaptures[i].name
            let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
            if op == "Multiply"  { result *= rhs }
            else if op == "Divide" { result /= rhs }
            i += 2
        }
        return result

    case "Factor":
        var sign  = 1.0
        var value = 0.0
        for inner in capture.innerCaptures {
            if inner.name == "Negate" { sign = -1.0 }
            else { value = evaluateMath(inner, in: source) }
        }
        return sign * value

    case "Integer":
        return Double(String(source[capture.range]))!

    case "Decimal":
        return Double(String(source[capture.range]))!

    case "Hexadecimal":
        let hex = String(source[capture.range]).dropFirst() // skip '#'
        return Double(UInt64(hex, radix: 16) ?? 0)

    case "ParenthesizedExpression":
        for inner in capture.innerCaptures {
            if inner.name == "Expression" { return evaluateMath(inner, in: source) }
        }
        preconditionFailure("Parenthesized expression without inner Expression")

    case "Function":
        var funcName = ""
        var argument = 0.0
        for inner in capture.innerCaptures {
            switch inner.name {
            case "sin", "cos", "sqrt", "abs": funcName = inner.name
            case "Expression": argument = evaluateMath(inner, in: source)
            default: break
            }
        }
        switch funcName {
        case "sin":  return sin(argument)
        case "cos":  return cos(argument)
        case "sqrt": return sqrt(argument)
        case "abs":  return abs(argument)
        default: preconditionFailure("Unknown function")
        }

    case "Unknown Expression":
        return .nan

    default:
        preconditionFailure("Unknown capture name: \(capture.name)")
    }
}


// MARK: - Demo

/// Runs the math expression parser demo.
///
/// Parses, evaluates, and prints a series of math expressions covering
/// basic arithmetic, operator precedence, parentheses, negation, decimals,
/// hexadecimals, functions, error recovery, and capture tree output.
func mathExpressionDemo() {
    typealias R = Rule<String>

    // --- Character classes ---

    let unicode16Digit      = R.unicode(.decimalDigits)
    let unicode16Hex        = R.unicode(CharacterSet(charactersIn: "0123456789abcdefABCDEF"))
    let unicode16Letter     = R.unicode(.letters)
    let unicode16Whitespace = R.unicode(.whitespaces)

    // --- Character sequences ---

    let digits       = R.oneOrMore(unicode16Digit)
    let hexadecimals = R.oneOrMore(unicode16Hex)
    let gap          = R.oneOrMore(unicode16Whitespace)

    // --- Punctuation ---

    let decimalSeparator = R.unicode(".")
    let parenthesisOpen  = R.unicode("(")
    let parenthesisClose = R.unicode(")")

    // --- Operators ---
    // Each operator has its own capture name so the evaluation function can
    // identify it without reading the source text.
    // "Negate" is separate from "Minus": Minus is binary (a - b), Negate is unary (-a).

    let plus     = R.unicode("+").capture(as: "Plus")
    let minus    = R.unicode("-").capture(as: "Minus")
    let negate   = R.unicode("-").capture(as: "Negate")
    let multiply = R.unicode("*").capture(as: "Multiply")
    let divide   = R.unicode("/").capture(as: "Divide")

    let addOrSubtract    = R.firstMatch(plus, minus)
    let multiplyOrDivide = R.firstMatch(multiply, divide)

    // --- Keywords ---
    // Case insensitive: sin, SIN, and Sin all match.
    // longestMatch ensures "sinh" would win over "sin" if added later.

    let functionKey = R.longestMatch(
        R.word("sin",  caseInsensitive: true).capture(as: "sin"),
        R.word("cos",  caseInsensitive: true).capture(as: "cos"),
        R.word("sqrt", caseInsensitive: true).capture(as: "sqrt"),
        R.word("abs",  caseInsensitive: true).capture(as: "abs")
    )

    // --- Numbers ---
    // longestMatch ensures "3.14" matches as Decimal, not Integer "3".

    let integer = digits.capture(as: "Integer")

    let decimal = R.longestMatch(
        R.sequence(digits, decimalSeparator, digits),
        R.sequence(decimalSeparator, digits),
        R.sequence(digits, decimalSeparator)
    ).capture(as: "Decimal")

    let hexadecimal = R.sequence(R.unicode("#"), hexadecimals).capture(as: "Hexadecimal")
    let number      = R.longestMatch(hexadecimal, decimal, integer)

    // --- Error recovery ---

    let unknownExpression = R.any().capture(as: "Unknown Expression")

    // --- Grammar ---
    // The RuleLibrary enables recursion: an expression can contain a
    // parenthesized expression, which again contains an expression.

    enum GrammarKey { case expression, term, factor }
    let grammar = RuleLibrary<GrammarKey, String>()

    let expression = R.rule(.expression, in: grammar)
    let term       = R.rule(.term,       in: grammar)
    let factor     = R.rule(.factor,     in: grammar)

    let parenthesizedExpression = R.sequence(
        parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
    ).capture(as: "ParenthesizedExpression")

    let function = R.sequence(
        functionKey, parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
    ).capture(as: "Function")

    // Factor: optional negation + number, function, or parenthesized expression.
    grammar[.factor] = R.sequence(
        negate.optional, .firstMatch(number, function, parenthesizedExpression)
    ).capture(as: "Factor")

    // Term: factors joined by * or /. Left-to-right: a * b / c = (a * b) / c.
    grammar[.term] = R.sequence(
        factor, .zeroOrMore(.sequence(gap.optional, multiplyOrDivide, gap.optional, factor))
    ).capture(as: "Term")

    // Expression: terms joined by + or -. Left-to-right: a + b - c = (a + b) - c.
    grammar[.expression] = R.sequence(
        term, .zeroOrMore(R.sequence(gap.optional, addOrSubtract, gap.optional, term))
    ).capture(as: "Expression")

    // --- Parse ---
    // The outer loop tries expression first, then skips whitespace, then
    // captures one unknown character as fallback. This ensures the parser
    // never fails — unknown input is visible in the capture tree.

    let fullMatch = R.sequence(
        gap.optional,
        .oneOrMore(.firstMatch(expression, gap, unknownExpression)),
        .endOfSequence()
    ).capture(as: "Root")

    func parse(_ input: String) -> RuleCapture<String>? {
        fullMatch.evaluate(input)?.captures.first
    }

    // --- Test helper ---

    func test(_ input: String, expected: Double) {
        guard let root = parse(input) else {
            print("[FAIL] '\(input)' — parse failed")
            return
        }

        let result = evaluateMath(root, in: input)

        let passed = expected.isNaN ? result.isNaN : Swift.abs(result - expected) < 0.0001

        if passed { print("[PASS] '\(input)' = \(result)") }
        else      { print("[FAIL] '\(input)' = \(result)  (expected \(expected))") }

        if !collectDiagnostics(root, in: input).isEmpty {
            printDiagnostics(for: input, capture: root)
        }
    }

    // --- Tests ---

    print("--- Basic arithmetic ---\n")
    test("1 + 2",               expected: 3)
    test("10 - 3",              expected: 7)
    test("2 * 3",               expected: 6)
    test("10 / 4",              expected: 2.5)

    print("\n--- Operator precedence ---\n")
    test("2 + 3 * 4",           expected: 14)
    test("10 - 2 * 3",          expected: 4)
    test("2 * 3 + 4 * 5",       expected: 26)

    print("\n--- Parentheses ---\n")
    test("(2 + 3) * 4",                    expected: 20)
    test("((1 + 2))",                      expected: 3)
    test("(2 + 3) * (4 - 1)",              expected: 15)
    test("(3 + ((2 + 6) / 4)) * (6 - 1)",  expected: 25)

    print("\n--- Negation ---\n")
    test("-5",                  expected: -5)
    test("-5 + 10",             expected: 5)
    test("-(2 + 3)",            expected: -5)
    test("-2 * -3",             expected: 6)

    print("\n--- Integers ---\n")
    test("42",                  expected: 42)
    test("100 + 200",           expected: 300)

    print("\n--- Decimals ---\n")
    test("3.14",                expected: 3.14)
    test("0.5 + 0.5",           expected: 1.0)
    test("2.5 * 4",             expected: 10.0)
    test(".5 + .5",             expected: 1.0)
    test("3. + .14",            expected: 3.14)

    print("\n--- Hexadecimal ---\n")
    test("#FF",                 expected: 255)
    test("#10 + #10",           expected: 32)
    test("#A * 2",              expected: 20)

    print("\n--- Functions ---\n")
    test("sqrt(9)",             expected: 3.0)
    test("abs(0 - 5)",          expected: 5.0)
    test("sqrt(3 * 3 + 4 * 4)", expected: 5.0)
    test("cos(0)",              expected: 1.0)
    test("SIN(0)",              expected: 0.0)
    test("Sqrt(16)",            expected: 4.0)

    print("\n--- Unknown tokens ---\n")
    test("123+456+yxc+(2*5*zu)", expected: .nan)

    print("\n--- Capture tree ---\n")
    let example = "2 + 3 * ( 4 + 1 )"
    if let root = parse(example) {
        print("'\(example)':\n")
        print(root.debugDescription)

        // Root [0..<18]
        // ╰─ Expression [0..<18]
        //    ├─ Term [0..<1]
        //    │  ╰─ Factor [0..<1]
        //    │     ╰─ Integer [0..<1]
        //    ├─ Plus [2..<3]
        //    ╰─ Term [4..<18]
        //       ├─ Factor [4..<5]
        //       │  ╰─ Integer [4..<5]
        //       ├─ Multiply [6..<7]
        //       ╰─ Factor [8..<18]
        //          ╰─ ParenthesizedExpression [8..<18]
        //             ╰─ Expression [10..<15]
        //                ├─ Term [10..<11]
        //                │  ╰─ Factor [10..<11]
        //                │     ╰─ Integer [10..<11]
        //                ├─ Plus [12..<13]
        //                ╰─ Term [14..<15]
        //                   ╰─ Factor [14..<15]
        //                      ╰─ Integer [14..<15]

        print("\nResult: \(evaluateMath(root, in: example))")
        // Result: 17.0
    }
}

3 Likes

Hello, @MaximilianServais

Thank you for sharing this.

I really like the examples, which all compile, and work as intended. :slight_smile:

1 Like

It is a very neat idea! I remember Dave Abrahams experimenting with it back in the early days of Swift. Where it fell short was that we never managed to get the performance up to snuff. Some things I remember, vs a dedicated engine:

  • When the data is in a continuous span, you don't need to do function calls to move around in it
  • When the data is made of fixed-size units, you can fit them in registers instead of having to use indirection (sometimes even getting SIMD out of it)
  • When the patterns are all based on integer or Unicode rules, the engine can optimize them
  • More things I'm forgetting (this was probably about 10 years ago!)

So as a library that exists in the ecosystem, I think this is super cool. As a pitch for the standard library, I'd like to see how well it performs first, rather than assume we'll be able to optimize it later. Because last time it was tried, we weren't.

8 Likes

It appears the original pitch, as well as the follow up posts, are largely (entirely?) AI generated, I'd like to ask the author to disclose exactly how much AI was used, and how much it was used in the implementation behind the pitch.

10 Likes

Why?

1 Like

Why not? We all have the right to know where code comes from, whether AI, real person, or a mix of both. AI use, especially to this degree, raises real quality and long term commitment issues that impact whether such a proposal should even be considered for use or inclusion. Is it worth discussing if all we get are more bot responses?

10 Likes

Quality and commitment questions are orthogonal to the source of the code. A human has to be involved in any event, and that human may be a subpar developer or may get bored and move on. The code is available. The design is laid out. The pitch can be judged on its merits, no matter its source.

1 Like

Nowadays, pitching a feature implies you're going to run a feature through to review and inclusion to the language, so knowing the author can actually do that is not orthogonal at all. Drive by proposals never go anywhere for good reason, the same concerns apply here.

But that's wasn't my original problem. We all deserve to know whether we're talking to a real person who is really taking feedback and updating code. This is a community, and a collaborative process. So I ask for the curtesy of disclosing when AI tools are used in our communication, and to what degree, so I can try to judge for myself the quality and reliability of the content provided. If I'm going to spend my time participating in this forum, I'd like to know I'm not arguing with a machine.

24 Likes

I don’t see how the use of AI, which is just a tool, changes anything. Until and unless AI agents start posting to the forum on their own behalf.

Considering the posts appear to be entirely AI written, it’s entirely possible this is an agent of some kind.

6 Likes

Hi Jon,

Thank you for taking the time to read my post! I can see your point about AI-generated content, so let me clarify the origin of this idea and its history, which may be of interest.

I started developing this idea three years ago as a replacement for RegEx. I needed a RegEx that could handle nested structures and work with any type of string representation. Originally, this code was intended for a calculator app (which is why I needed the nesting behaviour). Since then, however, I have used it to write my own tool to read .gcode content and perform post-processing for my 3D printing. I have also tested the code to implement a JSON parser and a Markdown processor.

The more I use the code, the more convinced I am that it should be a standard part of the Swift library. Like Swift protocols, which make code reusable, generic and combinable, Rule<T:Collection> makes Regex-like work reusable, generic and combinable. This is why I’m pitching this idea.

So far, the project has been manual work. In preparation for presenting this idea in the forum, I created a playground (I’m using Xcode), copied the rule file (you can see that all the files have recent dates), added the examples, and wrote the outlines for this post, including the design decisions and reasons for choosing certain names. Then I used AI. I gave Claude everything and told it to: Unify the code documentation, unify the layout, rewrite my outlines into proper text and format everything into Markdown. Afterwards, I made some manual changes here and there, but you are right that the text has been rewritten by AI.

Does this answer your question?

3 Likes

P.S. For disclosure, I used Apple's AI to proofread this text as well. I have seen platforms where you can mark content as 'AI generated' or 'generated with AI assistance', etc. Does this exist on this forum?

Hi @ibex,

Thank you for taking the time to test my code. I did test the examples myself before posting them, but mistakes can always happen, so thank you for verifying them!

I spent a while thinking about how best to explain the idea in this forum. I think the easiest way to understand the concept is just to let people experiment with it. I'm glad to hear you enjoyed it! :grinning_face:

Hi Jordan,

thank you for reading my proposal and of course for sharing this information. I wasn't aware of Dave Abrahams' earlier work, so it's useful to know where he encountered difficulties.

To be honest, I don't know enough about compiler internals to address all of these points. However, I will share my understanding, and please correct me if I'm wrong!

First, I want to clarify the core idea, because I think it's easy to underestimate: a Rule is simply a generic closure (T, Range<T.Index>) -> RuleMatch? where T: Collection. That's the entire contract. There is no engine or intermediate representation. When a Rule is executed, it's simply Swift code using the given collection. This means:

  • the collection defines how fast the code can run, not the framework. If the collection has a slow index access (like String), the rule has slow index access. If the collection has a faster index access (like String.UTF16View), the rule has faster index access. If the collection offers direct access to contiguous memory, the rule has access to contiguous memory, and so on. The framework adds no abstraction layer, the default matchers simply call index(after:) on whatever T is.

  • Every rule is replaceable. Predefined combinators such as sequence, oneOrMore, etc. are simply convenience functions that return closures, so developers don't have to reinvent the wheel. If a generic matcher is too slow, you can replace it (e.g. via extension Rule where T == MyContiguousCollection) and access the collection with pointer arithmetic, SIMD, or anything else.

There are three basic scenarios I can think of:

The common case: Rule with String
Swift's String has a variable-width character layout, so every index step requires index(after:). I understand that the stdlib Regex engine has privileged access to the internal UTF-8 buffer, so it has a performance advantage over Rule. However, Rule natively handles recursive and nested structures — something classical regex cannot do without external code. That external code has its own cost that should be factored into any overall comparison.

Known character width: String.UTF16View
Here the index is offset-based. The default combinators use index(after:), which is significantly faster for String.UTF16View than for String. If you know that your string has no emojis or composed characters you can use String.UTF16View and get a speed boost just by choosing a different view of the same string, using the exact same combinators.

Custom contiguous collection
This is why I created the code this way — it's where the design really pays off!
Because T is generic, you can define your own collection backed by contiguous memory (e.g. a ContiguousCollection) where the index is a pointer. The default combinators like sequence will still use index(after:), but you can write your own rule if needed — for example:

extension Rule where T == MyContiguousCollection {
    static func sequence(_ rules: [Rule<T>]) -> Rule<T> { … }
}

Your custom rule can now access the collection directly via the memory pointer, so no function calls. Start with the generic rules, profile, and swap out only the hot paths — the rest remains untouched.

The case for stdlib inclusion
The only thing set in stone is the base rule type itself. The default rules handle common patterns and nested structures 'out of the box', and provide matchers for common collections, such as String and UTF16View. You can add any rule you like, whether it's generic or collection-specific.

Regarding the privileged access that Regex seems to have: since rules are generic, the standard library could simply add extension Rule where T == String with matchers that have the same direct access to String's internal buffer. Generic rules would remain available for every other collection, while String-specific rules would close the performance gap.

I totally agree with you that we shouldn’t „assume we'll be able to optimize it later." After all, this is a pitch for the standard library and it should be carefully thought through. Thanks for sharing your thoughts and for reading this answer, which turned out much longer than planned! :grin:

P.S. I'd be happy to sketch out what a ContiguousCollection -based example could look like, if that would be useful. Also, do you know how to properly benchmark something like this without writing a full compiler?

Yes. Unfortunately you still seem to be running replies through Claude, as evidenced by the random emojis and em dashes, so I won't really be participating.

As for the idea itself, it seems powerful, but used even more rarely than regexes, and so I don't think it makes a good candidate for inclusion in the language itself. It would be more successful as an open source project, able to develop and iterate more quickly than if it was locked to the language.

What I think could be most useful here is if you benchmarked your code, analyzed it, and found any language improvements that would help improve performance, as a more concrete version of what Jordan suggested. Then you can propose smaller changes to the language that improve the capability and performance of your library.

8 Likes

Hi MaximilianServais,
I would suggest making Rule, RuleMatch, RuleCapture generic even on the name of the captured content, which simply requires being Equatable.
This makes it possible, for example and if appropriate, to create a finite list of tokens with an enum without the risk of mistyping a string.
Here's how Rule.swift would change.

Rule
//
//  Rule.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A universal, composable pattern matching library for any Swift Collection.
//
//  Rules describe patterns that can be applied to strings, byte arrays, token lists,
//  or any other Collection type. They compose into complex parsers and lexers through
//  a small set of orthogonal primitives: matchers, combinators, selectors, quantifiers,
//  lookahead, and captures.
//

import Foundation

// MARK: - Rule

/// A composable pattern that evaluates a Collection within a given range.
///
/// `Rule` is the core building block of the library. Each rule answers one question:
/// *"Does this pattern match at this position, and if so, how much input does it consume?"*
///
/// Rules compose through combinators (`sequence`), selectors (`firstMatch`, `longestMatch`),
/// quantifiers (`repeat`, `zeroOrMore`, `oneOrMore`), and lookahead (`expecting`,
/// `notExpecting`). This small set of orthogonal primitives can express anything from
/// simple patterns to full recursive parsers.
///
/// If the library does not provide what you need, create a custom rule with a closure:
///
///     let myRule = Rule<String> { sequence, range in
///         // Your matching logic here
///     }
///
struct Rule<T: Collection,N: Equatable> {
	private let evaluationBlock: (T, Range<T.Index>) -> RuleMatch<T,N>?
	
	/// Creates a rule with a custom evaluation closure.
	///
	/// The closure receives the input collection and the range to match within.
	/// Return a `RuleMatch` on success, or `nil` if the pattern does not match.
	init(evaluationBlock: @escaping (T, Range<T.Index>) -> RuleMatch<T,N>?) {
		self.evaluationBlock = evaluationBlock
	}
	
	/// Evaluates the rule on `sequence` within `range`.
	///
	/// Returns a `RuleMatch` describing the consumed range and any captures,
	/// or `nil` if the pattern does not match at the given position.
	func evaluate(_ sequence: T, in range: Range<T.Index>) -> RuleMatch<T,N>? {
		return evaluationBlock(sequence, range)
	}
	
	/// Evaluates the rule on the entire sequence.
	///
	/// Convenience for the common case where matching starts at the beginning.
	func evaluate(_ sequence: T) -> RuleMatch<T,N>? {
		return evaluationBlock(sequence, sequence.startIndex..<sequence.endIndex)
	}
}


// MARK: - RuleMatch

/// The result of a successful rule evaluation.
///
/// Contains the consumed range and any captures extracted during evaluation.
/// A `nil` return from `Rule.evaluate` means the rule did not match;
/// a non-nil `RuleMatch` means it did, and describes how much input was consumed.
struct RuleMatch<T: Collection,N: Equatable> {
	let range: Range<T.Index>
	let captures: [RuleCapture<T,N>]
	
	init(range: Range<T.Index>, captures: [RuleCapture<T,N>]) {
		self.range = range
		self.captures = captures
	}
}

extension RuleMatch: CustomStringConvertible {
	
	/// A flat summary showing the matched range and capture count.
	var description: String {
		"RuleMatch [\(range.lowerBound)..<\(range.upperBound)] captures: \(captures.count)"
	}
}

extension RuleMatch: CustomDebugStringConvertible {
	
	/// Full representation including the capture tree.
	var debugDescription: String {
		var description = "RuleMatch [\(range.lowerBound)..<\(range.upperBound)]"
		if !captures.isEmpty {
			description += RuleCapture.treeDescription(for: captures)
		}
		return description
	}
}


// MARK: - RuleCapture

/// A named region within the input, forming a tree of nested captures.
///
/// Captures are the structural output of rule evaluation. They record *what* was
/// matched and *where*, organized hierarchically. A capture's `name` identifies it;
/// interpretation of the captured content is left to a separate phase after matching.
struct RuleCapture<T: Collection,N: Equatable> {
	let name: N
	let range: Range<T.Index>
	let innerCaptures: [Self]
}

extension RuleCapture: CustomStringConvertible {
	
	/// A flat, single-line summary showing name and range.
	var description: String {
		"\(name) [\(range.lowerBound)..<\(range.upperBound)]"
	}
}

extension RuleCapture: CustomDebugStringConvertible {
	
	/// Full tree representation including all nested captures.
	var debugDescription: String {
		var description = "\(name) [\(range.lowerBound)..<\(range.upperBound)]"
		if !innerCaptures.isEmpty {
			description += Self.treeDescription(for: innerCaptures, indent: "")
		}
		return description
	}
	
	/// Renders captures as an indented ASCII tree.
	///
	/// Produces a hierarchical view that makes it easy to visualize
	/// the structure of parsed content during development and debugging.
	static func treeDescription(for captures: [Self], indent: String = "") -> String {
		var description = ""
		
		for (index, capture) in captures.enumerated() {
			let isLast = index == captures.count - 1
			
			description += "\n"
			
			let captureDescription = capture.debugDescription
			let prefix = isLast ? "╰─ " : "├─ "
			let continuation = isLast ? "   " : "│  "
			
			let lines = captureDescription.components(separatedBy: .newlines)
			for (lineIndex, line) in lines.enumerated() {
				if lineIndex == 0 {
					description += indent + prefix + line
				} else {
					description += "\n" + indent + continuation + line
				}
			}
		}
		
		return description
	}
}


// MARK: - Capture Tree Navigation

extension RuleCapture {
	
	/// Finds the first capture with the given name using depth-first search.
	///
	/// Returns `nil` if no capture with that name exists in the tree.
	func find(_ name: N) -> Self? {
		if self.name == name { return self }
		for inner in innerCaptures {
			if let found = inner.find(name) { return found }
		}
		return nil
	}
	
	/// Finds all captures with the given name using depth-first search.
	///
	/// Returns an empty array if no captures with that name exist in the tree.
	func findAll(_ name: N) -> [Self] {
		var results: [Self] = []
		if self.name == name { results.append(self) }
		for inner in innerCaptures {
			results.append(contentsOf: inner.findAll(name))
		}
		return results
	}
}


// MARK: - Matchers

extension Rule {
	
	/// Always succeeds, consuming nothing.
	///
	/// Useful as a fallback in `firstMatch` for error recovery or default cases:
	/// `.firstMatch(expectedRule, .empty().capture(as: "Error"))`
	static func empty() -> Self {
		Self { _, range in
			return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
		}
	}
	
	/// Matches any single element, regardless of content.
	///
	/// Fails only if the range is empty. Use inside repeat loops to consume
	/// input one element at a time: `.zeroOrMore(.sequence(.notExpecting(stop), .any()))`
	static func any() -> Self {
		Self { sequence, range in
			guard !range.isEmpty else { return nil }
			let next = sequence.index(after: range.lowerBound)
			return RuleMatch(range: range.lowerBound..<next, captures: [])
		}
	}
	
	/// Matches a single element satisfying the given predicate.
	///
	/// This is the generic building block for element-level conditions on any collection type.
	/// Domain-specific matchers like `unicode(_:)` for strings are built on the same principle.
	///
	///     .element(where: { $0.isLetter })
	///     .element(where: { $0 > 0x7F })
	static func element(where predicate: @escaping (T.Element) -> Bool) -> Self {
		Self { sequence, range in
			guard !range.isEmpty else { return nil }
			guard predicate(sequence[range.lowerBound]) else { return nil }
			let next = sequence.index(after: range.lowerBound)
			return RuleMatch(range: range.lowerBound..<next, captures: [])
		}
	}
	
	/// Succeeds only if the remaining range is empty. Consumes nothing.
	///
	/// Use this to assert that a rule must match the entire remaining input,
	/// not just a prefix: `.sequence(decimal, .endOfSequence())`
	static func endOfSequence() -> Self {
		Self { _, range in
			guard range.isEmpty else { return nil }
			return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
		}
	}
}


// MARK: - Annotation

extension Rule {
	
	/// Wraps the match in a named capture group.
	///
	/// The entire matched range becomes a `RuleCapture` node with the given name.
	/// Any captures from the inner rule become `innerCaptures` of the new node.
	func capture(as name: N) -> Self {
		Self { sequence, range in
			guard let result = self.evaluate(sequence, in: range) else { return nil }
			let capture = RuleCapture(name: name, range: result.range, innerCaptures: result.captures)
			return RuleMatch(range: result.range, captures: [capture])
		}
	}
	
	/// Wraps the match in a named capture only if a name is provided.
	///
	/// Convenience for optional capturing in generic combinators.
	/// Passes through unchanged when `name` is `nil`.
	func captureIf(_ name: N?) -> Self {
		guard let name else { return self }
		return self.capture(as: name)
	}
}


// MARK: - Combinator

extension Rule {
	
	/// Matches a sequence of rules consecutively.
	///
	/// Each rule is applied in order, starting where the previous one ended.
	/// If any rule fails, the entire sequence fails. Captures from all rules
	/// are combined in the result.
	static func sequence(_ rules: Self...) -> Self {
		precondition(rules.count > 0, "At least one rule is required.")
		
		return Self { sequence, range in
			var remaining = range
			var captures: [RuleCapture<T,N>] = []
			
			for rule in rules {
				guard let result = rule.evaluate(sequence, in: remaining) else { return nil }
				remaining = result.range.upperBound..<range.upperBound
				captures.append(contentsOf: result.captures)
			}
			
			return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
		}
	}
}


// MARK: - Selectors

extension Rule {
	
	/// Returns the result of the first rule that matches.
	///
	/// Rules are evaluated in order — the first success wins. Order defines priority.
	/// Use this when you need deterministic precedence (e.g. keywords before identifiers).
	static func firstMatch(_ rules: Self...) -> Self {
		precondition(rules.count > 0, "At least one rule is required.")
		
		return Self { sequence, range in
			for rule in rules {
				if let result = rule.evaluate(sequence, in: range) { return result }
			}
			return nil
		}
	}
	
	/// Returns the result of the rule that consumes the most input.
	///
	/// All rules are evaluated and the longest match wins. Order does not matter.
	/// Use this when the correct interpretation is the greediest one (e.g. tokenizers).
	static func longestMatch(_ rules: Self...) -> Self {
		precondition(rules.count > 0, "At least one rule is required.")
		
		return Self { sequence, range in
			var best: RuleMatch<T,N>?
			
			for rule in rules {
				if let result = rule.evaluate(sequence, in: range) {
					if best == nil || best!.range.upperBound < result.range.upperBound {
						best = result
					}
				}
			}
			
			return best
		}
	}
}


// MARK: - Quantifiers

extension Rule {
	
	/// Repeats a rule between `min` and `max` times (greedy).
	///
	/// The rule is applied as many times as possible up to `max`. Succeeds if it
	/// matched at least `min` times. Stops early if the rule fails or makes no
	/// progress (to prevent infinite loops with zero-width matches).
	static func `repeat`(_ rule: Self, min: UInt, max: UInt) -> Self {
		precondition(min <= max, "Minimum must be ≤ maximum.")
		
		return Self { sequence, range in
			var count: UInt = 0
			var remaining = range
			var captures: [RuleCapture<T,N>] = []
			
			while !remaining.isEmpty && count < max {
				guard let result = rule.evaluate(sequence, in: remaining) else { break }
				if result.range.upperBound == remaining.lowerBound { break }
				
				count += 1
				remaining = result.range.upperBound..<range.upperBound
				captures.append(contentsOf: result.captures)
			}
			
			guard min <= count else { return nil }
			return RuleMatch(range: range.lowerBound..<remaining.lowerBound, captures: captures)
		}
	}
	
	/// Matches the rule zero or one time.
	///
	/// Equivalent to `.repeat(rule, min: 0, max: 1)`.
	static func zeroOrOne(_ rule: Self) -> Self {
		.repeat(rule, min: 0, max: 1)
	}
	
	/// Matches the rule zero or more times.
	///
	/// Equivalent to `.repeat(rule, min: 0, max: .max)`.
	static func zeroOrMore(_ rule: Self) -> Self {
		.repeat(rule, min: 0, max: .max)
	}
	
	/// Matches the rule one or more times.
	///
	/// Equivalent to `.repeat(rule, min: 1, max: .max)`.
	static func oneOrMore(_ rule: Self) -> Self {
		.repeat(rule, min: 1, max: .max)
	}
	
	/// Makes the rule optional (zero or one match).
	///
	/// Instance property for readability in chains: `sign.optional`, `whitespace.optional`.
	var optional: Self {
		return .zeroOrOne(self)
	}
}


// MARK: - Lookahead

extension Rule {
	
	/// Succeeds if the rule *would* match at the current position, but consumes nothing.
	///
	/// Use this to assert what comes next without advancing the position.
	/// For example, `.expecting(keyword)` ensures a keyword follows,
	/// but leaves it for the next rule to consume.
	static func expecting(_ rule: Self) -> Self {
		Self { sequence, range in
			guard rule.evaluate(sequence, in: range) != nil else { return nil }
			return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
		}
	}
	
	/// Succeeds if the rule would *not* match at the current position. Consumes nothing.
	///
	/// The complement of `expecting`. Most commonly used inside repeat loops:
	/// `.zeroOrMore(.sequence(.notExpecting(stop), .any()))` — consume everything
	/// until `stop` would match.
	static func notExpecting(_ rule: Self) -> Self {
		Self { sequence, range in
			guard !range.isEmpty else { return nil }
			if rule.evaluate(sequence, in: range) != nil { return nil }
			return RuleMatch(range: range.lowerBound..<range.lowerBound, captures: [])
		}
	}
}


// MARK: - Convenience: Repeat with Termination

extension Rule {
	
	/// Repeats `body` until `end` would match, then stops. Does NOT consume `end`.
	///
	/// Equivalent to `.zeroOrMore(.sequence(.notExpecting(end), body))`.
	/// The optional `as` parameter captures the repeated body under a name.
	static func `repeat`(_ body: Self, as bodyName: N? = nil, endBefore end: Self) -> Self {
		zeroOrMore(.sequence(.notExpecting(end), body))
			.captureIf(bodyName)
	}
	
	/// Repeats `body` until `end` matches, then consumes `end` as well.
	///
	/// Equivalent to `.sequence(.zeroOrMore(.sequence(.notExpecting(end), body)), end)`.
	/// The optional `as` parameter captures the repeated body under a name.
	/// To also capture `end`, apply `.capture(as:)` to the end rule before passing it.
	static func `repeat`(_ body: Self, as bodyName: N? = nil, endAfter end: Self) -> Self {
		.sequence(
			zeroOrMore(.sequence(.notExpecting(end), body))
				.captureIf(bodyName),
			end
		)
	}
}


// MARK: - Indirection (Rule Library)

/// A named collection of rules that enables mutual recursion and late binding.
///
/// Rules can reference other rules in the library by key. The lookup happens at
/// evaluation time, so rules can be registered after the reference is created.
/// This enables mutually recursive grammars (e.g. expressions containing sub-expressions).
///
/// The library tracks recursion depth during evaluation and triggers a
/// `preconditionFailure` if it exceeds `maxDepth`, catching infinite recursion early.
class RuleLibrary<I: Hashable, T: Collection,N: Equatable> {
	
	/// Maximum allowed recursion depth (default: 1000).
	let maxDepth: Int
	
	/// Current recursion depth during evaluation.
	///
	/// Incremented on entry, decremented on exit via `defer`.
	/// Readable by rules that want to bail out gracefully before hitting the hard limit.
	private(set) var depth: Int = 0
	
	private var dictionary: [I: Rule<T,N>] = [:]
	
	/// Creates a rule library with a configurable recursion depth limit.
	init(maxDepth: Int = 1_000) {
		self.maxDepth = maxDepth
	}
	
	subscript(identifier: I) -> Rule<T,N>? {
		get { dictionary[identifier] }
		set { dictionary[identifier] = newValue }
	}
	
	/// Evaluates a named rule from the library with recursion depth tracking.
	///
	/// Returns the match result, or `nil` if the rule does not exist or does not match.
	func evaluate(_ identifier: I, sequence: T, in range: Range<T.Index>) -> RuleMatch<T,N>? {
		precondition(depth < maxDepth, "Rule recursion depth exceeded \(maxDepth). Possible infinite recursion.")
		
		depth += 1
		defer { depth -= 1 }
		
		return dictionary[identifier]?.evaluate(sequence, in: range)
	}
}

extension Rule {
	
	/// Creates a rule that delegates to a named rule in a library.
	///
	/// The library is captured weakly to avoid retain cycles. The actual rule lookup
	/// happens at evaluation time, so rules can be registered after this reference
	/// is created. This enables mutually recursive grammars.
	static func rule<I: Hashable>(_ identifier: I, in library: RuleLibrary<I, T, N>) -> Self {
		weak var weakLibrary: RuleLibrary<I, T, N>? = library
		
		return Self { sequence, range in
			weakLibrary?.evaluate(identifier, sequence: sequence, in: range)
		}
	}
}


// MARK: - String-Specific Matchers

extension Rule where T == String {
	
	/// Matches a single character whose Unicode scalar belongs to the given `CharacterSet`.
	///
	/// Only characters consisting of exactly one Unicode scalar are considered.
	/// Multi-scalar characters (e.g. flag emoji) will not match.
	static func unicode(_ characterSet: CharacterSet) -> Self {
		Self { sequence, range in
			guard !range.isEmpty else { return nil }
			
			let character = sequence[range.lowerBound]
			
			guard character.unicodeScalars.count == 1,
				  let scalar = character.unicodeScalars.first,
				  characterSet.contains(scalar) else { return nil }
			
			return RuleMatch(range: range.lowerBound..<sequence.index(after: range.lowerBound), captures: [])
		}
	}
	
	/// Matches a single character matching one of the given Unicode scalars.
	///
	/// Convenience for matching specific characters: `.unicode("a", "b", "c")`.
	static func unicode(_ scalars: UnicodeScalar...) -> Self {
		var set = CharacterSet()
		for scalar in scalars {
			set.insert(charactersIn: String(scalar))
		}
		return unicode(set)
	}
	
	/// Matches a single character whose Unicode scalar falls within one of the given ranges.
	///
	/// Convenience for matching character ranges: `.unicode("a"..."z", "A"..."Z")`.
	static func unicode(_ ranges: ClosedRange<UnicodeScalar>...) -> Self {
		var set = CharacterSet()
		for range in ranges {
			set.insert(charactersIn: range.lowerBound...range.upperBound)
		}
		return unicode(set)
	}
	
	/// Matches an exact string at the current position.
	///
	/// Compares character by character. Use `caseInsensitive: true` to ignore case.
	static func word(_ word: String, caseInsensitive: Bool = false) -> Self {
		precondition(!word.isEmpty, "Word must not be empty.")
		
		return Self { sequence, range in
			var index = range.lowerBound
			
			for character in word {
				guard index < range.upperBound else { return nil }
				
				let match = caseInsensitive
				? sequence[index].lowercased() == character.lowercased()
				: sequence[index] == character
				
				guard match else { return nil }
				index = sequence.index(after: index)
			}
			
			return RuleMatch(range: range.lowerBound..<index, captures: [])
		}
	}
}
2 Likes

And here's how the examples change:

Examples
//
//  Examples.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 22.03.26.
//
//  Practical examples showing how to use Rule for common pattern matching tasks.
//  Each example builds on the previous one, introducing new primitives step by step.
//

import Foundation


// MARK: - Finding Numbers

/// The simplest useful rule: matching integers and decimal numbers in a string.
///
/// This example introduces `unicode`, `oneOrMore`, `sequence`, and `optional` —
/// the basic building blocks you will use in almost every rule.

func findingNumbers() {
	typealias MyRule = Rule<String,Never>
	
	let digit = MyRule.unicode(.decimalDigits)
	
	// An integer: one or more digits.
	let integer = MyRule.oneOrMore(digit)
	
	// A decimal number: digits, a dot, then more digits.
	// The fractional part is optional, so "42" and "42.5" both match.
	let decimal = MyRule.sequence(
		integer,
		MyRule.sequence(.word("."), integer).optional
	)
	
	let input = "42.5"
	if let match = decimal.evaluate(input) {
		let matched = input[match.range]
		print("Number: \(matched)")
		// Number: 42.5
	}
}


// MARK: - Finding Dates

/// Matching dates in DD.MM.YYYY format — a slightly more complex pattern
/// that shows how `sequence` and `repeat` combine for fixed-length fields.
///
/// This also introduces `capture` to extract the individual components
/// (day, month, year) from a match, so you can work with them separately.

func findingDates() {
	enum Token {
		case day, month, year, date
	}
	
	typealias MyRule = Rule<String,Token>
	
	let digit = MyRule.unicode(.decimalDigits)
	let dot   = MyRule.word(".")
	
	// Exactly two digits for day and month, four for year.
	let day   = MyRule.repeat(digit, min: 2, max: 2).capture(as: .day)
	let month = MyRule.repeat(digit, min: 2, max: 2).capture(as: .month)
	let year  = MyRule.repeat(digit, min: 4, max: 4).capture(as: .year)
	
	let date = MyRule.sequence(day, dot, month, dot, year).capture(as: .date)
	
	let input = "22.03.2026"
	if let match = date.evaluate(input) {
		print(match.debugDescription)
		// RuleMatch [0..<10]
		// ╰─ Date [0..<10]
		//    ├─ Day [0..<2]
		//    ├─ Month [3..<5]
		//    ╰─ Year [6..<10]
		
		// Extract the individual parts using the capture tree:
		if let dateCapture = match.captures.first {
			for part in dateCapture.innerCaptures {
				print("\(part.name): \(input[part.range])")
			}
			// Day: 22
			// Month: 03
			// Year: 2026
		}
	}
}


// MARK: - Finding Quoted Strings

/// Matching quoted strings like `"hello world"` — including everything between
/// the opening and closing quote.
///
/// This introduces `repeat(endAfter:)` which repeats a body rule until
/// an end pattern matches, then consumes the end pattern as well.
/// It also shows `repeat(endBefore:)` which stops before the end pattern
/// without consuming it.

func findingQuotedStrings() {
	enum Token {
		case content, quotedString
	}
	
	typealias MyRule = Rule<String,Token>

	let quote = MyRule.word("\"")
	
	// Match everything from opening quote through closing quote.
	// `endAfter` consumes the closing quote as part of the match.
	let quotedString = MyRule.sequence(
		quote,
		MyRule.repeat(.any(), as: .content, endAfter: quote)
	).capture(as: .quotedString)
	
	let input = "\"hello world\""
	if let match = quotedString.evaluate(input) {
		print(match.debugDescription)
		// RuleMatch [0..<13]
		// ╰─ QuotedString [0..<13]
		//    ╰─ Content [1..<12]
		
		// Extract just the content without quotes:
		if let content = match.captures.first?.find(.content) {
			print("Content: \(input[content.range])")
			// Content: hello world
		}
	}
}


// MARK: - Finding Identifiers and Keywords

/// Matching programming-language-style identifiers (variable names) and
/// distinguishing them from keywords.
///
/// This introduces `firstMatch` for priority-based selection: keywords
/// are checked first, so `if` is recognized as a keyword, not an identifier.
/// It also shows `notExpecting` to ensure a keyword is not just a prefix
/// of a longer identifier (e.g. `iffy` should be an identifier, not keyword `if`).

func findingIdentifiersAndKeywords() {
	enum Token {
		case identifier, keyword
	}
	
	typealias MyRule = Rule<String,Token>
	
	let letter = MyRule.unicode(.letters)
	let digit  = MyRule.unicode(.decimalDigits)
	let letterOrDigit = MyRule.firstMatch(letter, digit)
	
	// An identifier starts with a letter, followed by letters or digits.
	let identifier = MyRule.sequence(
		letter,
		.zeroOrMore(letterOrDigit)
	)
	
	// A keyword is a specific word that is NOT followed by more letters/digits.
	// Without `notExpecting`, the word "iffy" would match keyword "if".
	func keyword(_ word: String) -> MyRule {
		.sequence(
			.word(word),
			.notExpecting(letterOrDigit)
		).capture(as: .keyword)
	}
	
	// Keywords have priority over identifiers — check them first.
	let token = MyRule.firstMatch(
		keyword("if"),
		keyword("else"),
		keyword("return"),
		identifier.capture(as: .identifier)
	)
	
	for input in ["if", "iffy", "return", "x1"] {
		if let match = token.evaluate(input) {
			//	let name = match.captures.first?.name ?? "?"
			//	print("\(input) → \(name)")
			if let name = match.captures.first?.name {
				print("\(input) → \(name)")
			} else {
				print("\(input) → ?")
			}
		}
	}
	// if → Keyword
	// iffy → Identifier
	// return → Keyword
	// x1 → Identifier
}


// MARK: - Finding Multi-Line Comments

/// Matching C-style block comments: `/* ... */` — where the content can
/// span multiple lines and contain anything except the closing `*/`.
///
/// This shows `notExpecting` combined with `any` inside a repeat loop:
/// "consume any character, as long as it is not the start of `*/`."
/// This is a fundamental pattern for matching delimited content.

func findingMultiLineComments() {
	enum Token {
		case open, close, body, comment
	}
	
	typealias MyRule = Rule<String,Token>

	let open  = MyRule.word("/*")
	let close = MyRule.word("*/")
	
	// Repeat any character as long as `*/` does not follow, then consume `*/`.
	let comment = MyRule.sequence(
		open.capture(as: .open),
		MyRule.repeat(.any(), as: .body, endAfter: close.capture(as: .close))
	).capture(as: .comment)
	
	let input = "/* This is\na comment */"
	if let match = comment.evaluate(input) {
		print(match.debugDescription)
		// RuleMatch [0..<23]
		// ╰─ Comment [0..<23]
		//    ├─ Open [0..<2]
		//    ├─ Body [2..<21]
		//    ╰─ Close [21..<23]
		
		if let body = match.captures.first?.find(.body) {
			print("Body: \(input[body.range])")
			// Body:  This is
			// a comment
		}
	}
}


// MARK: - Finding Key-Value Pairs

/// Matching simple `key = value` pairs separated by newlines —
/// like a configuration file or HTTP headers.
///
/// This combines several techniques: `element(where:)` for custom predicates,
/// `repeat(endBefore:)` to collect characters up to a delimiter, and
/// `oneOrMore` to collect multiple pairs.

func findingKeyValuePairs() {
	enum Token {
		case key, value, pair
	}
	
	typealias MyRule = Rule<String,Token>
	
	let gap     = MyRule.zeroOrMore(.unicode(.init(charactersIn: " \t")))
	let newline = MyRule.word("\n")
	let equals  = MyRule.word("=")
	
	// A key is one or more letters.
	let key = MyRule.oneOrMore(.unicode(.letters)).capture(as: .key)
	
	// A value is everything up to a newline or end of input.
	let value = MyRule.oneOrMore(
		.sequence(.notExpecting(newline), .notExpecting(.endOfSequence()), .any())
	).capture(as: .value)
	
	// A single pair: key = value
	let pair = MyRule.sequence(
		key, gap, equals, gap, value
	).capture(as: .pair)
	
	// Multiple pairs separated by newlines.
	let config = MyRule.sequence(
		pair,
		.zeroOrMore(.sequence(newline, pair))
	)
	
	let input = "host = localhost\nport = 8080\nmode = debug"
	if let match = config.evaluate(input) {
		print(match.debugDescription)
		// RuleMatch [0..<41]
		// ├─ Pair [0..<16]
		// │  ├─ Key [0..<4]
		// │  ╰─ Value [7..<16]
		// ├─ Pair [17..<28]
		// │  ├─ Key [17..<21]
		// │  ╰─ Value [24..<28]
		// ╰─ Pair [29..<41]
		//    ├─ Key [29..<33]
		//    ╰─ Value [36..<41]
		
		// Read the parsed key-value pairs:
		for pairCapture in match.captures {
			if let k = pairCapture.find(.key),
			   let v = pairCapture.find(.value) {
				print("\(input[k.range]) → \(input[v.range])")
			}
		}
		// host → localhost
		// port → 8080
		// mode → debug
	}
}
MathExpression
//
//  MathExpression.swift
//  Universal Pattern Recognizer
//
//  Created by Maximilian Servais on 13.03.26.
//
//  A complete math expression parser built with Rule.
//
//  This example demonstrates how Rule scales from simple patterns to a full
//  recursive-descent parser with operator precedence, functions, error recovery,
//  and capture tree evaluation — all built from the same composable primitives.
//
//  Grammar:
//
//    root        =  gap? (expression | gap | unknown)+
//    expression  =  term (('+' | '-') term)*
//    term        =  factor (('*' | '/') factor)*
//    factor      =  '-'? (number | function | parenthesized)
//    function    =  functionKey '(' expression ')'
//    parenthesized = '(' expression ')'
//    number      =  hexadecimal | decimal | integer
//    hexadecimal =  '#' hexDigits
//    decimal     =  digits '.' digits | '.' digits | digits '.'
//    integer     =  digits
//    unknown     =  any single character (error recovery)
//

import Foundation

enum Token {
	case root, expression, term, factor, function, parenthesized, number, hexadecimal, decimal, integer, unknown
	case negate, plus, minus, multiply, divide
	case sin, cos, sqrt, abs
}

// MARK: - Diagnostics

/// A diagnostic marker at a specific position in the input.
private struct Diagnostic {
	let name: Token
	let startOffset: Int
	let length: Int
}

/// Collects all "Unknown Expression" diagnostics from a capture tree.
private func collectDiagnostics(_ capture: RuleCapture<String,Token>, in source: String) -> [Diagnostic] {
	var results: [Diagnostic] = []
	
	if capture.name == .unknown {
		let startOffset = source.distance(from: source.startIndex, to: capture.range.lowerBound)
		let length = source.distance(from: capture.range.lowerBound, to: capture.range.upperBound)
		results.append(Diagnostic(name: capture.name, startOffset: startOffset, length: length))
	}
	
	for inner in capture.innerCaptures {
		results.append(contentsOf: collectDiagnostics(inner, in: source))
	}
	
	return results
}

/// Prints the input with a visual marker line below any unknown tokens.
///
///     ┌──────────────────────────────────────┐
///     │ Diagnostic: Unknown expression       │
///     │                                      │
///     │ 123+456+yxc+(2*5*zu)                 │
///     │         ^^^      ^^                  │
///     └──────────────────────────────────────┘
private func printDiagnostics(for input: String, capture: RuleCapture<String,Token>) {
	let diagnostics = collectDiagnostics(capture, in: input)
	
	let label = diagnostics.isEmpty
	? "Diagnostic: No issues"
	: "Diagnostic: Unknown expression"
	
	var markerChars = Array(repeating: Character(" "), count: input.count)
	for diag in diagnostics {
		for offset in diag.startOffset..<(diag.startOffset + diag.length) {
			if offset < markerChars.count { markerChars[offset] = "^" }
		}
	}
	let markerLine = String(markerChars)
	
	let contentWidth = max(input.count, label.count)
	let border = String(repeating: "─", count: contentWidth + 2)
	let pad = { (s: String) in s + String(repeating: " ", count: contentWidth - s.count) }
	
	print("┌" + border + "┐")
	print("│ " + pad(label) + " │")
	print("│ " + pad("")    + " │")
	print("│ " + pad(input) + " │")
	if !diagnostics.isEmpty { print("│ " + pad(markerLine) + " │") }
	print("└" + border + "┘")
}


// MARK: - Evaluation

/// Walks the capture tree and computes a numeric result.
///
/// Every decision is based on capture names — the source text is only
/// read for number literals. Operators and functions are identified
/// purely by their capture names.
private func evaluateMath(_ capture: RuleCapture<String,Token>, in source: String) -> Double {
	
	switch capture.name {
		case .root:
			if capture.innerCaptures.count == 1,
			   capture.innerCaptures[0].name == .expression {
				return evaluateMath(capture.innerCaptures[0], in: source)
			}
			return .nan
			
		case .expression:
			var result = evaluateMath(capture.innerCaptures[0], in: source)
			var i = 1
			while i + 1 < capture.innerCaptures.count {
				let op  = capture.innerCaptures[i].name
				let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
				if op == .plus  { result += rhs }
				else if op == .minus { result -= rhs }
				i += 2
			}
			return result
			
		case .term:
			var result = evaluateMath(capture.innerCaptures[0], in: source)
			var i = 1
			while i + 1 < capture.innerCaptures.count {
				let op  = capture.innerCaptures[i].name
				let rhs = evaluateMath(capture.innerCaptures[i + 1], in: source)
				if op == .multiply  { result *= rhs }
				else if op == .divide { result /= rhs }
				i += 2
			}
			return result
			
		case .factor:
			var sign  = 1.0
			var value = 0.0
			for inner in capture.innerCaptures {
				if inner.name == .negate { sign = -1.0 }
				else { value = evaluateMath(inner, in: source) }
			}
			return sign * value
			
		case .integer:
			return Double(String(source[capture.range]))!
			
		case .decimal:
			return Double(String(source[capture.range]))!
			
		case .hexadecimal:
			let hex = String(source[capture.range]).dropFirst() // skip '#'
			return Double(UInt64(hex, radix: 16) ?? 0)
			
		case .parenthesized:
			for inner in capture.innerCaptures {
				if inner.name == .expression { return evaluateMath(inner, in: source) }
			}
			preconditionFailure("Parenthesized expression without inner Expression")
			
		case .function:
			var funcName = Token.unknown
			var argument = 0.0
			for inner in capture.innerCaptures {
				switch inner.name {
					case .sin, .cos, .sqrt, .abs: funcName = inner.name
					case .expression: argument = evaluateMath(inner, in: source)
					default: break
				}
			}
			switch funcName {
				case .sin:  return sin(argument)
				case .cos:  return cos(argument)
				case .sqrt: return sqrt(argument)
				case .abs:  return abs(argument)
				default: preconditionFailure("Unknown function")
			}
			
		case .unknown:
			return .nan
			
		default:
			preconditionFailure("Unknown capture name: \(capture.name)")
	}
}


// MARK: - Demo

/// Runs the math expression parser demo.
///
/// Parses, evaluates, and prints a series of math expressions covering
/// basic arithmetic, operator precedence, parentheses, negation, decimals,
/// hexadecimals, functions, error recovery, and capture tree output.
func mathExpressionDemo() {
	typealias R = Rule<String,Token>
	
	// --- Character classes ---
	
	let unicode16Digit      = R.unicode(.decimalDigits)
	let unicode16Hex        = R.unicode(CharacterSet(charactersIn: "0123456789abcdefABCDEF"))
	//	let unicode16Letter     = R.unicode(.letters)
	let unicode16Whitespace = R.unicode(.whitespaces)
	
	// --- Character sequences ---
	
	let digits       = R.oneOrMore(unicode16Digit)
	let hexadecimals = R.oneOrMore(unicode16Hex)
	let gap          = R.oneOrMore(unicode16Whitespace)
	
	// --- Punctuation ---
	
	let decimalSeparator = R.unicode(".")
	let parenthesisOpen  = R.unicode("(")
	let parenthesisClose = R.unicode(")")
	
	// --- Operators ---
	// Each operator has its own capture name so the evaluation function can
	// identify it without reading the source text.
	// "Negate" is separate from "Minus": Minus is binary (a - b), Negate is unary (-a).
	
	let plus     = R.unicode("+").capture(as: .plus)
	let minus    = R.unicode("-").capture(as: .minus)
	let negate   = R.unicode("-").capture(as: .negate)
	let multiply = R.unicode("*").capture(as: .multiply)
	let divide   = R.unicode("/").capture(as: .divide)
	
	let addOrSubtract    = R.firstMatch(plus, minus)
	let multiplyOrDivide = R.firstMatch(multiply, divide)
	
	// --- Keywords ---
	// Case insensitive: sin, SIN, and Sin all match.
	// longestMatch ensures "sinh" would win over "sin" if added later.
	
	let functionKey = R.longestMatch(
		R.word("sin",  caseInsensitive: true).capture(as: .sin),
		R.word("cos",  caseInsensitive: true).capture(as: .cos),
		R.word("sqrt", caseInsensitive: true).capture(as: .sqrt),
		R.word("abs",  caseInsensitive: true).capture(as: .abs)
	)
	
	// --- Numbers ---
	// longestMatch ensures "3.14" matches as Decimal, not Integer "3".
	
	let integer = digits.capture(as: .integer)
	
	let decimal = R.longestMatch(
		R.sequence(digits, decimalSeparator, digits),
		R.sequence(decimalSeparator, digits),
		R.sequence(digits, decimalSeparator)
	).capture(as: .decimal)
	
	let hexadecimal = R.sequence(R.unicode("#"), hexadecimals).capture(as: .hexadecimal)
	let number      = R.longestMatch(hexadecimal, decimal, integer)
	
	// --- Error recovery ---
	
	let unknownExpression = R.any().capture(as: .unknown)
	
	// --- Grammar ---
	// The RuleLibrary enables recursion: an expression can contain a
	// parenthesized expression, which again contains an expression.
	
	enum GrammarKey { case expression, term, factor }
	let grammar = RuleLibrary<GrammarKey, String, Token>()
	
	let expression = R.rule(.expression, in: grammar)
	let term       = R.rule(.term,       in: grammar)
	let factor     = R.rule(.factor,     in: grammar)
	
	let parenthesizedExpression = R.sequence(
		parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
	).capture(as: .parenthesized)
	
	let function = R.sequence(
		functionKey, parenthesisOpen, gap.optional, expression, gap.optional, parenthesisClose
	).capture(as: .function)
	
	// Factor: optional negation + number, function, or parenthesized expression.
	grammar[.factor] = R.sequence(
		negate.optional, .firstMatch(number, function, parenthesizedExpression)
	).capture(as: .factor)
	
	// Term: factors joined by * or /. Left-to-right: a * b / c = (a * b) / c.
	grammar[.term] = R.sequence(
		factor, .zeroOrMore(.sequence(gap.optional, multiplyOrDivide, gap.optional, factor))
	).capture(as: .term)
	
	// Expression: terms joined by + or -. Left-to-right: a + b - c = (a + b) - c.
	grammar[.expression] = R.sequence(
		term, .zeroOrMore(R.sequence(gap.optional, addOrSubtract, gap.optional, term))
	).capture(as: .expression)
	
	// --- Parse ---
	// The outer loop tries expression first, then skips whitespace, then
	// captures one unknown character as fallback. This ensures the parser
	// never fails — unknown input is visible in the capture tree.
	
	let fullMatch = R.sequence(
		gap.optional,
		.oneOrMore(.firstMatch(expression, gap, unknownExpression)),
		.endOfSequence()
	).capture(as: .root)
	
	func parse(_ input: String) -> RuleCapture<String,Token>? {
		fullMatch.evaluate(input)?.captures.first
	}
	
	// --- Test helper ---
	
	func test(_ input: String, expected: Double) {
		guard let root = parse(input) else {
			print("[FAIL] '\(input)' — parse failed")
			return
		}
		
		let result = evaluateMath(root, in: input)
		
		let passed = expected.isNaN ? result.isNaN : Swift.abs(result - expected) < 0.0001
		
		if passed { print("[PASS] '\(input)' = \(result)") }
		else      { print("[FAIL] '\(input)' = \(result)  (expected \(expected))") }
		
		if !collectDiagnostics(root, in: input).isEmpty {
			printDiagnostics(for: input, capture: root)
		}
	}
	
	// --- Tests ---
	
	print("--- Basic arithmetic ---\n")
	test("1 + 2",               expected: 3)
	test("10 - 3",              expected: 7)
	test("2 * 3",               expected: 6)
	test("10 / 4",              expected: 2.5)
	
	print("\n--- Operator precedence ---\n")
	test("2 + 3 * 4",           expected: 14)
	test("10 - 2 * 3",          expected: 4)
	test("2 * 3 + 4 * 5",       expected: 26)
	
	print("\n--- Parentheses ---\n")
	test("(2 + 3) * 4",                    expected: 20)
	test("((1 + 2))",                      expected: 3)
	test("(2 + 3) * (4 - 1)",              expected: 15)
	test("(3 + ((2 + 6) / 4)) * (6 - 1)",  expected: 25)
	
	print("\n--- Negation ---\n")
	test("-5",                  expected: -5)
	test("-5 + 10",             expected: 5)
	test("-(2 + 3)",            expected: -5)
	test("-2 * -3",             expected: 6)
	
	print("\n--- Integers ---\n")
	test("42",                  expected: 42)
	test("100 + 200",           expected: 300)
	
	print("\n--- Decimals ---\n")
	test("3.14",                expected: 3.14)
	test("0.5 + 0.5",           expected: 1.0)
	test("2.5 * 4",             expected: 10.0)
	test(".5 + .5",             expected: 1.0)
	test("3. + .14",            expected: 3.14)
	
	print("\n--- Hexadecimal ---\n")
	test("#FF",                 expected: 255)
	test("#10 + #10",           expected: 32)
	test("#A * 2",              expected: 20)
	
	print("\n--- Functions ---\n")
	test("sqrt(9)",             expected: 3.0)
	test("abs(0 - 5)",          expected: 5.0)
	test("sqrt(3 * 3 + 4 * 4)", expected: 5.0)
	test("cos(0)",              expected: 1.0)
	test("SIN(0)",              expected: 0.0)
	test("Sqrt(16)",            expected: 4.0)
	
	print("\n--- Unknown tokens ---\n")
	test("123+456+yxc+(2*5*zu)", expected: .nan)
	
	print("\n--- Capture tree ---\n")
	let example = "2 + 3 * ( 4 + 1 )"
	if let root = parse(example) {
		print("'\(example)':\n")
		print(root.debugDescription)
		
		// Root [0..<18]
		// ╰─ Expression [0..<18]
		//    ├─ Term [0..<1]
		//    │  ╰─ Factor [0..<1]
		//    │     ╰─ Integer [0..<1]
		//    ├─ Plus [2..<3]
		//    ╰─ Term [4..<18]
		//       ├─ Factor [4..<5]
		//       │  ╰─ Integer [4..<5]
		//       ├─ Multiply [6..<7]
		//       ╰─ Factor [8..<18]
		//          ╰─ ParenthesizedExpression [8..<18]
		//             ╰─ Expression [10..<15]
		//                ├─ Term [10..<11]
		//                │  ╰─ Factor [10..<11]
		//                │     ╰─ Integer [10..<11]
		//                ├─ Plus [12..<13]
		//                ╰─ Term [14..<15]
		//                   ╰─ Factor [14..<15]
		//                      ╰─ Integer [14..<15]
		
		print("\nResult: \(evaluateMath(root, in: example))")
		// Result: 17.0
	}
}
1 Like