I have been working on my own String Pattern Matching Library (SPML) since the Swift 2 / 3 days (for a book project I've been working on for as long, as well). When Swift 5.7 introduced RegEx Builder, I was happy to abandon that effort to use the more idiomatic RegEx Builder DSL for the book.
However, there were a few things about the SPML that I really liked, such as the ability to handle context free languages (as well as regular languages) and the easy ability to insert semantic actions (code) within the grammars. So recently I dusted off the SPML and started playing around with it again.
One of my first questions concerning the performance of RegEx Builder versus SPML. So I created a simple little benchmark to compare the two. Here's the RegEx Builder expression I used:
let logLine = Regex
{
Repeat(copies...copies)
{
Anchor.startOfLine
"["
Repeat(.digit, count: 4)
"-"
Repeat(.digit, count: 2)
"-"
Repeat(.digit, count: 2)
One(.whitespace)
Repeat(.digit, count: 2)
":"
Repeat(.digit, count: 2)
":"
Repeat(.digit, count: 2)
"."
Repeat(.digit, count: 3)
"] ["
ChoiceOf { "TRACE"; "DEBUG"; "INFO"; "WARN"; "ERROR" }
"] ["
OneOrMore(.word)
ZeroOrMore { "."; OneOrMore(.word) }
"] ("
OneOrMore(.digit)
")"
OneOrMore(.whitespace)
ZeroOrMore(.any, .reluctant)
"\n"
Anchor.endOfLine
}
}
Executing this 150,000 times took around 1,200 seconds. That is, to put it nicely, a lot of time. (note that this did not include the time to build the RegEx value; I excluded that from the benchmark because the RegEx initializer is notoriously slow).
I fed this benchmark eight lines of text for each iteration (that is, "copies" had the value 8) and supplied eight lines of text that this expression would successfully match. Here is a typical line from the corpus ("body of text") string:
[2025-09-01 12:34:56.789] [TRACE] [net.http] (1442) anna.smith+test@example.com | https://example.com | {"id": 42, "name": "Widget", "price": 12.50, "tags": ["swift","regex","benchmark"]} | let foo = 123; let bar = foo + 456 // sum | This is is a test with a repeated word word for backreference. | 0.0.0.0
There are eight of these lines in the corpus string, based on the parameters I used when running the benchmark code.
I'm wondering if I haven't done something stupid in this regular expression that is causing backtracking to kick in and create a massive slowdown. To put things in perspective, the following SPML code executes in 6.5 seconds:
let result =
p.match( corpus )
{
p.nPat_fast(
n:copies,
block:{ tail in
// Date/Time:
p.c( "[" ){
p.nADigit( n:4 ){
p.c( "-" ){
p.nADigit( n:2 ){
p.c( "-" ){
p.nADigit( n:2 ){
p.ws(){
p.nADigit( n:2 ){
p.c( ":" ){
p.nADigit( n:2 ){
p.c( ":" ){
p.nADigit( n:2 ){
p.c( "." ){
p.nADigit( n:3 ){
p.s( "] [" ){
// Level:
p.choiceOf(
[
{tail in p.s("TRACE", tail)},
{tail in p.s("DEBUG", tail)},
{tail in p.s("INFO", tail)},
{tail in p.s("WARN", tail)},
{tail in p.s("ERROR", tail)}
]
)
{
p.s( "] [" ){
// Module:
p.oneOrMoreWordChar(){
p.zeroOrMorePat(
block:
{ tail in
p.s(".")
{
p.oneOrMoreWordChar( tail )
}
}
){
p.s("] (" ){
p.nADigit( n:4 ){
p.c( ")" ){
p.oneOrMoreWS(){
p.thruChar( "\n" ){
tail()
}}}}}}}}}}}}}}}}}}}}}}}}
} // block
) // nPat
} // Match
While my ego would love to believe that my astounding coding skills produced something that ran over two orders of magnitude faster than RegEx Builder, common sense suggests that there is something else wrong here.
FWIW, here's the full RegEx Builder benchmark program (the SPML variant is roughly the same except for the actual regular expression code):
// BenchRegex.swift
import Foundation
import RegexBuilder
// MARK: - Utilities
struct BenchResult
{
let name : String
let iterations: Int
let duration : Duration
let matches : Int
}
@inline(__always)
func time<T>(label name:String, iterations: Int = 25, _ body: () -> T) -> (result: T, elapsed: Duration)
{
let clock = ContinuousClock()
let start = clock.now
let result = body()
let end = clock.now
return (result, end - start)
}
@inline(__always)
func runTimed(label name:String, iterations: Int = 25, _ body: () -> Int) -> BenchResult
{
let (_, elapsed) = time(label: name, iterations: iterations)
{
var total = 0
for _ in 0..<iterations
{
total &+= body()
}
return total
}
// One extra pass to report “per-run” match count:
let totalMatches = body()
return BenchResult(name: name, iterations: iterations, duration: elapsed, matches: totalMatches)
}
@inline(__always)
func fmt(_ d: Duration) -> String
{
let c = d.components
let seconds = Double(c.seconds) + Double(c.attoseconds) / 1e18
return String(format: "%.2f ms", seconds * 1000.0)
}
// MARK: - Corpus Generator
func makeCorpus( _ copies:Int ) -> String
{
let levels = ["TRACE","DEBUG","INFO","WARN","ERROR"]
let modules = ["net.http","auth.core","db.pg","ui.kit","regex.engine","scheduler"]
let emails = [
"anna.smith+test@example.com",
"john-doe@example.co.uk",
"δοκιμή+αλφα@παράδειγμα.gr",
"bad..dots@example.com",
"simple@example.org",
"x@y.z"
]
let urls = [
"https://example.com",
"http://localhost:8080/path/to/resource?x=1&y=2#frag",
"ftp://192.168.0.1/pub/file.tar.gz",
"https://sub.domain.example.io:443/a/b/c?q=swift%20regex",
"ws://[::1]/socket"
]
let jsons = [
#"{"id": 42, "name": "Widget", "price": 12.50, "tags": ["swift","regex","benchmark"]}"#,
#"{"ok": true, "msg": "hello \"world\"", "pi": 3.14159}"#,
#"{"bad": [1,2,,3], "x": }"#
]
let code = [
"let foo = 123; let bar = foo + 456 // sum",
"func greet(_ name: String) -> String { return \"Hello, \\(name)!\" }",
"struct Point { var x: Double; var y: Double }",
"if a == b && c != d || e <= f { print(\"ok\") }"
]
let text = [
"This is is a test with a repeated word word for backreference.",
"Sphinx of black quartz, judge my vow. 12345 abcde Áéîøü",
"P@ssw0rdPolicyMustHaveMix and length >= 12 with symbols!"
]
let dates = [
"2025-09-01 12:34:56.789",
"1970-01-01 00:00:00.000",
"2000-02-29 23:59:59.123"
]
let ipv4s = [
"0.0.0.0","10.0.0.1","127.0.0.1","192.168.100.200","256.12.12.12","12.34.56.078"
]
var parts: [String] = []
var lvlNdx = 0
var modNdx = 0
var dateNdx = 0
var emailNdx = 0
var urlNdx = 0
var jsonNdx = 0
var codeNdx = 0
var textNdx = 0
var ipv4sNdx = 0
for _ in 0..<copies
{
let lvl = levels[lvlNdx]
lvlNdx = (lvlNdx + 1) % levels.count
let mod = modules[modNdx]
modNdx = (modNdx + 1) % modules.count
let dt = dates[dateNdx]
dateNdx = (dateNdx + 1) % dates.count
let msg = [
emails[emailNdx],
urls[urlNdx],
jsons[jsonNdx],
code[codeNdx],
text[textNdx],
ipv4s[ipv4sNdx]
].joined(separator: " | ")
emailNdx = (emailNdx + 1) % emails.count
urlNdx = (urlNdx + 1) % urls.count
jsonNdx = (jsonNdx + 1) % jsons.count
codeNdx = (codeNdx + 1) % code.count
textNdx = (textNdx + 1) % text.count
ipv4sNdx = (ipv4sNdx + 1) % ipv4s.count
parts.append("[\(dt)] [\(lvl)] [\(mod)] (\(Int.random(in: 1000...9999))) \(msg)")
}
return parts.joined(separator: "\n") + "\n"
}
// MARK: - Benchmark Runner
func main()
{
let copies = 8
let iterations = 150000 // bump for longer timing windows
let corpus = makeCorpus( copies )
print("Corpus size: \(corpus.utf8.count) bytes")
print("Iterations per regex: \(iterations)\n")
var results: [BenchResult] = []
let logLine = Regex
{
Repeat(copies...copies)
{
Anchor.startOfLine
"["
Repeat(.digit, count: 4)
"-"
Repeat(.digit, count: 2)
"-"
Repeat(.digit, count: 2)
One(.whitespace)
Repeat(.digit, count: 2)
":"
Repeat(.digit, count: 2)
":"
Repeat(.digit, count: 2)
"."
Repeat(.digit, count: 3)
"] ["
ChoiceOf { "TRACE"; "DEBUG"; "INFO"; "WARN"; "ERROR" }
"] ["
OneOrMore(.word)
ZeroOrMore { "."; OneOrMore(.word) }
"] ("
OneOrMore(.digit)
")"
OneOrMore(.whitespace)
ZeroOrMore(.any, .reluctant)
"\n"
Anchor.endOfLine
}
}
let r = runTimed(label: "log", iterations: iterations)
{
var count = 0
for _ in corpus.matches(of: logLine)
{
count &+= 1
}
return count
}
results.append®
print("log", "matches: \(r.matches)", "time: \(r.duration)" )
let totalSeconds = results.reduce(0.0)
{ acc, r in
let c = r.duration.components
return acc + Double(c.seconds) + Double(c.attoseconds) / 1e18
}
//let sTemp = String(format: "%.3f s", totalSeconds)
print("\nTotal wall time (sum of individual regex runs): \(totalSeconds)")
}
main()