Appending to this review thread (apologies, not sure where best to post this):
Working out this example in another review:
let kind = Reference(Substring.self) // 🫤
let date = Reference(Substring.self)
let account = Reference(Substring.self)
let amount = Reference(Substring.self)
let regex = Regex {
// Match a line of the format e.g "DEBIT 03/03/2022 Totally Legit Shell Corp $2,000,000.00"
let fieldBreak = /\s\s+/
Capture(/\w+/, as: kind); fieldBreak
Capture(/\S+/, as: date); fieldBreak
Capture(/(?: (?!\s\s) . )+/, as: account); fieldBreak // Note that account names may contain spaces.
Capture(/.*/, as: amount)
}
…made me realize that the builder DSL presents an ugly tradeoff:
- either tolerate verbose repetition of
Reference(Substring.self), or - do away with the legibility and safety of named captures.
Positional captures are something we’d want to discourage: they not only harm readability, but make code more brittle because of the danger of position errors (i.e. capture 2 became capture 3, code far below still assume it is 2 and is now broken, oops).
It would be nice if Substring were the default Capture for a no-args Reference initializer:
let kind = Reference() // 😐 better
let date = Reference()
let account = Reference()
let amount = Reference()
Perhaps it would even be worth providing a method overloaded for various tuple sizes that returns a tuple of many references of the same type, and makes Substring the default type via overloading:
let (kind, date, account, amount) = Regex.references() // 🙂 not so bad! (all Substring captures)
let (startDate, endDate) = Regex.references(Date.self) // multiple references of a different type
Something along these lines could make named captures in builder DSL regexes much more tolerable.
Something to consider in the revision?