SE-0351: Regex Builder DSL

Appending to this review thread (apologies, not sure where best to post this):

Working out this example in another review:

let kind = Reference(Substring.self)  // 🫤
let date = Reference(Substring.self)
let account = Reference(Substring.self)
let amount = Reference(Substring.self)

let regex = Regex {
  // Match a line of the format e.g "DEBIT  03/03/2022  Totally Legit Shell Corp  $2,000,000.00"
  let fieldBreak = /\s\s+/
  Capture(/\w+/,               as: kind);    fieldBreak
  Capture(/\S+/,               as: date);    fieldBreak
  Capture(/(?: (?!\s\s) . )+/, as: account); fieldBreak  // Note that account names may contain spaces.
  Capture(/.*/,                as: amount)
}

…made me realize that the builder DSL presents an ugly tradeoff:

  1. either tolerate verbose repetition of Reference(Substring.self), or
  2. do away with the legibility and safety of named captures.

Positional captures are something we’d want to discourage: they not only harm readability, but make code more brittle because of the danger of position errors (i.e. capture 2 became capture 3, code far below still assume it is 2 and is now broken, oops).


It would be nice if Substring were the default Capture for a no-args Reference initializer:

let kind = Reference()  // 😐 better
let date = Reference()
let account = Reference()
let amount = Reference()

Perhaps it would even be worth providing a method overloaded for various tuple sizes that returns a tuple of many references of the same type, and makes Substring the default type via overloading:

let (kind, date, account, amount) = Regex.references()   // 🙂 not so bad! (all Substring captures)

let (startDate, endDate) = Regex.references(Date.self)  // multiple references of a different type

Something along these lines could make named captures in builder DSL regexes much more tolerable.

Something to consider in the revision?

4 Likes