SE-0354 (Second Review): Regex Literals

It occurs to me that another line of argument is that Swift simply should not support extended mode at all. Once again, I am musing, not necessarily advocating. The argument is that the concise literal syntax is best for short regexes, any regex that does not fit on a single line should use the builder DSL to break it into multiple lines.

Wondering how this plays out, I tried translating @hamishknight’s example from above:

…into a builder DSL expression with a similar spirit of formatting:

let kind = Reference(Substring.self)
let date = Reference(Substring.self)
let account = Reference(Substring.self)
let amount = Reference(Substring.self)

let regex = Regex {
  // Match a line of the format e.g "DEBIT  03/03/2022  Totally Legit Shell Corp  $2,000,000.00"
  let fieldBreak = /\s\s+/
  Capture(/\w+/,               as: kind);    fieldBreak
  Capture(/\S+/,               as: date);    fieldBreak
  Capture(/(?: (?!\s\s) . )+/, as: account); fieldBreak  // Note that account names may contain spaces.
  Capture(/.*/,                as: amount)
}

Is that compelling enough to dispense with extended mode altogether? I’m not sure.

The repetition of Reference(Substring.self) is certainly unsatisfying, and makes me wish again for the DSL to support named capture groups as tuple labels to parallel the behavior of literals. (One day, hopefully!)

If we’re willing to dispense with the clarity and safety of named capture groups, the DSL builder version isn't such a bad alternative to extended mode:

let regex = Regex {
  // Match a line of the format e.g "DEBIT  03/03/2022  Totally Legit Shell Corp  $2,000,000.00"
  let fieldBreak = /\s\s+/
  Capture(/\w+/); fieldBreak             // kind
  Capture(/\S+/); fieldBreak             // date
  Capture(/(?:(?!\s\s).)+/); fieldBreak  // account (Note that account names may contain spaces.)
  Capture(/.*/)                          // amount
}

I’d say that the builder is an improvement for my own multiline example from above, although it's probably less representative of common usage than Hamish’s example:

 #/
     (
         hello        # morning
         |
         good night   # evening  (this and only this space character is preserved)
     )
     (
         ,\s+
         every
         (body|one)
     )?
/#
Regex {
	ChoiceOf {
		"hello"       # morning
		"good night"  # evening  (no special handling of space character necessary)
	}
	Optionally {
		/,\s+/
		"every"
		/body|one/
	}
}

Perhaps multiline / extended mode won’t pull its weight as a feature in Swift? I’m not sure I’ve convinced myself here, but it’s worth considering the question.

12 Likes