You'd need to remove the whitespace inside those regexes, so you'd have:
let kind = Reference(Substring.self)
let date = Reference(Substring.self)
let account = Reference(Substring.self)
let amount = Reference(Substring.self)
let regex = Regex {
// Match a line of the format e.g "DEBIT 03/03/2022 Totally Legit Shell Corp $2,000,000.00"
let fieldBreak = /\s\s+/
Capture(/\w+/, as: kind); fieldBreak
Capture(/\S+/, as: date); fieldBreak
Capture(/(?:(?!\s\s).)+/, as: account); fieldBreak // Note that account names may contain spaces.
Capture(/.*/, as: amount)
}
Are you envisioning the scenario where a multi-line regex treats contained newlines as verbatim content, or would they be outright forbidden?
Similarly, what does a newline in a literal with semantic whitespace entail? Verbatim treatment or error? What about spaces around the next line?
Syntactic options are a little different in practice than semantic options, even though they use the same mechanism in traditional regex syntax. (Regex syntax conflates things that the builders treat orthogonally or via API).
The i
would preferably be spelled as regex.ignoresCase()
, which extends well to structured builders. E.g., string literals are verbatim by default, but you could add that to ignore case for just that component. (Assuming we want the API directly on String
, otherwise it might be spelled "literal content".regex.ignoresCase()
)
Ignoring whitespace could be a modifier, but that implies a semantic change. E.g., it seems like /abc/.ignoringWhitespace()
intends to match the input "a b\r\nc"
.
This is in [Pitch] Unicode for String Processing
@nnnnnnnn any update or thoughts here?