While I'm lightly (but not passionately) in favor of respecting precedent from other languages and making /…/
parse, I'm really uncomfortable with making #/…/#
the only regex literal syntax. It's just…uuuuugly.
Regexes are already a nasty symbol soup, and more noise doesn't help readability. You may not love this:
/[+-]?\d(\.\d+)/
…but you'll have a hard time convincing me that this is an improvement:
#/[+-]?\d(\.\d+)/#
I can just imagine explaining that to students: “No, no, both #
and /
are delimiters, whereas all those other symbols are part of the Swift syntax….” Regex literal syntax is daunting enough as it is.
(An aside: several comments mention #/…#/
instead of #/…/#
. Surely you mean the latter, not the unbalanced former?! Keep in mind that the Swift extended string syntax is #"…"#
. That is a syntactic precedent to strictly respect.)
The extra noise is especially bothersome given the proposal’s appealing idea that regex literals might be used to concisely express small molecules in a larger DSL-based regex. In such usage, regex literals are small, and the signal-to-noise reduction of the bigger delimiter is thus significant. The version on the right is a significant regression to my eye:
let regex = Regex { let regex = Regex {
Capture { /[$£]/ } Capture { #/[$£]/# }
TryCapture { TryCapture {
/\d+/ #/\d+/#
"." "."
/\d{2}/ #/\d{2}/#
} transform: { } transform: {
Amount(twoDecimalPlaces: $0) Amount(twoDecimalPlaces: $0)
} }
} }
Bleah. If we're introducing that much noise, let’s at least make it expressive noise (keeping in mind Doug’s concerns upthread about #regex(…)
, which are compelling):
let regex = Regex {
Capture { #re"[$£]" }
TryCapture {
#re"\d+"
"."
#re"\d{2}"
} transform: {
Amount(twoDecimalPlaces: $0)
}
}
Again, I am personally in favor of allowing /…/
. Parsing concerns seem more fear-based than evidence-based. Some syntactic familiarity is aways nice to offer language newcomers. But if community sentiment runs squarely against it, I’d argue in favor of a new pitch thread devoted to finding a better delimiter than #/…/#
. That just seems to me like syntactic salt (or maybe syntactic Bitrex) against using regex literals at all.
While syntax always steals the show, I'd like to reiterate my concerns over some other things that IMO deserve a little more discussion than they've received:
-
Mismatch in optionality between literals and the DSL for nested capture groups:
/(.)*|\d/
→ match type of
(Substring, Substring?)
…but IIUC…
ChoiceOf { // supposed to be equiv to above; don't know DSL well yet; making up details Capture { ZeroOrMore(.any) } .digit }
→ match type of
(Substring, Substring??)
-
The hidden change in the meaning of all whitespace when
#/…/#
contains a newline:#/ (foo|bar)(d|f|t) /#
matches
" foot "
…but IIUC…
#/ (foo|bar) (d|f|t) /#
does not match
" foot "
…which seems to me like a footgun.
Edit: whitespace in the middle makes an even more compelling example of this problem:
#/hello world/#
matches
"hello world"
#/ hello world /#
does not match
"hello world"