SE-0354 (Second Review): Regex Literals

I am a long-time regex user, and have happily made use in Ruby of multi-line regex literals in extended mode (mostly for the ability to add comments). As such, I'm sympathetic this line of argument. However, I don’t find at as open-and-shut as this.

As someone who has both seen and authored multiline extended mode regexes in the wild, this feels like a leap to me too. I think it is plausible — likely, even — that general usage ends up following this pattern in practice:

…especially given the ability to mix small, single-line regex literals into a larger builder superstructure, as @AliSoftware discussed.


Extended mode regex literals are likely to end up being one of those things that is present in the language, but rarely used. A few adherents use them and love them; most teams discourage them with style guides and linters and social consensus, if they are aware of them at all.

(Please note, again, that their utility in other languages is not evidence that they will be equally useful in Swift, given the presence of features those other languages do not have.)

If this speculation about usage is correct, then in that hypothetical world, extended mode multiline regexes take on one of two roles:

  1. They might be a curious side pocket, like differentiable functions or #dsohandle or @usableFromInline: valuable to a few people, occasionally vexing to those who encounter them in the wild without context, but essentially invisible and harmless to all but those few developers actively seeking the feature.
  2. They might be a legacy albatross, like AnyObject dispatch: something that adds language surface so as to trip up people not even intending to use the feature, and adds bulk and bugs and corner cases that increase the maintenance burden of the language, but is impossible to remove because too much code relies on it.

Something like @xwu’s suggestion of an explicit (?x) mitigates the “accidental syntax” risk of scenario 2, but increases the “maintenance burden” part of that scenario. All the wrangling over parsing and whitespace upstream here is dancing with similar tradeoffs.

Ultimately, I trust the judgement of the core team and the other language maintainers on whether we’re risking scenario 1 or 2 here. It is worth pointing out, however, that they don’t need to make that decision stuck with nothing better the speculation in this review thread:

The new information we could acquire is data to replace speculation about actual usage in practice. That’s not nothing.


A question for @Michael_Ilseman:

What features, precisely, are “currently inexpressible in the builders” if builders can contain single-line, non-extended-mode regex literals?

The only one I’m aware of is named captures becoming tuple labels. That is a serious builder shortcoming, and one I would certainly hope to see addressed (or at least mitigated!) regardless of the availability of extended mode.

Are there any other features that would be inexpressible without extended mode? That seems important.

9 Likes