SE-0354: Regex Literals

Douglas_Gregor · May 2, 2022, 10:26pm

I'm +1 on the proposal as written. Lots more commentary below.

This is really the fundamental question for this proposal. Aside from two minor things called out in the proposal (multiple layers of optionality, named captures), everything this proposal does is expressible via regex builders. Regex builders are clear and expressive, and I can absolutely see myself wanting to use them for complicated regular expressions. But they are really quite verbose.

We also need runtime construction of regexes, because people will want to take regexes as inputs. The syntax for these is effectively settled outside of Swift, so arguments that regex syntax is bad and therefore we shouldn't do anything but regex builders in Swift don't make sense to me. Now, these regexes are quite concise, sometimes to a fault (especially for big regexes), but for simple matches they are great, and online references for regexes are plentiful.

That leaves a gap between "type safe and expressive but verbose" and "not type safe but concise", and we don't want to make this a choice between "verbose" and "not type safe." Hence, this proposal to add regex literals as the in-between that is both concise and type-safe. Starting from the concise runtime regex syntax and giving it strong type information is absolutely the right approach to fill that gap.

Really, the only point of discussion here is the delimiters. The proposal suggests /.../, which is precedented in Perl, JavaScript, and Ruby, as well as command-line tools like sed. /.../ also extends out to raw literals #/...#/ and multi-line literals in a natural way, echoing raw string literals.

As for alternatives, there aren't that many that make sense. Most-discussed here is #regex(...). It does have the advantage of implying that the result will be a Regex. It's not eliminating the need to bake regex syntax into the language, or making regex syntax easier to understand. #regex(...) doesn't fit well with others things that share its syntactic form, like #selector(...) or #available(...), because there the ... is always delimiter-balanced, which regex literals aren't. The suggestion for #regex/.../ sorta addresses that, but now it's even more different from #selector(...) et al. And unlike the proposed /.../ syntax, #regex(...) also doesn't adapt to raw and multi-line literals as well: would we use #regex#(...#)? #regex(#/...#/)?

If not for the source-compatibility issue with /.../, I don't think we would be discussing #regex(...). Aside from "it has regex in the name", it's worse than the proposed /.../ in almost every way. And we don't really have other great alternatives on the table.

So, let's talk about source compatibility.

Swift tries to be forward-looking: we decide where we want to be, then figure out how to get there, and when.

Sometimes there's no way to get there, and we have to go back and try a new design. The /.../ syntax is not such a huge source break that we cannot ever get there. We're working toward Swift 6, which has already queued up some source-breaking changes (for trailing closures, any, #file). Against that backdrop, it is easy to justify the narrow source break that comes from /.../ literals. So if we're to argue against /.../ due to the source-compatibility issues, at most it's an argument not to permit /.../ in Swift 5.x mode. It's not an argument for another, second-best syntax.

We've also made source-breaking changes within Swift 5.x. The introduction of the await keyword just this last year was source-breaking, because one could previously have defined a function named await and called it with await(1, 2). That didn't prevent us from taking the syntax we wanted for async/await, even though (based on the fixes I ended up doing personally), I suspect it caused more failures in practice than the 16/2968 failures reported for /.../. We didn't even stage that change in with a compiler flag; just a warning in Swift 5.4 that said "this is going to break" before we broke it in Swift 5.5 six months later.

The proposed /.../ source break is gentler than what we did with await, because it's under the control of a flag. In the run-up to Swift 6, we should be embracing this approach wholeheartedly, such that each Swift 6 source break has a flag associated with it so folks can nudge their Swift 5.x code along toward Swift 6 incrementally, gaining the benefits that came with each of these changes. I have a design in mind for this that I'll bring up in another discussion.

Swift's source stability has gotten massively better since the turbulent days of Swift 1-4, but it's not an absolute. I hope it never becomes an absolute, because that would lead us into bad long-term decisions.

I consider the intense focus on source compatibility in this review to be overblown. If #regex(...) is the better syntax, argue that without reference to source compatibility, and give it the same level of in-depth design that the authors have provided for /.../. I've thought about #regex(...) and found enough holes in the design (noted above) that it's a very distance second choice to me.

Doug