SE-0354: Regex Literals

ensan-hcl · May 2, 2022, 6:01am

First, thank you for address my comment in the proposal. I appreciate it.

However, since now you've clearly stated that regex literals will be equipped with named captures, I must put -1 on the whole proposal. I prefer not to introduce regex literals after all, rather than supporting attractive feature only with horrible syntax.

Without regex literals, we still have Regex DSL and String based API. I believe that's enough, and I also think they can be even better than regex literals; they accept interpolation and concatenation, which regex literals do not support. I think it would be good to see for a while how inconvenient it is with just the String based API and the Regex DSL.

Panajev · May 2, 2022, 6:27am

Agree with this, it has not been convincingly explained why the familiarity of a bare syntax like /…/ vs regex() or one of the other alternatives provided warrants big changes in the language and implementation complexity.

If we were to have almost 100% JavaScript or almost 100% Perl regex syntax compatibility I could see, but if we deviate a few inches that argument frankly goes out of the window.

Panajev · May 2, 2022, 6:39am

Sorry John, but in this case you are talking around the point made in the post you replied to.

The proposal author can also focus on the substance of the proposal and use a different solution for the delimiter like #/…/# that avoids the complexity of the /…/ syntax it spends time talking about. It is the “the delimiter is not the important of the proposal stop focusing on it” put against the “No, the delimiter must stay what it is, consequences and all” that seems odd, it is becoming a Schrödinger cat like argument, the delimiter is important and not important at the same time based on the argument we want to make no offence.

s-k · May 2, 2022, 7:33am

While I am OK with adding a Regex literal to Swift, I am strongly against using the /.../ syntax for the following reasons:

I think it could be very hard for people new to programming (in Swift) to know what they are looking at when coming across such a literal. This syntax also isn't very googleable.
I strongly prefer simple solutions over easy ones. It may be easy for someone coming from another language to recognize such literals. (However, I would argue that (at most) 5 minutes of googling for something like #regex would clear up the situation.) On the other hand, judging from the proposal, this syntax brings so much complexity to the compiler that it is not worth it.
The only reason given in the proposal for this syntax is: "Other languages do it this way." That is, in my opinion, a very weak reason for the drawbacks it brings. And again: In 5 minutes, anyone can get accustomed to another syntax. Swift has, on many occasions, opted to drop a bad design used in many other languages.
While it is not stated in the proposal, maybe one reason for proposing this syntax is its brevity. While I don't see this as a convincing reason, adding one more character (for the syntax r"...") would alleviate most of the drawbacks.
I am a professional programmer that reasonably often works with string processing. However, I use Regexes only about once a year or even less frequently. Also, Swift has thrived for the past eight years without a Regex literal. To me, this signifies that they are not vitally important and are not worthy bringing so much complexity to the language.

michelf · May 2, 2022, 10:08am

On the topic of syntax highlighting and /.../, I suppose one issue will be the lack of context. For a while we'll have two dialects (Swift 5 and 6) with different highlighting rules. In general syntax highlighters don't have access to compiler flags, so they'll just have to pick one and mis-highlight the rest. In some cases I suppose users could tag code snippets as swift5 or swift6, but that won't work when the language is derived from the file extension.

Maybe the whole issue is blown out of proportion and it does not really matter if files are occasionally mis-highlighted. But whenever it happens, it reflects badly on whatever tool you're using, and to some extent the language itself.

masters3d · May 2, 2022, 3:41pm

There is a precedence of syntax that starts with # for raw, multiline strings. Could we have an assessment of what are the disadvantages if we only support the extended regex literal syntax with #? To me the bare / is syntax sugar for the extended version.

Perhaps pun the bare / literal to future proposal so we can make forward progress.

Nevin · May 2, 2022, 4:06pm

I do not have the time or inclination to go through all these regex proposals in detail.

However, I am deeply, deeply opposed to Perl-style regular expression. They are fundamentally incompatible with Swift’s goal of clarity at the point of use.

Swift is an opinionated language, and it has chosen clarity. Perl-style regex literals are antithetical to that. We cannot, must not, absolutely should not sacrifice Swift’s clarity for them.

Perl-style regular expressions are a terrible syntax. They are illegible, incomprehensible, and aggressively unwelcoming. We need to proactively and vigilantly ensure that they do not enter the Swift language.

Douglas_Gregor · May 2, 2022, 10:26pm

I'm +1 on the proposal as written. Lots more commentary below.

This is really the fundamental question for this proposal. Aside from two minor things called out in the proposal (multiple layers of optionality, named captures), everything this proposal does is expressible via regex builders. Regex builders are clear and expressive, and I can absolutely see myself wanting to use them for complicated regular expressions. But they are really quite verbose.

We also need runtime construction of regexes, because people will want to take regexes as inputs. The syntax for these is effectively settled outside of Swift, so arguments that regex syntax is bad and therefore we shouldn't do anything but regex builders in Swift don't make sense to me. Now, these regexes are quite concise, sometimes to a fault (especially for big regexes), but for simple matches they are great, and online references for regexes are plentiful.

That leaves a gap between "type safe and expressive but verbose" and "not type safe but concise", and we don't want to make this a choice between "verbose" and "not type safe." Hence, this proposal to add regex literals as the in-between that is both concise and type-safe. Starting from the concise runtime regex syntax and giving it strong type information is absolutely the right approach to fill that gap.

Really, the only point of discussion here is the delimiters. The proposal suggests /.../, which is precedented in Perl, JavaScript, and Ruby, as well as command-line tools like sed. /.../ also extends out to raw literals #/...#/ and multi-line literals in a natural way, echoing raw string literals.

As for alternatives, there aren't that many that make sense. Most-discussed here is #regex(...). It does have the advantage of implying that the result will be a Regex. It's not eliminating the need to bake regex syntax into the language, or making regex syntax easier to understand. #regex(...) doesn't fit well with others things that share its syntactic form, like #selector(...) or #available(...), because there the ... is always delimiter-balanced, which regex literals aren't. The suggestion for #regex/.../ sorta addresses that, but now it's even more different from #selector(...) et al. And unlike the proposed /.../ syntax, #regex(...) also doesn't adapt to raw and multi-line literals as well: would we use #regex#(...#)? #regex(#/...#/)?

If not for the source-compatibility issue with /.../, I don't think we would be discussing #regex(...). Aside from "it has regex in the name", it's worse than the proposed /.../ in almost every way. And we don't really have other great alternatives on the table.

So, let's talk about source compatibility.

Swift tries to be forward-looking: we decide where we want to be, then figure out how to get there, and when.

Sometimes there's no way to get there, and we have to go back and try a new design. The /.../ syntax is not such a huge source break that we cannot ever get there. We're working toward Swift 6, which has already queued up some source-breaking changes (for trailing closures, any, #file). Against that backdrop, it is easy to justify the narrow source break that comes from /.../ literals. So if we're to argue against /.../ due to the source-compatibility issues, at most it's an argument not to permit /.../ in Swift 5.x mode. It's not an argument for another, second-best syntax.

We've also made source-breaking changes within Swift 5.x. The introduction of the await keyword just this last year was source-breaking, because one could previously have defined a function named await and called it with await(1, 2). That didn't prevent us from taking the syntax we wanted for async/await, even though (based on the fixes I ended up doing personally), I suspect it caused more failures in practice than the 16/2968 failures reported for /.../. We didn't even stage that change in with a compiler flag; just a warning in Swift 5.4 that said "this is going to break" before we broke it in Swift 5.5 six months later.

The proposed /.../ source break is gentler than what we did with await, because it's under the control of a flag. In the run-up to Swift 6, we should be embracing this approach wholeheartedly, such that each Swift 6 source break has a flag associated with it so folks can nudge their Swift 5.x code along toward Swift 6 incrementally, gaining the benefits that came with each of these changes. I have a design in mind for this that I'll bring up in another discussion.

Swift's source stability has gotten massively better since the turbulent days of Swift 1-4, but it's not an absolute. I hope it never becomes an absolute, because that would lead us into bad long-term decisions.

I consider the intense focus on source compatibility in this review to be overblown. If #regex(...) is the better syntax, argue that without reference to source compatibility, and give it the same level of in-depth design that the authors have provided for /.../. I've thought about #regex(...) and found enough holes in the design (noted above) that it's a very distance second choice to me.

Doug

Karl · May 2, 2022, 10:40pm

It doesn't seem to be settled whether or not /.../ really is the best syntax, though. The proposal itself seems to suggest that other languages are moving towards user-defined literals instead (which have the clear benefit that less escaping is required). It could better articulate why adding /.../ literals is so compelling in the first place, instead of just, oh, "it's a term of art", and then later implying that it's actually more of an anachronism.

Even if source compatibility isn't an absolute, breaking it should require a clear, compelling reason IMO. If it really is worthwhile, there shouldn't be a problem making that case.

So "Swift is an opinionated language, and it has chosen clarity" can mean many things.

Swift also has result builders, so there is precedent for incorporating DSLs in the language. Regexes are a critically important text processing DSL.

You can make the argument that, just like SwiftUI view builders allow complex view hierarchies to be understood more easily, regexes allow complex text processing to be understood more easily. You have to compare the regex to the equivalent non-regex function to really judge how much the compact notation really helps/hurts understanding.

Douglas_Gregor · May 2, 2022, 10:49pm

Does any other syntax have such a compelling argument?

Literals in the language have specific syntactic forms. \d+, \d+[.]\d+, "...", [...], [... : ... ], { ... }, etc. If we agree that regex literals should be part of the language, they need a syntax. We could grab some other delimiter like |...|, but the arguments for it are basically the same as for /.../. I think that's why #regex(...) is popular to discuss as an alternative, because it has "regex" in the name. But its problems make it worse than picking a punctuation character as delimiter. And if we have to pick a punctuation character as delimiter, might as well make it the one that's well-precedented.

Doug

Karl · May 2, 2022, 10:55pm

My understanding from the proposal is that extended literals with #/.../# do not have the same source compatibility concerns, but have the benefits that user-defined literals in other languages do - that you can avoid a lot of noisy escaping.

A fair few reviews so far seem to be of the opinion that extended literals alone would be enough, and that we don't need the bare syntax. I'm not able to find a compelling counter-argument in the proposal why we absolutely need those bare slashes, and why they are worth the cost of the source break (even though I agree source breaks can be justified).

Jumhyn · May 2, 2022, 10:56pm

I'm sympathetic to the #regex(...) syntax, but you and the proposal both do a good job IMO of outlining why it's probably not desirable. However, I think you've glossed over what I consider the best alternative—one whose benefits are already thoroughly discussed in the proposal: allow for the extended literals #/.../#, ##/.../##, etc., with no bare version.

In addition to allowing unescaped forward slashes (and unescaped /#, /##, etc. at deeper levels of nesting), I think the use of # offers a great indicator of "literal." It's not universal as you note, but it's heavily precedented throughout the language. In addition to #selector we have #colorLiteral, #imageLiteral, #keyPath, #file and friends... I personally have a strong mental association between "#" and "literal" when reading swift.

I still don't believe the proposal does a great job justifying why we need the super-terse bare form to be introduced at the same time as the extended literals.

xwu · May 2, 2022, 10:57pm

By and large, I think the tenor of the discussion is that it’s not that the arguments for are insufficiently compelling, it’s the arguments against are outweighing the arguments for. Thus, another delimiter with the same arguments for it but not the same arguments against it would be strictly superior. (For example: '…'. Maybe |…| but I don’t think it’s been worked out as explicitly.)

allevato · May 2, 2022, 11:28pm

+1 on having regex literals in the first place, and –50 on having the bare /.../ syntax.

The two above comments essentially summarize my thoughts as well. As the proposal points out, grafting the bare /.../ syntax onto the language we have today introduces unnecessary complexity and removes features (certain custom operators) from the language and breaks a popular third-party package.

Given that #/.../# would provide the same capabilities with minimal added syntax compared to the proposed /.../ (in fact, it's strictly better, as it would allow unescaped forward slashes), and it has none of the limitations of that bare syntax with regard to arbitrarily closing off slices of the custom operator namespace, why even take the chance and then be stuck with it? I haven't seen the case be made yet that supporting only #/.../# instead of /.../ would be so onerous as to make a change that we can't easily reverse.

Most major programming languages do not have a built-in regular expression syntax like this. The ones that do were either designed with it in mind to begin with, or were able to add it without as much difficulty because they didn't have to deal with as many syntactic ambiguities. So I think the "term of art" argument doesn't hold up here, because there's close-to-zero chance that someone will come to Swift from some other randomly-picked language they used previously and assume that it does support /.../ or be confused if it uses something different like #/.../# as the syntax instead. I imagine the first thing most folks do when they need a regular expression is a web search for "how do regex in $LANG", and then once you know, you know.

The risk of doing bare /.../ now would seem to outweigh the minor advantage of shedding a couple other characters around the literal.

YOCKOW · May 3, 2022, 1:02am

digression

I don’t want to make sarcastic remarks, but it seems that members of The Core team fall in with bare/…/ and others don’t.

Skeptics may suspect The Core Team had already decided something.

Les_Pruszynski · May 3, 2022, 1:16am

And they have all the right to do so.

scanon · May 3, 2022, 1:28am

Broadly I agree with the argument that Doug laid out, summarized here:

We need to have runtime construction of regexes and literals as well as the DSL. They fill very real holes in the usage model that the DSL cannot. Therefore the only subject for real discussion is the choice of delimiter.
In as much as any delimiter is precedented from other languages and tools, that thing is /.../. It is not universal by any stretch, but it is vastly more common than any other option on the table.
I have a certain abstract fondness for #regex(...) or regex'...' for the reason of extensibility, but there are also very real drawbacks to both of those that ultimately make them non-starters for me.

The only alternative that I really take seriously is providing only #/.../# and not providing /.../ at all. The argument for this delimiter is that it does not require a source break. The question then is whether /.../ is enough better than #/.../# to justify the break.

As Doug noted, the source break is "small"; in particular, the only case in which the break is really concerning to me is the operator usage for enum case paths. Every other hypothetical example of breakage that I've seen is both contrived and has a trivial workaround. For enum case paths, there is a clear and correct solution: make them a first-class language feature.

So for me, the hurdle that /.../ has to clear is small. And it clears it, for the same reason that we would not accept spelling dictionary literals #[1:'a', 2:'b']#. This syntax is "fine", but there's an obviously better one that is readily available. "Swift is a pragmatic language," and not using the readily available better syntax is the opposite of a pragmatic decision. It is superficially pragmatic, but only in the very short term. There are infinitely more programs yet to be written in Swift than exist today, and we should eat this minor break it for the sake of all of the programs still to come.

+1

rvsrvs · May 3, 2022, 1:54am

this would seem to me to argue for doing #/.../# only now and implementing the source break once the optics change has actually been implemented and maintainers of current code have had a chance to adopt it.

My fear is that the clear and correct solution doesn't get implemented for a long time or perhaps not at all.

scanon · May 3, 2022, 2:28am

isn't this what the flag is for?

woolsweater · May 3, 2022, 4:44am

That's not really so. The forward slash is extremely commonly used for other things. Most notably comments. Picking a delimiter that was not a valid operator character (admittedly difficult), or just some that were uncommonly used in source, say, «[a-z]+», would have a very different set of consequences.