[Returned for Revision] SE-0354: Regex Literals

Hi everyone,

The review of SE-0354: Regex Literals ran from April 28 to May 10. The core team has decided to return the proposal for revision while accepting in principle the need for a regex literal and the use of /.../ as the delimiter. A second review with this revision is open now.

The majority of discussion in the first review was regarding the choice of delimiter, and its impact on existing source – specifically due to removal of prefix / operators. During the review discussion, an alternative parsing rule was established that eliminated the need to remove these operators.

The additions to the proposal consists of two parts:

  • looking forward for unmatched closing parentheses within the regular expression, and only parsing the / as a regex if there are none. This resolves ambiguity such as f(x, /, y).reduce(/)
  • parsing / as an operator if there is no second / on the same line

Testing by the proposal authors indicate that several open-source packages that used those operators now compile cleanly with the 5.7 release branch when enabling regex literals.

Given this, the core team has decided to open a second round of review, with the new parsing rule, for further feedback. In particular, the core team would like this review to focus on other aspects of the proposal, such as multi-line non-semantic whitespace literals, and the typed capture behavior. Feedback on any unanticipated edge cases with the new parsing rule would also be appreciated.

Some reviewers expressed a desire to avoid the terse regex syntax entirely, or only allow it to be constucted at runtime, with only the DSL being provided to create them Regex type at compile time. The core team endorses the principle of regex syntax literal as a first-class feature of the language. Regexes are a time-tested way of matching string content. Swift would take this concept, adding notable improvements such as grapheme-based matching and strongly-typed captures.

The core team agrees with the proposal authors that a succinct literal is important. Here, succinct means a single-character delimeter, matching the precedent of string, array, and dictionary literals. One of Swift's goals is clean, concise syntax. While #/.../# could support the feature on its own, it doesn't achieve the same level of the concise syntax that can be achieved with the proposed /.../ syntax. This is especially important with regular expressions used within the DSL, but also for other non-DSL uses such as in switch statements.

Given the need for a single-character literal, /.../ is appropriate as a term of art for regexes. Other single-character options either suffer similar ambiguities to /, or would use a delimiter that doesn't imply regex and would perhaps have a better alternative use in future, such as '...'.

The core team acknowledges that this is a contentious decision. Many on the forum thread expressed the view that #/.../# is more clear than the bare slash delimiter. The core team does not take this position and agrees with the proposers that /.../ is preferable. This is primarily an aesthetic judgement, but there are also other criteria involved. In particular, it seems hard to justify why string literals have both " and #" forms, while the analogous regular expression form has only #/ .

Some reviewers preferred a fuller spelling, for example #regex(...) . Given the goal of concise syntax, the core team does not think this would be a good direction. A motivation cited for #regex(...) was that other literals begin with # – for example #filename and #selector(...) . While this is true, the inference from this that that # means literal is incorrect. # denotes compiler integration, of which literals are a subset. However, the most common literals such as for strings or arrays do not use a # , since it introduces unnecessary noise. The core team considers this to apply to regex literals also.

Reviewers expressed concern at the complexity of the parsing rules needed to properly disambiguate operators from delimiters. The core team notes that this kind of parsing rule is normal for Swift – a very similar situation exists for handling < – but that this kind of parsing rule is rarely surfaced in this level of detail, so may be new for some evolution participants. The need for non-Swift compiler parsers (such as editor syntax highlighting) to be able to handle the rules is an acknowledged issue here – but the belief is the rules presented are straightforward enough to be implemented by other parsers too.

Thanks to everyone who participated in the review, which led to a much improved proposal.

Ben Cohen
Review Manager

25 Likes

Will /../ be gated behind a production flag and on by default for swift 6?

That is covered by the proposal document. The answer is yes to both – but feedback on that (and whether it's the right approach) should be part of the new review thread.

I hope this is not an inappropriate forum for this comment and I truly do not mean to disparage anyone and hope it can be read generously and with the spirit of ignorance informing its author (I am a dumb) ... I think the summary of the discussion above is incredibly good and remarkably fair -- however I don't personally feel heard from the summary ...

I expressed a view -- and I don't think I'm the only one to express this view though it was a minority view and largely without popular traction which in aggregate was very much drowned out by discussion of "very popular subjects of controversy" ... my view is that copying the regular expression syntax of old is a tragedy on future generations ... swift could've done better and imo the arguments to support "copy pasting pcre with minor improvements to preserve familiarity and a specific ideal of brevity" are very much not well supported ...

Anyway I just wanted to throw this out there so that my grandchildren can see I objected to this effort to breathe significant longevity into a monstrosity that they might've avoided ...

Interestingly i do find the language of the core team above fascinating -- and am impressed by the skill and effort required to navigate an enormously complex discussion of mostly trivial things -- and over a topic where popular controversy takes hold ... while at the same time -- I remain fully uninspired by the result produced ...

I'm not using swift as a daily driver but no doubt will at some point in the future and will likely experience shock and that sense which sometimes comes when forced to deal with technology's sometimes oppressive legacies ... "this exists for reasons of someone's idea of practicality which they held at the time"

onward and forward and massive apologies if this missive offends anyone ...

There’s no need to restate your opposition to traditional regex syntax. Your feedback is explicitly addressed here:

Interpreting the above as related to my comment implies agreement to the idea that terse syntax == pcre-like. if you disagree with that equivocation as I do then you would not see that statement as related to my comment ...

I can agree with the idea that my comment is not needed though ;)

1 Like

The problem with the "Swift could've done better" expression is that it does not suggest anything in particular, nor does it state the problem. At best, I see that the argument is that regex is arcane.

That may be true (and likely is), but it’s also rather hard to find an equally concise syntax that is easily recognizable/understandable for those in on such arcana. Any alternative will have to be much, much better than regex as one also needs to justify not using regex.

2 Likes

Question: why does one need to beef up the language with things that can be offloaded to third party packages? NSRegularExpression is also an option. Part of foundation.

Type safety is one reason. NSRegularExpression and other third party frameworks can't perform compile-time check of your expressions, and can't expose the results in a type-safe manner. Another argument is that of "currency types". That is, if we rely on third party frameworks, we can't pass these regexes around between frameworks. If e.g. Vapor were to rely on one third party library to support regex-like URL matching, and some database layer uses another third party library, you can't pass stuff around.

I'm sure there are other arguments to be made as well.

4 Likes

The Swift compiler would be able to provide efficient native code for individual regexes, instead of shipping a regex interpreter in the runtime. This can give tremendous performance gains.

5 Likes

Apart from what the others have already said, I just want to add my (very subjective) opinion that it is just absolutely horrible to do anything with NSRegularExpression. It is one of the last APIs in Foundation that is still really 'un-swifty' and I either avoid doing any string processing in Swift at all or I use a handcrafted solution instead of RegExes because it is so painful right now. Because of that, I have always waited for native and fast RegEx support in Swift and now it's finally coming.

6 Likes

This is addressed is SE-0350: Regex Type and Overview, which is linked from this proposal:

1 Like

Note that this is not the plan. The immediate plan is to compile all regexes to bytecode at runtime. Future directions include compiling trivial regexes to native code and compiling other regexes to bytecode at compile time, but at no point is it expected that Swift will drop the runtime bytecode interpreter.

4 Likes

I think the argument is more specific than that -- you can't interpret a sequence of regexp characters without internalizing the entire set of control and escape characters -- full stop that makes PCRE hard to read and learn (I suppose that's an opinion, but I think it is accurate to say that PCRE is hard to learn ...). A syntax which required delimiters around terminals would be much more readable and easier to learn since as the reader you know exactly which characters have 'special meaning' and which represent 'just characters to match' [Pitch] Regular Expression Literals - #4 by breathe

I just don't buy that argument ... I advocated that the proposal should include scenarios representative of the real-world use cases to which these kinds of tools are applied. In my mind the basic scenario in which these expressions will evolve in a codebase is along these lines:

(1) you are given an informally specified/ad-hoc format defined by some number of examples
(2) you build a thing to do a thing (validate it obeys some constraints or get some data out of it)
(3) more examples of the language/format come in on which your code now breaks
(4) edit the patterns to fix; rinse and repeat

A sequence of examples showing the evolution of a set of literals given a scenario like that would allow alternative literal designs to be compared/contrasted along far more meaningful and practical dimensions than any of the discussions about the literal design achieved ...

If the proposal authors wish to propose regex as the basis for text processing, it is not their job to design an entirely new system that would, by definition, be completely untested.

If you have an alternative to regex that you wish to propose, you are welcome to do so.

Sure, suppose that you managed to convince me, or even the core team for that matter, I still struggle to find an actionable item from your original post and our discussion. Let’s focus on that if you’d like to pursue this direction.

Perhaps.

I perceive this as unrelated to anything that I said. But I do understand from your comment that your goal is to get me to stop engaging on this topic -- and I will respect that wish. Apologies once again for any offense or noise introduced and for any discussion raised (or forced!) in an inappropriate context.

No one is saying that. If you think it'd make the language better, you're all the more welcome. However, you'd need to articulate your idea so that people can follow, investigate, and decide whether to agree or disagree. (I'm not talking about this forum or proposal, just a general interaction of any discourse.) I feel like there is some helpful feedback to be had in your post, but if that's the case, you might need to distill it even further.


Also, note that the core team already decided to accept /.../ as the regex literal in principle as per this announcement, so any argument to break that needs to be rather compelling.


With all that said, this is an announcement post about 2nd review. It might be better to discuss it in the actual review thread instead.

1 Like