SE-0354: Regex Literals

+1 but only support extended literal #/.../#

-1 for bare literal /.../

7 Likes
  • What is your evaluation of the proposal?

+ 1

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes. A regex literal would be very useful, especially with compile-time checking.

  • Does this proposal fit well with the feel and direction of Swift?

Yes. Allowing named captures to drive tuple elements fits right in.

I also appreciate the proposed solution for including regex literals within more complex Regex result builders for cases when that may be clearer than a multiline regex literal.

Can these two concepts combine to get named tuple elements from using a regex literal within a result builder? If not this seems to be a missed opportunity.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

It's good to see a common format used here.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I've been following the previous discussions and proposals.

Enthusiatic +1 on the goal:

I have reservations about feature mismatch between literal and DSL, but can accept that provided we quickly move in the direction of adding language features necessary to have DSL and literal parity.

Concerning delimiters:

I clearly see that a lot of thought and effort went into making /.../ work. The huge amount of work invested in this choice proves that it is impossible to make it work without serious compromises. The compromises are twofold: You propose we both remove/restrict features from the language and simultaneously make it more complex with new special rules and exceptions. I don't like this, even if the features are rarely used and special rules rarely encountered.

I honestly think that the main reason for defending this delimiter is the big effort already invested in it. I also clearly see that there have not been enough serious work and effort on making any of the alternatives work well.

Considering how close we are to WWDC, is there any real chance/possibility that the core team will seriously consider anything other than accepting /.../ as is?

If there is, then I will have more to say.

8 Likes

Please review the proposal without concern to implementation timeframes. If you believe there are better alternatives, it's best to lay them out and explain why you feel they're preferable.

Please don't second guess peoples motivations like this – it's not appropriate. The proposal authors have laid out a case for why they believe that /.../ is the best choice for the language, based on factors like look and feel, familiarity with other languages etc. This should be taken at face value, and engaged with as presented in good faith rather than making accusations that the true reason is sunk cost or something else.

6 Likes

My apologies if I came across accusatory. That is the only plausible explanation I could come up with. I don't mean any disrespect to any of the people involved, and apologize if I did.

I have seen strong opposition from the core team when it comes to ideas that result removal/restriction of (otherwise harmless) features and simultaneously add complexity by introducing new exceptions and special rules.

To me, the choice of the delimiter does not look to be so fundamental to warrant such an invasive change to the language. And all of the discussion so far have not convinced me otherwise. Some very common languages such as Python don't even have a dedicated delimiter for regex literals. And most of the ones that already use /.../ are already offering alternatives.

Since I can't explain the current choice any other way to myself, I thought I better bring it up. This gives you a chance to clarify further why you think this is a worthy tradeoff and why /.../ is so fundamental. I don't think what have been said so far in defense of /.../ is enough justification for the damages to the language. I don't think I am the only one who sees it this way. Maybe others are too polite to bring it up.

I will put up my suggestions in my next post.

5 Likes

Part 1
Having dedicated literal: +1. I already stated my support for the goal of having regex literals in Swift source code that provide compile-time checks and typed-capture inference.

Part 2
Choice of delimiter. I think the magic literal alternative is not adequately considered. The idea of replacing the internal delimiter with another character, specifically /, is not adequately considered. The only justification provided is that we have not done this before. Are there any other down sides to it besides it being novel?

The advantage of #regex/.../ is that for the contents of the literal it has exactly the same behavior as if the literal was /.../. It does also score in familiarity, as it is already the same literal as say Perl, with a prefix. The other advantage is opening the door for normalizing this type of syntax for other compile-time checked and potentially library-based new types of literals (like property wrappers). We should be able to be flexible about the specific delimiter used after the #whatever and if we support library defined custom literals, we could let the library specify the supported delimiters. For example, both #sql"...." and #sql'...'. We should also be able to easily extend it to support multi-line variants.

The down sides that I see are being novel and its verbosity.

I don't see the first one (bringing something new to the language) as inherently bad, unless there is a clearly better alternative within the current norms, which I don't see any. Please correct me if I am wrong on this.

The verbosity: this is more subjective and depends on people's preferences, uses and experiences. This can be mostly addressed by privileging regexes as the most common case of such delimiters as being allowed to use #/.../ (as opposed to shortened magic literal) in addition to the already proposed #/.../#.

I also think we could explore using single quotes without completely sacrificing it to this use case a bit deeper. For example, we could ask '/' and '//' literals written as ''/' and ''//' to enable using '/.../' to achieve minimal noise and keep single quote still usable for uses like ASCII literal (e.g. 'A') and source compatibility. I know ''/' and ''//' would be special cases, but they would come after '/.../' is established. We could even reserve the whole starting with a delimiter case to support other literal types with other delimiters (e.g. '!...!').

5 Likes
  • What is your evaluation of the proposal?

+1 on the idea of having regex literals.
+1 on #/.../# syntax. (The re'', #regex(), and #() alternatives are acceptable to me as well.)
-1 on the bare /.../ syntax. It introduces too much ambiguity (for humans) and odd special cases to Swift syntax for the sake of a feature that most people should hopefully be avoiding in favor of the regex DSL. Outright banning / from prefix operators eliminates a lot of potentially useful syntax for libraries. It's already hard enough to come up with good operator names using only ASCII characters; removing an ASCII character from prefix/postfix use does not feel good.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes, regular expressions are a powerful tool and it's useful to have a special literal syntax for expressing them, as this makes it more obvious when they're being used and allows for editors and other tooling to provide syntax highlighting, linting, etc.

  • Does this proposal fit well with the feel and direction of Swift?

I don't believe the bare /.../ syntax fits Swift well. Most of the text of the proposal documents the numerous places where Swift syntax needs to be adjusted to shoehorn this syntax into the language. A more appropriate syntax for Swift wouldn't require such wide and far-reaching changes, nor would introduce so many potentially ambiguous situations.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I've used Perl professionally since 1997 and a majority of the Perl software I've written has used regex literals. I also use sed but its regex functionality is a subset of Perl's, with essentially the same syntax. I also use that syntax in other software such as ed/vi/vim. I've occasionally used other programming languages with similar regex literals (e.g., Ruby, Javascript) and those that use other regex literal syntax (e.g., Julia's r"..." syntax) and those that don't have special syntax or co-opt a more general raw string syntax (e.g., Rust's r"..." raw string syntax) for regular expressions.

I think Perl has by far the best support for regex literals but it's important to recognize that in Perl the bare /.../ syntax is actually sugar for a couple of different operations. The general syntax for a regex literal in Perl is the qr{} operator, which can be spelled qr"..." or qr '...' or qr(...) or qr[...] or even qr A...A (That last one is rarely a good choice.:rofl:) In some contexts /.../ is sugar for qr{...} but in other contexts it's not. (because Perl :disappointed:) In other contexts a bare /../ in Perl is sugar for the m{} operator instead, which both creates a regex and immediately matches a string against it.

It would be nice if Swift could have similar flexibility in delimiters (but only for quote-like or bracket-like characters, not the "any non-whitespace ASCII" rule that Perl has) via something like #regex[] but it's not absolutely necessary.

Perl examples:

$string = "foo bar baz";  # assign a string to a scalar variable
$string =~ m{o*}; # Try to match the regex /o*/ to the string
$string =~ /o*/; # shorthand for matching the regex to the string

@array = split qr{\s*}, $string; # split the string on whitespace
@array = split /\s*/, $string; #shorthand for splitting the string on whitespace

$regex = qr/foo( ba.)*/; # $regex contains a regular expression (not a string)
# The qr// operator is required above, as using bare /.../ is actually sugar for a rather non-obvious operation in this case:
$foo = /foo( ba.)*/; # shorthand for $foo = ($_ =~ m/foo( ba.)*/);

I've used Perl on a regular basis for a quarter century now and yet off the top of my head I'm not 100% sure of all the cases where /.../ is sugar for qr{...} and where it's sugar for m{...}

The /.../ syntax in Swift would be less confusing in this regard than in Perl, as in Swift it would simply be regex literal syntax and not be sugar for a variety of different (sometimes surprising) operations. Thus the behavior of /.../ in Swift would actually be the equivalent to qr{...} in Perl and not /.../ in Perl. So choosing /.../ to match Perl is not actually matching Perl!

The flexibility to choose a delimiter is extremely useful. The #/.../# syntax proposed here is adequate for this and is reasonably Swifty, given the parallels to string literal syntax. A #regex(...) syntax could conceivably allow more Perl-like flexibility in choosing delimiters but I feel that #/.../# is a more Swifty choice. Although the #regex() syntax would be more easily extended to include other types of literals, I can't think of any other types of string-like literals that would be nearly as useful as regular expressions, so that doesn't feel necessary to me.

Personally I'd prefer that one or more # be required, rather than the zero or more allowed in strings. This would allow us to avoid breaking changes in Swift's syntax, reducing cognitive load on programmers by making it more clear that regular expressions are being used. If we adopt the proposed /.../ it feels like we're just making life far too easy for entrants into any future obfuscated Swift contests.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I've read this proposal and the related syntax proposal SE-0355 in depth and followed the pitch threads closely. (Like several others, I participated in the pitch thread to suggest adopting only the #/.../# syntax and not the bare /.../ syntax.)

19 Likes

+1
After a thorough review of the proposal, I support this feature as is. That said, I am more than willing to accept #/.../# as the next best thing. This feature far exceeds what's currently available to me in Java and (along with the other text processing proposals) will help to bring Swift to a wider audience. I'm thankful for all the amazing effort put into bringing this feature to the language.

3 Likes

+0.75 on the proposal.

  • The delimeter-extending sequence /…/, #/…/#, ##/…/##, etc. is a nice solution to the escaping / custom delimiter problem and parallels Swift strings nicely.

  • Allowing /…/ maintains a healthy familiarity for newcomers from other languages, and I’d prefer supporting it. The mind-bending parsing rules give me pause, though my impression is that other parts of Swift’s parser are equally ridden with special cases that mostly just work in practice.

    I’m thus in favor of allowing /…/ if and only if the level of existing source breakage is tolerable. Do we have an analysis of how much existing code this breaks? A run on the Swift compatibility suite, for example?

  • This non-parallel between /…/ and #/…/# is highly bothersome, a special case I’m loath to introduce, though I don’t see a better solution offhand:

    In order to help avoid further parsing ambiguities, a /.../ regex literal will not be parsed if it starts with a space or tab character. This restriction may be avoided by using the extended #/.../# literal.

  • I love the passing of named capture groups through the tuples. I’m troubled by the non-availability of this feature in the DSL, particularly for this reason:

  • While the previous point can also be fixed later, this mismatch between literals and DSLs seems like it’s probably best fixed now, either in this proposal or in the DSL proposal:

    The optional wrapping does not become nested, at most one layer of optionality is applied. For example:

    let regex = /(.)*|\d/ // regex: Regex<(Substring, Substring?)>

    This behavior differs from that of the DSL, which does apply multiple layers of optionality in such cases due to a current limitation of result builders.

  • The change in the meaning of whitespace when #/…/# spans multiple lines is bothersome. I wonder whether the multiline (or rather, non-significant whitespace) syntax should be #///…///#, to parallel #"""…"""#.

In short, a very welcome proposal with too many special cases for complete comfort. Let's think twice about those special cases, though in the balance, I'm in favor of adding them if they do prove to be the best choice.

7 Likes

I am +1 on having regex literals and support in the language.

I am very much -∞ on using / as the delimiter. Perl allows custom delimiters, and has for over 20 years. The last time I was using Perl, the primary benefit being touted was the ability to chose a delimiter that was not to appear in the regex itself. Given Perl's niche at the time, not having to escape path literals was a big deal.

I feel that the only benefit to the chosen delimiter is that it allows copy+paste without altering the content. I think this is a very poor reason to break existing code and introduce more magic into the parser that kinda-mostly-almost-always-but-not-really works.

8 Likes

Avi, I'm confused here. This proposal addresses the custom delimiter problem you describe, and does so using the same # solution as Swift strings:

#"String containing "quotes.""#
#/Regex containing sla/sh/es./#

##"String containing a #" hashquote."##
##/Regex containing a #/ hashslash./##

etc

Are you speaking against this #/…/# solution, proposing an alternative that does not parallel Swift string literals?

Or are you fine with #/…/#, and only speaking against allowing the zero-hash /…/ syntax?

1 Like

I support this proposal. +1. It's well thought out and will be a very positive addition to Swift.

The only really controversial part of the proposal is the use of the /regex/ delimiters. I believe we can hold our noses a bit and live with the mitigations outlined. The value achieved in using a simple syntax in common with other languages far outweighs the few cases where ambiguity will occur: I don't believe we will see substantive problems in broad practice.

5 Likes

+1

Although I have no immediate use for this, I think regexes are an excellent addition to Swift and being able to use literals like this makes regexes a first class feature of the language. My first and main experience with regexes was with Perl and having regexes a part of the language was one of the highlights of using it. Yes, Swift is not Perl but adding this will make Swift more useful in situations where you might instead reach for a scripting language or external library.

I've been following the discussions on this, and the main objection seems to be that it's a breaking change. But I am persuaded by the arguments, with data, that its impact on most actual code is small, less e.g. than many other changes. And /regex/ is by far the best syntax to use, as it is instantly recognisable, easily understandable to anyone familiar with regular expressions.

I very clearly and explicitly stated the syntax I object to. There isn't a single # in my previous comment.

  • What is your evaluation of the proposal?

-1-- I'd rather have seen a good regex literal design be implemented rather than the respin of familiar but horrible designs of old.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Probably. Tho to be honest, it's hard for me to see this literal design as such a big improvement over just encoding regular expressions into strings -- there's very little no readability gain offered by these literals imo

  • Does this proposal fit well with the feel and direction of Swift?

+0

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

This feels almost identically bad to the most common regex literal designs in other languages I've used -- modulo some swift specific problems associated with the delimiter choice (I don't find this surprising since the explicit goal was to copy the most familiar regex literal syntax)

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I paid attention to early pitches and completely lost interest when the goal seemed to be to maximize familiar syntax at any cost

5 Likes

I remember some of the previous proposals mentioned having user types which could be expressed as Regex literals. Is that part of another proposal or has it been dropped (at least for now)?

Generally in Swift, the standard library types are constructed from literals using protocols, such as ExpressibleByStringLiteral, ExpressibleByIntegerLiteral, etc - but this seems to be the first time (as far as I know) where a type is just outright created by the compiler from a literal with no corresponding protocol. This is all the proposal says about it:

The compiler will parse the contents of a regex literal using regex syntax outlined in Regex Construction, diagnosing any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Regex literals allows editors and source tools to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL.

That in itself is worth calling out, but actually, I ask because I've been thinking about building URL patterns using a regex-like syntax.

We can't just use a normal regular expression, because even literal segments may need to be matched through percent-encoding (k may need to match k, %4b and %4B. Non-ASCII needs longer sequences of bytes) and take other normalization processes in to account. It really is a custom pattern, but there are lots of interesting ways you can express them with regexes or incorporate regexes, and as part of that, I may want/need a deeper understanding of the AST or the ability to construct my own type from a user's literal (or restrict some regex features). I haven't really thought deeply about what I would need for that, or prototyped it, but it is something I'd like to play with so I'd like to know what happened to the protocol.

8 Likes

That is future work. The ExpressibleBy* approach is geared around the needs of data literals, which it barely serves. Regex literals are more akin to algorithm literals. Thus, I think it is better to improve the library-compiler interface here. From an early thread:

Even better than raw mode is the ability for the regex parser to pretty-print its AST using a requested syntax variant.

Either way this is incremental and future work. Nothing in the proposal precludes this.

This is a fascinating use case and I'm really interested in exploring it! (outside the scope of this specific proposal, of course).

4 Likes

I don't think we should hamper regex literals because of compiler limitations preventing the same goodness from appearing for the DSL. Doing so is counter productive if we care about these limitations actually being addressed. Tuples have been muddy and under-featured in Swift (lots of historical reasons). There's never been as tangible a demonstration of this limitation within the Swift toolchain until now.

As an aside (I don't think this is necessarily your argument, but I can see it being related), I support developers having a policy or linter rules against using certain features. I could understand regex literals being that feature for some. However, I think regex literals serve a useful role for large swaths of developers.

4 Likes

I addressed this thoroughly in the pitch thread. Copying some of that content here for easy viewing:

Note there is no work being done, AFAICT, on SQL literals or these other theoretical use cases. I'm personally interested, but they're not plan of record.

Also, it seems clear that foreign source fragment might have their own needs above and beyond regex's, so we certainly wouldn't want to limit them from regex-based assumptions. I think it's better to design the general facility in light of general usage, which is multiple releases out as it involves further evolving the compiler-library interaction story. The regex work actually advances it behind the scenes, but it needs more examples than just regex to help complete it.

2 Likes

+0.75. I support the decision to use / as a delimiter for regexes, though I have some concerns about other various minutiae.


I feel that many posters on this thread and the previous thread are overstating the harm that comes from from the ambiguous cases listed in the proposal. While there definitely are cases where there is ambiguity, these cases seem very rare. I've never once needed to write anything like foo(/, /) or bar[/] + bar[/] in Swift. In the previous thread, Mishal Shah found that only 1 out of 2,879 projects on the Swift Package Index broke due to the ambiguities introduced in this proposal. To date, I haven't seen a case where an ambiguity would occur in realistic Swift code. Additionally, standalone operators already have cases where parentheses or an explicit closure are required e.g. let divide: (_, _) -> Int = (/).

Here's how I imagine these ambiguities will play out:

  1. It's extremely unlikely that a Swift programmer will encounter a situation where they have to disambiguate between two / operators and a /.../ regex literal in the first place.
  2. Even if they do get into that situation, they will recognize the situation due to syntax highlighting and, oftentimes, related errors. They can then use parentheses or closures to disambiguate.

I get the appeal of only having #/.../# as a language designer. But as a language user, the # characters are just noise and the ambiguities are rare enough that they don't really pull their weight, especially considering that plain /.../ is a term of art for regular expressions. Objective-C developers know about the clutter that comes from repeatedly using a special character (like #) to maintain backwards compatibility.


In regard to the CasePaths library, I agree that the deprecation of prefix / is unfortunate. However, I don't think we should hold Swift back for the sake of one library — especially since the library could switch to another operator, like |. Unless I'm mistaken, it seems that swift-syntax should be able to automate replacing / with | in existing codebases. And if case paths do get natively implemented in the language, delimited by \, then people would have to rewrite their code anyway.


I still think that named captures should be supported by the DSL if they're supported by literals. It's not a dealbreaker for me if that doesn't happen, but I do think this sort of thing is antithetical to how literals and the DSL is supposed to work. Regex literals are supposed to be terse, succinct expressions while the DSL is supposed to be more powerful, readable, and composable with the expense of being more verbose. Requiring programmers to un-DSLize (for lack of a better term) their Regex in order to have named captures would undermine the power, readability, and composability that the DSL is supposed to have over literals.

Reference is sort of similar to named captures, but I don't think it's close enough. It doesn't have the same semantics as named captures and replaces type system guarantees with runtime checks and confusing rules.


I have reservations about the -enable-bare-regex-syntax flag as well. I'd love to be able to use /.../ from day one, but I'm worried about creating a new dialect of Swift. If Swift 6 is coming out soon, though, it's less of an issue. I'd like to know what the intended use of this flag is. Is it just for regex enthusiasts? Or is it something that's intended to be added by default to new Swift packages and Xcode projects?

Also, how would this flag work with features like playgrounds or the REPL?

7 Likes