SE-0354: Regex Literals

Part 1
Having dedicated literal: +1. I already stated my support for the goal of having regex literals in Swift source code that provide compile-time checks and typed-capture inference.

Part 2
Choice of delimiter. I think the magic literal alternative is not adequately considered. The idea of replacing the internal delimiter with another character, specifically /, is not adequately considered. The only justification provided is that we have not done this before. Are there any other down sides to it besides it being novel?

The advantage of #regex/.../ is that for the contents of the literal it has exactly the same behavior as if the literal was /.../. It does also score in familiarity, as it is already the same literal as say Perl, with a prefix. The other advantage is opening the door for normalizing this type of syntax for other compile-time checked and potentially library-based new types of literals (like property wrappers). We should be able to be flexible about the specific delimiter used after the #whatever and if we support library defined custom literals, we could let the library specify the supported delimiters. For example, both #sql"...." and #sql'...'. We should also be able to easily extend it to support multi-line variants.

The down sides that I see are being novel and its verbosity.

I don't see the first one (bringing something new to the language) as inherently bad, unless there is a clearly better alternative within the current norms, which I don't see any. Please correct me if I am wrong on this.

The verbosity: this is more subjective and depends on people's preferences, uses and experiences. This can be mostly addressed by privileging regexes as the most common case of such delimiters as being allowed to use #/.../ (as opposed to shortened magic literal) in addition to the already proposed #/.../#.

I also think we could explore using single quotes without completely sacrificing it to this use case a bit deeper. For example, we could ask '/' and '//' literals written as ''/' and ''//' to enable using '/.../' to achieve minimal noise and keep single quote still usable for uses like ASCII literal (e.g. 'A') and source compatibility. I know ''/' and ''//' would be special cases, but they would come after '/.../' is established. We could even reserve the whole starting with a delimiter case to support other literal types with other delimiters (e.g. '!...!').

5 Likes
  • What is your evaluation of the proposal?

+1 on the idea of having regex literals.
+1 on #/.../# syntax. (The re'', #regex(), and #() alternatives are acceptable to me as well.)
-1 on the bare /.../ syntax. It introduces too much ambiguity (for humans) and odd special cases to Swift syntax for the sake of a feature that most people should hopefully be avoiding in favor of the regex DSL. Outright banning / from prefix operators eliminates a lot of potentially useful syntax for libraries. It's already hard enough to come up with good operator names using only ASCII characters; removing an ASCII character from prefix/postfix use does not feel good.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes, regular expressions are a powerful tool and it's useful to have a special literal syntax for expressing them, as this makes it more obvious when they're being used and allows for editors and other tooling to provide syntax highlighting, linting, etc.

  • Does this proposal fit well with the feel and direction of Swift?

I don't believe the bare /.../ syntax fits Swift well. Most of the text of the proposal documents the numerous places where Swift syntax needs to be adjusted to shoehorn this syntax into the language. A more appropriate syntax for Swift wouldn't require such wide and far-reaching changes, nor would introduce so many potentially ambiguous situations.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I've used Perl professionally since 1997 and a majority of the Perl software I've written has used regex literals. I also use sed but its regex functionality is a subset of Perl's, with essentially the same syntax. I also use that syntax in other software such as ed/vi/vim. I've occasionally used other programming languages with similar regex literals (e.g., Ruby, Javascript) and those that use other regex literal syntax (e.g., Julia's r"..." syntax) and those that don't have special syntax or co-opt a more general raw string syntax (e.g., Rust's r"..." raw string syntax) for regular expressions.

I think Perl has by far the best support for regex literals but it's important to recognize that in Perl the bare /.../ syntax is actually sugar for a couple of different operations. The general syntax for a regex literal in Perl is the qr{} operator, which can be spelled qr"..." or qr '...' or qr(...) or qr[...] or even qr A...A (That last one is rarely a good choice.:rofl:) In some contexts /.../ is sugar for qr{...} but in other contexts it's not. (because Perl :disappointed:) In other contexts a bare /../ in Perl is sugar for the m{} operator instead, which both creates a regex and immediately matches a string against it.

It would be nice if Swift could have similar flexibility in delimiters (but only for quote-like or bracket-like characters, not the "any non-whitespace ASCII" rule that Perl has) via something like #regex[] but it's not absolutely necessary.

Perl examples:

$string = "foo bar baz";  # assign a string to a scalar variable
$string =~ m{o*}; # Try to match the regex /o*/ to the string
$string =~ /o*/; # shorthand for matching the regex to the string

@array = split qr{\s*}, $string; # split the string on whitespace
@array = split /\s*/, $string; #shorthand for splitting the string on whitespace

$regex = qr/foo( ba.)*/; # $regex contains a regular expression (not a string)
# The qr// operator is required above, as using bare /.../ is actually sugar for a rather non-obvious operation in this case:
$foo = /foo( ba.)*/; # shorthand for $foo = ($_ =~ m/foo( ba.)*/);

I've used Perl on a regular basis for a quarter century now and yet off the top of my head I'm not 100% sure of all the cases where /.../ is sugar for qr{...} and where it's sugar for m{...}

The /.../ syntax in Swift would be less confusing in this regard than in Perl, as in Swift it would simply be regex literal syntax and not be sugar for a variety of different (sometimes surprising) operations. Thus the behavior of /.../ in Swift would actually be the equivalent to qr{...} in Perl and not /.../ in Perl. So choosing /.../ to match Perl is not actually matching Perl!

The flexibility to choose a delimiter is extremely useful. The #/.../# syntax proposed here is adequate for this and is reasonably Swifty, given the parallels to string literal syntax. A #regex(...) syntax could conceivably allow more Perl-like flexibility in choosing delimiters but I feel that #/.../# is a more Swifty choice. Although the #regex() syntax would be more easily extended to include other types of literals, I can't think of any other types of string-like literals that would be nearly as useful as regular expressions, so that doesn't feel necessary to me.

Personally I'd prefer that one or more # be required, rather than the zero or more allowed in strings. This would allow us to avoid breaking changes in Swift's syntax, reducing cognitive load on programmers by making it more clear that regular expressions are being used. If we adopt the proposed /.../ it feels like we're just making life far too easy for entrants into any future obfuscated Swift contests.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I've read this proposal and the related syntax proposal SE-0355 in depth and followed the pitch threads closely. (Like several others, I participated in the pitch thread to suggest adopting only the #/.../# syntax and not the bare /.../ syntax.)

19 Likes

+1
After a thorough review of the proposal, I support this feature as is. That said, I am more than willing to accept #/.../# as the next best thing. This feature far exceeds what's currently available to me in Java and (along with the other text processing proposals) will help to bring Swift to a wider audience. I'm thankful for all the amazing effort put into bringing this feature to the language.

3 Likes

+0.75 on the proposal.

  • The delimeter-extending sequence /…/, #/…/#, ##/…/##, etc. is a nice solution to the escaping / custom delimiter problem and parallels Swift strings nicely.

  • Allowing /…/ maintains a healthy familiarity for newcomers from other languages, and I’d prefer supporting it. The mind-bending parsing rules give me pause, though my impression is that other parts of Swift’s parser are equally ridden with special cases that mostly just work in practice.

    I’m thus in favor of allowing /…/ if and only if the level of existing source breakage is tolerable. Do we have an analysis of how much existing code this breaks? A run on the Swift compatibility suite, for example?

  • This non-parallel between /…/ and #/…/# is highly bothersome, a special case I’m loath to introduce, though I don’t see a better solution offhand:

    In order to help avoid further parsing ambiguities, a /.../ regex literal will not be parsed if it starts with a space or tab character. This restriction may be avoided by using the extended #/.../# literal.

  • I love the passing of named capture groups through the tuples. I’m troubled by the non-availability of this feature in the DSL, particularly for this reason:

  • While the previous point can also be fixed later, this mismatch between literals and DSLs seems like it’s probably best fixed now, either in this proposal or in the DSL proposal:

    The optional wrapping does not become nested, at most one layer of optionality is applied. For example:

    let regex = /(.)*|\d/ // regex: Regex<(Substring, Substring?)>

    This behavior differs from that of the DSL, which does apply multiple layers of optionality in such cases due to a current limitation of result builders.

  • The change in the meaning of whitespace when #/…/# spans multiple lines is bothersome. I wonder whether the multiline (or rather, non-significant whitespace) syntax should be #///…///#, to parallel #"""…"""#.

In short, a very welcome proposal with too many special cases for complete comfort. Let's think twice about those special cases, though in the balance, I'm in favor of adding them if they do prove to be the best choice.

7 Likes

I am +1 on having regex literals and support in the language.

I am very much -∞ on using / as the delimiter. Perl allows custom delimiters, and has for over 20 years. The last time I was using Perl, the primary benefit being touted was the ability to chose a delimiter that was not to appear in the regex itself. Given Perl's niche at the time, not having to escape path literals was a big deal.

I feel that the only benefit to the chosen delimiter is that it allows copy+paste without altering the content. I think this is a very poor reason to break existing code and introduce more magic into the parser that kinda-mostly-almost-always-but-not-really works.

8 Likes

Avi, I'm confused here. This proposal addresses the custom delimiter problem you describe, and does so using the same # solution as Swift strings:

#"String containing "quotes.""#
#/Regex containing sla/sh/es./#

##"String containing a #" hashquote."##
##/Regex containing a #/ hashslash./##

etc

Are you speaking against this #/…/# solution, proposing an alternative that does not parallel Swift string literals?

Or are you fine with #/…/#, and only speaking against allowing the zero-hash /…/ syntax?

1 Like

I support this proposal. +1. It's well thought out and will be a very positive addition to Swift.

The only really controversial part of the proposal is the use of the /regex/ delimiters. I believe we can hold our noses a bit and live with the mitigations outlined. The value achieved in using a simple syntax in common with other languages far outweighs the few cases where ambiguity will occur: I don't believe we will see substantive problems in broad practice.

5 Likes

+1

Although I have no immediate use for this, I think regexes are an excellent addition to Swift and being able to use literals like this makes regexes a first class feature of the language. My first and main experience with regexes was with Perl and having regexes a part of the language was one of the highlights of using it. Yes, Swift is not Perl but adding this will make Swift more useful in situations where you might instead reach for a scripting language or external library.

I've been following the discussions on this, and the main objection seems to be that it's a breaking change. But I am persuaded by the arguments, with data, that its impact on most actual code is small, less e.g. than many other changes. And /regex/ is by far the best syntax to use, as it is instantly recognisable, easily understandable to anyone familiar with regular expressions.

I very clearly and explicitly stated the syntax I object to. There isn't a single # in my previous comment.

  • What is your evaluation of the proposal?

-1-- I'd rather have seen a good regex literal design be implemented rather than the respin of familiar but horrible designs of old.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Probably. Tho to be honest, it's hard for me to see this literal design as such a big improvement over just encoding regular expressions into strings -- there's very little no readability gain offered by these literals imo

  • Does this proposal fit well with the feel and direction of Swift?

+0

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

This feels almost identically bad to the most common regex literal designs in other languages I've used -- modulo some swift specific problems associated with the delimiter choice (I don't find this surprising since the explicit goal was to copy the most familiar regex literal syntax)

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I paid attention to early pitches and completely lost interest when the goal seemed to be to maximize familiar syntax at any cost

5 Likes

I remember some of the previous proposals mentioned having user types which could be expressed as Regex literals. Is that part of another proposal or has it been dropped (at least for now)?

Generally in Swift, the standard library types are constructed from literals using protocols, such as ExpressibleByStringLiteral, ExpressibleByIntegerLiteral, etc - but this seems to be the first time (as far as I know) where a type is just outright created by the compiler from a literal with no corresponding protocol. This is all the proposal says about it:

The compiler will parse the contents of a regex literal using regex syntax outlined in Regex Construction, diagnosing any errors at compile time. The capture types and labels are automatically inferred based on the capture groups present in the regex. Regex literals allows editors and source tools to support features such as syntax coloring inside the literal, highlighting sub-structure of the regex, and conversion of the literal to an equivalent result builder DSL.

That in itself is worth calling out, but actually, I ask because I've been thinking about building URL patterns using a regex-like syntax.

We can't just use a normal regular expression, because even literal segments may need to be matched through percent-encoding (k may need to match k, %4b and %4B. Non-ASCII needs longer sequences of bytes) and take other normalization processes in to account. It really is a custom pattern, but there are lots of interesting ways you can express them with regexes or incorporate regexes, and as part of that, I may want/need a deeper understanding of the AST or the ability to construct my own type from a user's literal (or restrict some regex features). I haven't really thought deeply about what I would need for that, or prototyped it, but it is something I'd like to play with so I'd like to know what happened to the protocol.

8 Likes

That is future work. The ExpressibleBy* approach is geared around the needs of data literals, which it barely serves. Regex literals are more akin to algorithm literals. Thus, I think it is better to improve the library-compiler interface here. From an early thread:

Even better than raw mode is the ability for the regex parser to pretty-print its AST using a requested syntax variant.

Either way this is incremental and future work. Nothing in the proposal precludes this.

This is a fascinating use case and I'm really interested in exploring it! (outside the scope of this specific proposal, of course).

4 Likes

I don't think we should hamper regex literals because of compiler limitations preventing the same goodness from appearing for the DSL. Doing so is counter productive if we care about these limitations actually being addressed. Tuples have been muddy and under-featured in Swift (lots of historical reasons). There's never been as tangible a demonstration of this limitation within the Swift toolchain until now.

As an aside (I don't think this is necessarily your argument, but I can see it being related), I support developers having a policy or linter rules against using certain features. I could understand regex literals being that feature for some. However, I think regex literals serve a useful role for large swaths of developers.

4 Likes

I addressed this thoroughly in the pitch thread. Copying some of that content here for easy viewing:

Note there is no work being done, AFAICT, on SQL literals or these other theoretical use cases. I'm personally interested, but they're not plan of record.

Also, it seems clear that foreign source fragment might have their own needs above and beyond regex's, so we certainly wouldn't want to limit them from regex-based assumptions. I think it's better to design the general facility in light of general usage, which is multiple releases out as it involves further evolving the compiler-library interaction story. The regex work actually advances it behind the scenes, but it needs more examples than just regex to help complete it.

2 Likes

+0.75. I support the decision to use / as a delimiter for regexes, though I have some concerns about other various minutiae.


I feel that many posters on this thread and the previous thread are overstating the harm that comes from from the ambiguous cases listed in the proposal. While there definitely are cases where there is ambiguity, these cases seem very rare. I've never once needed to write anything like foo(/, /) or bar[/] + bar[/] in Swift. In the previous thread, Mishal Shah found that only 1 out of 2,879 projects on the Swift Package Index broke due to the ambiguities introduced in this proposal. To date, I haven't seen a case where an ambiguity would occur in realistic Swift code. Additionally, standalone operators already have cases where parentheses or an explicit closure are required e.g. let divide: (_, _) -> Int = (/).

Here's how I imagine these ambiguities will play out:

  1. It's extremely unlikely that a Swift programmer will encounter a situation where they have to disambiguate between two / operators and a /.../ regex literal in the first place.
  2. Even if they do get into that situation, they will recognize the situation due to syntax highlighting and, oftentimes, related errors. They can then use parentheses or closures to disambiguate.

I get the appeal of only having #/.../# as a language designer. But as a language user, the # characters are just noise and the ambiguities are rare enough that they don't really pull their weight, especially considering that plain /.../ is a term of art for regular expressions. Objective-C developers know about the clutter that comes from repeatedly using a special character (like #) to maintain backwards compatibility.


In regard to the CasePaths library, I agree that the deprecation of prefix / is unfortunate. However, I don't think we should hold Swift back for the sake of one library — especially since the library could switch to another operator, like |. Unless I'm mistaken, it seems that swift-syntax should be able to automate replacing / with | in existing codebases. And if case paths do get natively implemented in the language, delimited by \, then people would have to rewrite their code anyway.


I still think that named captures should be supported by the DSL if they're supported by literals. It's not a dealbreaker for me if that doesn't happen, but I do think this sort of thing is antithetical to how literals and the DSL is supposed to work. Regex literals are supposed to be terse, succinct expressions while the DSL is supposed to be more powerful, readable, and composable with the expense of being more verbose. Requiring programmers to un-DSLize (for lack of a better term) their Regex in order to have named captures would undermine the power, readability, and composability that the DSL is supposed to have over literals.

Reference is sort of similar to named captures, but I don't think it's close enough. It doesn't have the same semantics as named captures and replaces type system guarantees with runtime checks and confusing rules.


I have reservations about the -enable-bare-regex-syntax flag as well. I'd love to be able to use /.../ from day one, but I'm worried about creating a new dialect of Swift. If Swift 6 is coming out soon, though, it's less of an issue. I'd like to know what the intended use of this flag is. Is it just for regex enthusiasts? Or is it something that's intended to be added by default to new Swift packages and Xcode projects?

Also, how would this flag work with features like playgrounds or the REPL?

7 Likes

Aha! I'd been hoping to see a stat like that. I wrote above:

1 out of 2894 certainly meets my threshold for “tolerable.” That removes my concern about allowing /…/. (Other concerns from the OP still stand, none dealbreakers.)

1 Like

It should be noted that I was only talking about projects that broke due to ambiguity — I didn't count projects that broke due to the use of a prefix / operator. (I probably should have said 1 out of 2879 and not included those projects at all — I'll edit my original post.)

Here are the findings in full:

5 Likes

To repeat from the pitch thread, the issue is that that CasePaths package shown in the previous posting is actually quite popular bc it is a dependency of swift-composable-architecture (aka TCA). The CasePaths / operator gets used in every SwiftUI end-user application that uses TCA.

My expectation is that most of the public packages tested above are frameworks and not apps, and that therefore those numbers are probably not a good reflection of how many end-user apps are using TCA. Under this proposal, each of those apps (whose number is not known) would get a very pervasive source break that users of TCA are not looking forward to. (it's used in every screen in a TCA app multiple times).

That operator was not arbitrarily chosen, it's there primarily to deal with a shortcoming in Swift's optics system (i.e. swift has lenses but not prisms). / was chosen bc of its likeness to the \ operator for keypaths (lenses). The pitch thread discussed the possibility that a future version of Swift could incorporate direct language support for casepaths (prisms), but that's a completely separate evolution proposal on a completely separate timeframe.

Personally, I'd be happy with direct language support for casepaths and to see TCA use that support (as would TCA's authors by their own admission). That would be great and I would heartily applaud it. Even so, doing that won't go back and fix all the existing code that will no longer compile.

On ne fait pas d'omelette sans casser des œufs. The question is: do we really like omelettes that much.

7 Likes

This is my vote as well.

1 Like

I opened a PR with some updates and clarifications.

I broke out some of the aesthetic motivation into multiple sentences instead of it all being glommed into one potentially-confusing paragraph. Thanks to @Karl and others who requested some clarification.

I added a future direction for library-extensible support (Thanks again to @Karl for pointing out its omission):

Library-extensible protocol support

A regex literal describes a string processing algorithm which can be ran over some model of String. The precise semantics of running over extended grapheme clusters vs Unicode scalar values is part of [Unicode for String Processing][regex-unicode]. Libraries may wish to extend this behavior, but the approach presented by various ExpressibleBy* protocols is underpowered as libraries would need access to the structure of the algorithm itself.

A better (and future) approach is to open up the regex parser's AST, API, and AST actions to libraries. Here's some examples of why a library might want to customize regex:

A library may wish to provide support for a different or higher level model of string. For example, using localized comparison or tailored grapheme-cluster breaks. Such a use case would need access to the structure of the string processing algorithm literal.

A library may wish to provide support for running over another engine, such as ICU, PCRE, or Javascript. Such a use case would want to pretty-print Swift's regex syntax into one of these syntax variants.

A library may wish to provide their own higher-level structure around which regex literals can be embedded for the purpose of multi-tier processing. For example, processing URLs where regex literal-character portions would be converted into percent-encoded equivalents (with some kind of character class customization/mapping as well). Additionally, a library may have the desire to explicitly delineate patterns that evaluate within a component vs patterns spanning multiple components. Such an approach would benefit from access to the real AST and rich semantic API.

I added an alternative for forbidding features not present in the DSL (thanks to @ensan-hcl for mentioning this):

Restrict feature set to that of the builder DSL

The regex builder DSL is unable to provide some of the features presented such as named captures as tuble labels. An alternative could be to cut those features from the literal out of concern they may lead to an over-use of the literals. However, to do so would remove the clearest demonstration of the need for better type-level operations including working with labeled tuples.

Similarly, there is no literal equivalent for some of the regex builder features, but that isn't an argument against them. The regex builder DSL has references which serves this role (though not as concisely) and they are useful beyond just naming captures.

Regex literals should not be outright avoided, they should be used well. Artifically hampering their usage doesn't provide any benefit and we wouldn't want to lock these limitations into Swift's ABI.

And I added a sub-section to discussion about #regex(...) extensibility to foreign language snippets (thanks to @rvsrvs for reminding me of our extensive discussion from the pitch thread):

On future extensibility to other foreign language snippets

One of the benefits of #regex(...) or re'...' is the extensibility to other kinds of foreign langauge snippets, such as SQL. Nothing in this proposal precludes a scalable approach to foreign language snippets using #lang(...) or lang'...'. If or when that happens, regex could participate as well, but the proposed syntax would still be valuable as regex literals are unique in their prevalence as fragments passed directly to API, as well as components of a result builder DSL.

12 Likes