[Pitch #2] Regex Literals

I really don't like bare /regex/ syntax; it just doesn't fit into Swift very well. The changes needed to force it into the language are just too invasive.

I think requiring #/regex/# is fine. A backslash-escaped / would still be treated like a bare / inside a regex, so there's no portability problem with copying/pasting a slash-delimited regex from Perl or Ruby and then slapping a # on either end of it. (/foo\/bar/ and #/foo\/bar/# are both equivalent to #/foo/bar/#)

12 Likes

For what it's worth, we are actively exploring various enhancements to keypaths, including paths to enum cases, and other reflection capabilities in the Swift 6 timeframe, in part based on the benefits shown by community packages such as as CasePaths and others.

45 Likes

That's really exciting news. The swift optics story is pretty good now, but feels incomplete in this respect.

To contribute some data to the level of source impact when in Swift 6 mode: we tested the proposed literal syntax using the Swift Package Index package collection. The packages were selected if they successfully built for Swift 5.6 using Swift package manager (swift build) on macOS.

  • Total packages - 2968
  • 16 Projects failed due to Regex Literal change
    • 15 Projects failed with error: prefix operator may not contain '/'
    • 1 Project failed due to / ambiguity
  • 74 Projects failed due to unrelated reasons

We also tested changes against a large closed-source swift codebase and found one instance of failure.

7 Likes

If Swift had a real dependency ecosystem where we could accurately determine the number of projects using a dependency then we could really determine the change’s impact. But these are hard breaks, not something that can easily be worked around.

This is helpful information. TBH, 15 projects out of 2968 defining or using a definition of an operator containing / is more than I would have predicted.

In the absence of the data here, if Swift did not support unparenthesized references to operators and there weren't other parsing issues requiring innovations such as the no-leading-space-or-tab rule, I would have been pretty sanguine about the bare /.../ syntax even if it meant requiring the banning of / in prefix operators, since I've always thought that to be a niche thing.

But, in the face of the empiric data, to me the amount of breakage itself is sufficient enough to call this syntax into question even independent of the other issues that were more what I fixated on.

12 Likes

Thanks both, I have edited the pitch to fix these issues.

3 Likes

Is there strong need for multi-line regex literals and # comments? I don't understand why we need this feature.

  • Multi-line regex literals are supported in only a few language, and it means that portability is less required.

  • They are worth when we try to improve readability, but Swift will introduce Regex DSL. I think developers who want to make regex readable should use Regex DSL, rather than use multi-line regex literals.

  • It seems to be just enough to use Regex DSL as @1-877-547-7272 mentioned

4 Likes

:thought_balloon:
I'm probably in a minority, but all I need is the Regex builder DSL as to regex; I mean I espouse "No custom literal" in the context of this pitch.
The DSL is Swifty, but regex literals aren't (in my subjective feelings).
Regex is easy to write, however, its ways to be parsed and to process strings would be more complex. I think that writing regex should be difficult in accordance with its essence (at least in Swift).

I wish regex could not break Swift even if regex literals would be accepted.

2 Likes

Even ignoring the source breaking aspect (which shouldn't be ignored) of using the /, the difficulties in integrating it into the language as is seems to make it less than ideal. I'm reminded of the difficulties the multiple trailing closure syntax had, where existing constructs had to be deprecated and an emergency change to the parsing rules had to be me made to get the unlabeled first parameter syntax to work. Can we just avoid all of that for a very slightly more verbose syntax?

Personally, I'm not even sure I agree that regexes are so important they deserve their own operator anyway.

13 Likes

I'm in favour of this pitch. Anything to make regular expressions less verbose, and more consistent with usage in other languages is a good thing to my mind/use.

I'm not ignoring the impact this change may have on libraries like CasePaths, but I'd be disappointed to see the Swift project holding back language improvements due to third party libraries. Doubly so, given that it sounds like there's a suitable replacement in the same timeframe as this pitch's intended implementation timeframe.

4 Likes

I don't think it's fair to say that CasePaths is holding the Swift evolution, it simply demonstrated that the / operator could be really nice for optics. IMHO optics by their composable nature deserve the more convenient syntax while there is not much lost for regex to be wrapped by #.

I would love to ear more about the future of optics before settling one way or another.

8 Likes

The obvious solution for case keypaths is to “just” add it to the language using \, which removes any conflict with /. This requires an evolution proposal and some implementation work, but there’s no technical obstacle to that resolution.

3 Likes

I feel strongly we need to let go of the idea that we will be able to support the bare /regex/ syntax. While it is certainly a well known precedent for the spelling of a regex literal, it was always a bad precedent and I don't recommend Swift pursues the sort of lexical cartwheels and recourse to heuristics blurring the line between lexical and Semantic analysis these languages have to resort to. Regex literals are a very rich and arbitrary combination characters and a single character delimiter is always going to be too flimsy for cases that come up sufficiently frequently to make the extended #/regex/# syntax necessary anyway.

All this aside the solution proposed for some of these problems, that of deprecating using / as a prefix operator is going to source break a innovative and significant open source project The Composable Architecture and a good deal of client source that uses it. In the end, they got there first and clinging to having the bare '/regex/' syntax available simply can't be justified for a much more niche requirement such as regex literals, particularly in the face of the almost as recognisable syntax #/regex/# being proposed anyway which would avoid all this disruption.

I'm sure however if a way could be found to include \ in the characters that can be used to define operators the TCA folk would be only too happy.

9 Likes

I’m in favor this proposal –as is. That said, I'm curious if the loss of "/" to third party libraries is a sort of contract violation: If you beat the Swift team to a symbol, do you own it? Maybe there should be a broader discussion about this.
+1

My understanding is that Swift doesn't have to resort to any blurring between lexical, syntactic, and semantic analysis. The compiler changes proposed are strictly lexical. They also prioritize preserving the meaning of existing code via the no-leading-spaces rule. A interesting discussion here could be around whether or how much this rule devalues the /.../ syntax.

IMO, this is the most compelling counter-argument to the proposed /.../ syntax.

And this, or native support for case paths, would (IMO) be the most compelling workaround.

4 Likes

Due to current limitations in Swift (and it's unclear how or if these will be fixed in any near term), this will not work. The DSL, unlike a literal, cannot support named captures presented as labeled tuple members.

Some of the appeal of multi-line literals from other languages is subsumed by the DSL, but not all. It's also a trivial extension to support this behavior with no broader impact on the Swift language.

1 Like

I think /.../ syntax would also affect custom operators.

infix operator /¢*¢/

extension Int {
    static func /¢*¢/ (lhs: Int, rhs: Int) -> Int {
        return lhs + rhs
    }
}

func foo(op: (Int, Int) -> Int) {
    print(op(1,2))
}

foo(op: /¢*¢/)
1 Like

I don’t think a #/regex/# syntax makes sense without a /regex/ syntax. The # ’s only make sense to me as a logical as an extrapolation of the raw string syntax. I think developers will find a raw regex syntax without a corresponding plain regex syntax surprising.

That people are nonetheless advocating for #/regex/# syntax I think goes to show that / really is the only delimiter that properly evokes regular expressions. While the proposal has exhaustively listed the edge cases that need to be worked around to make / work, and it can seem like a lot when you read through them all at once, I don’t think developers will actually encounter these cases frequently in practice. And when they do, I don’t think it will be a confusing experience (especially with the regex appropriately syntax highlighted).

4 Likes

This is a pretty big assumption. I don't think a succinct delimiter is necessary at all and prefer something like #regex which is actually readable. But given the choice between / and #/, #/ seems rather obviously superior given it avoids edge cases.

And your point about the use of # needing to align with raw strings is rather easily explained by saying that regex literals are always raw strings. Since they have different rules from string literals anyway, that seems to make sense.

8 Likes