SE-0354: Regex Literals

This is what bothers me the most. If the Core Team was honest about what parts of a proposal are actually up for debate/discussion, everyone would be happier. There might be less involvement still, but people wouldn't stay away out of disappointment or disillusionment. It would be because they are simply not interested.

5 Likes

I saw some earlier comments about how this doesn't break very much but there are known cases that it does break in anyone who's overloaded /, such as the reasonably popular and useful CasePaths library. I'm sure there are also web frameworks that use / for routing, and wouldn't be surprised to see URL parsing libraries that do the same.

Anyway, this is a long way of saying I'd cast a vote for #/.../ or even #regex(…), without wading into a big discussion.

3 Likes

I'd like to add my voice as well. I primarily am an app developer and Swift is my native tongue for development. I have found that there are times when I need to resort to using regexes for things but that always comes with having to look up resources for how to use them. I have been working professionally with Swift for years and I still find regexes arcane and avoid them when it is possible to avoid them.

Personally I find regexes give off more of a "quick and dirty" way to solve a problem and prioritize terseness over readability and clarity. In any case where regexes are used codebase where I am in, they are almost universally paired with code comments because they are difficult to parse at a glance.

As someone who is a Swift first native and app developer, I don't think that regexes fit with the clarity and expressivity that made me love swift. That said I understand pragmatically that regexes are not going away and that there should be a way to work with them and for them to be easier and safer to write.

I have a different association with # that I'm not sure has been discussed as there is already a paradigm for # to indicate that items are not being escaped and are treated literally inside of strings.

Borrowing from hacking with swift we see that there is a paradigm where adding # before an indicator shows that we will not need to escape inside:

let regularString = "\\Hello \\World"
let rawString = #"\Hello \World"#

In my mind having the #/ ... #/ syntax not only avoids the source break but I believe is more clear to the intent of what will happen between the indicators, especially to people who are using swift as a first or home language. On top of that, I don't think that this change clears the bar for a source break. Async/await fundamentally changed the programming model and safety for how we interact with concurrent code and I found that source breaks for that were justifiable to elevate that syntax. I don't think that regex literals clear the same bar for this.

7 Likes

Two ideas I haven't seen mentioned yet in this thread:


I think hiding the functionality of -enable-bare-regex-syntax behind a compiler flag is suboptimal. Xcode's build settings UI is hard to navigate and cluttered. And compiler flags aren't supported at all by playgrounds and Swift Playgrounds app projects.

What if, in addition to the flag, we added a new compiler control statement that enabled bare regex syntax?

#enable(bareRegexSyntax)

This would make bare regex syntax available in Swift 5 mode from day one, without the need to pass any obscure flags. It could also be extended to most other source-breaking changes, which parallels Doug Gregor's idea to create a compiler flag for each Swift 6 source break.


I'm a bit uncomfortable with the fact that parentheses can't be used to disambiguate this case. Should we deprecate operators like this as well? Operators with more than one / can't be very popular, right? And if we implement ` delimiters for operators as suggested by tem, I think this will be even less of an issue.

I've thought about it a bit, and I haven't convinced myself one way or the other whether I believe we should have regex literals at all, but I have convinced myself that we shouldn't use bare /.

The difference in readability and writability between #/.../# and /.../ just doesn't seem to me to be large enough to warrant any source break to add the latter. Even when lots of literals are in use. And in some cases (regexes starting with a space or including a slash) the #/...#/ syntax is objectively better. So, in short, I think we should just add #/.../#, and leave the bare slash delimiters as a potential future direction.

+1 to "#/.../# literals
-1 to /.../ literals

2 Likes

A post was merged into an existing topic: SE-0355: Regex Syntax and Runtime Construction

Sine this is text around motivation and alternatives considered rather than the implementation, I went ahead and merged this.

3 Likes

Hello @Ben_Cohen @Michael_Ilseman re-reading this section:

We could choose to avoid adding the bare forward slash syntax, and instead require at least one # character to be present in the delimiter. This would retain some of the familiarity of /.../ while avoiding the parsing ambiguities and source breaking changes.

However we feel that /.../ is the better choice of default syntax, especially for simple regex where the additional noise of the # characters would be undesirable.

I do appreciate the extra historical context behind the bare delimiter syntax /…/ , but I still do not see how the noise of an extra # character in a language that does not prioritise terseness above all trumps source breaking changes. Are we saying the “only” reason this is breaking external libraries and the authors going through hard work to make it all function is that #/…/# is not quite as visually tidy as /…/? We got people over things such as no post and pre increment operators, old style for loop syntax, and do - while renamed to repeat - while, I think community would rather not break source and have to learn this as a consequence :).

10 Likes

What is your evaluation of the proposal?

We should add support for regex literals, but the proposal should be simplified to only introduce #/.../# syntax.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes. While I personally dislike regexes, it is clearly true that they are widely used, and string literals are an inadequate way to express them. I’m not aware of a competing concise string matching syntax, and if we only included the Regex DSL people would continue to use traditional regexes embedded in string literals.

Does this proposal fit well with the feel and direction of Swift?

Not in its current form.

Custom operators are a central part of Swift, and a key constraint on its syntax. Until now, the rules for what makes an operator have been reasonably clear; one might wish to use . or # in an operator, but these are at least uniformly excluded.

/ is clearly a useful operator symbol; it’s part of the small easily-typed subset of operator characters, and we have one widely used case of / as a prefix operator. Removing this character from prefix operators only will not only inconvenience CasePaths, but also unknown use cases now in and in the future.

Additionally, the complex rules around whitespace and juxtaposition of regex literals with infix operators create annoying and potentially confusing edge cases that developers will run into, all for very little benefit:

  • the claim that /.../ is a term of art is quite weak, given that it’s apparently supported in three languages but discouraged in two of them.
  • Regardless of precedent, slashes are a bad delimiter since searching for slashes is common, and escaping them makes it even harder to read and write regexes correctly.
  • In any case, #/.../# syntax is perfectly adequate to invoke the historical connection, while doing the job better.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I have held my nose and used regex literals in the past; at no point did I think “Thank goodness I don’t have to add two more delimiter characters, escaping all these slashes was totally worth it”.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I’ve been gloomily following this pitch in detail.

20 Likes

What is your evaluation of the proposal?

I am waiting for Regex literal, so overall it is positive.
The /…/ syntax is a bold -1 (see below).

Is the problem being addressed significant enough to warrant a change to Swift?

Yes. But no.
Yes for having a regex literal.
No for having it with ambiguities.

Does this proposal fit well with the feel and direction of Swift?

Kind of.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

AFAIK, regex literals in other languages do not create ambiguities.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Between quick reading and in-depth study.


While I would love to have /…/, the ambiguities are a deal breaker for me.

While there is a work-around for let x = arr.reduce(1, /) / 5, I do not expect a helpful error message for let x = foo(a, /, b).reduce(1, /).

How about:

func foo(_ a: [Int], _ o: (_ :Int, _: Int) -> Int, _ b: [Int]) -> [Int] { [4] }
func foo(_ a: [Int], _ r: Regex) -> [Int] { [2] }

let a = [1,2,3]
let b = [4,5,6]

let x = foo(a, +, b).reduce(1, /)
let y = foo(a, /, b).reduce(1, /)
let z = foo(a, /, b).reduce(1, +)

Would this compile? What value will y have?


+1 for #/…/#
+0.5 for #(…)
+0 for #regex(/…/)
-1 for /…/

7 Likes

I am +1 on adding regex support, but -1 on adding the bare /.../ syntax.

I don't have much to say that hasn't already been said, but I don't believe that the benefits bare /.../ provide is worth the costs, especially when we have multiple alternatives that work without the same costs.

My preference would be #/.../#, however I would also be fine with every other suggestion I've seen in this thread.

5 Likes

While the Swift code within Apple may or may not be different, I think there's a subtly different question of import here which is whether developer sentiment within Apple (with regards to whether the 'clean' syntax is a marked improvement, and whether the source break, however severe, is worth it) is representative of the general Swift community. I think that's very hard to determine, but IMO this thread offers a good signal that it may not be the case.

Again, I'd like to bring up a subtly different alternative consideration: whether more controversial source breaks should be rolled up into larger proposals when they could (seemingly) be considered on their own merits, separate from the bulk of the proposed functionality.

On this point, I think the comparison with async/await fails on two counts. Not only was the source break relatively uncontroversial in that situation, but also adopting a non-source breaking syntax (say, #await) until Swift 6 came around would have left the language in a worse state—we would have had to continue supporting the two alternate syntaxes for a long time to come.

But splitting the source break off from this proposal poses no such issues, as far as I can tell. We'll apparently want the 'unclean' syntax regardless of the presence of the 'clean' syntax, so if/when the source break is later accepted on its own merits, the language will end up in exactly the same place it would have otherwise. This also gives us a chance to validate some of the assumptions that this proposal makes: most notably, that the 'clean' syntax is something that will be beneficial to the language/community overall. A later, targeted proposal would be able to cite evidence such as StackOverflow questions/forum discussions asking about why Swift's regex syntax is so ugly, or if there's a better way.

At the very least, I think a separate proposal would help the bare syntax be justified/explained on its own merits, which I think is pretty crucial for what is clearly a controversial source break. If this small aspect of the proposal is important enough to justify a source break, it really should be able to stand on its own. Even with the updates to the proposal, the case for /.../ doesn't really convince me. The claim that forward slash is "instantly recognizable" as regex falls a bit flat for me—it's not instantly recognizable to me as regex—and so it feels as though this mostly applies to people who are already intimately familiar with regexes, which are not obviously the people we should be optimizing the literal syntax for. As a point of reference for "instant recognizability", the iconography that both IntelliJ and Visual Studio Code use for "regex search" is .*, not something like /abc/.

I would feel so much better about my own ability to evaluate the tradeoffs of this source break if it were able to be considered on its own in a small, targeted proposal, after we already had broad community experience with what seems to be the most uncontroversial alternative (#/.../#).

I'd also really like to see some additional justification of this point:

I initially expressed ambivalence about this, but other comments have convinced me that this is a surprising departure from precedent that I don't believe has been adequately addressed in the review thread (unless I've missed a comment somewhere), and I'm not sure I understand the implications.

Would a later proposal (such as improved optics features) which had to make source-breaking changes to the bare regex syntax be considered 'truly' source breaking, since it would break -enable-bare-regex-syntax mode? Why is this being proposed as a production flag rather than an unreviewed -enable-experimental-bare-regex-syntax?

17 Likes

Yes, this is quite a surprising departure from what we've been told in the past about not wanting there to be different "dialects of Swift" via the use of feature flags.

Numerous times, there have been discussions about making improvements around controlling warnings emitted for the use of deprecated declarations and their conflicting nature with -warnings-as-errors. Many of those discussions have been sidelined with the proclamation that "we don't want dialects of Swift". And now here we are, proposing that very thing be added for a different feature.

I don't have a lot of skin in the game here; I'll continue to work with/on Swift whether /.../ or #/.../# or neither or both end up being accepted. My main interest at this point is having some consistency and an understanding either of why this situation is different or of whether the core team's position has changed/evolved since those other discussions, since that would inform how I approach future Swift Evolution discussions.

11 Likes

Warning seems reasonable to me, though note we will reject unknown letter escape sequences, which should avoid confusion in that case. I've added this to the list of warnings to implement (Implement parser warnings · Issue #380 · apple/swift-experimental-string-processing · GitHub).

3 Likes

foo(a, /, b).reduce(1, /) would unfortunately become a regex literal and require disambiguation by writing it as e.g foo(a, (/), b).reduce(1, /). We might be able to extend the ) heuristic to check for any unbalanced ) between the delimiters, which would help avoid most of these ambiguities. I will investigate this further.

4 Likes

That's an interesting idea! It seems reasonable to allow backticks on operators as well as identifiers, and that would allow disambiguation of operators from regex literals. I will investigate this further.

5 Likes

Speaking of @tem's idea. As a less noisy alternative to # that we can explain away why it is used and what it means, there is another possibility:

    `/.../`
    // Extended format being:
   ``/.../`` // add ` as needed 

This saves ' and does not conflict with existing uses. What do you guys think?

4 Likes

This is inaccurate; . within an operator is allowed and has special parsing rules. ..< is a valid operator, but >.. is not.

From The Swift Programming Language:

You can also define custom operators that begin with a dot ( . ). These operators can contain additional dots. For example, .+. is treated as a single operator. If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere. For example, +.+ is treated as the + operator followed by the .+ operator.

7 Likes

+1 good alternative

1 Like

I like backticks as a delimiter because it's already widely used when embedding code in Markdown, and regexes are more program than data. Backticks are also not as noisy as pound signs.

But there is a slight issue that would make it ambiguous if backticks could also surround operators in the future:

`/.*/`    // operator or regex literal?

By the way, a potential issue with backticks surrounding operators is that they could be juxtaposed with backticked identifiers:

`/``default`

which at the very least doesn't look clear. But it would be extremely rare.

Backticks could also extend neatly to generalized foreign language multi-line literals by mirroring Markdown:

```sql
SELECT *
FROM users;
```

But I'm not sure what the compact one-line version would look like, whereas #sql'SELECT * FROM users;' would work well I think (using backticks even, but I can't figure out how to put backticks inside backticks here).

Right now I'm more in favor of the "allow only extended regex literals" alternative with the proposed #/../# syntax, even though I still think that anything but a clear and firm promise to never introduce /../ may have a self-reinforcing effect if CasePaths gets assimilated into the language, because library authors would avoid creating new operators that are in jeopardy of future source break, wouldn't they? I think #/../# could be quite noisy in some situations but the best way to find out is to let people use it for some time.

Either that or completely rejecting regex literals (for now) to give us some more time to evaluate the alternatives. All of the other new regex-related features will be plenty to get excited about when they drop!

Edit:

Actually, (I've had another change of heart) if the popular CasePaths library is given enough time to migrate the source breakage (via the -enable-bare-slash-regex compiler flag) then I guess accepting /../ would be the pragmatic choice, if not the ideal one. Aside from the concrete source break, cutting into the very limited pool of viable custom operator symbols hurts a bit, but custom operators are admittedly an already somewhat esoteric feature and if the actual demand for /-containing (prefix) operators is very low (which it seems to be, outside of CasePaths), then it seems like a reasonable compromise.

Additionally, generalized foreign language literals are not precluded by this choice. Deciding on regex literals before a more general design might well make regex literals inconsistent with those future foreign language literals, but that's not a big issue. I don't even know of any languages with such a generalized feature (Markdown code blocks don't count), whereas regex literals do exist in other languages, so it's ok for regex literals to be treated specially in Swift too IMO (even if regexes are often obtuse and 'arcane' or 'legacy').

I'm very interested in a comprehensive, more modern, replacement for regexes as @Nevin has advocated, but I'm not aware of any existing prototypes/designs that we could evaluate today. It seems like something that could take years and I don't see the benefit of holding back on first-class regex support that has already been (mostly?) implemented. If there is some all-new design that's clearly better than regexes, then I don't see why that would be precluded by regexes existing in the language. There were vague claims to that effect, but one could also counter, vaguely, that having first-class regex support sets the bar for any alternative to have to clear, and would make it easier to compare and contrast the two.

2 Likes