SE-0354: Regex Literals

Speaking of @tem's idea. As a less noisy alternative to # that we can explain away why it is used and what it means, there is another possibility:

    `/.../`
    // Extended format being:
   ``/.../`` // add ` as needed 

This saves ' and does not conflict with existing uses. What do you guys think?

4 Likes

This is inaccurate; . within an operator is allowed and has special parsing rules. ..< is a valid operator, but >.. is not.

From The Swift Programming Language:

You can also define custom operators that begin with a dot ( . ). These operators can contain additional dots. For example, .+. is treated as a single operator. If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere. For example, +.+ is treated as the + operator followed by the .+ operator.

7 Likes

+1 good alternative

1 Like

I like backticks as a delimiter because it's already widely used when embedding code in Markdown, and regexes are more program than data. Backticks are also not as noisy as pound signs.

But there is a slight issue that would make it ambiguous if backticks could also surround operators in the future:

`/.*/`    // operator or regex literal?

By the way, a potential issue with backticks surrounding operators is that they could be juxtaposed with backticked identifiers:

`/``default`

which at the very least doesn't look clear. But it would be extremely rare.

Backticks could also extend neatly to generalized foreign language multi-line literals by mirroring Markdown:

```sql
SELECT *
FROM users;
```

But I'm not sure what the compact one-line version would look like, whereas #sql'SELECT * FROM users;' would work well I think (using backticks even, but I can't figure out how to put backticks inside backticks here).

Right now I'm more in favor of the "allow only extended regex literals" alternative with the proposed #/../# syntax, even though I still think that anything but a clear and firm promise to never introduce /../ may have a self-reinforcing effect if CasePaths gets assimilated into the language, because library authors would avoid creating new operators that are in jeopardy of future source break, wouldn't they? I think #/../# could be quite noisy in some situations but the best way to find out is to let people use it for some time.

Either that or completely rejecting regex literals (for now) to give us some more time to evaluate the alternatives. All of the other new regex-related features will be plenty to get excited about when they drop!

Edit:

Actually, (I've had another change of heart) if the popular CasePaths library is given enough time to migrate the source breakage (via the -enable-bare-slash-regex compiler flag) then I guess accepting /../ would be the pragmatic choice, if not the ideal one. Aside from the concrete source break, cutting into the very limited pool of viable custom operator symbols hurts a bit, but custom operators are admittedly an already somewhat esoteric feature and if the actual demand for /-containing (prefix) operators is very low (which it seems to be, outside of CasePaths), then it seems like a reasonable compromise.

Additionally, generalized foreign language literals are not precluded by this choice. Deciding on regex literals before a more general design might well make regex literals inconsistent with those future foreign language literals, but that's not a big issue. I don't even know of any languages with such a generalized feature (Markdown code blocks don't count), whereas regex literals do exist in other languages, so it's ok for regex literals to be treated specially in Swift too IMO (even if regexes are often obtuse and 'arcane' or 'legacy').

I'm very interested in a comprehensive, more modern, replacement for regexes as @Nevin has advocated, but I'm not aware of any existing prototypes/designs that we could evaluate today. It seems like something that could take years and I don't see the benefit of holding back on first-class regex support that has already been (mostly?) implemented. If there is some all-new design that's clearly better than regexes, then I don't see why that would be precluded by regexes existing in the language. There were vague claims to that effect, but one could also counter, vaguely, that having first-class regex support sets the bar for any alternative to have to clear, and would make it easier to compare and contrast the two.

2 Likes
Sorry for one more digression.

This could be my last post in this thread.

Thank you for noticing my posts.

I am a mere bikeshedding person. I mean I thought it was enough for me to write my own stance.
I didn't imagine I would mention something about the formality. I just expressed my fear, though.
The reason why I had concerns might be because I knew, by hearsay, this was not the first time of their such behavior.

———

Also my intention is being distorted:

My fear is not that their opinion is different from mine, but is that they don’t seem neutral.

3 Likes

I commented about this earlier in the thread. To quote myself:

I just started a separate thread to discuss this promised design. My central thesis here is that we want to have a general way for Swift 5.x to opt into the source-breaking changes we've queued up for Swift 6 one at a time. This keeps coming up for Swift 6 features (-warn-concurrency for data-race safety; requests for a "require any on all existentials" mode) because folks want to adopt new features as soon as they can. This isn't creating permanent dialects, which we want to avoid: it's creating an incremental adoption path that smooths the transition to Swift 6. This way, developers won't have to confront every single breaking change all at once when they flip the language mode, which could be daunting. We can do better for the incremental adoption path and also get the syntax we want.

We have already made a number of source-breaking changes to the language that are queued up for Swift 6, and several of them have far larger impact on source code than what's being discussed here (any and Sendable checking will hit pretty much every bit of Swift code everywhere). We have to manage this transition well, or we'll end up with a permanent Swift 5/6 split. Against that backdrop, I consider the problems with the source-break of /.../ to be fairly minor.

That's why I'm looking for arguments as to why /.../ is the wrong destination that don't rely on the source compatibility angle. There really haven't been that many---folks that only want #/.../# but not /.../ tend to cite source compatibility alone. #regex(...) gained early favor in this thread, but I've already said why I think it's worse than the other options presented.

Doug

14 Likes

But avoiding source breaks is important Doug and even one of the proposal authors conceded this was a showstopper during the pitch phase.

For me the strategy of managing the transition using feature flags is worse than the original problem, breaking the idea that Swift syntax is a linear progression forward. Far better not to create the problem in the first place no?

The "necessity" for the source break arises directly from fact the bare /regex/ syntax is the wrong destination. This, is in turn is a consequence of the naïve view in my opinion it will ever be possible to contain a full range of possible regexes inside single character delimiters let alone one which is already an operator in Swift. It's as futile as trying to construct the enclosure for a tiger with secondhand chicken wire and as unlikely to end well. The result is weird escaping rules and whitespace sensitivity that needs to be documented and the occasional mis-parses that have already been mentioned.

You need a distinct introducer that is not currently part of the language, for example #/ to switch the lexer into regex tokenising mode and a distinct terminator /# to cater for elements that may come up inside the regex. The #/regex/# syntax is no great beauty but fits this requirement well and also borrows from raw strings the notion that while it is essentially a string \ escapes are passed through.

7 Likes

Another reasons some, like myself, disagree it is the right destination tend to also note how it adds a lot more frequent noise in the regex itself having to escape / which is not so infrequent in non trivially short regexes.

I not sure why, but the source compatibility issue hand waived away as an issue, as if it were clearly and evidently understood as important as the other source compatibility changes you mentioned and thus a necessity (burden of proof on the community)… and this is what I have not seen: the reason why having the bare /…/ is important (the visual noise angle seems a bit overemphasised). The other changes were bigger source breaking changes but their importance was also very high.

Still if we were looking at a clarity point of view alone, escaping / seems to be a much worse scenario than decorating the delimiters #/…/# … then we look at the cases and libraries it breaks as cherry on top :).

5 Likes

I, for one, touched on source compatibility, but my central complaint is that /.../ complicates the mental model of the language while also being a manifestly bad syntax for regexes, since matching slashes is very common.

If the answer to that is to use “extended” literals either every time a slash is needed, or all the time, it’s hard to see how using /.../ some of the time will be an aesthetic or legibility advantage.

(Incidentally, citing the legacy of ed, which is mainly remembered for its outstandingly user-hostile syntax, in support of proposed Swift syntax is… difficult to describe politely.)

12 Likes

That is a gross mischaracterization of our dialogue. If you're willing to engage in good faith, I'm happy to engage with you further on these costs or any technical details. But stop misrepresenting my views.

The core team has an aesthetic preference for /.../, in a if-we-had-a-time-machine like scenario. We don't have a time machine, and so this carries certain costs and impacts. The goal of this proposal is to accurately and fully detail those costs and present the most workable solution we can. It's up to the core team to weigh the value of an aesthetic preference with the impact of doing it now.

Breaking a popular 3rd party library is one of those costs, and in my view is the "most compelling". Especially because your other points didn't make much sense:

The first argument you link to:

This proposal does in fact propose a #/regex/# syntax for the contained-/ problem and other benefits. The proposal also goes into great pains to describe how the lexer decides when encountering a /. If you read the proposal, you will see that both of these are present and if there's any other information you need to help understand how Swift's lexer works, you can ask for clarification.

The second argument:

affects a different programming language than Swift.

And the rest of your arguments were addressed in the portion of that reply that you didn't quote:

Again, the syntactic/semantic blurring affects a different programming language than the one we're proposing these changes to.

2 Likes

Can you expand on why this differs from string literals, which have the same situation for their delimiter?

I am guessing the reasons would be either:

  • " is uncommon in strings, but / is more common in regexes; or
  • strings are so fundamental to the language that even though the same reasoning applies, #"..."# for all strings would be unacceptable, but is more acceptable for the less common case of regex literals.

I'd be interested if there's a third reason I'm missing. I think the first reason is the most common one cited. Speaking personally, I don't find / cropping up in so many regexes (bear in mind they must appear in the expresion, not the matched string) that it justifies a blanket "just always use the escaping one" rule of thumb. Though I gather maybe that might be the general feeling in the perl community?

1 Like

A couple more plausible reasons come to mind (though I don’t really agree with the second, insofar as it implies I’d drop the ‘bare’ string literal syntax were source compatibility not a concern):

  • " as the string literal delimiter is a much more pervasive term of art than / is for regexes.

  • The source compatibility break required to change string literal delimiters would be far too large, compared to solving it at the proposal stage for regex literals, even though the same issues apply to both.

5 Likes

Calm down man, I've been patiently trying to engage you in good faith for 18 months now trying to point out that in my experience trying to shoe-horn Perl regex syntax into Swift wasn't perhaps a particularly good idea.

And now you find your self sitting on a review having to find more and more inventive strategies to stage a source break (I've been there) which is in my opinion very avoidable if you would just rephrase bare /regex/ syntax to be a future direction that can't be pursued until TCA makes room for it.

This was an interesting and relevant thread about the lengths someone had to go to to replicate Ruby's parsing rules for regex literals as a result of using / as a single character delimiter. I remain unconvinced the analysis in the review is exhaustive of the all problems that users will encounter.

I tried to do you a favour by not quoting that as Hamish was a little more honest further down the same thread.

Speaking freely I'm tired of trying to nudge this conversation in the right direction when the proposers simply don't seem to be taking input. Deprecating TCA for the sake of an aesthetic preference is out of the question ("compelling counter-argument" is the phrase you used) in the real world and let's not waste energy trying to pretend it is or it can be "managed".

3 Likes

Posting as review manager for some moderation feedback:

This thread is getting a little heated again, so I'll ask everyone to keep in mind it's important to stay civil. I know it's easy to get swept up in an argument and slip into a more jousting style of debate – I'm guilty of it myself sometimes – but in formal review threads it's particular important to engage with arguments on their merits.

@johnno1962 Michael's complaint that you were representing his view in bad faith seems well-founded.* It is clearly not the case that "the proposal authors conceded this was a showstopper". Michael was merely acknowledging it was worthy of serious consideration. So to say

is off-base. This is also demonstrated by Hamish's posts:

"Not taking input" and "not conceding that they are wrong and you are right" are not the same things.

Additionally,

is also a misrepresentation of what's being proposed. Making this change would not deprecate CasePaths. It would require CasePaths to migrate to a different operator or take a different approach (such as adopting a native key path feature if it's implemented). It is perfectly reasonable to say this is unacceptable – but that's not the same thing as claiming the whole of the Composable Architecture framework is being threatened with deprecation.

In all of these cases and others, it appears rhetorical force is being used to strengthen an argument. This approach doesn't work (the people you need to convince – the Core Team – will not find your posts more compelling) but it does have the affect of making the evolution thread more hostile. Please take a step back and think about how to make your case without these techniques.

* Nevertheless @Michael_Ilseman it's better to just point out the actual meaning of your quoted passage, rather than call someone out for misquoting it. If someone consistently does this kind of thing, the review manager will step in to ask the person to stop.

3 Likes

The contention in this thread has been almost laser focused on bare single enclosing /…/ for simple regex literals.

I would suggest the core team to defer the bare syntax topic to a new proposal that should be focused on deprecating prefix / and repurposing it for simple regex literals.

It would be great if the new language group that is being put together could tackle these type of changes.

4 Likes

The extent to which proposals should be broken up into separate proposals is something we've discussed a fair bit in the core team, especially when it comes to large groups of themed proposals such as we've seen with concurrency, string processing, and generics. It is tricky, because huge numbers of "micro-proposals" can lead to fragmented reviews that interrelate but are hard to tie together. They can also create review fatigue.

For example, this proposal was split apart from the very closely related proposal around regular expression "interior" syntax. The consequence of this was that the other proposal received almost no commentary – and what commentary it did receive was very closely related to literals. So in that case, it seems it may have been more separate proposals than was necessary (though another factor is that "giant" proposals are hard to read, even if they do form a cohesive whole).

In very many proposals, there is often "one specific thing" that drives a lot of the discussion. But it is usually the case that breaking up the proposal isn't the right fix for this. One approach the core team has taken is to accept a proposal "in principal", but put it back into review for further feedback on other aspects (sometimes combined with amendments to the proposal addressing feedback during review).

4 Likes

My sympathies to anyone trying to parse Ruby. I tried that 15 years ago and it didn't turn out well. This was, at least at the time, a very well-known issue unique to Ruby compared with pretty much any other popular programming language. There were many high-profile discussions on the ruby language mailing lists on this topic and who was going to maintain the horrendous YACC file (around the time of the Ruby 2 transition). I stopped working with Ruby shortly after so I do not know if this was ever improved (it looks unlikely).

The concern about the breakage of TCA when this flag is enabled is real and concrete. Significant amounts of this thread is devoted to that topic and the core team and project lead have even been weighing in to provide what assurances they can.

This is one of many syntactic migrations for Swift 6. The formal SE process, like all processes, is incomplete and doesn't have a vehicle for every unique consideration. It is my recommendation to the core team to not remove the migration flag until there's a good place for developers to migrate to, and that would (ideally) be language support for case paths. A recommendation is all that proposal authors can give, as a proposal itself is a recommendation to the core team.

The rest of your concerns appear as a vague unease about the changes. This proposal goes to great lengths to explore and try to address this vague unease (a fact that's somehow being used against it).

Do you have any information whatsoever that would clarify your vague unease? If you do, please share it so actual discussion can take place.

Thanks for the reply. I really don't think I have much more to say other than to re-iterate that IMHO, bare /regex/ syntax is not a good destination in itself even without taking into account the migration issues it creates. I feel I have already made every effort to articulate the reasons for this "vague unease".

6 Likes

Several points in this thread have changed my mind on the entire idea. I'm now -1 on the proposal.

I was originally in the "don't break my source" camp. @tkremenek, @Ben_Cohen, and others have pointed out that the major source break associated with enum case paths is under consideration for inclusion in Swift 6. If that really happens it moves me to the "please break my source camp", I would voluntarily change all my code to use what I consider to be a missing feature of the language anyway, and I'm sure that's true for much of the TCA community.

The bigger issue for me has become grammar extensibility. The entire proposal is about hosting an external grammar inside of swift. This particular grammar is reasonably characterized as far and away the most widely used "small language" out there. It is also reasonably characterized as strongly resembling line noise on a 300 baud acoustic coupler (dating myself there).

The #regex( ... ) delimiter appeals to me because it does not privilege one particular small language (regex) over others that have been mentioned in both the pitch thread and upthread here. In spite of its lexical terseness, the / ... / syntax does not appeal to me precisely because it takes an element of the small space of operator characters and reserves it for regex use only. @Michael_Ilseman has pointed out (reasonably) that doing that reservation does not preclude us from doing a more general small language hosting in the future, but doing this one this way and others another way seems like imposing additional cognitive burdens on a language syntax which, lets be frank, already has a lot of them.

Summarizing some of the above, #regex( ... ) syntax has been ruled out on two bases (if I have read everything correctly)

  1. It doesn't seem to fit with #available, #file, and #selector.
  2. The particular foibles of regex syntax mean that it will be difficult to disambiguate the closing paren of the #regex element from the line noise of the actual regex itself.

The first seems somewhat strange to me. It's difficult to imagine anything those have in common other than "#ImASpecialCompilerThing". #regex doesn't seem to me to be at all out place in that list. Especially when you compare with #selector. People who are new to the language are going to have no idea what that is about.

The second is what has me changing my vote on the proposal. To me it seems difficult to argue that / ... / provides needed visual decluttering while ignoring that the ... right there in the middle provides the exact opposite. And that that cluttered syntax itself is what makes it incredibly complicated to host this small language inside of Swift in a manner that can be readily extended to other small langs. That got me thinking that where we are is this:

  1. The Swift language is not ready to host external languages in a form other than Result Builders
  2. We have an excellent well-thought out DSL in Result Builder form that does exactly what the regex language does
  3. It would not be difficult to provide tooling to generate a DSL implementation from a regex string. (And perhaps, though I'm well outside my depth here, the reverse as well)

I would much prefer to see a comprehensive proposal for "small lang" extensions to the language (NB this conversation and experience with things like ASP and JSX has convinced me that this may not be possible). Until that time, I would much rather have regex translation implemented at the tool level than at the language level.

7 Likes

Also the Atom text editor.

3 Likes