SE-0354: Regex Literals

Another reasons some, like myself, disagree it is the right destination tend to also note how it adds a lot more frequent noise in the regex itself having to escape / which is not so infrequent in non trivially short regexes.

I not sure why, but the source compatibility issue hand waived away as an issue, as if it were clearly and evidently understood as important as the other source compatibility changes you mentioned and thus a necessity (burden of proof on the community)… and this is what I have not seen: the reason why having the bare /…/ is important (the visual noise angle seems a bit overemphasised). The other changes were bigger source breaking changes but their importance was also very high.

Still if we were looking at a clarity point of view alone, escaping / seems to be a much worse scenario than decorating the delimiters #/…/# … then we look at the cases and libraries it breaks as cherry on top :).

5 Likes

I, for one, touched on source compatibility, but my central complaint is that /.../ complicates the mental model of the language while also being a manifestly bad syntax for regexes, since matching slashes is very common.

If the answer to that is to use “extended” literals either every time a slash is needed, or all the time, it’s hard to see how using /.../ some of the time will be an aesthetic or legibility advantage.

(Incidentally, citing the legacy of ed, which is mainly remembered for its outstandingly user-hostile syntax, in support of proposed Swift syntax is… difficult to describe politely.)

12 Likes

That is a gross mischaracterization of our dialogue. If you're willing to engage in good faith, I'm happy to engage with you further on these costs or any technical details. But stop misrepresenting my views.

The core team has an aesthetic preference for /.../, in a if-we-had-a-time-machine like scenario. We don't have a time machine, and so this carries certain costs and impacts. The goal of this proposal is to accurately and fully detail those costs and present the most workable solution we can. It's up to the core team to weigh the value of an aesthetic preference with the impact of doing it now.

Breaking a popular 3rd party library is one of those costs, and in my view is the "most compelling". Especially because your other points didn't make much sense:

The first argument you link to:

This proposal does in fact propose a #/regex/# syntax for the contained-/ problem and other benefits. The proposal also goes into great pains to describe how the lexer decides when encountering a /. If you read the proposal, you will see that both of these are present and if there's any other information you need to help understand how Swift's lexer works, you can ask for clarification.

The second argument:

affects a different programming language than Swift.

And the rest of your arguments were addressed in the portion of that reply that you didn't quote:

Again, the syntactic/semantic blurring affects a different programming language than the one we're proposing these changes to.

2 Likes

Can you expand on why this differs from string literals, which have the same situation for their delimiter?

I am guessing the reasons would be either:

  • " is uncommon in strings, but / is more common in regexes; or
  • strings are so fundamental to the language that even though the same reasoning applies, #"..."# for all strings would be unacceptable, but is more acceptable for the less common case of regex literals.

I'd be interested if there's a third reason I'm missing. I think the first reason is the most common one cited. Speaking personally, I don't find / cropping up in so many regexes (bear in mind they must appear in the expresion, not the matched string) that it justifies a blanket "just always use the escaping one" rule of thumb. Though I gather maybe that might be the general feeling in the perl community?

1 Like

A couple more plausible reasons come to mind (though I don’t really agree with the second, insofar as it implies I’d drop the ‘bare’ string literal syntax were source compatibility not a concern):

  • " as the string literal delimiter is a much more pervasive term of art than / is for regexes.

  • The source compatibility break required to change string literal delimiters would be far too large, compared to solving it at the proposal stage for regex literals, even though the same issues apply to both.

5 Likes

Calm down man, I've been patiently trying to engage you in good faith for 18 months now trying to point out that in my experience trying to shoe-horn Perl regex syntax into Swift wasn't perhaps a particularly good idea.

And now you find your self sitting on a review having to find more and more inventive strategies to stage a source break (I've been there) which is in my opinion very avoidable if you would just rephrase bare /regex/ syntax to be a future direction that can't be pursued until TCA makes room for it.

This was an interesting and relevant thread about the lengths someone had to go to to replicate Ruby's parsing rules for regex literals as a result of using / as a single character delimiter. I remain unconvinced the analysis in the review is exhaustive of the all problems that users will encounter.

I tried to do you a favour by not quoting that as Hamish was a little more honest further down the same thread.

Speaking freely I'm tired of trying to nudge this conversation in the right direction when the proposers simply don't seem to be taking input. Deprecating TCA for the sake of an aesthetic preference is out of the question ("compelling counter-argument" is the phrase you used) in the real world and let's not waste energy trying to pretend it is or it can be "managed".

3 Likes

Posting as review manager for some moderation feedback:

This thread is getting a little heated again, so I'll ask everyone to keep in mind it's important to stay civil. I know it's easy to get swept up in an argument and slip into a more jousting style of debate – I'm guilty of it myself sometimes – but in formal review threads it's particular important to engage with arguments on their merits.

@johnno1962 Michael's complaint that you were representing his view in bad faith seems well-founded.* It is clearly not the case that "the proposal authors conceded this was a showstopper". Michael was merely acknowledging it was worthy of serious consideration. So to say

is off-base. This is also demonstrated by Hamish's posts:

"Not taking input" and "not conceding that they are wrong and you are right" are not the same things.

Additionally,

is also a misrepresentation of what's being proposed. Making this change would not deprecate CasePaths. It would require CasePaths to migrate to a different operator or take a different approach (such as adopting a native key path feature if it's implemented). It is perfectly reasonable to say this is unacceptable – but that's not the same thing as claiming the whole of the Composable Architecture framework is being threatened with deprecation.

In all of these cases and others, it appears rhetorical force is being used to strengthen an argument. This approach doesn't work (the people you need to convince – the Core Team – will not find your posts more compelling) but it does have the affect of making the evolution thread more hostile. Please take a step back and think about how to make your case without these techniques.

* Nevertheless @Michael_Ilseman it's better to just point out the actual meaning of your quoted passage, rather than call someone out for misquoting it. If someone consistently does this kind of thing, the review manager will step in to ask the person to stop.

3 Likes

The contention in this thread has been almost laser focused on bare single enclosing /…/ for simple regex literals.

I would suggest the core team to defer the bare syntax topic to a new proposal that should be focused on deprecating prefix / and repurposing it for simple regex literals.

It would be great if the new language group that is being put together could tackle these type of changes.

4 Likes

The extent to which proposals should be broken up into separate proposals is something we've discussed a fair bit in the core team, especially when it comes to large groups of themed proposals such as we've seen with concurrency, string processing, and generics. It is tricky, because huge numbers of "micro-proposals" can lead to fragmented reviews that interrelate but are hard to tie together. They can also create review fatigue.

For example, this proposal was split apart from the very closely related proposal around regular expression "interior" syntax. The consequence of this was that the other proposal received almost no commentary – and what commentary it did receive was very closely related to literals. So in that case, it seems it may have been more separate proposals than was necessary (though another factor is that "giant" proposals are hard to read, even if they do form a cohesive whole).

In very many proposals, there is often "one specific thing" that drives a lot of the discussion. But it is usually the case that breaking up the proposal isn't the right fix for this. One approach the core team has taken is to accept a proposal "in principal", but put it back into review for further feedback on other aspects (sometimes combined with amendments to the proposal addressing feedback during review).

4 Likes

My sympathies to anyone trying to parse Ruby. I tried that 15 years ago and it didn't turn out well. This was, at least at the time, a very well-known issue unique to Ruby compared with pretty much any other popular programming language. There were many high-profile discussions on the ruby language mailing lists on this topic and who was going to maintain the horrendous YACC file (around the time of the Ruby 2 transition). I stopped working with Ruby shortly after so I do not know if this was ever improved (it looks unlikely).

The concern about the breakage of TCA when this flag is enabled is real and concrete. Significant amounts of this thread is devoted to that topic and the core team and project lead have even been weighing in to provide what assurances they can.

This is one of many syntactic migrations for Swift 6. The formal SE process, like all processes, is incomplete and doesn't have a vehicle for every unique consideration. It is my recommendation to the core team to not remove the migration flag until there's a good place for developers to migrate to, and that would (ideally) be language support for case paths. A recommendation is all that proposal authors can give, as a proposal itself is a recommendation to the core team.

The rest of your concerns appear as a vague unease about the changes. This proposal goes to great lengths to explore and try to address this vague unease (a fact that's somehow being used against it).

Do you have any information whatsoever that would clarify your vague unease? If you do, please share it so actual discussion can take place.

Thanks for the reply. I really don't think I have much more to say other than to re-iterate that IMHO, bare /regex/ syntax is not a good destination in itself even without taking into account the migration issues it creates. I feel I have already made every effort to articulate the reasons for this "vague unease".

6 Likes

Several points in this thread have changed my mind on the entire idea. I'm now -1 on the proposal.

I was originally in the "don't break my source" camp. @tkremenek, @Ben_Cohen, and others have pointed out that the major source break associated with enum case paths is under consideration for inclusion in Swift 6. If that really happens it moves me to the "please break my source camp", I would voluntarily change all my code to use what I consider to be a missing feature of the language anyway, and I'm sure that's true for much of the TCA community.

The bigger issue for me has become grammar extensibility. The entire proposal is about hosting an external grammar inside of swift. This particular grammar is reasonably characterized as far and away the most widely used "small language" out there. It is also reasonably characterized as strongly resembling line noise on a 300 baud acoustic coupler (dating myself there).

The #regex( ... ) delimiter appeals to me because it does not privilege one particular small language (regex) over others that have been mentioned in both the pitch thread and upthread here. In spite of its lexical terseness, the / ... / syntax does not appeal to me precisely because it takes an element of the small space of operator characters and reserves it for regex use only. @Michael_Ilseman has pointed out (reasonably) that doing that reservation does not preclude us from doing a more general small language hosting in the future, but doing this one this way and others another way seems like imposing additional cognitive burdens on a language syntax which, lets be frank, already has a lot of them.

Summarizing some of the above, #regex( ... ) syntax has been ruled out on two bases (if I have read everything correctly)

  1. It doesn't seem to fit with #available, #file, and #selector.
  2. The particular foibles of regex syntax mean that it will be difficult to disambiguate the closing paren of the #regex element from the line noise of the actual regex itself.

The first seems somewhat strange to me. It's difficult to imagine anything those have in common other than "#ImASpecialCompilerThing". #regex doesn't seem to me to be at all out place in that list. Especially when you compare with #selector. People who are new to the language are going to have no idea what that is about.

The second is what has me changing my vote on the proposal. To me it seems difficult to argue that / ... / provides needed visual decluttering while ignoring that the ... right there in the middle provides the exact opposite. And that that cluttered syntax itself is what makes it incredibly complicated to host this small language inside of Swift in a manner that can be readily extended to other small langs. That got me thinking that where we are is this:

  1. The Swift language is not ready to host external languages in a form other than Result Builders
  2. We have an excellent well-thought out DSL in Result Builder form that does exactly what the regex language does
  3. It would not be difficult to provide tooling to generate a DSL implementation from a regex string. (And perhaps, though I'm well outside my depth here, the reverse as well)

I would much prefer to see a comprehensive proposal for "small lang" extensions to the language (NB this conversation and experience with things like ASP and JSX has convinced me that this may not be possible). Until that time, I would much rather have regex translation implemented at the tool level than at the language level.

7 Likes

Also the Atom text editor.

3 Likes

I would love to a literal syntax for regexes in Swift, it's quick and easy to handle, we get named captures, but I do not see the attachment to the bare /.../ syntax. It wouldn't be the first time Swift moves away from features of other languages that people seem to be used to.

The Swift community thought that increment and decrement operators were too confusing and gave up on them, but this proposal finds that this kind of construct isn't confusing: f(/, /)?

I think the proposal does respond to a need to have a simple syntax that is expressive enough like regex literal. I don't think the bare syntax is worth keeping it seems to be more confusing than anything, I would vote for #/.../# as the minimum regex literal syntax because it has the merit of being quick to type and still retaining what other languages see as a regex. With this you can just copy the code from another language and add # around it and you're done. If having # by default is going to be confusing, I'd rather vote #Regex() rather than restricting the custom operator syntaxes.

7 Likes

I do not see how this is an acceptable syntax: f(/, /) how is that not confusing? Are we using a function with two arguments or a function with one taking a regex? It doesn't seem right to me to find that kind of syntax acceptable, but finding this value++ unacceptable in the language, at least if I want to I can reintroduce increment and decrement operators in my code.

3 Likes

One way to resolve this ambiguity is to make the previous suggestion of backtick escaping mandatory for using operator characters as identifiers in Swift 6 mode. Then bare operators can only ever mean application.

To me it seems that using the / operator would come up more often than using regular expressions, so we would make using the very common / much harder and more confusing by slapping everywhere for the much more uncommonly used regex literal syntax, which I believe is actually made better with explicit#/ /#`.

From the other regex proposals, I was led to believe that the regex literal syntax wasn't even the preferred one, it seems that we want to use the declarative/result builder syntax over the literal syntax, and the literal syntax is here more for convenience and "compatibility with other languages" than clarity, and I don't believe that either of those justify the change. And as I pointed out before, both of those reasons weren't enough to keep the increment and decrement operators in the standard library.

I find the bare regex literal syntax to be extremely confusing on its own, especially when I see used in Perl, and I do not see how requiring the # around a literal is such a burden on the proposal.

7 Likes

Backticks would only be necessary to use the / operator as an identifier. My intuition is that a programmer who uses regex literals at all will use them much more frequently than any Swift programmer uses an operator as a bare identifier.

3 Likes

Given that the issue here is between allowing the syntaxes /.../ and #/.../# together versus only allowing #/.../# I still do not see the benefit of the bare syntax compared to all the other issues that it brings in, and I don't think it's just an issue of source compatibility, it removes good usages for the operators, makes illegal some usages that are seen as advantages of the Swift language, like some custom operators, use of operators as parameters to other functions, etc.

The only argument I see in favor of the bare syntax (as opposed to required # as a minimum) is "other languages do it" but given the other languages in question, Perl, Ruby, and JavaScript, that'd be an argument against this syntax, they're not bastions of clarity and readability.

6 Likes

Fully support strongly typed captures and compile time checked literals for regexes. The inclusion of both plain /abc/ and extended #/abc/# literals mirrors string literals nicely. I'm happy with the /abc/ syntax as long as it doesn't cause problems in practice. I think this will take some real world experience (as did the introduction of multiple trailing closures), so leaving this behind a flag will hopefully allow that to be done without too much disruption.

One of the main problems with using regex literals in other languages is having to run the code before the syntax of the literal embedded in a string can be checked. Having compile time literals for regexes will make them significantly more usable in Swift.

Adding compile time support for regex syntax and captures fits well with Swift's goals of being safe, expressive and to 'present excellent diagnostics'.

I've used regexes in several other languages and have found literal support in JavaScript and Ruby useful to have.


Parsing ambiguity

It would be nice if it were possible to avoid the parsing ambiguity between regex literals and certain operators. One of the preferences expressed elsewhere in the proposal review is that Swift could only have the extended syntax. My feeling is that extended literals look too heavyweight for an ‘everyday’ syntax.

One possibility that I haven't seen expressed is adding only a leading symbol before the first / to help remove the ambiguity (cf Lisp's quoting and Ruby's symbol literals):

  • :/[a-z]+/ (this also looks similar to Raku’s adverb syntax). Would this be ambiguous with colons in method parameters? It seems like it might work, but I'm sure I haven't thought through all the possibilities.

  • '/[a-z]+/. I think this wouldn't 'burn' the use of the single quote for other literals, as long as whatever it was used for didn’t need a / at the first character.

Both of these don't look as visually heavyweight as #/[a-z]+/#.

Another question - could some of the ambiguities people have mentioned be resolved by requiring spaces to be escaped within the literal? e.g. foo(/, /) - is it two / operators or a regex? If this were foo(/,\ /), would it be unambiguous?

3 Likes