SE-0354: Regex Literals

hooman · May 3, 2022, 4:46am

My objection goes deeper than source break. My problem is with the destination, after we get there. I know that Swift is already hard to parse and there are multiple edge cases and special rules (e.g. spacing around binary operators). But most of those spacial rules make sense and aid in human readability as well. Also those cases are unavoidable if we want to keep Swift syntax within the C-family languages and also get the advanced capabilities like custom operators. The expense of getting /.../ is not just the transitory expense of a source break migration period. It also has a permanent expense of restricting the language and adding complexity to it.

And what do we gain? A shorter (and now mostly legacy) delimiter around regular expression literals. I really don't see how it is so much better than #/.../ or #/.../# or even '...' and '/.../' alternatives to be worth it. I have also already explained my case for the generality and flexibility of magic literal alternative that can be sugared with these shorter variants.

This is a very good example: await was a source break, but Swift community was very supportive because of the reward. Also it was just adding a contextual keyword and not removing any features or introducing weird edge cases and workarounds.

Swift is now so much more powerful with async/await and the precedent for using those words is also much stronger among other popular languages. Moreover, the source break was again unavoidable because staying Swifty required some marker keyword there and no matter what we chose we would break something. Yes, await was a very worthy and mostly unavoidable source break and community embraced the change.

Here I don't see the end result worth it, especially since we can achieve the goal of adding literal regexes without this cost. Is getting bare / delimiter really such a big deal?

I guess you guys should already have some sample/test Swift code with regex literals in it. Can somebody please post a decent sized and rather realistic sample Swift code to demonstrate and compare different delimiters for regex literals to visualize how each one looks? My gut feeling is that having more noise might be a good thing to make the regex literals stand out a little bit. I don't expect to have dense use of several regex literals in close proximity except in the middle of the regex DSL. And I think it is also not a bad thing if the regex DSL stands out a little bit.

Avi · May 3, 2022, 5:09am

The responses from the Core Team suggest that the choice of delimiter is not up for review. This decision was made before the pitch phase. I just wish the Core Team was more honest and up-front. We all know that Swift's evolution is 99% Apple and 1% community. I just wish there was more transparency about that 1%, but that's not Apple's way.

Panajev · May 3, 2022, 6:10am

You make very good points, but this one is pretty super powerful if we are seeking concise and expressive. Escaping inside a regex string just adds noise.

Panajev · May 3, 2022, 6:11am

I do not want to believe that here. The #/…/# delimiter for multi line regex literals is already part of the game, we are “just” talking about not allowing the bare /…/ syntax to begin with… :/.

Panajev · May 3, 2022, 6:13am

I am sorry but what are the arguments for the flag and for the /…/ bare syntax. How much work is if you were to revert the changes made to allow for the bare /…/ syntax too? What do we gain by having it instead of #/…/# literals?

jazzbox · May 3, 2022, 7:30am

I think it is extremely misleading to compare /.../ with #/.../#. The first is of course 'prettier' but both are very short and in this case even invalid regexes. If we look at a longer regex mit many [(+* in it then this difference disappears.

dlbuckley · May 3, 2022, 8:26am

+0.75 for how it is currently.

Before getting on to the proposal itself; there seems to be a number of accusations against the core team having already made a decision and the community feedback will get ignored. I don't think it's very healthy to make these assumptions/accusations but I can see why they have arisen. I think some confidence was lost in how much the community can influence a review in the multiple trailing closure syntax debacle. This seems to be one of those instances where the review is quite polarising (between core team and community) and I fear there will again be some kickback if it's accepted as-is. As it stands now I think most people have accepted or just ignored the multiple trailing closure syntax change.

Going back to this proposal at hand; I find it quite odd to have two different ways to parse single line and multi-line regexes. I understand why it why we have different ways to parse single line and multi-line strings but that is a very different use case to what we have now where we are proposing something entirely new that hasn't before existed.

I completely agree with what @Douglas_Gregor mentions above regarding #regex(...), it's a completely different use case compared to #selector(...), #file, #line etc. These always feel like they are for reaching outside of the core language into something else, they are conveniences to interact with something which is not built in. So in my mind this is a non starter as the new regex work is built in to the core of the language.

As for /.../ and #/.../# I would prefer us to stick to one syntax whichever way we go. I don't actually have a strong resistance to one or the other, but there is a little unease in my mind regarding the /.../ simply due to the caveats mentioned in the proposal (and the strong opinions of many others against it in this thread).

If the proposal landed 'as is' I would still be happy as I think the regex work being done is fantastic, well thought out and long overdue. I can't wait to start using it no matter what happens.

bzamayo · May 3, 2022, 8:45am

This part feels very bizarre, and against the principles of the language up to now. Isn't this a dialect? What is the urgency that this flag is so desperately needed? A less aggressive — or more accurately typical — transition path, implementing the unencumbered #//# syntactic form in Swift 5, and deferring // until Swift 6 is perfectly acceptable here.

Panajev · May 3, 2022, 8:49am

It feels like the change must happen as it is in the pitch. Instead of making assumption I will ask it straight here: is there a possibility of changes like these been done to the proposal or the proposal having more revisions?

smuroftfiws · May 3, 2022, 9:48am

Me too.

Michael_Ilseman · May 3, 2022, 1:01pm

Yes

rvsrvs · May 3, 2022, 1:53pm

I guess I didn't understand that. The proposal seems to call for bare syntax to be enabled full stop in Swift 6. As I read the proposal, the flag is only to allow bare syntax to be enabled early in Swift 5 and seems to be obsoleted in Swift 6.

My problem with that approach is that the optics proposals have not surfaced yet and are not guaranteed to address this problem when/if they do.

I know that @Ben_Cohen has stated its under consideration and I think that is excellent, but i'd vastly prefer to see that proposal before I bought in to having to fix my code.

hashemi · May 3, 2022, 2:14pm

The suggestion I would make is if we were to allow /.../, then we should impose more restrictions on the regex allowed with those simple delimiters. Perhaps we can restrict them to starting with (, [, and .. I would say that we should also disallow escaping within the /.../ delimiters. I'm hoping that this reduces the source breaking changes required in the language outside of regex.

Basically, the /.../ delimiters would only be used for those expressions that are immediately recognizable as regex expressions. This would capture the vast majority of popular short regex cases like /[0-9]+/ or /[a-zA-Z_][0-9a-zA-Z_]*/. Even a single character can be captured within those rules like this /[0]+/.

Anything else would require the extended syntax or Regex DSL.

The purist in me wants to be down on the /.../ syntax but I know in real world usage it'll be fine and I do agree that it is nicer than the alternatives, especially for those typical/commonly used regex expressions. I'm just worried that the ways in which the syntax is source breaking are difficult to explain. Just knowing that the rules are there might add a bit of cognitive burden, even though it's unlikely to be an issue in real practice, and if they were, we would know immediately through a compiler error. I'm sure we'll all forget about them very quickly after they're implemented.

I don't think /.../ on their own merits make much sense as the delimiter for regex today if it wasn't for Perl, sed, etc. I imagine that the majority of Swift developers are coming from Objective-C, Java, Kotlin, Python, C++ backgrounds, which don't support /.../. I don't see /.../ commonly used in JS examples, but I'm not in the JS community. So I don't think a lot existing Swift developers or Swift new comers would benefit much from the familiarity of /.../. Regex wizards coming from Perl, on the other hand, would find it trivial to adapt to any alternative syntax we would come up with for regex.

If I were to pick an alternative, I would say '...' unless there's another compelling use for which we're saving single quoted strings. My second choice would be re"..." or r"...". Coming from Python, this works very well in practice. The additional character before "..." is very conspicuous, especially with syntax highlighting. It would be extendible to more types of embedded languages in the future.

hooman · May 3, 2022, 2:24pm

This is a good point.

Paul_Cantrell · May 3, 2022, 11:16pm

Regardless of the merits and/or problems of the /…/ syntax, I do appreciate that Swift takes this attitude, and is willing to make judicious breaking changes to do it.

Languages that promise near-total eternal source compatibility do have their place in this world. Becoming excessively breakage-averse freezes a language in time, however, and eventually makes it feel like a burden to developers, and sends developers out…well, inventing Kotlin.

Let's not develop a communal phobia of source breakage. Instead, the consideration should be long-term vision (as Doug said), weighed against empirical measurement of actual breakage.

In the case at hand, that measurement of breakage gave results that seem entirely acceptable to me. I don't have such ardent feelings as others about the merits or demerits of /…/, but I don't see source breakage per se as a reason to reject it. I was worried about breakage, I saw the data, and and now I'm not worried about breakage.

While I don't love special cases in general, I'm less concerned about them in the parser, since the problems they cause are immediately apparent and usually easily resolvable. As I wrote above:

Given that these rules do also seem to “mostly just work in practice,” I'm more concerned about these two other inconsistencies I mentioned:

Optionality mismatch between literals and the DSL.
The change in the meaning of whitespace when #/…/# contains a line break.

The latter is especially concerning to me, since its effect won't be apparent until runtime, and will result not in an error, but in puzzling changes to matching behavior.

I spoke above in favor of bare /…/, and would be very surprised to learn that I'm on the Swift core team. FWIW.

Jon_Shier · May 4, 2022, 12:07am

We also shouldn't let Swift's inability to inform us of the "actual breakage" of this proposal lead us down a poor path. So, given Swift has no real way to judge "actual breakage" of this proposal, we can't really point to such limited results and say it's okay. But you could at least try and judge it by the library's relative popularity, which says that hundreds to thousands of apps will break because of this choice. Even for those of us who accept that regex literals are necessary and useful, this seems unnecessarily high, for a few reasons.

There's nothing about / itself that lends to use with regex literals. That is, unlike String and ", there's no case from base principles for / here.
For languages which use / and support custom delimiters, the usual recommendation is to use the custom delimiter so you don't have to worry about escaping. Swift could easily jump to that recommendation and not need to handle any of this breakage at all.
Only a single language, Ruby, supports / without customization, so I don't find the language compatibility argument to be compelling.
As mentioned, the await example isn't a compelling comparison, as await was extremely high value, rather uniquely required, and rather easy for users to work around. And any other keyword would've run into the same issue, so it was generally unavoidable. / doesn't meet any of these criteria.
Not only is this is source break but it generally makes / unusable. The proposal has offered no justification for reserving an entire (well, most) operator for this feature.

In the end, though, even without the source break, I don't think the issues surrounding the use of / are justified by this proposal.

ksluder · May 4, 2022, 12:32am

I don’t think this is a well reasoned assertion. What justification exists for insisting on double quotes for string literals other than similarity to C? Does the existence of languages such as Python and Visual Basic which treat single and double quotes equivalently undermine this justification in the same way you argue the existence of languages that support customizable regex delimiters undermines the justification for /…/?

Jon_Shier · May 4, 2022, 12:33am

Their use in plain English of course. Most other operators default to behaviors similar to those in English, given their original authors.

ksluder · May 4, 2022, 12:35am

Standard UK English uses single quotes for outermost quotation. Swift insists on double quotes. Other programming languages allow both—and of those, some treat them as equivalent (Python, VB) and others give them different meanings (shell script). Some languages even support arbitrary delimiters with heredoc syntax.

Jon_Shier · May 4, 2022, 12:36am

What's your point?