SE-0354: Regex Literals

Agreed, but this is orthogonal to how having the bare /…/ syntax is worth the trouble of implementation and readability issues (escaping / does not seem better than not having to) compared to #/…/#?

Can you speak to the possibility of staging this with changes to optics to include enum case keypaths. That has been mentioned several times as a desired feature to ameliorate some of the breakage pain, but there has been no mention of combining the two features before a final breaking change is made in Swift 6.

Posting as review manager for some moderation feedback:

Nevin is right here. People should generally accept that people are always representing their own perspective. Generally speaking, it's best to avoid meta-commentary on how people are arguing.

This is especially true in areas of taste – for example, there is no "right" answer to whether #/.../# looks worse than/.../. Nevertheless, it's good if people can try their best to substantiate their views. In cases of code aesthetics, examples are particularly helpful.

That said, @Nevin it would also be good for you to tone down the rhetorical flamethrower a little. Referring to things as "dumpster fires" isn't really appropriate language, and certainly won't help persuade the core team of your position.

I understand it was posted good-naturedly, but please avoid including humorous pics in your posts to evolution threads. This is particularly true of snowclone-style memes ("X all the things", "No, it is the X that are wrong" etc). They're not really appropriate for proposal reviews (other areas of the forum are perhaps more OK with it). People often link to XKCD as an aside, but a link is enough.

8 Likes
// bare
let regex = Regex {
   digit
   /\ [+-] /
   digit
}
// extended

let regex = Regex {
   digit
   #/ [+-] /#
   digit
}
//The above from the proposal. If We want to optimize for this type of regex composition and the # are not acceptable then let’s use ‘/…/‘ ( I did see the note in the alternatives)

let regex = Regex {
   digit
   '/ [+-] /'
   digit
}


I personally think that we should not optimize the syntax for partial small regex

2 Likes

I'm definitely +1 for introducing RegEx literals into Swift. IMHO they have immense value in various kinds of string processing and I have always been massively bothered every time I had to use the ObjC-style RegEx-API.

Of course we also get the 'more swifty' RegEx builder DSL and that is definitely a good thing that I will use whenever a literal would be too cumbersome to write and read, but, unlike some others in this thread, I don't think that we can introduce that DSL exclusively and leave literals completely out.
Especially for short matches it will be so incredibly much nicer to just write a little one-liner instead of writing a multi-line result builder closure.

Additionally, I think, RegEx literals will be particularly handy while conceptualizing and testing out your code and can then be translated into the DSL later on.


However, as many others, I'm greatly concerned about the choice of delimiter for the literals and about the set of trade-offs it is gonna bring. Even if the data that we currently have were really representative and the amount of source-breakage indeed very low, I'd still think that banning a whole set of operators using a common symbol that were possible to write since Swift 1.0 and introducing complicated parser rules that may be not so much a burden for the compiler but IMHO for humans very much so should really not be necessary.
Also, and correct me if I'm wrong, until now, Swift had a very clear division between operator characters and delimiter characters and it'd be a shame to lose that.

I think there are much more compelling delimiter choices (just #/.../#, '/.../', #re'...', or even #re/.../, just to name a few again) and the proposal definitely does too little in giving pro arguments for /.../ (except of "it's a term of art, although it is becoming less popular, but it's still a compelling choice") and con arguments against all other choices, given the consequences.

5 Likes

I am very disappointed to hear such a statement from the project lead because it seems as if the conclusion has been made before the deadline.


As some folks already pointed out, regex literals have very subjective problems rather than technical issues.
Now I feel that it may be trivial whether or not the new syntax will break source compatibility.
It is important that regex literals are really clear or illegible. It may be less important what delimiters are chosen.
Of course, "every man to his taste". No one can judge which is correct. Consequently the discussion could get heated.


Regular expression was born over a half century ago (also someone pointed out in this thread).
At that time, to be short was to be the best because our ancestors had to input regex onto a terminal by hand.
Today, we have "code completion" or something like it. DSL can be inputted with few key-touches.
Do we actually need regex literals?
Is to be short to be clear?

There are definitely only subjective answers, but my subjective answer is no.

The core team is the elite. The elite has much knowledge about many languages and related things.
Regex literals are easy to write and read for those who are wise in such syntaxes.
It is also subjective perception.

What about newcomers?
They would regard regex literals as ancient magic spells. As a matter of fact, regex is just legacy.
On the other hand, DSL consists of simple English words such as OneOrMore.
It is very clear to read. Even non-programmers can understand it.

To be short is not to be clean nor clear.

I know Mr. Lattner is not on the core team any longer. However, this utterance of him is still very valuable to us.
I don't want to guess that the reason why the core team had sent him off is to introduce this kind of ugly syntaxes.


Just in case, my stance is:

Let me slide into this review to restate my firm opposition from the outset to the valiant efforts of the proposal authors to support the 'bare' /regex/ syntax in Swift and my surprise to see it being brought to review in the proposal. It simply is not worth the known lexical contortions and source breaking of popular open source projects it would inflict on the language. At best it could be presented as a "future direction" for when these issues have been thought through. I just don't see #/regex/# being much less recognisable or "clean" as an alternative "term of art" or the need to support both.

Not strictly relevant to the review, given that ( and ) are special characters important to regex syntax I don't see a syntax like #re(stuff) being much more promising. Any literal for something as subtle as a regex terminated by a single character simply isn't going to cut it.

10 Likes
Aside

You've mentioned this a couple of times. It's more of a procedural point, but I agree. For other proposal as well, we're seeing members of the core team jump in to pitches and reviews to defend even the tiniest minutia. For the recent light-weight generics pitch, every member of the core team who works on the compiler/stdlib - Doug, John, Joe, Ben, etc - all of them were there, defending tiny details about syntax.

It's unfair to the community, IMO. It's obviously valuable to hear what the core team members think, but it does often lead to situations where, to argue with any tiny point, suddenly volunteer community members who are given very little time to review a proposal and given little/no insight in to future plans are met by the core team in defensive formation, building a wall and blocking the debate.

It's very corporate, if you know what I mean ;)

The core team ultimately makes the decision, in a closed process without even any meeting minutes, and the secrecy is so extreme even that even the founder of the project and former Apple executive (I think he was on the "executive leadership" website at one point...?) finds the process intolerable. It has never been a democracy, and it is absolutely open to arbitrary "I just want this" kinds of decisions.

So it is obvious why the words of the core team can shift the debate. They can instantly render the efforts of community members moot with a single random musing that they just "aren't feeling" a particular feature. The core team do not need to justify anything.

@tkremenek 's point is basically that - he says he thinks /.../ looks nicer. Full stop. Case closed. You can try to convince him (or other core team members), but at the end of the day they shape the future of the language based on arbitrary feelings like that.

That's just how this system works. And that itself isn't necessarily unworkable, but it does mean that the musings of core team members has extraordinary weight. Increasingly, they are being less cautious about how they throw that weight around.

Personally, I've basically followed Chris' example and don't care about swift-evolution any more. I think the community is at an all-time low, and the core team are too far removed to even notice it. I somehow got roped in to this discussion, but for the last few months I generally don't comment/like/anything on evolution proposals any more. After years in the community, I think the process is a waste of time for community members; they are more like "soft announcements" than reviews.

12 Likes

I understand why some may be concerned with me expressing an opinion they disagree with. As I said in my post, I am stating my opinion. I suspect folks would rather hear the diversity of the views and perspectives, with their reasoning to support those perspectives rather than not hearing them.

All of the signals from reviews feed into the core team's decision-making discussion. Of course, I have an opinion, which I will express in the core team discussion, but the signal raised in review threads is always considered.

Everyone: Let's please keep this thread civil. This topic is polarizing with very different viewpoints. We can politely ask people to explain their rationale and perspective, even if we won't agree with them entirely or potentially at all.

8 Likes

I don't have much to add here that hasn't been said already, except to signal support for #regex(...) as an alternative to /.../.

I'm gonna voice my preference to not use /.../, too. If votes are being tallied.

6 Likes

This seems like a powerful feature to have in the language, and I believe it should be prioritized for inclusion in Swift 6.

4 Likes

Thanks, Allen, for chiming in. Can you elaborate more about your preference? The signal of evolution threads comes from when folks explain their viewpoints.

I don't feel the "cleanliness" of the /.../ syntax warrants consideration of any source breakage, especially when the general consensus around the delimiter seems to be a range from ambivalence to "anything except the one that breaks code".

ps I didn't mean for my last post to be a reply. Sorry if you felt @'ed, it was a mistake and I can't edit the reply away.

8 Likes

Thank you for elaborating!

That's my feeling as well. I find the weird excuses more offending than the actual process, which I'm actually perfectly fine with :slight_smile: Just be honest about it.

BTW: I prefer the decided // syntax over gartenzäune. But I also don't really like operator overloading like the usr / lib in the first place, as fancy as it looks, operator overloading is only sound in a tinsy amount of cases.

1 Like

This is what bothers me the most. If the Core Team was honest about what parts of a proposal are actually up for debate/discussion, everyone would be happier. There might be less involvement still, but people wouldn't stay away out of disappointment or disillusionment. It would be because they are simply not interested.

5 Likes

I saw some earlier comments about how this doesn't break very much but there are known cases that it does break in anyone who's overloaded /, such as the reasonably popular and useful CasePaths library. I'm sure there are also web frameworks that use / for routing, and wouldn't be surprised to see URL parsing libraries that do the same.

Anyway, this is a long way of saying I'd cast a vote for #/.../ or even #regex(…), without wading into a big discussion.

3 Likes

I'd like to add my voice as well. I primarily am an app developer and Swift is my native tongue for development. I have found that there are times when I need to resort to using regexes for things but that always comes with having to look up resources for how to use them. I have been working professionally with Swift for years and I still find regexes arcane and avoid them when it is possible to avoid them.

Personally I find regexes give off more of a "quick and dirty" way to solve a problem and prioritize terseness over readability and clarity. In any case where regexes are used codebase where I am in, they are almost universally paired with code comments because they are difficult to parse at a glance.

As someone who is a Swift first native and app developer, I don't think that regexes fit with the clarity and expressivity that made me love swift. That said I understand pragmatically that regexes are not going away and that there should be a way to work with them and for them to be easier and safer to write.

I have a different association with # that I'm not sure has been discussed as there is already a paradigm for # to indicate that items are not being escaped and are treated literally inside of strings.

Borrowing from hacking with swift we see that there is a paradigm where adding # before an indicator shows that we will not need to escape inside:

let regularString = "\\Hello \\World"
let rawString = #"\Hello \World"#

In my mind having the #/ ... #/ syntax not only avoids the source break but I believe is more clear to the intent of what will happen between the indicators, especially to people who are using swift as a first or home language. On top of that, I don't think that this change clears the bar for a source break. Async/await fundamentally changed the programming model and safety for how we interact with concurrent code and I found that source breaks for that were justifiable to elevate that syntax. I don't think that regex literals clear the same bar for this.

7 Likes

Two ideas I haven't seen mentioned yet in this thread:


I think hiding the functionality of -enable-bare-regex-syntax behind a compiler flag is suboptimal. Xcode's build settings UI is hard to navigate and cluttered. And compiler flags aren't supported at all by playgrounds and Swift Playgrounds app projects.

What if, in addition to the flag, we added a new compiler control statement that enabled bare regex syntax?

#enable(bareRegexSyntax)

This would make bare regex syntax available in Swift 5 mode from day one, without the need to pass any obscure flags. It could also be extended to most other source-breaking changes, which parallels Doug Gregor's idea to create a compiler flag for each Swift 6 source break.


I'm a bit uncomfortable with the fact that parentheses can't be used to disambiguate this case. Should we deprecate operators like this as well? Operators with more than one / can't be very popular, right? And if we implement ` delimiters for operators as suggested by tem, I think this will be even less of an issue.