SE-0354: Regex Literals

hooman · May 1, 2022, 10:45pm

I would like to expand a bit more on the possibilities opened up by the magic literal solution:

We can have a local contextual scope for the arguments of the magic literals (generally, not just for #regex). This means that we could have a separate contextual interpretation for the syntax and delimiters of such arguments (parameters).

This way, we can define #regex/…/ to be a shorthand for trailing literals, the same way that map{…} is a shorthand for map( {…} ).

The full syntax would be #regex( /…/ ), and / would keep its meaning everywhere else. This opens the possibility of supporting Perl-style modifier prefixes for the regex literal as well as even additional arguments to select the language variant and behavior of the literal. For example, we would be able to support all the Perl variants noted by @tim1724 such as: #regex(qr{...}) or even #regex(Perl, qr{...}). Privileging #regex with # shorthand would give us #(qr{...}).

This would offer a general extensible foundation to deal with such things and we can add syntactic sugar to the frequently used ones, the same thing that we do with [Int] vs Array<Int>.

On the other hand, a tiny bit of syntactic noise (e.g. #/.../ instead of /.../) might not be such a bad thing. As far as I know, the intent is to discourage the overuse of bare regex literal and a tiny bit of noise may tip the scales in the right direction.

I will address the comments that downplay the extent of the damage to the language caused by adopting the bare /.../ later when/if I can spare some time to do so.

hooman · May 1, 2022, 10:51pm

I posted my message above before reading this, but IMHO the logic I propose to get from #regex(...) to #/.../ still stands and offers a better solution that avoids source-breaking changes.

Jon_Shier · May 1, 2022, 11:56pm

What is your evaluation of the proposal?

-0.25

Honesty compels me to admit that, despite regexes being generally terrible API, these literals will play an important part in Swift's string processing story. However, I see these literals as an API of last resort where copy pasting an existing regex is most important, or where inline capture is required. Therefore they're unworthy of the privileged syntax given to them by this proposal.

Nothing in this proposal justifies the breakage and edge cases introduced by the use of /. In fact, the proposal itself spends more time explaining the edge cases introduced by this usage than the actual capabilities of the literals themselves. That, in and of itself, should indicate that / is not a good choice.

In addition to the general issues introduced by the use of /, which the proposal actually address fairly comprehensively, I have to once again reiterate and echo the concerns voiced in this review and the various pitches leading up to it: nothing in this proposal justifies the relatively massive source breaking change this use of / represents. @rvsrvs is 100% correct: breaking TCA will break hundreds, if not thousands of apps. In fact, it seems likely this represents the largest source break since Swift declared source stability five years ago. Of course, we can't quantify this breakage due to Apple's continued neglect of the Swift ecosystem and the lack of analytics around package usage. But the library itself has over 6k stars on GitHub, making it one of the most popular Swift libraries out there.

In addition to the concrete impact this breakage will have, this raises important considerations for the community in general. Specifically, how popular does a library have to be for Apple to avoid breaking it (when such breakage is easily avoidable)? If TCA isn't popular enough, is Alamofire? Alamofire currently has over 37k stars on GitHub and is usually recognized as, if not the most popular Swift library, certainly one of the most popular. Yet it represents just a single entry in the compatibility suite. Does that mean it's subject to breakage at any point? Why would people spend their valuable time developing unique solutions to problems in the Swift community when it could be broken at any time?

Is the problem being addressed significant enough to warrant a change to Swift?

Probably, though the other parts of the regex proposals are far more valuable.

Does this proposal fit well with the feel and direction of Swift?

Not especially. Nasty, complex, inline literals are usually a feature of last resort. But, given the placement within Swift's sphere of features, it could be. If this proposal was a first implementation of custom literals, then it might be more valuable. If was exploring the future of protocols for literals, that might be valuable. If it was exploring generalized inline captures from literals, that might be more valuable. But in the end this proposal feels most like something we want for copy / paste compatibility for regexes we find on Stack Overflow, which has never been a priority for Swift before.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Regexes in other languages suck pretty hard. The overall set of proposal certainly puts Swift out in front of them, for the most part.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I've been tracking the various pitches and proposals and tracking the development of the string processing library on GitHub.

woolsweater · May 2, 2022, 12:43am

There's no question that / is the canonical delimiter for regex. But that comes from contexts where they are not used in direct conjuction with other syntax (grep, sed, etc.). To my mind it's an awfully important point to know just how many other general-purpose programming languages use this literal syntax, embedded as a normal expression? I don't claim full knowledge*, but I know of three, the same mentioned in the proposal: Javascript, Ruby, and Perl. Are there others? And then of those, how many:

use / for comments (Perl and Ruby use #)
have custom operators like Swift (that may include slashes)

Any? And then when we add on operator references as expressions, I think Swift is all alone in this region of syntactical space.

So, while I agree that this would be kind of a cool feature to have, I am skeptical that slash-literals-as-expression rises to the universality that the proposal suggests. And given that, I don't think the costs would be worth it.

A final point is that there will always be some places where code is read, that can't avail themselves of rigorous highlighting logic. For example, GitHub. A straightforwardly unambiguous delimiter and rules make it easier for that highlighting to be decent, and it aids reading even when there is no highlighting. An extra mark, like r'/[a-z]+/' or a pair of octothorpes, is no great burden in my opinion, and might even be a positive.

*And for some reason I'm having a devil of a time searching the web for this information.

John_McCall · May 2, 2022, 2:36am

Not to take anything away from the rest of your review, but this argument is not very convincing. At best, it would be an argument that the proposal document isn't very well drafted, which would be unfortunate but also something that reviewers would ideally look past to focus on the substance of the proposal. In this case, though, it's just misplaced: the capabilities of the literal are variously covered in SE-0355 (the regex syntax) and SE-0350 (the general API of the Regex type), so you wouldn't expect them to be redundantly described in SE-0354, which is specific to wrapping up the regex syntax into a literal.

Like I said, this doesn't take anything away from the rest of your review; your review stands by itself well enough without trying to make a point about the quality of the proposal document.

Jon_Shier · May 2, 2022, 2:45am

Not sure I see your point here. If the literal syntax were fully included in one of the other proposals, my point would be the same, I would've just said the literal syntax section spends more time describing its edge cases rather than the actual functionality. Perhaps the point would've been blunted since the relative size isn't as extreme, but the point stands. Boiling it down into its own proposal just makes the point more obvious. But I'll express it differently (and more generally) if that helps: new features which have to be described more in terms of the edge cases they have than on the actual functionality they add probably aren't a good idea.

John_McCall · May 2, 2022, 2:50am

No, you have it backwards. You are using the fact that the description of something has been pulled into a separate proposal (or section of a proposal) from the description of its functionality as evidence that it doesn't have much functionality.

Jon_Shier · May 2, 2022, 2:54am

My understanding is that, without this proposal, the only things Swift loses by requiring use of the Regex initializer is inline captures and their integration into the Regex DSL. Is that incorrect? That you can represent a regex as a string seems separate from the actual literal, is it not?

Ben_Cohen · May 2, 2022, 2:56am

Note that while this is certainly a consideration, the use in Javascript of / for division, regexes, and comments, suggests this is not really a problem in practice. This, combined with the fact that regexes cannot start with a space or ) means syntax highlighting using cross-language highlighting techniques found in places like GitHub or Textmate etc can be implemented with a very high degree of accuracy.

Jon_Shier · May 2, 2022, 2:58am

This point seems specious. There's zero connection between the use of something in JavaScript and whether it's a problem in practice. (I'm half joking, but still.)

woolsweater · May 2, 2022, 3:02am

Are you speaking as the review manager? I don't see how my point is in any way inappropriate to a review of this proposal.

Ben_Cohen · May 2, 2022, 3:05am

No, not speaking as review manager here. Your comment was perfectly appropriate. I am just pointing out the situation already exists in Javascript and (unless I'm mistaken) is not challenging to syntax highlight.

michelf · May 2, 2022, 3:23am

What is your evaluation of the proposal?

I think it's a good idea to have regex literals. But the /.../ syntax appears to create a lot of bizarre situations in Swift that would be better avoided. Having only #/.../# delimiters would be perfectly fine and avoid creating a disruption.

Is the problem being addressed significant enough to warrant a change to Swift?

I don't think the /.../ syntax makes it worth dealing with the breakage it would cause. But in general having a way to express regex literals seems worth it.

Does this proposal fit well with the feel and direction of Swift?

To me, this proposal does a good job of discrediting the /.../ syntax by listing all the limitations and features that would have to be slightly broken in unexpected ways. It's sort of going timidly backward on things that existed since Swift 1.0. There's no evidence unapplied operators and prefix / are causing any harm, which should be the threshold for breaking them.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I have occasionally used languages using / as a regex delimiter, and I don't feel it's a good delimiter. A good delimiter is one that you almost never need to escape, and in my experience those are generally balanced delimiters like (...). But #/.../# is seems good too, as I can hardly see any situation where you'd need to escape something in it.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal and the conversation about it. Some part of the earlier pitch.

Ben_Cohen · May 2, 2022, 3:56am

The idea here is that tools such as GitHub, which are general cross-language tools, might be significantly disadvantaged by hard-to handle rules even if we think it's OK for the Swift parser and tooling to incorporate them.

Given this concern, I think it's reasonable to cite Javascript as a counter-example.

I guess we could enumerate the possibilities:

Highlighting Javascript regexes in these tools is not a big problem, and won't be for Swift either
Highlighting Javascript regexes was a massive pain, but hopefully that pain can be re-used for Swift
Highlighting Javascript regexes is a major issue, and should be a cautionary tale for Swift
Javascript is sufficiently different from Swift that it's fine for JS but not Swift

I must admit I'm making an informed guess but I'm pretty sure the answer is 1. I'm not aware of 3 being the case, and playing around with GitHub and Textmate suggests its highlighting of JS regexes is pretty robust. For 4, Swift and JS certainly differ, but I'm not sure if any of those differences are material here. Possibly JS's rules around semicolon insertion vs Swift's multi-line statements might be one.

YOCKOW · May 2, 2022, 4:11am

There's no availability to define custom operators in JavaScript. So, it cannot be a rebuttal. In fact, certain number of projects will be broken by bare /.../ as already mentioned in this thread and the pitch thread.

Jon_Shier · May 2, 2022, 4:42am

Like I said, it was half a joke, and I didn't make the original point, but I think the situation is more 4 than you may like. Swift has always had trouble with syntax highlighting by tools which can't actually parse the language. This is especially true of the relatively simple grammars supported by tools like GitHub (which I believe are less powerful than TextMate grammars, which most tools now avoid due to complexity). I don't know how JavaScript defines the edge cases around parsing the regex literals and other uses of /, but we have Swift's just staring us in the face right here. Perhaps we can just avoid the issue altogether?

Ben_Cohen · May 2, 2022, 4:47am

It is definitely true that Swift is a pain to parse. The question at hand though is not about that, but if this proposal makes it materially harder, and in this particular sub-thread, whether Swift's differences to JS make parsing /-delimeted regexes harder than in JS, where this appears to be a solved problem.

Jon_Shier · May 2, 2022, 4:51am

By the proposal's own content it makes it "materially harder" to parse Swift by introducing various edge cases required for accurate parsing. Whether that additional difficulty is acceptable to the community is, I believe, at least part of the point of this review.

Ben_Cohen · May 2, 2022, 5:24am

The word "materially" has meaning here: it means the rules add significant complexity that makes it meaningfully harder. It doesn't just mean "there are some extra rules".

While the proposal goes into depth about the parsing consequences, the actual rules it describes are fairly straightforward. Implementing new delimited regex syntax in a Swift syntax highlighter is not going to be made significantly more challenging through rules such as "// is always a comment, not an empty regex" or "regex literals cannot start with a space or )".

Now, you might take the position "ok, not so hard but it's harder than if we repurposed single quotes". I think that would be a rather strange position to take – to make a decision about the Swift language based not on its impact on Swift users, but rather on avoiding minor inconvenience to tooling developers. I don't think the core team would weigh that as meaningful input from the review. It's certainly not a consideration we've applied before (any contextual keyword added is a pain for syntax highlighting)

So it really comes down to whether Swift regexes would be significantly harder to parse than the very similar situation with Javascript regexes. For that to be taken as a serious negative to consider, someone would need to demonstrate that they were.

Jon_Shier · May 2, 2022, 5:59am

Every time features are discarded because they’re too difficult to implement or parse, Swift implicitly assists tool makers by keeping language complexity under control. But perhaps third party tooling should be a regular consideration. Swift can’t succeed if it can only be parsed by Apple’s tools. The rest of us need those third party tools.

In any case, the point wasn’t originally mine.