SE-0354: Regex Literals

As a really minor side note, one field where #/.../# is slightly superior is in user experience. When writing #/, an IDE can auto-insert /# after the cursor position, enclosing the literal as already done with string/dictionary/array/tuple literals and avoiding any major disruption in the semantic checking of the content that follows. With a bare /.../, there wouldn't be any indication about the role of the inserted /, since it could serve as an operator, the beginning of a comment (both /* or //) or a regex literal.
As I said, a really minor side effect.

3 Likes

This source break removes a capability that is currently used by third parties. Do any of the source breaks have this property?

3 Likes

Several of them mean you have to go to an alternative, more heavyweight syntax to do what you're doing today, sure. Assuming that all we'll need to do is back-tick prefix operator / to resolve the conflict, the introduction of await in Swift 5.5 is quite close in spirit: if you had code like await(f()) before Swift 5.5, it had to get back-ticks around await to compile in 5.5.

Doug

3 Likes

IMO the relative controversy over the bare regex syntax indicates that there's a large portion of the community that does not view this source break as 'worth it' compared to the break for, say, any P. I don't think opponents of the 'bare' regex syntax should need to argue that the source break here is somehow worse than other source breaks, except to say that the benefit of the source break is not outweighed by the costs.

Given that the argued benefit is based on an "aesthetic preference" it feels a bit dismissive to me to say that you've seen no "objective" argument that this source break is worse than other accepted breaks. Indeed, the exact same objective data about the size of this break could support the conclusion that this break is 'better' or 'worse' overall entirely based on the (subjective) perceived benefit.

And further, the fact that we have accepted other, large source breaks for Swift 6 is to me a good reason to be more skeptical of additional source breaks, not less. I don't think we should too quickly discount the additional marginal cost of more migrations to verify, more libraries to update, etc., just because we already have larger source breaks planned.

9 Likes

/ will no longer be usable as a prefix operator. There is no workaround for that being proposed.

1 Like

FYI based on this being suggested upthread, the proposal authors are exploring using backticks to avoid requiring removal of operator prefix / – putting it on a similar footing to await. It's possible this could be combined with a heuristic that only two slashes on a line would trigger requiring backticks.

9 Likes

I'd like to think that the arguments I stated at the end of my post are about precedent with regex literals in other languages and consistency with other literals in Swift.

Additionally, there is an objective measure of source breakage, and that's how much code will be affected by the change. We know, objectively, that the scale of source code breakage from this proposal is far less than any P. A tiny fraction of projects will be affected by this proposal (even fewer if this pans out) vs. nearly 100% of projects for any P and Sendable. We're not even in the same ball park here w.r.t. source breakage, and we shouldn't pretend we are.

Perhaps I should make my point differently: there has been a lot of discussion here about source breakage, and I find the amount of concern expressed is completely out of proportion to the actual demonstrated source breakage from the change.

Yes, it's fine to subjectively say that this amount of source break isn't worth it in your opinion, but evaluating that argument means being realistic about both (1) the actual cost of the source break, and (2) the downsides from adopting an alternative syntax like #/.../#. The more data we get, (1) seems smaller than first anticipated, and the more I think about the relationship of regex literals to other parts of the language, the more (2) seems to grow.

The problem with this line of argument is that I, or anyone else, can selectively wield it for any syntax I don't like, so long as it has the tiniest potential source break. And using that argument for this proposal, rather than something like any P or Sendable, would establish a baseline of unacceptable source breakage so low that essentially nothing can change from now on for Swift 6.

Doug

4 Likes

Aside from the near 100% source breaks of things like any, which are easy to see, Swift lacks the tooling necessary to objectively determine the degree of source breakage of any change, unless you limit "objective" to simply mean "relative to the size of the language". By that measure, sure, the removal of / as an operator is objectively small, since it's only one character and operator among many. But I don't think that would be a useful measure to most people.

I think most people would consider "objective" to be more meaningful as a consideration of "How many Swift projects will this break?" or "How many changes are required to fix this?" The Swift ecosystem currently has no way of measuring those, or any, impacts. We can guess, based on the relative popularity of the CasePaths library and The Swift Composable Architecture in general, that it would number from hundreds to thousands of projects, but without actual stats we can't get more precise. So I don't think you can say "the amount of concern expressed is completely out of proportion to the actual demonstrated source breakage from the change" when you can't see the demonstrated source breakage. For anyone using the mentioned libraries, this breakage will be just as bad as the breakage from any and much harder to fix. With any you can simply mass apply the fixit and be done with it. There's no workaround here (perhaps yet?).

So while this change has a relatively small impact to the language, you can't say the same thing about the Swift ecosystem itself. So it's probably a good idea to stop thinking about the impact as "objectively" small.

5 Likes

The numbers we have include "16/2968 projects in the Swift Package Index", "0 projects in the source compatibility suite" and "1 project out of all of the Swift code at Apple". In my experience, even the most minor source break we unintentionally make during the normal flow of compiler development breaks more projects than the above, so from my perspective as compiler implementer those results are very, very good.

You have claimed that this data is not representative, and, there's no way to definitively counter your claim because we can't see most of the Swift source code in the world. Maybe CasePath's 547 stars undercount it's influence on the wider Swift ecosystem. The 6.1k stars for the Swift Composable Architecture might be a better indicator, or maybe not. We're guessing here, but I do want to point out a bit of old precedent: a while ago we took away $ as a bare identifier, breaking the 4.2k-starred Dollar library, because it was the right thing to do.

There is most certainly a workaround, and I'm surprised that you didn't know about it: define a single-parameter function that does what prefix / does. Maybe we call it casePathRoot, so

/Authentication.authenticated

becomes

casePathRoot(Authentication.authenticated)

If this suggestion works out for this proposal, it'll be

`/`Authentication.authenticated

So there is a workaround, and it's a fairly localized fix to uses of the prefix / operator. It is similar to any in its locality but on a demonstrably smaller scale. I think we can agree on this part?

All of this is temporary, of course. Key paths absolutely should be able to refer to enum cases, and when they do, they'll be preferred to CasePaths because they can integrate better in the rest of the language. When that happens, do we then revisit the /.../ syntax or has it been forever taken by one library?

The reality is that we're both extrapolating from the data we have, because fundamentally that's all we can do when most of the Swift code in the world is closed-source. I've looked at that data and I'm comfortable with this source break: I know how narrow it is and how the smooth rollout of it will go. Every hat I wear in the Swift world has a lightning rod attached to the top, so I don't take source breaks lightly.

Doug

16 Likes

What is your evaluation of the proposal?

  • ½
    While regex literals can sometimes be useful, they do not have to look exactly like in certain other languages (notably Perl) which, due to their niche, place a lot more emphasis on regexes than Swift is likely to do. Some of these languages, notably, implemented regex literals several years before implementing raw string literals, which already serve to make regex strings more workable.

Is the problem being addressed significant enough to warrant a change to Swift?

Not if it involves adding special-cased syntax which is both source-breaking and non-extensible.

I am particularly thinking of the / … / syntax, but I feel that any kind of syntax that is tailored only for regex literals is to couple Swift too closely to a single sub-language which has no particular shared history with Swift. I know that whilst I do sometimes resort to a regular expression to solve a problem, I write a lot more raw JSON literals, whereas others may write SQL literals or HTML literals. And I think VB.NET has XML literals. If each kind of literal should have its unique literal delimiters, we would run out of character combinations.

I therefore prefer that alternatives like #Regex(…) or #re"…" be used. They are purely additive, do not add as many special cases or complexities to lexing and parsing and are more obvious at point of use.

Does this proposal fit well with the feel and direction of Swift?

No. Swift literals have so far been limited to the types in the standard library, whilst also adoptable for other types through ExpressibleBy*Literal. There is little precedence for adding support for a language-within-a-language, the closest being ResultBuilders, which are also used for regexes in this proposal as well as being applicable to many other use cases instead of just being a kind of SwiftUI literal.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I have used them in Javascript, but I find regex literals there to be given an unnecessarily prominent place in the language with no hints regarding its nature. It is not easy to search for two slashes interspersed with random characters. A syntax like Regex() or something similar with normal letters would enable the newcomer to look up its meaning and usage and better fit in with Swift’s ideal of progressive disclosure.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I have read the proposal twice, as well as having read the pitch thread.

2 Likes

I've been following the review thread here so I don't have exactly the perspective you are asking for, but I'm not intimately familiar with regexes in general, and I have a fair amount of teaching experience to introductory programming students, so I feel I can still give some useful insight on the corner cases here. I also work on diagnostics in the compiler, so I can give some insight into what's possible for letting programmers know what went wrong in these corner cases.

Similar to the whitespace rules in this proposal, the compiler also has whitespace rules for operators, and I think those whitespace rules save us from these corner cases when operators are directly applied to arguments. I could be wrong about that, but I've found it difficult to cook up an applied operator expression that compiles today, and does not compile under SE-0354, and even if we found one, you'd have to write the code in an extremely specific way.

That leaves unapplied operator references to / as the only problematic case. Now, unapplied operator references are already problematic, because they must have a concrete contextual type in order to compile in the first place. That means it must be an argument to a function, an assignment to a local variable that already has a type, etc. Otherwise, overload resolution would not be able to disambiguate between all of the global overloads of /. This context is extremely helpful, because it means the error is nearly always going to manifest as a type mismatch between the Regex type and the operator function type, which means it's fairly easily detectable in the diagnostics code to produce a tailored message and a fix-it to, e.g., surround the operator reference in back ticks.

Now, to answer your question of "What do developers expect let y = foo(a, /, b).reduce(1, /) to do? How confident are they?" more directly. If we accept SE-0354 and you're asking somebody "What do you expect let y = foo(a, /, b).reduce(1, /) to do?", you're clearly asking a trivia question, because if someone attempts to write this in their project, they'll find out what it does when they get a (hopefully actionable after a bit of tailoring!) message telling them what went wrong and how to fix it. Trivia like this is extremely easy to cook up for any programming language given a combination of features. For example, I could ask all of you right now, what do you expect to happen given this code?

func test(_ arg1: String = "", _ arg2: Int = 1) {}

test(10)

or

func test(x: Double) -> Double {
  x/2/.pi
}

and many programmers would get the answers wrong, but it doesn't matter, because programmers rarely hit this behavior in practice, and in the rare case when somebody does, they find out what happens immediately with the error message produced. This is how people learn! People don't randomly encounter code that doesn't compile, and as long as the error message is actionable -- which should be fairly straightforward to achieve in all of the corner cases that I've seen in this thread, since they're all just argument-to-parameter type mismatches -- I don't think there's significant confusion or astonishment being introduced here.

I'd also like to note that, if we were to only parse #/.../# and not the bare /.../, attempting to write a bare /.../ would result in a pile of unhelpful parser diagnostics, which programmers might reasonably try given the parallel with extended string literals. The best way to guide programmers away from invalid syntax is to actually make the compiler understand the syntax anyway for the purpose of detecting it and diagnosing it as invalid, so we'd likely want to have that sort of parser support in place even if we were to choose only supporting the extended syntax.

18 Likes

If we can add support for backtick-quoted operators then I'd be ok with bare /.../ syntax and I'd be fully onboard with the proposal as written. It still wouldn't be my first choice of syntax but it would be good enough.

1 Like

Another minor point regarding this example:

foo in this case is presumably something like "zip then map" i.e. zip(a,b).map(/). In Swift, though, you would not write that function signature with the operation in the middle (despite making the operation "infix" between the two sequences having a certain appeal), because trailing closures encourage the function argument to always be the last argument. And when that is the case, you get foo(a, b, /) which fixes the parse.

Yes, it is still possible, perhaps including functions that will take two unapplied instances of / – but it becomes increasingly less likely, and this seems to reinforce the reason why source compatibility testing has shown parsing ambiguity to be effectively a non-issue for this proposal (as distinct from the impact of eliminating operator prefix /, which is certainly not a non-issue, whether or not you believe it's acceptable impact).

3 Likes

My fundamental issue is that "objective" keeps being bandied about in this thread as if, due to some equation, this source breakage is okay when others aren't. As you've clearly stated, that's not how it works. Instead, we all balance data, experience, and preference, and different people reach different conclusions about acceptability. So repeatedly pointing to the very limited data we have to support this breakage, when we know the actual impact will be some magnitude greater than what we see (libraries using / X libraries using that as a dependency X number of users of that dependency X number of uses by those users) is, at best, myopic. My replies have attempted to point out that relying solely on the available data is very limiting, nothing more.

Now, there is a lot Swift and Apple could be doing to gather more data about things like this, but that's a discussion for another thread.

3 Likes

It's interesting to note that inside this regular expression you encounter a closing parenthesis first, which should be a parse error when parsing the regex. Presumably, the Swift parser could treat this situation as not-a-regex (just like it could treat a single / on a line as not-a-regex) and parse it as normal Swift code.

This is a good point! But also worth noting, I think, that this complexity would not necessarily extend to tools which can mostly assume correct code/syntax, and/or which don’t really care about producing diagnostics. For instance, a source hosting/viewing tool like GitHub could probably get away with a less-complex syntax highlighting algorithm were /.../ not considered valid syntax, since users are (typically) not editing code there. So I think we should not totally discount the wins from keeping parsing rules simple just because we may want to parse some invalid syntax for the purposes of diagnostics.

4 Likes

I don't really have a horse in this race, but I've been wondering over the past few days whether there might not be a different way of "assembling" all of the constituent parts that might satisfy most of the objections on both sides.

To construct a regex out of a string, you use syntax like this:

    let regex = Regex("…guts…") // <-- says "Regex"

To construct a regex in a builder, you use syntax like this:

    let regex = Regex { // <-- says "Regex"
        … builderGuts …
    }

and you can construct a builder regex (using proposed syntax) like this:

    let regex = Regex { /…guts…/ } // <-- says "Regex"  **(a)**

IIUC, the last construct is functionally equivalent to:

    let regex = /…guts…/ // <--- doesn't say "Regex"  **(b)**

Now, a number of people want the literal regex syntax to say that it's a regex, so that the meaning is clear regardless of how messy the literal guts are, or what delimiter is chosen.

It seems to me that it might be feasible to use (a) as the real regex literal syntax, and dropping (b) from the proposal.

What I have in mind is this:

  1. Delimited regex literals would only be used inside a Regex { … } construct.
  2. Inside that construct, the proposed "new" rules for lexical interpretation of the / symbol would apply.
  3. Outside that construct, the existing rules for / would apply.
  4. This is 100% not-source-breaking, because the Regex { … } construct does not currently exist.
  5. Compiler Magic™ would allow the Regex type name to be recognized lexically as something sorta like the #regex symbol that some people have proposed.
  6. If what's inside the braces is just a /-delimited regex literal, the whole expression is just a regex literal.
  7. If what's inside the braces contains something other than (or as well as) a /-delimited literal, then it's a regex builder, not a regex literal. Multiple /-delimited literals would be allowed inside a builder without needing to nest them with additional Regex {…} syntax, so using lots of itty-bitty literals this way is no harder than originally proposed.

IOW, I'm suggesting something like #regex(…) literal syntax, but actually spelling it Regex {/…/}. Then, the interior of regex builders and literals would be the only lexical contexts where / is interpreted in a new way (for certain edge cases spelled out in the proposal).

The downsides here are:

  • The spelling of a top-level literal regex has a few more characters. However, this can be seen as an advantage — every regex is introduced by the exact same Regex symbol without exception. :confetti_ball:

  • The rules for / are different in different places. However, the members of the core team who've weighed in here have consistently stated that they don't expect the new rules to confuse developers. :sunglasses:

5 Likes

Xcode already has special visual representation of literals like #imageLiteral(…)s so for folks who are using Xcode the same treatment could be applied to extended regex literals so they can be visually appealing when single line.

In most platforms folks can adopt code ligature fonts to mitigate unwanted noise if they choose.

For me /../ is such a foreign concept. It’s as if somebody told me that they wanted to use percentage signs %..% because historically this is the ways it’s been done.

4 Likes

Have I got this right? We're suggesting hundreds, perhaps thousands of lines of existing code out there in the community have to move to this unattractive syntax so we can roll out bare /regex/ syntax on "aesthetic grounds" :upside_down_face:. If we are to deprecate on TCA, let's go the extra mile to a better end point where the syntax we're suggesting people have to take the trouble to move to could at least be \Authentication.authenticated which also solves the problem rather than this half way house.

All this so the very, very, very small number of people who will use the proposed syntax over the DSL version don't have to hold their nose and type a couple of extra #. What kind of preemptive "active harm" justification is this?

I believe if we are to pursue the bare regex syntax as a destination it will be a multi-year process that needs to be planned out thoughtfully rather than conspicuous specific source breaks of key open source projects not even being mentioned in the proposal which I find "odd".

If you can't see why this is receiving so much attention take time out of your day to watch the 5 free videos on the TCA homepage. It will enrich your programming life.

2 Likes

I have not used Case Paths yet, but I'm pretty sure that trying them is adopting them.

Case Paths are a great example of community-driven evolution. They provide a great service for building the kind of software that Swift is used for today.

And that's where I don't get why we're debating so much about regex literals. Is Swift about to become a fashionable alternative to sed, ruby, awk, or Perl? I'm quite versed in regex, and yet in years of Swift development, I think I can count on the fingers of one hand the number of times I needed some.

Why this fuss about regex literals??

The Core Team is again posing as a hostile group who could really improve its empathy skills, and pay due credit to the community. I just don't get what are the benefits. I'm desperately searching for enthusiasm, pride, and care.

12 Likes