SE-0354: Regex Literals

technogen · May 12, 2022, 7:42pm

I agree with the necessity to provide a terse alternative to the DSL. I'm only arguing that if the regex is terse enough to write in regex syntax format, it must be simple enough that a statically typed capture list would be of negligible use, compared to a more complex regex syntax. But a more complex regex syntax, as you've mentioned, is not going to be a good candidate for the terse syntax.

My argument is that those examples, where a simple one-liner would be better, would be just as readable with the Regex("...") syntax.

Nobody1707 · May 12, 2022, 8:05pm

That changes my opinion to a solid +1.

xwu · May 12, 2022, 8:54pm

hamishknight:

After exploring things further, we have come up with a revised parsing behavior that does not require prefix operators containing / to be banned, and fixes all the case path compatibility issues we have previously seen.

The changes are twofold:

When encountering a prefix operator containing / , we will not parse a regex literal if there is no closing / delimiter on the same line. This is the same behavior as with unapplied infix operators.

The ) heuristic has been expanded such that the entire range of the regex literal is scanned for an unbalanced ) . If such a case is encountered, we will not parse a regex literal.

Together, these changes mean that many uses of prefix / will be unaffected by the introduction of regex literals. It also means that ambiguities can be readily disambiguated with parentheses, for example
foo(/x, y / z) can be disambiguated as foo((/x), y / z) due to the expanded ) heuristic.

This is great and very encouraging!

It’d be really nice to be able to re-review now with the whole picture laid out containing all of these revisions, with a focus perhaps specifically on iteratively refining the rules to make them as simple-to-understand yet as useful (both for regexes and in preserving existing uses) as possible.

michelf · May 12, 2022, 9:42pm

With the change around parenthesis, yet another way to disambiguate prefix / becomes:

(/)(x)

where (/) transforms the operator to an unapplied function, and (x) calls that function. This is also backward compatible and should be applicable to even the most contrived examples. But very rarely needed I assume.

hooman · May 12, 2022, 9:54pm

This is a huge improvement, thanks. I agree with @xwu that we better do a focused re-review.

Just to provide a tiny data point: I do have a few (single digit, not sure how to count as some are related) that have unbalanced ) in them.

EDIT: Let me clarify that neither of those are in Swift code and only a couple might end up in Swift code in my case.

xwu · May 12, 2022, 10:22pm

Unbalanced unescaped closing parenthesis? The linked PR clarifies that escapes and custom character classes are taken into account.

hooman · May 12, 2022, 10:40pm

How would an unbalanced, unescaped ) appear in a regex? I thought it implies any unbalanced ), which would include escaped.

xwu · May 13, 2022, 12:24am

Exactly!

How would unbalanced \) appear outside of a regex or string literal?

tim1724 · May 13, 2022, 1:41am

This is fantastic! Thank you for putting in so much effort to prevent breaking the use of / in custom operators.

Avi · May 13, 2022, 3:52am

That's not the DSL. You're changing your argument.

technogen · May 13, 2022, 3:57am

My argument has been consistent across my messages. I argued that regex literals aren't worth their complexity, given a combination of a DSL and a string-based Regex parser.

I would like to hear constructive counter-arguments, please.

Avi · May 13, 2022, 4:03am

You wrote in your initial post:

That is not an argument that Regex plus result builders will cover uses of regex literals. That's an argument that not even Regex should be used for regex literals.

If regex literals are to be (type) checked by the compiler, it makes more sense to define a literal type than it does to embed special knowledge of a standard library type and one of its constructors (Regex and its string-literal initializer).

At the end of the day, though, whether regex literals carry their own weight is an opinion. You've stated your opinion. So far, no one agrees with you. There's no point to a drawn-out debate because we already know where the Core Team stands.

technogen · May 13, 2022, 4:18am

Please refer to this post: SE-0354: Regex Literals - #158 by Ben_Cohen

Making baseless assumptions about the popularity of the opinions of others and resorting to aggression due to somebody not agreeing with you is hardly going to help validate the point you're making.

In contrast to that, @Ben_Cohen has taken time to give a well-presented argument why getting rid of a terse option would be a bad idea, which I agreed with and continued the discussion by outlining another solution to the problem of not having a terse alternative. I'm not presuming that answering to my post is in his list of priorities, so I'm not taking the silence as a "yes" or "no".

Changing your opinion during a constructive discussion is a side-effect of a rational argumentation. Getting aggressive and trying to undermine opposing opinions by insulting people is destructive to a healthy community.

Paul_Cantrell · May 13, 2022, 5:32am

hamishknight:

However even with that rule in place, we'd have to contend with cases such as:
///
/// Some interesting function
///
func foo() {}

///
/// Another interesting function
///
func bar() {}

Could /// only form a regex when it occurs (1) inside a function body or (2) in expression position? Doc comments would not normally occur in either of those positions, I think.

I sense impending doom for the dream of making /// parallel """, but I am a foolishly optimistic person by nature and have to ask.

johnno1962 · May 13, 2022, 5:36am

I wouldn't even try to support ///. When I talk about unifying string and regex I'm not chasing the bare syntax rainbow. Regex literals would be raw string only (prefixed by #) in its various variations in line with that they do not expand \ escapes or interpolations.

hamishknight · May 13, 2022, 10:24am

Yup, I have edited my post to clarify this.

iressler · May 13, 2022, 2:15pm

This changes my review of bare /.../ to +1, since this removes (imo) the biggest costs associated with it, and I think the remaining costs are worth it.

I agree that this may benefit from some form of a new review; with so much of this thread focused on bare /.../ most other discussions get lost. But I don't have anything to add myself so I don't have a strong opinion.

masters3d · May 13, 2022, 2:34pm

The embedded into the DSL is the only usecase that makes sense to me because the bare regex literal occupies a whole line. We already elide commas in DSLs why not also elide #’s. It’s contextual and easy to explain and it’s web searchable. Bare literals everywhere doesn’t have a good rational for me.

hamishknight · May 13, 2022, 3:31pm

I would not be surprised if there are cases of people using /// instead of regular comments inside function bodies. That being said, at least those cases would be straightforward to fix. The main issue with restricting this to only within function bodies is that multi-line regex literals would then be unusable (or require additional #s) at the top level of a main.swift file or playground. And IMO it doesn't seem unreasonable to want to write documentation comments in those cases.

I don't think it's impossible to get /// working in most cases, we might be able to leverage the parser's isStartOfSwiftDecl heuristic to exclude most/all of the cases where you have a documentation comment. You then wouldn't be able to start a multi-line regex with a decl introducer keywords like var, func, or class. However it would add even more complexity to source tooling such as syntax highlighting. Additionally, it's possible the editor may no longer be able to automatically insert /// to continue a documentation comment in case you are trying to write a regex.

Now, this might be a worthwhile tradeoff if it's felt that the delimiter is worth it, but I'm not entirely sure whether it is. While it does nicely mirror """, it doesn't have the same semantics due to whitespace being non-semantic. In that regard, it could be argued that it's more surprising than mutli-line #/. It additionally doesn't have the same term-of-art or conciseness benefits that /.../ has.

Ben_Cohen · May 16, 2022, 3:22pm

13 posts were merged into an existing topic: SE-0354 (Second Review): Regex Literals