SE-0355: Regex Syntax and Runtime Construction

Michael_Ilseman · May 11, 2022, 4:55pm

You can work around it, poorly, by making sure prefix doesn't contain the subsequence \E and wrapping it in a \Q...\E.

I think the better general solution is to support regex interpolations, which is future work.

Michael_Ilseman · May 11, 2022, 5:19pm

My understanding, (please correct me if I'm wrong @rxwei @nnnnnnnn), is that with the soon-to-be-revised DSL's mapOutput:

func buildPrefixExpression(_ prefixStr: String) throws -> Regex<(Substring, suffix: Substring)> {
    Regex { 
      prefixStr
      Capture { /.*/ }
    }.mapOutput {
      ($0, suffix: $1)
    }
}

nnnnnnnn · May 11, 2022, 6:55pm

That's right — this kind of control over composition is one of the primary motivations for creating the RegexBuilder approach to building regexes.

christopherweems · May 11, 2022, 7:55pm

That's fantastic!

.mapOutput(..) will be a heavy hitter for loads of regex code for sure.

hooman · May 12, 2022, 2:41pm

I fully support the work being done here and it looks really good. But I can't vote on it. I can't provide valid detailed feedback on the proposal because of my limited exposure and actual use case for many of the advanced and somewhat problematic corners of regex syntax and the unification and Unicode full adoption effort. A huge amount of work have been done, but I am afraid it might be too soon to commit to this at the standard library level and make it subject to source break rules.

I didn't get a chance to read the responses, so please accept my apology if this question is duplicate:

If this proposal is accepted and released, are we going to be locked out of breaking changes to the syntax until Swift 7? Strings with this literal syntax might be stored externally. If we do make a breaking change, will compiler and Xcode be able to help migrate the existing strings? Especially if we create the string at runtime using literal string fragments plus dynamic runtime information (such as user provided word to match)

scanon · May 12, 2022, 3:34pm

No. There are several mechanisms available that could assist us in doing a migration if we had to (though I don't think that we will). The first one that came to mind is that rather than migrate existing strings, we would continue to support the existing syntax via a labeled Regex(swift5_7syntax: pattern) or similar, and migrate existing unlabeled inits to that via tooling. I can think of a few other ways to address it as well, so I do not believe we would have painted ourselves into a corner.

hooman · May 12, 2022, 3:42pm

Good to hear. How about the ABI?

scanon · May 12, 2022, 5:05pm

We'd be able to do a similar thing at the ABI level so that already-compiled code continued to see the same behavior.

hooman · May 12, 2022, 5:09pm

Great. In that case I am fully +1 on this.

wowbagger · June 4, 2022, 7:01am

Apologies for another extremely late review.

My primary concern is the proposal's adherence to the group numbering convention.

I'm not aware of the historical reasons for this convention, but I suspect it was because it was good enough for people back then without overly complicating regex-parsing, perhaps out of concern for technical constraint at the time. I don't know–I'm only speculating.

Regardless of what the reasons might be before, I don't think it's good to stay with this convention for nested groups. Most human eyes/brains are not good at counting (this is why we have things like rainbow parentheses), which means the numbering is an error prone area. Especially with only-Substring captures, it could be difficult to find wrong numbering until the program is run. Additionally, with this linear numbering, editing a group may very likely result in editing many unrelated match calls. These match calls could be very far away from the regex pattern definition, maybe even in different projects, and thus very difficult to keep track of and update for numbering changes. This seems to go contrary to Swift's stance on good local reasoning.

Perhaps nested numbering via nested tuples is a better solution for nested groups?

ben-cohen · July 26, 2022, 6:52pm

Review Conclusion

The proposal has been accepted.