SE-0350 (second review): Regex Type and Overview

Hello, Swift community.

The second review of SE-0350: Regex Type and Overview begins now and runs through April 28, 2022.

The second revision makes some adjustments to naming:

  • Matching functions now follow the pattern Regex.wholeMatch(in:)
  • Regex.init(_: String) has dropped its compiling: label

The proposal also now clarifies that matching will check for task cancellation and throw.

An alternative considered section now discusses the benefits of the bytecode interpreter.


This review is part of a collection of proposals for better string processing in Swift. The proposal authors have put together a proposal overview with links to in-progress pitches and reviews. This proposal introduces the fundamental type, Regex , to the standard library, and outlines how it will be used, including setting up future proposals. It will be run simultaneously with one of the ways to create the Regex type: from a result builder DSL .

As with the concurrency initiative last year, the core team acknowledges that reviewing a large number of interlinked proposals can be challenging. In particular, acceptance of one of the proposals should be considered provisional on future discussions of follow-on proposals that are closely related but have not yet completed the evolution review process. Similarly, reviewers should hold back on in-depth discussion of a subject of an upcoming review. Please do your best to review each proposal on its own merits, while still understanding its relationship to the larger feature.


Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. If you do email me directly, please put "SE-0350" somewhere in the subject line.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

  • What is your evaluation of the proposal?
  • Is the problem being addressed significant enough to warrant a change to Swift?
  • Does this proposal fit well with the feel and direction of Swift?
  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at:

https://github.com/apple/swift-evolution/blob/main/process.md

As always, thank you for contributing to Swift.

Ben Cohen

Review Manager

7 Likes

Sorry to nitpick. Title is "SE-0352", but "SE-0350" is correct, isn't it?

1 Like

Fixed, thank you.

+1 - I'm glad to see first class support for Regexes in Swift.

Yes - NSRegularExpression is awkward to use in Swift, and regexes are an important part of making text processing concise and convenient.

Using tuples to provide static guarantees about the number and names of captures makes them fit well with Swift's goals of safety and correctness. Aligning the numbers of captures with back references makes it easier to use them correctly.

I've used regexes in Ruby, JavaScript, Java & .Net . This + the related proposals generally compare well with these other languages. I've commented separately about being able to introspect a Regex to find the number and names of captures in the 'Run-time Regex Construction' pitch.

Have contributed to the pitch discussion and the first review of the proposal. Compared the API being proposed to the other languages I'm familiar with.

Is this review specifically for the amended aspects or does the core team want feedback on the overall proposal again?

This is primarily to get feedback on the amended aspects. There's no need to re-post feedback from the last review.

:+1: Happy to see both adjustments in naming which I’d raised in the original review period. Appreciate clarifications and extended discussion incorporated into the proposal text.

The cancellation bit is interesting. I think there’s been some discussion on whether that should be used in non-async code, but I think the justification of it having the potential to be long running code makes sense.

Will we see first-party support for regexes in a switch case?

switch userInput {
case /[aeiou]+/:
    return "All vowels here"
case /[bcdfghjklmnpqrstvwxyz]+/
    return "All consonants"
default:
    return "Not all vowels or consonants"
} 

Not a silver-bullet, but I’ve found this very useful to break up long regular expressions when you don’t need to capture anything.

Pattern Matching Operator Definition
func ~=<Output>(a: Regex<Output>, b: String) -> Bool {
    guard let _ = try? a.wholeMatch(b) else { return false }
    return true
}

func ~=<Output>(a: Regex<Output>, b: Substring) -> Bool {
    guard let _ = try? a.wholeMatch(b) else { return false }
    return true
}

Related earlier post of mine: Pitch #2 Regex Literals - #55 by christopherweems - Pitches - Swift Forums

6 Likes

Yes, and it probably makes sense to appear in the up-coming "String processing algorithms" proposal.

Thanks for bringing it up! This is an unintentional omission (PR).

While this is nice to have, I believe the truly interesting and powerful use cases come with the ability to extract values from the pattern match itself, which is future work and requires language features:

5 Likes

Should the behaviour of ~= be anchored (wholeMatch) or unanchored (firstMatch)? I've commented about this in the String processing algorithms pitch thread to avoid derailing the review of SE-0350.

I'd love to see regexes be given the ability to declare constants and variables inline when they're used within a case condition, rather than requiring constants and variables to be declared outside the regex like a value-binding pattern would. For example:

if case /(?<let identifier>[[:alpha:]]\w*) = (?<let hex? = Int($0, radix: 16)>[0-9A-F]+)/ = string {
    print(identifier, hex)
}

IMO this is much clearer, and aligns much better with Chris Lattner's original vision for regular expressions:

That would be further work built on top of a destructuring pattern match, and as such is future work (potentially even further into the future). I also discussed extended (and breaking) regex syntax a little here: [Pitch] Regex Syntax - #27 by Michael_Ilseman, where <...> could be used for interpolation or bindings.

1 Like

Review Conclusion

The proposal has been accepted.

2 Likes

Late to the party:

Why are there explicit overloads for String and Substring, rather than being generic over S: StringProtocol?

1 Like