How to replace regex by template with semantic level "Unicode scalar"

I would like to do something like this:

let result = "a\u{0358}".replace(
    regex: try! Regex(#"([a-z])\x{0358}"#),
    withTemplate: try! Regex(#"$1\x{0307}"#),
    semanticLevel: .unicodeScalar
)

In this example "a͘" would change to "ȧ".

So I would like to replace regex matches by a template using the "unicode scalar" semantic level where I am able to replace e.g. specific combining characters. (The example above is artificial and no real case, but I really need to do replacements of this kind.) And it should work on macOS, Linux, and Windows.

I could not find out how this could currently be possible. Any help would be welcome, thanks.

If I understand your goal correctly, this seems to work for me:

let regex = #/([a-z])\x{0358}/#.matchingSemantics(.unicodeScalar)
let input = "a\u{0358}"
let result = input.replacing(regex) { match in
    "\(match.output.1)\u{0357}"
}

(I only tested on macOS, but the Regex APIs should be available on all platforms, right?)

1 Like

Thanks, your code works (tested on macOS and on Windows). I need to use replacement expressions of the form "...$1...$2...", but I can replace those $1 in the expression myself.

As far as I know, Swift doesn't provide APIs for replacing a regex with a pattern string where patterns such as "$1" evaluate to matched groups.

Maybe somebody has already written such an API on top of the Regex APIs, but I'm not aware of anything.

It does work with "...".replacingOccurrences(of: regex, with: theReplacement, options: .regularExpression, range: nil) so it seems strange to me that Swift does not provide it in "our" case. It seems some nice convenience methods are missing here in the library, they would make it easier.

Here's a quick and dirty solution that implements replacement patterns of the form "$1" on top of the Regex APIs:

extension RangeReplaceableCollection where SubSequence == Substring {
    func replacing(
        _ regex: some RegexComponent,
        withPattern replacementPattern: String,
        maxReplacements: Int = .max
    ) -> Self {
        let numberedGroupRegex = #/\$([0-9]+)/#
            .asciiOnlyCharacterClasses()
            .matchingSemantics(.unicodeScalar)
        return self.replacing(regex, maxReplacements: maxReplacements) { match in
            let typeErasedMatch: Regex<AnyRegexOutput>.Match = .init(match)
            return replacementPattern.replacing(numberedGroupRegex) { groupMatch in
                let groupNumber = Int(groupMatch.output.1)!
                let replacement = typeErasedMatch[groupNumber].substring!
                return replacement
            }
        }
    }
}

let input = "a\u{0358}"
let regex = #/([a-z])\x{0358}/#.matchingSemantics(.unicodeScalar)
let result = input.replacing(regex, withPattern: "$1\u{0357}")
print(result)

Very much experimental, untested, and possibly not a great solution, but it seems to be working for your use case. Note that the function will crash if the replacement pattern refers to a non-existent group, such as "$2" in the example.

2 Likes

Wow, cool, thank you very much!

1 Like

Just as a note: I made an according macro.

1 Like

Interesting. And by making this a macro that replaces "$1" with "\(match.output.1)" at compile time, you can make sure that no non-existent groups can be used, did I get that right? Are there other benefits to making this a macro?

On the flip side, you lose the ability to generate the pattern string programmatically at runtime, right?

With non-existing groups it does not compile, but with a little list of unclear error messages, so no perfect behaviour at this point. (This is different from the cases where in the macro itself you are throwing errors yourself, where the user of the macro then gets nice error messages.)

Should be faster.

With the RegexWithCharacterClasses macro I am using many character classes defined there (the classes that can be used in standard regex expressions do not suffice for my needs), together without those macros this was slowing down my according applications, so I hope (and think) that I am good again at the efficiency side.

Yes.

Question: I am using autoreleasepool (and the AutoreleasepoolShim) where I am doing the replacement of the $1, $2, … because this was necessary for some older replace-methods, I am not sure if it is still necessary for these newer methods (and in the future should generally not be necessary with the new Foundation). Does anyone know more? Thanks.

1 Like

I almost certain it's not necessary. Native Swift objects don't have the concept of autoreleasing, and the new Regex engine is fully implemented in Swift.

The older regex APIs that String inherits from its automatic bridging to NSString create Objective-C objects under the hood (on Apple platforms), so an autorelease pool may make sense for these, e.g. if you're calling such an API over and over in a loop.

1 Like

Note: I asked about an implementation issue in a separate topic.