sspringer
(Stefan Springer)
1
I would like to do something like this:
let result = "a\u{0358}".replace(
regex: try! Regex(#"([a-z])\x{0358}"#),
withTemplate: try! Regex(#"$1\x{0307}"#),
semanticLevel: .unicodeScalar
)
In this example "a͘" would change to "ȧ".
So I would like to replace regex matches by a template using the "unicode scalar" semantic level where I am able to replace e.g. specific combining characters. (The example above is artificial and no real case, but I really need to do replacements of this kind.) And it should work on macOS, Linux, and Windows.
I could not find out how this could currently be possible. Any help would be welcome, thanks.
ole
(Ole Begemann)
2
If I understand your goal correctly, this seems to work for me:
let regex = #/([a-z])\x{0358}/#.matchingSemantics(.unicodeScalar)
let input = "a\u{0358}"
let result = input.replacing(regex) { match in
"\(match.output.1)\u{0357}"
}
(I only tested on macOS, but the Regex APIs should be available on all platforms, right?)
1 Like
sspringer
(Stefan Springer)
3
Thanks, your code works (tested on macOS and on Windows). I need to use replacement expressions of the form "...$1...$2...", but I can replace those $1 in the expression myself.
ole
(Ole Begemann)
4
As far as I know, Swift doesn't provide APIs for replacing a regex with a pattern string where patterns such as "$1" evaluate to matched groups.
Maybe somebody has already written such an API on top of the Regex APIs, but I'm not aware of anything.
sspringer
(Stefan Springer)
5
It does work with "...".replacingOccurrences(of: regex, with: theReplacement, options: .regularExpression, range: nil) so it seems strange to me that Swift does not provide it in "our" case. It seems some nice convenience methods are missing here in the library, they would make it easier.
ole
(Ole Begemann)
6
Here's a quick and dirty solution that implements replacement patterns of the form "$1" on top of the Regex APIs:
extension RangeReplaceableCollection where SubSequence == Substring {
func replacing(
_ regex: some RegexComponent,
withPattern replacementPattern: String,
maxReplacements: Int = .max
) -> Self {
let numberedGroupRegex = #/\$([0-9]+)/#
.asciiOnlyCharacterClasses()
.matchingSemantics(.unicodeScalar)
return self.replacing(regex, maxReplacements: maxReplacements) { match in
let typeErasedMatch: Regex<AnyRegexOutput>.Match = .init(match)
return replacementPattern.replacing(numberedGroupRegex) { groupMatch in
let groupNumber = Int(groupMatch.output.1)!
let replacement = typeErasedMatch[groupNumber].substring!
return replacement
}
}
}
}
let input = "a\u{0358}"
let regex = #/([a-z])\x{0358}/#.matchingSemantics(.unicodeScalar)
let result = input.replacing(regex, withPattern: "$1\u{0357}")
print(result)
Very much experimental, untested, and possibly not a great solution, but it seems to be working for your use case. Note that the function will crash if the replacement pattern refers to a non-existent group, such as "$2" in the example.
2 Likes
sspringer
(Stefan Springer)
7
Wow, cool, thank you very much!
1 Like
sspringer
(Stefan Springer)
8
Just as a note: I made an according macro.
1 Like
ole
(Ole Begemann)
9
Interesting. And by making this a macro that replaces "$1" with "\(match.output.1)" at compile time, you can make sure that no non-existent groups can be used, did I get that right? Are there other benefits to making this a macro?
On the flip side, you lose the ability to generate the pattern string programmatically at runtime, right?
sspringer
(Stefan Springer)
10
With non-existing groups it does not compile, but with a little list of unclear error messages, so no perfect behaviour at this point. (This is different from the cases where in the macro itself you are throwing errors yourself, where the user of the macro then gets nice error messages.)
Should be faster.
With the RegexWithCharacterClasses macro I am using many character classes defined there (the classes that can be used in standard regex expressions do not suffice for my needs), together without those macros this was slowing down my according applications, so I hope (and think) that I am good again at the efficiency side.
Yes.
Question: I am using autoreleasepool (and the AutoreleasepoolShim) where I am doing the replacement of the $1, $2, … because this was necessary for some older replace-methods, I am not sure if it is still necessary for these newer methods (and in the future should generally not be necessary with the new Foundation). Does anyone know more? Thanks.
1 Like
ole
(Ole Begemann)
11
I almost certain it's not necessary. Native Swift objects don't have the concept of autoreleasing, and the new Regex engine is fully implemented in Swift.
The older regex APIs that String inherits from its automatic bridging to NSString create Objective-C objects under the hood (on Apple platforms), so an autorelease pool may make sense for these, e.g. if you're calling such an API over and over in a loop.
1 Like
sspringer
(Stefan Springer)
12
Note: I asked about an implementation issue in a separate topic.