Regex initialization seems to be really slow. We ported some text-parsing code, which uses regular expressions heavily, from Typescript to Swift and found that parsing 100 files with the Swift version took up to 20x longer than the original Typescript code (in a browser).
After further investigation, we found Regex initialization to be the culprit. I am listing some (contrived) examples, to show what I mean:
The following code takes approx 3s to run:
for _ in 0..<100000 {
"abcde".matches(of: /a.c.e/)
}
The regex literal seems to be directly translated to Regex("a.c.e")
, because replacing the regex literal with the initializer expression yields the same performance.
BUT: The following code runs in 0.3s (10 times faster):
let regex = /a.c.e/
for _ in 0..100000 {
"abcde".matches(of: regex)
}
So, constructing the regex seems to take much more time that actually doing the matching. And thus, pulling the regex initialization out of the loop gives a huge performance boost.
Our conclusion was to circumvent Regex initialization wherever possible by defining private class constants for all used regex (which is a bit cumbersome). So we wondered, whether the Swift compiler couldn't do this automatically by "externalizing" all Regex literals into some constant pool or virtual/fileprivate/static/whatever variable such that they are initialized only once (instead of during each and every execution of its usage).
And related to this:
String.replacing(Regex, with: String)
does not support back references in the replacement string. So we had to resort to using String.replacingOccurrencesOf(String, with:String, options: .regularExpression)
, which does support back references at the cost of constructing the regex each time from the String (which hurts performance again).
It would be nice, if String.replacing(Regex, with: String)
could be made to support back references (so that we can use externalized regex literals again), or - if that is too much of a breaking change to String.replacing()
- maybe we could have a variant String.replacingOccurrencesOf(Regex, with:String)
.
Or, just make the Regex initializer (much) faster