String.replacing performance

Recently, upon spotting a use of Regex in the wild to achieve the equivalent of a literal string replacement, I wondered about the performance implications of using Regex this way. To my surprise, I learned that using String.replacing(Regex...) is actually about 50% faster than using String.replacing(String...). In my tests, I ran thousands of iterations for each scenario, including the use of Emoji characters, multiple replacements, etc, and in every test I made, a call like this:

myString.replacing(/foo/, with: "bar")

Is about 50% faster than:

myString.replacing("foo", with: "bar")

This made me wonder if Regex might have received more performance attention than the code path that .replacing(String) passes through, and if so, maybe it would make sense for the implementation of the latter to simply call through to the former?

After observing the difference in performance between the two above, I wondered how NSString would fare, so I changed the Swift String version to:

(myString as NSString).replacingOccurrences(of: "foo", with: "bar")

To my astonishment, this version is around 50x faster than the Swift Regex case. Here is a typical Instruments Time Profiler run with each of the Regex, Swift String, and NSString cases running in succession:

I know there are sophisticated Unicode correctness issues at play in Swift String, and the C-based counterpart might be benefiting from speed of incorrectness. But these disparities, both the one that causes Regex to perform better than a literal string, and the one where NSString soundly wallops Swift String, made me wonder if there is room for improvements here.

I'd love to hear from anybody with experience in this area of the code base about whether they think there is merit in filing a bug or otherwise pursuing improvements here.

Daniel

13 Likes

These API are somewhere that we know that there are good opportunities for large performance improvements, and relatively little effort has been directed at them so far, because most of our initial focus was on getting the API surface right.

Filing "hey, I tried a random synthetic benchmark and it was slow" bugs is not particularly useful--we already know that there's work to be done on these. But filing bugs of the form "this particular workload is very important to my use of the API" is quite useful, because we can add those meaningful cases to our benchmark suites and ensure that they are included in the improvements we make.

14 Likes

Makes sense. Thank you!

2 Likes

Maybe take a quick once-over of this earlier thread - not because it's the exact same situation (kind of reverse in fact) but because it has some good discussion about the current state of both String and Regex APIs. They need some love, in a nutshell. They are open source, by the way. :wink:

1 Like

thanks for doing the extensive testing, i suspect this might be a situation where you’re running into the rough edges of swift overload resolution.

first, godbolt.

in the example with the /foo/ regex literal, swift is calling sSm17_StringProcessingSs11SubSequenceRtzrlE9replacing_4with15maxReplacementsxqd_0__qd__SitSlRd__AA14RegexComponentRd_0_SJ7ElementRtd__r0_lF, which demangles to (extension in _StringProcessing):Swift.RangeReplaceableCollection< where A.SubSequence == Swift.Substring>.replacing<A, B where A1: Swift.Collection, B1: _StringProcessing.RegexComponent, A1.Element == Swift.Character>(_: B1, with: A1, maxReplacements: Swift.Int) -> A

from the signature it says it is an extension on RangeReplaceableCollection where Self.SubSequence == Substring so it can take advantage of string-specific optimizations because it can specialize many of the types.

in your second example with the "foo" string literal, it is calling sSm17_StringProcessingSQ7ElementRpzrlE9replacing_4with15maxReplacementsxqd___qd_0_SitSlRd__SlRd_0_ABQyd__ACRSABQyd_0_AGRSr0_lF which demangles to (extension in _StringProcessing):Swift.RangeReplaceableCollection< where A.Element: Swift.Equatable>.replacing<A, B where A1: Swift.Collection, B1: Swift.Collection, A.Element == A1.Element, A1.Element == B1.Element>(_: A1, with: B1, maxReplacements: Swift.Int) -> A.

this is an abstract function over RangeReplaceableCollection where Element:Equatable and isn’t likely getting specialized enough to benefit from optimization.

this probably has something to do with NSString being a concrete type and benefiting from more specialization than Self:RangeReplaceableCollection. it might be a good idea to add overloads to String and Substring that explicitly call the specialized implementations.

4 Likes

Usually that happens automatically whenever Foundation is loaded.

And this is not uncommon - Foundation's String (i.e. NSString) functions are generally much faster than those for Swift's standard String. For both of the reasons already covered - incorrect / inappropriate use of overly generic algorithms, and simply inefficient implementations / algorithms.