Would you be able to extract a representative benchmark from this? We'll get you on the fast-paths, but we'd also like to track and improve the slow-path performance as well. There's some (eventually unnecessary) overhead in the machinery here for interfacing with ICU, and we're always looking for more reasons to prioritize Stop using ICU for normalization.
Alternatively or additionally, do you have an instruments trace you can share? bugs.swift.org
The _foreign
part means that you have a lazily-bridged NSString that is incapable of providing a pointer+length to UTF-8 contents. In contrast, other strings are fastUTF8
meaning that they can provide such access (small strings spill to the stack to do so).
This specific terminology arose during the implementation, burned into Swift's ABI, albeit in a way that's not exposed to most users. Piercing the String Veil has more details on the internal structure of a String, settling on the terms "opaque" vs "contiguous" instead of "foreign" vs "fastUTF8". SE-0247 Contiguous Strings in 5.1 added API giving you control over this.
As an experiment, try adding in a makeContiguousUTF8()
mutating call to your strings. That forces opaque strings to be bridged into a native form, which also captures information such as whether the string is all ASCII (implying NFC).
If you can't use a 5.1 toolchain, you can hack fake it like this, which is a little less optimal than what 5.1 has.
That likely results in a string being bridged to Objective-C and then lazily imported back to Swift, and during the process the information about it being in NFC is lost.
Exposing normalization operations and queries is future work. It's a combination of new Unicode API mentioned in Unicode Enthusiasts Unite, getting off of ICU for normalization Stop using ICU for normalization, and exposing perf-flag operations mentioned further in Piercing the String Veil.
Posts, performance data, benchmarks, and bug reports help us prioritize this higher :-)
No, lifetime is an orthogonal concern and you're looking at the wrong bit. This is the bit for whether it's fast.
The makeContiguousUTF8()
API is the best we have for now (keeps you off the really slow paths), until we add the normalize()
API as well.
There's also a fair amount of bridging performance improvements in Swift 5.1 that may help some of your related code. CC @David_Smith.