var start = string.startIndex // 1
let end = string.index(start, offsetBy: 10) // 2
let length: CFIndex = string.utf16.distance(from: start, to: end) // 3
Is that the best way to do it?
I'm surprised that the second line is about 4x faster than the third line?
My goal is that I have a string broken up into saved lengths which are Int representing character count. I want to iterate over my string using those character lengths, but I want to generate CFRanges.
Right, I'm assuming a UTF-8 backing and (half the work of) transcoding to get a UTF-16 count instead (for CFRange). Counting Characters (extended grapheme clusters) should be about equally hard for UTF-8 and UTF-16.
These are not stable across Unicode versions, so if they are saved or transmitted separately from the string itself, they risk pointing at something different after they are reloaded.
If you disallow undefined scalars, then you can pick a normalization form, apply it after loading, and save the ranges as scalar distances in that normalization form. Otherwise, to be completely stable, you need to save raw bytes (not strings) and save the ranges as byte distances.
Unless the context guarantees the string is long enough, you probably want to use index(_:offsetBy:limitedBy:) instead.
Outside of where I'm actually using this code in my app string.index(start, offsetBy: 10) is the expensive call and string.utf16.distance(from: start, to: end) is the cheep call. Inside my app where I care it's the opposite.
Maybe this IS the problem. What's a fool proof way to make sure that I'm dealing with a native Swift string? I am trying:
let maybeNotNative: String = ...
var forceNative = String(maybeNotNative[...])
forceNative.makeContiguousUTF8()
To recap, I'm trying to iterate over a string. I pass in lengths in "Character" unit. I want to get out CFRanges (which are utf16 unit). The two key methods to do this are:
String.index(_:offsetBy:limitedBy:) // Advance existing index by chars
Yeah, unfortunately a native string in this case is worse than a bridged one. For a pure ASCII native string this is not an issue, but anything outside of ASCII requires the use of transcoding both indices (start and end) (although transcoding start is trivial). For a UTF-16 backed string, this operation is trivial (end - start), and yes, what Jordan said about counting characters is true, it's pretty much equally hard for both UTF-8 and UTF-16 backed strings.