I have an implementation which I’d like to create a pull request for but don’t know where it should go. The current implementation is at
inside ‘StringLegacy.swift’ but I’d presume it’s better to live elsewhere.
In addition this uses the character view which I’m not sure follows the normalisation required for comparisons; or if there is some different normalisation routines that need to be written either.
Finally it’s not clear if this the right way to solve the problem, or if the implementation should be in the open-source version of the Foundation NSString implementation.
I'm pretty sure the current implementation uses Unicode canonical
equivalence to perform the comparison. This is equivalent to invoking
`decomposedStringWithCanonicalMapping` on both strings and then
comparing the resulting utf8 (or utf16) sequences, although it doesn't
actually build new strings.
This is not the same thing as comparing character sequences.
For example, given the following two strings
let s1 = "\u{1F1E6}\u{1F1F9}" // let s2 = "\u{1F1E6}" // U+1F1E6
REGIONAL INDICATOR SYMBOL LETTER A
The current behavior of `s1.hasPrefix(s2)` is to return `true`, but any
kind of comparison on the character sequences is going to return `false`
as both strings have one character each, and the characters are not the
same. This is because String.characters is based on the notion of
extended grapheme clusters, which is not the same thing as canonical
equivalence.
FWIW, the comparison operators on String (for non-ASCII strings)
actually bridge into the ObjC runtime; if there is no ObjC runtime it
appears to use ICU instead to do the comparison over UTF16 or UTF8 (see
stdlib/public/stubs/UnicodeNormalization.cpp).
-Kevin Ballard
···
On Tue, Dec 22, 2015, at 01:28 PM, Alex Blewitt via swift-dev wrote:
inside ‘StringLegacy.swift’ but I’d presume it’s better to live
elsewhere.
In addition this uses the character view which I’m not sure follows
the normalisation required for comparisons; or if there is some
different normalisation routines that need to be written either.
Finally it’s not clear if this the right way to solve the problem, or
if the implementation should be in the open-source version of the
Foundation NSString implementation.