Bug in lexicographical compare operators?

acharseth · August 19, 2021, 9:12pm

When comparing strings or characters, supporting Unicode, I expect Swift to handle the lexicographical order (alphabetc order) of characters. So in Norwegian we have 29 letters in our alphabet. After Z comes Æ, Ø and Å.
So I should be able to check if a specific character is a special Norwegian character by checking if it is >= Æ and <= Å. This does not seem to work however. Look at this example:

var AEisN=false
var OEisN=false
var AAisN=false

if ("Æ" >= "Æ" && "Æ" <= "Å") { AEisN=true }
if ("Ø" >= "Æ" && "Ø" <= "Å") { OEisN=true }
if ("Å" >= "Æ" && "Å" <= "Å") { AAisN=true }

print("'Æ'=Norwegian letter?: \(AEisN)")
print("'Ø'=Norwegian letter?: \(OEisN)")
print("'Å'=Norwegian letter?: \(AAisN)")

This should print true for all 3 cases but prints false.

I am using the CodeSnack IDE app for testing this and it seems like it is having a recent version as:

#if swift(>=5.4)
print("Hello, Swift 5!")
#endif

Prints Hello Swift 5!

Is this a bug in the Swift specification, this implementation or am I doing something wrong here?

tera · August 19, 2021, 9:38pm

interestingly Å < Æ in greenlandic. maybe swift doesn't know what language you mean?

simpler check would be create a Norwegian CharacterSet and check if a given character in question belongs to it.

Jumhyn · August 19, 2021, 9:45pm

I can't track down a specific reference in the docs to explain this, but we can see a bit more about what is (probably) happening by inspecting the code points of the characters you've called out (here).

Notably, the UTF-8 code points of the characters are:

Æ: 0xC3 0x86
Å: 0xC3 0x85
Ø: 0xC3 0x98

I believe that this implies that the actual ordering of those characters is Å < Æ < Ø, which can be confirmed:

print("Å" < "Æ") // true
print("Æ" < "Ø") // true

The best reference I could find that implies this behavior is from the String docs:

Basic string operations are not sensitive to locale settings, ensuring that string comparisons and other operations always have a single, stable result, allowing strings to be used as keys in Dictionary instances and for other purposes.

As @tera notes, the relative ordering of specific characters is locale-dependent, so it's not possible to have a fixed ordering of characters that is also "correct" in every locale.

harlanhaskins · August 19, 2021, 9:46pm

The standard < operators are not locale-dependent, as @Jumhyn pointed out. If you're looking to lexicographically compare strings that are presented to the user in a locale-sensitive way, Foundation defines localizedStandardCompare that you can use instead:

https://developer.apple.com/documentation/foundation/nsstring/1409742-localizedstandardcompare