Character Comparison Doesn't Follow Unicode Scalar Value?

I stumble upon comparing two characters "ὰ" (\u{1F70}) and "ώ" (\u{1F7D}). I was expecting that "ὰ" < "ώ" but swift thinks the opposite. Why is this the case?

let a: Character = "\u{1F70}"  // ὰ
let b: Character = "\u{1F7D}"  // ώ

print(a)
print(b)

print(a < b)  // false

let aScalar = a.unicodeScalars.first!.value
let bScalar = b.unicodeScalars.first!.value
print(aScalar < bScalar) // true

The output is

ὰ
ώ
false
true

Characters and strings follow the rules of the Unicode Collation Algorithm. Along other things, the actual result is locale-dependent; do you get the same results if you set your locale to Greek?

2 Likes

No, Swift standard library operations are locale-independent. Locale-aware operations are vended by Foundation.

4 Likes

Thanks! I thought Character is treated differently but it looks like it's just a wrapper around String. In my case I'm only interested in comparing the Characters based on their unicode values.

Thanks for the clarification!

Ah, then UTS#10 linked to above by @ksluder has an important sentence for you to read:

The basic principle to remember is: The position of characters in the Unicode code charts does not specify their sort order.

I guess it must be a common enough misconception that they thought fit to write this out in bold and italics.

4 Likes

I think Swift also doesn't guarantee the sort order of Characters.
In fact, Swift changed its strategy to compare Characters in Swift 4.2.

 let character_u = Character("u") // U+0075
 let character_v = Character("v") // U+0076
 let character_uDiaeresisMacron = Character("\u{01D6}") // ǖ
 print(character_u < character_uDiaeresisMacron) // Prints "true"
 print(character_v < character_uDiaeresisMacron) // Prints "true" in Swift>=4.2, otherwise "false".
1 Like

There is an intended sort order (of sorts). The change in Swift 4.2 would have been a bugfix:

1 Like

In addition to the other answer: note that a character can consist of several Unicode scalars.

2 Likes

yeah I bumped into this a few minutes after posting this. let c: Character = "\r\n" is considered as a Character