Is this a bug in Swift Stdlib Character isUppercase()?

SDGGiesbrecht · March 7, 2020, 8:58pm

letter l with combining diacritics from the test was considered lowercase as the function latinL.isUppercase() returned false. This looks like a bug to me. Could someone confirm?

The first six diacritics aren’t letters and have no case, which is why they do not change anything about how L (or any of the other latin letters there) are considered.

But the last “diacritic” is actually a letter, even though it is printed underneath and is part of the same “grapheme cluster” as far as Unicode is concerned when it is in sentence case:

sentence case	ᾅδης	03B1 02BB 0301 0345 • 03B4 • 03B7 • 03C2
title case	Αἵδης	0391 • 03B9 02BB 0301 • 03B4 • 03B7 • 03C2
uppercase “font”	ΑΙΔΗΣ	0391 • 0399 • 0394 • 0397 • 03A0

Logically, it means the test is invalid. It is (at the human level) equivalent to requiring the snake case of of myGreatURLiRequest to be my_great_urli_request instead of my_great_ur_li_request.

However, as you can see from the chart, Unicode’s encoding of the letter is a huge mess for legacy reasons. I doubt a machine could do the “right” thing here no matter how hard you tried.

It would probably be wisest to pull that last scalar off of the test. But I don’t know if it is worth also trying to “fix” the implementation to match real human expectations (which your own re‐implementation happens to be closer to). We’re dealing with a letter that was officially abolished in 1982 and sees no use in any living language. No other Unicode characters work like this one either. Archeologists may use it in their papers, but the chances of it being used in a JSON key are basically 0 even in Greece.

then it looks like there may be a discrepancy between Character's isUppercase() and CharacterSet.uppercaseLetters

These are not the same thing. To avoid repeating myself, please see here: