letter l with combining diacritics from the test was considered lowercase as the function
latinL.isUppercase()
returned false. This looks like a bug to me. Could someone confirm?
The first six diacritics aren’t letters and have no case, which is why they do not change anything about how L (or any of the other latin letters there) are considered.
But the last “diacritic” is actually a letter, even though it is printed underneath and is part of the same “grapheme cluster” as far as Unicode is concerned when it is in sentence case:
sentence case | ᾅδης | 03B1 02BB 0301 0345 • 03B4 • 03B7 • 03C2 |
title case | Αἵδης | 0391 • 03B9 02BB 0301 • 03B4 • 03B7 • 03C2 |
uppercase “font” | ΑΙΔΗΣ | 0391 • 0399 • 0394 • 0397 • 03A0 |
Logically, it means the test is invalid. It is (at the human level) equivalent to requiring the snake case of of myGreatURLiRequest
to be my_great_urli_request
instead of my_great_ur_li_request
.
However, as you can see from the chart, Unicode’s encoding of the letter is a huge mess for legacy reasons. I doubt a machine could do the “right” thing here no matter how hard you tried.
It would probably be wisest to pull that last scalar off of the test. But I don’t know if it is worth also trying to “fix” the implementation to match real human expectations (which your own re‐implementation happens to be closer to). We’re dealing with a letter that was officially abolished in 1982 and sees no use in any living language. No other Unicode characters work like this one either. Archeologists may use it in their papers, but the chances of it being used in a JSON key are basically 0 even in Greece.
then it looks like there may be a discrepancy between Character's
isUppercase()
andCharacterSet.uppercaseLetters
These are not the same thing. To avoid repeating myself, please see here: