Right - and this leads folks to try things like a.uppercased().lowercased() == b.uppercased().lowercased(). Which handles more cases correctly, but still not necessarily all. And it's just plain inefficient.
Not off-hand, but I'd be amazed if there aren't some.
I don't know much about non-English locales (nor that specific example), which is why I cautioned in my blog post on this topic that insensitivity to these aspects might be wrong in some locales.
e.g. in Spanish it is wrong to treat "n" and "ñ" the same, because there are words which differ only in that respect but mean completely different things (and the difference can be very important, like coño vs cono - more examples).
And yet, lots of Spanish websites and applications do treat them equivalently in a lot of cases anyway, e.g. if you enter "nina" into spanishdict.com it will still match the actual word "niña".
Tip / side-rant
There's no such thing as "el nino" or "la nina" - it's "el niño" and "la niña", and they're pronounced differently because ñ is not an n, despite its visual similarities.
Similarly, it's a piña colada (pin-ya, not peen-a), because piña is pineapple in Spanish. "Pina" doesn't mean anything. The drink is literally named "strained pineapple" (strained in the kitchen sense, not the stressed sense
).
Maybe that's just it doing auto-correct, but for a random Swift app without access to an auto-correct dictionary, being diacritic-insensitive isn't the worst approximation thereof (especially if you take pains to prioritise exact matches over inexact ones).
It's hard to do context-unaware string handling - even just knowing a locale isn't enough to do it perfectly. e.g. in an English locale does "nina" mean:
- A name (that you mistakenly didn't capitalise, "Nina").
- A misspelling (e.g. "nine" or "nana").
- An attempt to use the Spanish word "niña",
- Something else?
Even a native speaker can have trouble deciding, even with a lot of context, and the decision is important because it determines whether that possibly-missing-˜ is important or not.
Which, again, is why I think being insensitive to all these aspects of characters is actually a really good general default. At least for searching (better to have false positives than false negatives) and security (better to avoid duplicates which might confuse a human).
Of course there are some use-cases where that insensitivity is inappropriate (e.g. showing changes in a document, where even if the semantics don't change you probably still want to see glyph changes).