Will Swift String's understanding of Characters remain stable?

ole · April 10, 2024, 2:35pm

I don't think this is true without exception, or at least it wasn't true in the past. For example, Unicode 9.0 changed the grapheme cluster boundary rules for emoji flags.

The rule in Unicode 9.0 (still current in Unicode 15.1):

Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.

The previous rule in Unicode 8.0 was:

Do not break between regional indicator symbols.

This changed the grapheme breaking rules for existing code points (the regional indicator symbols were introduced in Unicode 6.0).

For example, under Unicode 8.0 rules the string let str = "🇦🇷🇯🇵" would be treated as a single Character consisting of four code points. Since Unicode 9.0, it is treated as two Characters with two code points each.