“e” and “◌́” are each one grapheme (atomic unit of meaning), “é” is one integrated cluster of graphemes (roughly a group of graphemes which work together in a way that breaks the notion of sequential order). “Grapheme cluster” is the more technical term for what Swift calls Character
. Every Character
represents exactly 1 grapheme cluster.
[Character]
behaves as you seem to mean by “like an array of integers”. Both String
and [Character]
have elements that are Character
instances, but the two types behave in fundamentally different ways.
[Character]
keeps its elements separate from one another, effectively storing not just the text, but a set of arbitrary boundaries within it according to how it was constructed. It is capable of answering the question “How many characters did I insert?” in O(1), but it still requires O(n) to figure out “How many characters does the text I represent have now?” The tradeoff is that when loading a text file, it is O(n) to find the boundaries in the first place during initialization.
String
on the other hand stores its contents as an uninterrupted stream. That means it is completely unable to answer the question “How many characters did I insert?”. But instead, it is capable of loading a text file without even bothering to figure out how many characters there are. It is O(n) to answer the question “How many characters does the text I represent have now?”, but for many uses of String
, you never need to ask for that in the first place.
This illustrates the behavioural difference:
let characters: [Character] = ["ᄒ", "ᅡ", "ᆫ", "ᄀ", "ᅮ", "ᆨ", "ᄋ", "ᅥ"]
let alsoCharacters: [Character] = ["한", "국", "어"]
assert(String(characters) == String(alsoCharacters))
assert(characters.count == 8)
assert(String(characters).count == 3)
assert([Character](String(characters)).count == 3)
These escapes are only present in the source code. By the time they are compiled into a binary, they have already been replaced with exactly the same bytes as their direct forms. Escaped literals have nothing to do with runtime efficiency.
This is impossible. It takes time to decipher either:
- how many clusters are in the stream (
String
), or
- if the cluster delineation is still valid (
[Character]
)
You could create a type that audits and records the delineation after each mutation. It could help in situations where you mutate fewer times than you count. But the initialize‐then‐count process would still be O(n), and the efficiency of all mutating operations would suffer greatly.