Why does Swift use 21 bits per scalar, i.e. UTF-32 encoding, to represent a string?

We can't get linear time to get one or another symbol in a string:
UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position (combining characters) or more than one character position per code point ("grapheme clusters" for CJK ideographs).

In this use too much memory. So what's the advantage?

You meant 21 bits per scalar? What do you mean by "Swift uses"? Are you referring to String? String instances have a UTF-8 and UTF-16 representation.

21 bits per scalar

It doesn't. String are commonly stored in UTF-16 or UTF-8 (though currently this is only used for ASCII). Either way, this is abstracted from the user.

3 Likes