Pitch: String Gaps and Missing APIs

Michael_Ilseman · March 13, 2019, 12:30am

You should not bake in knowledge of grapheme breaking statically in your code, as it is a run-time concept.

This is actually why the standard library must not state the version of Unicode supported in documentation. The version of the standard library in the SDK that you build with and read the documentation for is not the same as the version that you will link with at run time. The version of Unicode supported is a run time concept.

There has been some desire for something like Unicode.version or similar as a static variable, so that you can guard against it at run time if necessary.

These read the same to me. For the example of .nonLossyASCII or punycode, which is capable of encoding and decoding all of Unicode. What does available(in:) then mean, if we can encode and decode to and from Unicode scalar values? Do you mean that the value is trivially-encoded, i.e. its integer value corresponds directly the a (truncation of) the Unicode scalar value?

A simpler example: what is the result of available(in:) for UTF-16 on a non-BMP scalar?