Unicode Enthusiasts Unite
Hello all, I thought I’d share some thoughts on future API directions relating to Unicode.
The standard library provides lots of functionality through things like the Unicode
namespace and Unicode Scalar Properties. String should continue to expand on this to deliver functionality to those unfortunate enthusiastic developers who interact with Unicody details on a frequent basis.
Normalized Scalar Views
Briefly discussed here, String.UnicodeScalarView
should expose properties providing lazily-normalized views for NFC, NFD, NFkC, NFkD, and maybe FCC. We should also have a “Swift canonical form”, which is the unspecified form that’s used for efficient string comparisons, as well as ways to force a string into that normal form.
String’s implementation of comparison, in a fall-back path, performs a lazy normalization of a sliding window into NFC. Unfortunately, we pay some overhead in how we interact with ICU (e.g. transcoding). A native Swift implementation of normalization algorithms would greatly improve, and simplify, comparison’s implementation. This would also improve Swift’s portability by reducing one of the main ways the standard library currently depends on ICU.
Emoji Analysis
Whether a Character
is or is not an emoji is actually complex and environment-dependent. For this reason emoji-analysis was postponed from SE-0221: Character Properties. We should revisit this, watching newer Unicode versions and perhaps consulting with the Unicode Consortium to understand how we can carefully balance usability with source-stability.
Bidirectional Properties
Unicode defines Bidirectional Class Values for scalars, and dictates how applications can use them to display text correctly. We can followup on the work in SE-0211 to add these.
extension Unicode {
/// The bidirectional classification of a Unicode scalar.
///
/// This classification is used for presenting directionality of text by the
/// [Unicode Standard](https://www.unicode.org/reports/tr9/#Bidirectional_Character_Types)
public enum BidirectionalClass {
/// A strong left-to-right character.
///
/// The value corresponds to the category `Left_To_Right` (abbreviated `L`) in the
/// [Unicode Standard](https://unicode.org/reports/tr44/#Bidi_Class_Values).
case leftToRight
...
}
}
extension Unicode.Scalar {
/// The bidirectional class of the scalar.
///
/// This property corresponds to the "Bidi_Class" property in the
/// [Unicode Standard](http://www.unicode.org/versions/latest/).
public var bidiClass: BidirectionalClass { get }
}
Current version
We should expose a way to get the run-time version of Unicode that’s available.
extension Unicode {
var currentVersion: Unicode.Version { get }
}
SIMD-accelerated decoding and analysis
Pending more SIMD feature work (CC @scanon), the standard library should expose more of its internal decoding and analysis functionality.
This includes things such as classifying a code unit, accelerating length calculations between UTF-8 and UTF-16, transcoding, etc.
Out-of-scope for now: locale
Locale is currently considered out of scope for the standard library. Even simple operations such as determining the current user’s locale can depend on higher-level layers of the OS. Baking in any notion of locale at this stage could limit Swift’s applicability to systems programming.
Conventions for dealing with localized content is best left to the platform (e.g. Cocoa).