Adding Unicode properties to UnicodeScalar/Character

Michael_Ilseman · January 30, 2018, 5:31pm

Thank you for kicking this off! https://github.com/allevato/icu-swift seems like a great testing ground for this.

This is a great way to approach the task. We may want to expose all the raw information for sophisticated use cases, and selectively bless some queries for common use.

My vote is for isLowercase, etc., to be present directly for common use, semantically equivalent to something built on top of a more general facility. Much like a subset of those provided by icu-swift.

I feel like this will end up being a case-by-case (all puns intended) tiny research project.

Case is pretty tricky. Unicode defines at least 3 different levels of thinking about case, because of course it does.

Old fashioned notions of case (such as in Java or CharacterSet) are based on general category, but that proved to not be very future-proof and skewed towards bicameral alphabets. The second level of case is from derived properties, which is likely what we'll want for UnicodeScalar at least. Relevant (trimmed) quote from the State of String thread:

Finally, the third level are the String functions such as isLowercase and isUppercase. I don't know if this level is overkill on Character, but off-the-cuff it seems viable. There might be some decent fast-paths in the implementation we can use for common scenarios.

AFAICT, isLowercase would return true for caseless graphemes, such as "7". I don't know what behavior we want to expose, e.g. perhaps a grapheme has to satisfy both Unicode's isCased and isLowercase for our Swift computed property isLowercase.

Or we expose both and let the user sort it out. Really depends on the use.