Adding Unicode properties to UnicodeScalar/Character

Ah yes, you're right—I was getting wrapped up in the various definitions. In that case, it looks like the derived "Lowercase" and "Uppercase" property would directly give us what we want for scalars and single-scalar characters.

AFAIK, this is where the Default Case Detection rules in 3.13 come in. So to try to restate everything:

  1. For Unicode.Scalars and Characters consisting of a single Unicode.Scalar or consisting of multiple scalars that are canonically equivalent to a single scalar, isLowercase equals the value of the single scalar's derived Lowercase property.

  2. For Characters consisting of multiple scalars that are not canonically equivalent to a single scalar, then isLowercased is true if and only if C == toLowercase(C) && isCased(C).

Case #1 is really a subset of case #2, but it presents an optimization opportunity for single scalars where we don't have to compute a temporary mapping and test equality. Overall, this behavior is consistent with what's described by 3.13 and produces the correct results for something like "a + several combining accents" (where isCased keeps it true) and for emoji sequences where isCased would be false, therefore saying the whole cluster is false.

How does that sound?

1 Like