Another thought: We discussed earlier that CharacterSet is inadequate because its definition of lowercaseCharacters and uppercaseCharacters is based on general categories instead of derived properties.
But as shown above, there are still scalars (like feminine/masculine ordinals ª/º) where the property value is inconsistent with the result of the case detection function.
If, in the future, we want a Unicode.ScalarSet type that works as one would expect, I think users would expect the following to be true:
∀ (s ∈
Unicode.ScalarSet.lowercaseScalars)s.isLowercase == true
∀ (s ∉Unicode.ScalarSet.lowercaseScalars)s.isLowercase == false
...which means we cannot implement that set in terms of the Lowercase Unicode property alone. Likely, we would need two APIs, to match the proposed pair of APIs in the previous post:
Unicode.ScalarSet.lowercaseScalarsis defined as the set of scalars for whichs.isLowercase == trueUnicode.ScalarSet(havingProperty: .lowercase)is defined as the set of scalars for whichs.hasProperty(.lowercase) == true
The second one can be built directly on top of ICU uset_* APIs. The harder question is how we implement the first in a way that's both efficient and safe with respect to future changes to the Unicode data.