Adding Unicode properties to UnicodeScalar/Character

I've started making some progress on an implementation of this. So far, it's pretty straightforward—I've taken the non-deprecated binary properties defined by the Unicode Standard (not any that are strictly ICU additions, yet) and implemented and documented them. Thanks for the documentation tips! This is just a start, and I'll end up fleshing them out a bit more. There's still a lot of work to do there. (For example, as you mentioned, we should clarify more about how properties like casing and whitespace work.)

The work so far is pushed in this branch: https://github.com/allevato/swift/compare/master...unicode-properties

I plan to keep chipping away at this in the near term as my time allows, adding more of the common properties we discussed above.

One question I'd like some input on: when I start adding enum-typed properties, some of those enums will be quite large. The easiest thing to do would be to make them RawRepresentable with the underlying ICU-defined integer values as the raw values, but that's a leaky abstraction and I assume we don't want that (what I wouldn't give for internal conformance right now!).

So, right now I'm planning to just write those enums by hand, along with internal inits with large switches to convert them from their raw values. I could also use GYB to simplify this a bit, but the cost there is we're adding another place where we're using GYB, which isn't great. Any strong opinions either way?

2 Likes