String.CodeUnits view

An earlier discussion about the String redesign included a .codeUnits view which exposed the String's contiguous code-units.

This came up about a year ago during Index conversions ([Revised and review extended] SE-0180 - String Index Overhaul - #8 by dabrahams), but it didn't make it to a concrete proposal. I'm just wondering if that's still the plan.

The use-case I have for this API is that I'm passing the string to a C library which will return offsets of interesting elements, which I want to convert back to String.Indexes. It seems there isn't really a terribly convenient way to do that today - the only contiguous representation you get is a utf8CString. This has multiple deficiencies, such as not providing a length (the library in question requires start + end pointers), introducing unnecessary transcoding (the library could read from the native UTF-16 buffer), and more expensive index conversions (back from the UTF8 view to the String's native view).

CC: @dabrahams, @Ben_Cohen

1 Like

Yes, this is still the plan for strings with contiguous backing storage (in contrast to Strings lazily bridged from arbitrary subclasses of NSString which may not be contiguous). This is currently complicated by the fact that ASCII strings are stored in 1-byte storage while non-ASCII strings are stored in 2-byte UTF-16 storage. There's also complications regarding potential UTF-8 support. Sorting these out will be happening during the wind-down towards ABI stability.

3 Likes