SE-0243: Codepoint and Character Literals

Vogel · March 5, 2019, 9:11pm

I accept this argument and retract my previous argument about alternative encodings.

I would support an ExpressibleByUnicodeScalarLiteral improvement with new _ExpressibleByBuiltinUnicodeScalarLiteral conformances:

extension UTF8 {
    //get rid of the old typealias to UInt8. Leave UInt8 alone!:
    struct CodeUnit: _ExpressibleByBuiltinUnicodeScalarLiteral, ExpressibleByUnicodeScalarLiteral {
        //8-bit only, compiler-enforced. Custom types can also use UTF8.CodeUnit as its UnicodeScalarLiteralType:
        typealias UnicodeScalarLiteralType = CodeUnit

        var value: UInt8
    }
}

This would use the well-known double quotes. It would add compiler-enforced 8- and 16-bit code unit types.

It would not pollute Integer APIs at all.

The only problem of course is that changing the Element of String.UTF8View etc. would be a breaking change (It is using UTF8.CodeUnit). Maybe there needs to be a String.betterUTF8View (or whatever other name) and the old utf8View etc. would just be deprecated.

Everyone that wants to mess around with code units can then use types like [UTF8.CodeUnit] instead of [UInt8]