SE-0243: Codepoint and Character Literals

lorentey · March 17, 2019, 10:43pm

These are really useful! I really don't see your point, though -- String.utf8 (and Unicode.Scalar.ascii) seem to provide perfectly elegant, safe and efficient solutions to all of them:

// storing a bytestring value
static var liga = "liga".utf8

// storing an ASCII scalar to mixed utf8-ASCII text
var xml: [UInt8] = ...
xml.append('/'.ascii)
xml.append('>'.ascii)

// ASCII range operations 
let current: UnsafePointer<UInt8> = ...
if 'a'.ascii ... 'z'.ascii ~= current.pointee {
    ...
}

// ASCII arithmetic operations 
let year: ArraySlice<UInt8> = ...
var value: Int = 0
for digit: UInt8 in year {
    guard '0'.ascii ... '9'.ascii ~= digit else {
        ...
    }
    value *= 10
    value += Int(digit - '0'.ascii)
}

// reading an ASCII scalar from mixed utf8-ASCII text 
let xml: [UInt8] = ... 
if let i: Int = xml.firstIndex(of: '<'.ascii) {
    ...
}

// matching ASCII signatures 
let c: UnsafeRawBufferPointer = ...
if c.starts(with: "PLTE".utf8) {
    ...
}

Note: I took the liberty of replacing Int8 above with UInt8. As far as I know, Int8 data typically comes from C APIs imported as CChar, which is a truly terrible type: it's documented to be either UInt8 or Int8, depending on the platform. Any code that doesn't immediately and explicitly rebind C strings to UInt8 is arguably broken.