SE-0243: Codepoint and Character Literals

ExpressibleByUnicodeScalarLiteral does allow you to run into the same issues today. But there are three significant differences.

  1. To encounter most of the issues today, you have to compose separate concepts:

    This is dangerous code:

    let x = "é" // Creating a text literal.
    let n = x.unicodeScalars.first!.value // Getting the numeric encoding.
    

    Thanks to Unicode equivalence, after copy‐and‐paste, etc., this source might compile so that n is 0x65 or 0xE8.

    But each line on its own would have been perfectly reasonable and safe in different contexts:

    let x = "é" // Creating a text literal (like line 1 above).
    if readLine().contains(x.first) { // Using it safely as text.
        print("Your string has no “é” in it.")
    }
    
    print("Your string is composed of these characters:")
    for x in readLine().unicodeScalars {
        let n = x.value // Getting the numeric encoding (like line 2 above).
        print("U+\(n)") // Safely expecting it to be absolutely anything.
    }
    

    But this proposal adds the combination as a single operation:

    let n: UInt8 = 'a'
    

    Thus it has the extra responsibility to consider where and when the entire combined operation as a whole is and isn’t reliable.

  2. The subset of issues you can already encounter today without composition fail immediately with a compiler error.

    This is dangerous code:

    let x = "é" as Unicode.Scalar
    

    That may or many not still fit into a Unicode.Scalar after copy‐and‐paste, but you will know immediately.

    The proposal, had it not limited itself, would have introduced instances of single operations that would be derailed silently:

    let x: UInt32 = `Å` // Started out as 0x212B, might become 0xC5.
    

    Hidden, nebulous logic changes would be far worse than the sudden compiler failures we can encounter today.

  3. Today, the most straightforward, safest way to get a vulnerable scalar from a literal currently requires all the same compiler and language functionality as the dangerous code:

    let x = "\u{E8}" as Unicode.Scalar
    

    Making the dangerous variant illegal would have made this safe way impossible as well.

    On the other hand, regarding the additions in the proposal, the most straightforward safe way to get a vulnerable integer is completely unrelated in both syntax and functionality, and is thus unaffected by the safety checks:

    let x = 0xE8
    

(For any new readers who want more information about this, the relevant discussion and explanations begin here in the pitch thread and run for about 20 posts until a consensus was reached around what was called “Option 4”. Please read those first if you want to ask a question or post an opinion about the restriction to ASCII.)

7 Likes