SE-0243: Codepoint and Character Literals

taylorswift · March 7, 2019, 11:18pm

It’s helpful to outline some big-picture roadmaps for Swift literals to understand how this concept would work and how it fits in to the rest of the language. I think everyone agrees the current ExpressibleBy system is excessively complex and magical, and it’s becoming clear the language is starting to outgrow this system.

To be clear, this is not part of the proposal, rather a vision for how 'x' as UInt8 will evolve from “special compiler magic” to “general language feature”.

Basically, in place of protocol conformances, we would have a set of @ attributes that would mark initializers as being literal initializers.

enum Int.Base 
{
    case decimal, octal, binary, hexadecimal
}
enum Double.Base 
{
    case decimal, hexadecimal
}
extension Double 
{
    // instead of receiving fully-parsed `Builtin` values, these 
    // initializers just receive minimally-parsed lexer tokens
    // the `Int(integer)` part means it depends on `Int`’s `@integerLiteral` 
    // initializer.
    @integerLiteral(expressible, Int(integer))
    init(sign:Sign, base:Int.Base, digits:[Int])

    @floatLiteral(expressible, Int(integer), Double(integer))
    init(sign:Sign, base:Base, fraction:[Int], digits:[Int], exponent:[Int])
}

// invokes Double.init(sign: .plus, base: .decimal, digits:[9, 8, 9, 1])
let x:Double = 1989

// invokes Double.init(sign: .plus, base: .hexadecimal, 
//                 fraction: [10], digits:[1, 15], exponent: [2, 1])
let y:Double = 0xF1.Ap12

Unlike the ExpressibleBy initializers, @{}Literal-annotated initializers would not be callable at runtime (a gaping hole in the current system, which makes no sense, and has been the source of endless headaches),, it follows that they would not be part of ABI. This is important because it makes the initializer arguments constant expressions, allowing literal syntax errors to be thrown from static #asserts inside the initializers, rather than from C++ code in the compiler. This would move a lot of C++ implementation into the standard library, and allow us to get rid of a considerable amount of Builtin cruft. It would open up a lot of new possibilities for things like arbitrary-precision integer types, which right now have to go through Builtin.IntLiteral.

For static string, string, character, and unicode scalar literals, we would be able to drop the overlapping ExpressibleByStringLiteral:ExpressibleByExtendedGraphemeClusterLiteral:ExpressibleByUnicodeScalarLiteral mess we have right now, and just have unified @textLiteral and @textElementLiteral attributes.

extension String 
{
    @textLiteral(expressible, Unicode.Scalar(text))
    init(hashtags:Int, unicodeScalars:[Unicode.Scalar])
}

Note that @textLiteral and textElementLiteral initializers take an array of Unicode.Scalars, because grapheme cluster boundaries aren’t known to the compiler at compile-time.

The compiler would evaluate a text literal (or any other literal) in source by looking for all visible @textLiteral initializers that are marked expressible that match the type context, if there is any. If there isn’t any, it will look for an initializer in the TextLiteralType typealias, analogous to what we do now.

The as coercion operator would do something similar, but it would not need a typealias (since the rhs of the operator gives the concrete type), and the initializer would not have to be marked expressible. This gives us coercible-but-not-expressible by ___ types.

extension UInt8 
{
    @textElementLiteral(Int(integer), UInt32(integer)) 
    init(unicodeScalars:[Unicode.Scalar])
    {
        // can use `Int` and `UInt32` literals because we declared them 
        // as dependencies
        #assert(unicodeScalars.count == 1, 
            unicodeScalars[0].value & 0xff_ff_ff_80 == 0)

        self.init(truncatingIfNeeded: unicodeScalars[0].value)
    }
}

Of course, this isn’t really that different from defining a function or method that takes explicitly @compilerEvaluable arguments, and would be spelled similar to the existing UInt8(ascii:) initializer. (Though we obviously couldn’t reuse the function signature.) But I think as is clearer and more readable, and makes more sense given its existing semantics in the language. This would be especially true if functions with @compilerEvaluable-restricted arguments share the same call site syntax as normal Swift functions, since in situations like

let a:Character = 'a'
let ord:UInt8 = foo(a)

you can’t tell whether foo is folding its argument without knowing its signature.