SE-0243: Codepoint and Character Literals

I feel like the constant values are still important, see this ugly wall of code in the PNG library

That doesn't seem all that bad to me, it's just a bunch of magic values expressed as static constants?

well it would be a lot nicer if

public static
let IHDR:Tag = .init(73, 72, 68, 82)

was

public static
let IHDR:Tag = .init('I', 'H', 'D', 'R')

or just

public static 
let IHDR:Vector4<UInt8> = ('I', 'H', 'D', 'R')

(coming soon to a swift evolution near you!)

It does look like there's a ExpressibleByStringLiteral wanting to come out in that particular example.

1 Like

Not really. No compile time validation of string length or codepoint range, and it requires runtime setup with ICU.

1 Like

For this and similar use-cases, I think you should either conform the type itself to ExpressibleByStringLiteral, or else use Jonas’s idea of an ASCII struct:

struct ASCII: ExpressibleByUnicodeScalarLiteral {
  var rawValue: UInt8
  …
}

Neither validates the Unicode.Scalar values to be ASCII. if that seems like a edge concern, think about how easy it is to accidentally type a '–' instead of a '-' or a '“' instead of a '"'. In fact in school lecture slides i've seen more accidental uses of ‘ than correct uses of '.

There is nothing stopping the library authors from writing an initializer that takes four Unicode scalars. This is an API entirely under the end user's control.

1 Like

Whether written in binary, octal, hex, or decimal, all of those are integer literals. Swift does not use any prefixes to distinguish between literals. This is actually a bit of a challenge for decimal vs hex float literals, but nonetheless, there is no use of prefixes in Swift anywhere to indicate different literals or different default literal types.

yes, but it should take four ASCII scalars, not four Unicode.Scalars. They are only the same for a specific subset of Unicode.Scalars. What happened to the importance of text encodings?

That too is a precondition under the control of the API authors.

PNG allows user-defined chunk types,, this is why Tag is a struct and not an enum providing the ASCII slugs as computed properties on self. (This also prevents users from accidentally shadowing a public chunk type, since the stored representation means it will just get interpreted as the public chunk.) Having the initializer take four UInt8s is a form of idiotproofing so that code like

let myType:Tag = .init("k", "l", "o", "ß")

or, god save us,

let myType:Tag = "kloß"

won’t compile to begin with. I know this because i am the API author

UInt8 isn’t perfect, it’s still possible to sneak a 0xFF in there, but the beautiful thing about this proposal is once we have ASCII-validated character literals, no one in their right mind would type a decimal number instead of a ASCII-validated character literal.

Just because we haven’t done it before in Swift doesn’t mean we can’t do it in the future. It’s heavily precedented in other languages. I don’t think “no literal prefixes” is anywhere in the language design goals. We are running short on delimiters, after all.

You make the ASCII type enforce the ASCII-ness.

If you want compile-time enforcement, then perhaps the language should expose a mechanism for defining arbitrary-sized binary integer types, such as UInt7.

,,, and this type’s init would take a UInt8 (or a UInt7 in a perfect world). We’re really just kicking the can deeper into the nest.

Sounds like you want compile-type checking of preconditions. As you point out, UInt8 doesn't actually idiot-proof your code at compile time, and this proposal isn't required for making it possible either.

4 Likes

Actually, I do recall that Chris Lattner has specifically said he or the core team made that choice early on. Whether the core team still holds that view today is unclear.

…erm, what is your point? The ASCII type would also conform to ExpressibleByUnicodeScalarLiteral, and the literal-initializer would ensure that only ASCII characters are permited:

extension ASCII: ExpressibleByUnicodeScalarLiteral {
  init(unicodeScalarLiteral value: Unicode.Scalar) {
    self.rawValue = UInt8(ascii: value)
  }
}

Then you can make your Tag initializer take four ASCII values, and call it like this:

let IHDR = Tag("I", "H", "D", "R")
1 Like

it’s not quite that simple, a quick sketch using @constexpression as a strawman attribute:

extension ASCII:ExpressibleByUnicodeScalarLiteral 
{
    init(unicodeScalarLiteral value:@constexpression Unicode.Scalar) 
    {
        #assert(value.value & 0xffff_ff80 == 0, 
            "Literal value '\(value)' is not an ASCII literal")
        self.rawValue = .init(truncatingIfNeeded: value.value)
    }
}

of course, even in theory this wouldn’t actually compile because ExpressibleByUnicodeScalarLiteral requires an init(unicodeScalarLiteral:Unicode.Scalar), not an init(unicodeScalarLiteral:@constexpression Unicode.Scalar). So this would probably get tied in with the improved literal initializer attributes discussed at the tuple thread. (weird how all these things seem to connect with each other lol.)

extension ASCII 
{
    @unicodeScalarLiteral // implied @constexpression for all arguments
    init(unicodeScalarLiteral value:Unicode.Scalar) 
    {
        #assert(value.value & 0xffff_ff80 == 0, 
            "Literal value '\(value)' is not an ASCII literal")
        self.rawValue = .init(truncatingIfNeeded: value.value)
    }
}

It really sounds though like what you’re actually chiselling out here is a standard library ASCII type, which would be ExpressibleByUnicodeScalarLiteral and get magic compiler checks in the same way that Int8 gets checked in the current proposal implementation. Definitely possible, but it would be a radically different direction from the current proposal and we’d have to start from scratch.

struct ASCII:ExpressibleByUnicodeScalarLiteral
{
    var _value:UInt8
    public 
    var value:UInt8 
    {
        return self._value
    }
    
    // dangerous, but that’s a design problem with the 
    // `ExpressibleBy` protocols
    public 
    init(unicodeScalarLiteral value:Unicode.Scalar) 
    {
        self._value = .init(truncatingIfNeeded: value.value)
    }
    
    public 
    init<T>(truncatingIfNeeded value:T) where T:BinaryInteger 
    {
        self._value = .init(truncatingIfNeeded: value & 0x8f)
    }
    
    public 
    init?<T>(_ value:T) where T:BinaryInteger 
    {
        guard let value:UInt8 = .init(exactly: value), 
            value & 0x80 == 0
        else 
        {
            return nil 
        }
        
        self._value = value 
    }
    
    static 
    func &+ (lhs:ASCII, rhs:ASCII) -> ASCII 
    {
        return .init(truncatingIfNeeded: lhs.value &+ rhs.value)
    }
    static 
    func &- (lhs:ASCII, rhs:ASCII) -> ASCII 
    {
        return .init(truncatingIfNeeded: lhs.value &- rhs.value)
    }
}

No, I’m saying you can literally get the syntax you want right now without any changes to the standard library at all. You just have to write the ASCII type in your project, and start using it.

The only thing missing is compile-time overflow checking.

3 Likes

But it isn't arithmetic on characters, it is arithmetic on the integers you get from an ascii table lookup (which for the current internal representation as UTF-8, doesn't necessitate a table).

UInt8(ascii: "a") is an explicit way to state this, but is considered wordy. I would argue that 'a' is not wordy enough.

3 Likes