SE-0243: Codepoint and Character Literals

Or we just change the way Unicode.Scalars are printed.

My comment was directed at no one in particular, but rather to potential future posters in general. I saw no problem with what had been posted so far, which was a reasonable exploration of the area where the subjects intersect. I just knew it had potential to grow out of hand very quickly, and I did not want the review derailed.

1 Like

I accept this argument and retract my previous argument about alternative encodings.

I would support an ExpressibleByUnicodeScalarLiteral improvement with new _ExpressibleByBuiltinUnicodeScalarLiteral conformances:

extension UTF8 {
    //get rid of the old typealias to UInt8. Leave UInt8 alone!:
    struct CodeUnit: _ExpressibleByBuiltinUnicodeScalarLiteral, ExpressibleByUnicodeScalarLiteral {
        //8-bit only, compiler-enforced. Custom types can also use UTF8.CodeUnit as its UnicodeScalarLiteralType:
        typealias UnicodeScalarLiteralType = CodeUnit

        var value: UInt8
    }
}

This would use the well-known double quotes. It would add compiler-enforced 8- and 16-bit code unit types.

It would not pollute Integer APIs at all.

The only problem of course is that changing the Element of String.UTF8View etc. would be a breaking change (It is using UTF8.CodeUnit). Maybe there needs to be a String.betterUTF8View (or whatever other name) and the old utf8View etc. would just be deprecated.

Everyone that wants to mess around with code units can then use types like [UTF8.CodeUnit] instead of [UInt8]

2 Likes

I feel like the constant values are still important, see this ugly wall of code in the PNG library

That doesn't seem all that bad to me, it's just a bunch of magic values expressed as static constants?

well it would be a lot nicer if

public static
let IHDR:Tag = .init(73, 72, 68, 82)

was

public static
let IHDR:Tag = .init('I', 'H', 'D', 'R')

or just

public static 
let IHDR:Vector4<UInt8> = ('I', 'H', 'D', 'R')

(coming soon to a swift evolution near you!)

It does look like there's a ExpressibleByStringLiteral wanting to come out in that particular example.

1 Like

Not really. No compile time validation of string length or codepoint range, and it requires runtime setup with ICU.

1 Like

For this and similar use-cases, I think you should either conform the type itself to ExpressibleByStringLiteral, or else use Jonas’s idea of an ASCII struct:

struct ASCII: ExpressibleByUnicodeScalarLiteral {
  var rawValue: UInt8
  …
}

Neither validates the Unicode.Scalar values to be ASCII. if that seems like a edge concern, think about how easy it is to accidentally type a '–' instead of a '-' or a '“' instead of a '"'. In fact in school lecture slides i've seen more accidental uses of than correct uses of '.

There is nothing stopping the library authors from writing an initializer that takes four Unicode scalars. This is an API entirely under the end user's control.

1 Like

Whether written in binary, octal, hex, or decimal, all of those are integer literals. Swift does not use any prefixes to distinguish between literals. This is actually a bit of a challenge for decimal vs hex float literals, but nonetheless, there is no use of prefixes in Swift anywhere to indicate different literals or different default literal types.

yes, but it should take four ASCII scalars, not four Unicode.Scalars. They are only the same for a specific subset of Unicode.Scalars. What happened to the importance of text encodings?

That too is a precondition under the control of the API authors.

PNG allows user-defined chunk types,, this is why Tag is a struct and not an enum providing the ASCII slugs as computed properties on self. (This also prevents users from accidentally shadowing a public chunk type, since the stored representation means it will just get interpreted as the public chunk.) Having the initializer take four UInt8s is a form of idiotproofing so that code like

let myType:Tag = .init("k", "l", "o", "ß")

or, god save us,

let myType:Tag = "kloß"

won’t compile to begin with. I know this because i am the API author

UInt8 isn’t perfect, it’s still possible to sneak a 0xFF in there, but the beautiful thing about this proposal is once we have ASCII-validated character literals, no one in their right mind would type a decimal number instead of a ASCII-validated character literal.

Just because we haven’t done it before in Swift doesn’t mean we can’t do it in the future. It’s heavily precedented in other languages. I don’t think “no literal prefixes” is anywhere in the language design goals. We are running short on delimiters, after all.

You make the ASCII type enforce the ASCII-ness.

If you want compile-time enforcement, then perhaps the language should expose a mechanism for defining arbitrary-sized binary integer types, such as UInt7.

,,, and this type’s init would take a UInt8 (or a UInt7 in a perfect world). We’re really just kicking the can deeper into the nest.

Sounds like you want compile-type checking of preconditions. As you point out, UInt8 doesn't actually idiot-proof your code at compile time, and this proposal isn't required for making it possible either.

4 Likes

Actually, I do recall that Chris Lattner has specifically said he or the core team made that choice early on. Whether the core team still holds that view today is unclear.

…erm, what is your point? The ASCII type would also conform to ExpressibleByUnicodeScalarLiteral, and the literal-initializer would ensure that only ASCII characters are permited:

extension ASCII: ExpressibleByUnicodeScalarLiteral {
  init(unicodeScalarLiteral value: Unicode.Scalar) {
    self.rawValue = UInt8(ascii: value)
  }
}

Then you can make your Tag initializer take four ASCII values, and call it like this:

let IHDR = Tag("I", "H", "D", "R")
1 Like