SE-0243: Codepoint and Character Literals

There might, someday, be a common future 8 bit encoding where the literal ASCII ‘a’ might be misinterpreted as an ‘x', but TODAY, we live in a world where there is a ton of ASCII, esp at the lower levels.

If there is some future 8 bit encoding that makes this untenable, Swift could almost certainly find a way to make this clear.

But for now, we need a clear concise way to express single character ASCII values without a ton of boilerplate.

I’m seeing a surprising amount of pushback on integers being expressible by character literals and all that implies in that code like the following would compile:

let m1 = ('a' as Int) + 12     // this is (barely) acceptable to me
let m2 = 'a' + 12              // ...but this seems extremely unwise
let wat = 'a' * 'b' / 'z'      // ...and this is just absurd

In fact the last line gives an error but only because there are too many ways it could compile (it is ambiguous). This does compile:

let wat: Int = 'a' * 'b' / 'z'      // ...and this is just absurd

Arithmetic on character values s more useful than one might think, consider the decoding hex example:

    func nibble(_ char: UInt16) throws -> UInt16 {
        switch char {
        case '0' ... '9':
            return char - '0'
        case 'a' ... 'f':
            return char - 'a' + 10
        case 'A' ... 'F':
            return char - 'A' + 10
        default:
            throw NSError(domain: "bad character", code: -1, userInfo: nil)
        }
    }

Imagine you had to write a JSONDecoder.

I'd thought through an accident of ABI stability history we had found the goldilocks point where this sort of behaviour wasn’t enabled by default but users had to opt in (by declaring an ExpressibleByUnicodeScalaLiteral conformance) but this seems to be sufficiently unintuitive to leave it in no mans land in peoples minds, neither conservative or convenient.

I’m fine with not deprecating UInt8(ascii: “a”) as the alternative but I’d have more time for if it was generic by return type able to take it’s type from the expression context as in reality you typically have to use something like Int8(UInt8(ascii: “a”)) which is a bit of a handful.

I’ve seen surprisingly little push back on the other aspect of the proposal which is to deprecate “” in favour of ’' for Unicode.Scalar and Character literals. This is a source breaking change but seems to crop up in comparatively few places:

"string".split(separator: "a")
...: warning: double quotes deprecated in favour of single quotes to express ‘Character'
// and ironically
UInt8(ascii: “a”)
...: warning: double quotes deprecated in favour of single quotes to express ‘Unicode.Scalar'

Is there any appetite for proceeding with this part of the proposal until it is possible to allow gated conformances to ExpressibleByUnicodeScalaLiteral to be added by default and we can judge the other part of the proposal on it’s end state rather than it’s awkward half way point. IMO this is a worthwhile change in it’s own right in terms of the ergonomics of the language to make explicit contexts where we are dealing with a single character and starting this change now will make eventual adoption of integers being expressible by character literals easier if we take another look at it.

Then please show us this way. From my point of view, once the API of Integers has been polluted with the new protocol conformance, there is no way back, because code that makes use of it never explicitly mentions encodings. So how would the future compiler know if the collection of integers being worked on is supposed to eventually be interpreted as UTF-8 etc. or some other encoding? It quite obviously can't.

2 Likes

This is an excellent argument for extending init(ascii:) to all fixed-width integer types--or at least to Int8.

3 Likes

I'm sorry, but that code is horrible. Why does it take an UInt16 and not an UInt8 (or either via generics)? Why does it return a UInt16? the maximum value that can be returned is obviously 15, so why not UInt8 or, in conformance with the Swift "use Int unless there are specific reasons to use another integer type", just Int? This is not a good use case.

Most importantly, the standard library can already convert from strings to integers with any radix, so that functionality already exists.

I agree with you here, that is also pretty bad and should be criticized more. Charater Literals being just shorter String literals has a certain elegance, just how Integer Literals are basically just Float literals without the whole ".123" or "e123" extensions. I'd actually like it very much if ExpressibleByFloatLiteral would inherit from ExpressibleByIntegerLiteral in the same way ExpressibleByStringLiteral inherits from ExpressibleByCharacterLiteral and ExpressibleByUnicodeScalarLiteral. But that's not what matters here.

1 Like

Can someone help me better understand the motivations behind using single quotes vs. reusing double quotes?

A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character or Unicode.Scalar type using as Character or as Unicode.Scalar , respectively.

Couldn't the same be said about Set and ArrayLiteralConvertible? Given that Character can be inferred as a string literal, this pain point also feels overstated.

Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or character in some cases.

Clarity in knowing what a literal is feels like an issue of literal syntax in general, regardless of collection/element differences. And by distinguishing a collection literal from its element, does this proposal also suggest that any collection of character literal elements be expressible as a string literal?

Character types also don't support string literal interpolation, which is another reason to move away from double quotes.

This doesn't seem like a strong motivator. The compiler already bans this, no?

I guess what I'm looking for is why reusing double quotes wasn't even mentioned in "alternatives considered." Seems like a big omission?

1 Like
  • What is your evaluation of the proposal?

-1

  • Does this proposal fit well with the feel and direction of Swift?

No, definitely not the proposed integer type conformances.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Not sure, because the proposal lacks convincing examples.

From the proposal:

With these changes, the hex code example can be written much more naturally:

let hexcodes: [UInt8] = [
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
]

for scalar in int8buffer {
    switch scalar {
    case 'a' ... 'f':
        // lowercase hex letter
    case 'A' ... 'F':
        // uppercase hex letter
    case '0' ... '9':
        // hex digit
    default:
        // something else
    }
}

I don't see how this change is warranted, given that you can have the following today:

let hexcodes: [AsciiScalar] = [
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    "a", "b", "c", "d", "e", "f"
]

for scalar in hexcodes {
    switch scalar {
    case "a" ... "f":
        // lowercase hex letter
    case "A" ... "F":
        // uppercase hex letter
    case "0" ... "9":
        // hex digit
    default:
        // something else
    }
}

With a struct:

struct AsciiScalar {
    let value: UInt8
}

and adding conformances to ExpressibleByUnicodeScalarLiteral and Strideable.

I am sure there are other, more convincing examples, but if so, please add those to the proposal.

That being said, it would probably be nice if ExpressibleByUnicodeScalarLiteral allowed for a single quoted character.

6 Likes

Yes, it does. And it's very obvious to see in code whether the literal you're looking at contains interpolation or not, especially with syntax highlighting and different colors for the inline expression and the string contents. So it's working well already, no need for single quotes :blush::+1:

I don't have a strong opinion on the subject, but I can look at those lines of code of mine:

func components(cString: UnsafePointer<CChar>, length: Int)
    -> DatabaseDateComponents?
{
    assert(strlen(cString) == length)
    guard length >= 5 else { return nil }
    if cString.advanced(by: 4).pointee == 45 /* '-' */ {
        return datetimeComponents(cString: cString, length: length)
    }
    if cString.advanced(by: 2).pointee == 58 /* ':' */ {
        return timeComponents(cString: cString, length: length)
    }
    return nil
}

For the context, this piece of code decides if we are parsing a SQLite date string (YYYY-MM-DD...), or a time (HH:MM...). The string length is provided by SQLite (so the strlen check is just a debugging assertion).

The interesting parts are of course:

if cString.advanced(by: 4).pointee == 45 /* '-' */ { ... }
if cString.advanced(by: 2).pointee == 58 /* ':' */ { ... }

In the current state of Swift, they could also have been written this way:

if cString.advanced(by: 4).pointee == Int8(UInt8(ascii: "-")) { ... }
if cString.advanced(by: 2).pointee == Int8(UInt8(ascii: ":")) { ... }

And with this proposal (as stated in the Motivation section), we would read instead:

if cString.advanced(by: 4).pointee == '-' { ... }
if cString.advanced(by: 2).pointee == ':' { ... }

To me this looks like an net enhancement. I'm right in the target. Considering this kind of code can run in a tight loop, I also don't mind skipping a few conversion CPU cycles (no I did not check for actual evidence of a real gain).

3 Likes

If this concern is valid, what about some "magic" import:

import ASCII
// Now all character literals are ascii

This would fit most needs (when one file deals with Ascii literals only), and yet prevent any ASCII lock-in, prevent any implicit conversion, and allow support for other encodings (import EBCDIC).

It's not really "magic" of course: those modules would just add the required conformances.

We already have such imports in Swift, such as import Foundation (which does come with a lot of real magic).

7 Likes

We do have heterogeneous comparisons for integers, so this already works today:

if cString.advanced(by: 4).pointee == UInt8(ascii: "-") { ... }
if cString.advanced(by: 2).pointee == UInt8(ascii: ":") { ... }

(Which doesn't mean we don't need the generic FixedWidthInteger.init(ascii:) variant, of course.)

4 Likes

Thank you for this great example, I love it!
That's because it shows off one thing that's very wrong with the very idea behind this proposal: Unnecessary use of low-level APIs instead of proper high-level APIs that already solve the problem perfectly fine.

Here's my suggestion:

extension DatabaseDateComponents {
    init?(cString: UnsafePointer<CChar>) {
        let string = String(cString: cString)

        guard string.count >= 5 else {
            return nil
        }

        if string[4] == "-" {
            self.init(datetime: string)
        } else if string[2] == ":" {
            self.init(time: string)
        } else {
            return nil
        }
    }
}
1 Like

While I tend to agree with you in general that hyperoptimized code does not need sugar when higher‐level constructs are already ergonomic...

You realize it is not actually that simple (yet)? :wink:

1 Like

I think this idea is way better than the proposal at hand.

Really, I think that no code should ever import this though. Even if it's an interesting convenience feature, you've already brought up an important point: It wouldn't be possible to mix encodings. That might seem like a small thing, but it ultimately breaks an important promise that the language, so far, has made: All of its features can be used together, however you want. That's the kind of stuff that a strong general-use programming language does. Providing you with weird hacks for low-level solutions, breaking the promise of "an import of a module adds only declarations and doesn't change the behavior of the compiler in any other way" – that's the kind of stuff that bad, awkward, special-usecase programming languages do.

I'm not sure what kind of magic import Foundation comes with (mostly coercion between types?), but it's important to consider that Foundation existed before Swift and anything that it does can be considered a backwards-compatibility hack, whereas that claim cannot be made about any new awkwardness introduced by this proposal.

But it could be :heart:

Or at least:

if string[offset: 4] == "-" {

Code is more often read than written, and this example reads perfectly clearly to me. It’s no more work to use this initializer than that required to convert among numeric types, and it is much, much more concise in comparison to C than is UnsafeMutablePointer to its C counterpart.

Thanks Vogel, this is an opportunity to revisit this old piece of code. I don't want to abuse anyone's time, yet:

  • We don't have string[4], unfortunately :sweat_smile:. It rather reads string[string.index(string.startIndex, offsetBy: 4)]

  • Next, there is a performance loss: 2s with the String(cString:) variant, vs. 0.48s with my ugly raw char comparison. (Xcode 10.1, performance test with a tight loop that checks only the code we are talking about).

So I'm rather keep my ugly code than using a "proper high-level API". I write a utility library, not a Swift style guide: I only care about fast database decoding.

5 Likes

For the record, comparison with UInt8(ascii: "-") or 45 yields no noticeable performance difference.

2 Likes

Looks like UInt8(ascii:) is compiled down to the value, so the executed code is exactly the same.

1 Like

For the simple, single comparisons shown, I might agree. For anything more complex, the literal syntax is much more readable.

3 Likes