SE-0243: Codepoint and Character Literals

stephencelis · March 5, 2019, 5:13pm

Can someone help me better understand the motivations behind using single quotes vs. reusing double quotes?

A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character or Unicode.Scalar type using as Character or as Unicode.Scalar , respectively.

Couldn't the same be said about Set and ArrayLiteralConvertible? Given that Character can be inferred as a string literal, this pain point also feels overstated.

Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or character in some cases.

Clarity in knowing what a literal is feels like an issue of literal syntax in general, regardless of collection/element differences. And by distinguishing a collection literal from its element, does this proposal also suggest that any collection of character literal elements be expressible as a string literal?

Character types also don't support string literal interpolation, which is another reason to move away from double quotes.

This doesn't seem like a strong motivator. The compiler already bans this, no?

I guess what I'm looking for is why reusing double quotes wasn't even mentioned in "alternatives considered." Seems like a big omission?

bobergj · March 5, 2019, 5:16pm

What is your evaluation of the proposal?

-1

Does this proposal fit well with the feel and direction of Swift?

No, definitely not the proposed integer type conformances.

Is the problem being addressed significant enough to warrant a change to Swift?

Not sure, because the proposal lacks convincing examples.

From the proposal:

With these changes, the hex code example can be written much more naturally:

let hexcodes: [UInt8] = [
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
]

for scalar in int8buffer {
    switch scalar {
    case 'a' ... 'f':
        // lowercase hex letter
    case 'A' ... 'F':
        // uppercase hex letter
    case '0' ... '9':
        // hex digit
    default:
        // something else
    }
}

I don't see how this change is warranted, given that you can have the following today:

let hexcodes: [AsciiScalar] = [
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
    "a", "b", "c", "d", "e", "f"
]

for scalar in hexcodes {
    switch scalar {
    case "a" ... "f":
        // lowercase hex letter
    case "A" ... "F":
        // uppercase hex letter
    case "0" ... "9":
        // hex digit
    default:
        // something else
    }
}

With a struct:

struct AsciiScalar {
    let value: UInt8
}

and adding conformances to ExpressibleByUnicodeScalarLiteral and Strideable.

I am sure there are other, more convincing examples, but if so, please add those to the proposal.

That being said, it would probably be nice if ExpressibleByUnicodeScalarLiteral allowed for a single quoted character.

Vogel · March 5, 2019, 6:01pm

Yes, it does. And it's very obvious to see in code whether the literal you're looking at contains interpolation or not, especially with syntax highlighting and different colors for the inline expression and the string contents. So it's working well already, no need for single quotes

gwendal.roue · March 5, 2019, 6:22pm

I don't have a strong opinion on the subject, but I can look at those lines of code of mine:

func components(cString: UnsafePointer<CChar>, length: Int)
    -> DatabaseDateComponents?
{
    assert(strlen(cString) == length)
    guard length >= 5 else { return nil }
    if cString.advanced(by: 4).pointee == 45 /* '-' */ {
        return datetimeComponents(cString: cString, length: length)
    }
    if cString.advanced(by: 2).pointee == 58 /* ':' */ {
        return timeComponents(cString: cString, length: length)
    }
    return nil
}

For the context, this piece of code decides if we are parsing a SQLite date string (YYYY-MM-DD...), or a time (HH:MM...). The string length is provided by SQLite (so the strlen check is just a debugging assertion).

The interesting parts are of course:

if cString.advanced(by: 4).pointee == 45 /* '-' */ { ... }
if cString.advanced(by: 2).pointee == 58 /* ':' */ { ... }

In the current state of Swift, they could also have been written this way:

if cString.advanced(by: 4).pointee == Int8(UInt8(ascii: "-")) { ... }
if cString.advanced(by: 2).pointee == Int8(UInt8(ascii: ":")) { ... }

And with this proposal (as stated in the Motivation section), we would read instead:

if cString.advanced(by: 4).pointee == '-' { ... }
if cString.advanced(by: 2).pointee == ':' { ... }

To me this looks like an net enhancement. I'm right in the target. Considering this kind of code can run in a tight loop, I also don't mind skipping a few conversion CPU cycles (no I did not check for actual evidence of a real gain).

gwendal.roue · March 5, 2019, 6:28pm

If this concern is valid, what about some "magic" import:

import ASCII
// Now all character literals are ascii

This would fit most needs (when one file deals with Ascii literals only), and yet prevent any ASCII lock-in, prevent any implicit conversion, and allow support for other encodings (import EBCDIC).

It's not really "magic" of course: those modules would just add the required conformances.

We already have such imports in Swift, such as import Foundation (which does come with a lot of real magic).

lorentey · March 5, 2019, 6:42pm

gwendal.roue:

In the current state of Swift, they could also have been written this way:
if cString.advanced(by: 4).pointee == Int8(UInt8(ascii: "-")) { ... }
if cString.advanced(by: 2).pointee == Int8(UInt8(ascii: ":")) { ... }

We do have heterogeneous comparisons for integers, so this already works today:

if cString.advanced(by: 4).pointee == UInt8(ascii: "-") { ... }
if cString.advanced(by: 2).pointee == UInt8(ascii: ":") { ... }

(Which doesn't mean we don't need the generic FixedWidthInteger.init(ascii:) variant, of course.)

Vogel · March 5, 2019, 6:43pm

gwendal.roue:

I don't have a strong opinion on the subject, but I can look at those lines of code of mine:

func components(cString: UnsafePointer<CChar>, length: Int)
    -> DatabaseDateComponents?
{
    assert(strlen(cString) == length)
    guard length >= 5 else { return nil }
    if cString.advanced(by: 4).pointee == 45 /* '-' */ {
        return datetimeComponents(cString: cString, length: length)
    }
    if cString.advanced(by: 2).pointee == 58 /* ':' */ {
        return timeComponents(cString: cString, length: length)
    }
    return nil
}

Thank you for this great example, I love it!
That's because it shows off one thing that's very wrong with the very idea behind this proposal: Unnecessary use of low-level APIs instead of proper high-level APIs that already solve the problem perfectly fine.

Here's my suggestion:

extension DatabaseDateComponents {
    init?(cString: UnsafePointer<CChar>) {
        let string = String(cString: cString)

        guard string.count >= 5 else {
            return nil
        }

        if string[4] == "-" {
            self.init(datetime: string)
        } else if string[2] == ":" {
            self.init(time: string)
        } else {
            return nil
        }
    }
}

SDGGiesbrecht · March 5, 2019, 6:51pm

While I tend to agree with you in general that hyperoptimized code does not need sugar when higher‐level constructs are already ergonomic...

You realize it is not actually that simple (yet)?

Vogel · March 5, 2019, 6:54pm

gwendal.roue:

If this concern is valid, what about some "magic" import:
import ASCII
// Now all character literals are ascii
This would fit most needs (when one file deals with Ascii literals only), and yet prevent any ASCII lock-in, prevent any implicit conversion, and allow support for other encodings ( import EBCDIC ).

It's not really "magic" of course: those modules would just add the required conformances.

We already have such imports in Swift, such as import Foundation (which does come with a lot of real magic).

I think this idea is way better than the proposal at hand.

Really, I think that no code should ever import this though. Even if it's an interesting convenience feature, you've already brought up an important point: It wouldn't be possible to mix encodings. That might seem like a small thing, but it ultimately breaks an important promise that the language, so far, has made: All of its features can be used together, however you want. That's the kind of stuff that a strong general-use programming language does. Providing you with weird hacks for low-level solutions, breaking the promise of "an import of a module adds only declarations and doesn't change the behavior of the compiler in any other way" – that's the kind of stuff that bad, awkward, special-usecase programming languages do.

I'm not sure what kind of magic import Foundation comes with (mostly coercion between types?), but it's important to consider that Foundation existed before Swift and anything that it does can be considered a backwards-compatibility hack, whereas that claim cannot be made about any new awkwardness introduced by this proposal.

Vogel · March 5, 2019, 6:55pm

But it could be

Or at least:

if string[offset: 4] == "-" {

xwu · March 5, 2019, 7:17pm

lorentey:

We do have heterogeneous comparisons for integers, so this already works today:
if cString.advanced(by: 4).pointee == UInt8(ascii: "-") { ... }
if cString.advanced(by: 2).pointee == UInt8(ascii: ":") { ... }
(Which doesn't mean we don't need the generic FixedWidthInteger.init(ascii:) variant, of course.)

Code is more often read than written, and this example reads perfectly clearly to me. It’s no more work to use this initializer than that required to convert among numeric types, and it is much, much more concise in comparison to C than is UnsafeMutablePointer to its C counterpart.

gwendal.roue · March 5, 2019, 7:25pm

Thanks Vogel, this is an opportunity to revisit this old piece of code. I don't want to abuse anyone's time, yet:

We don't have string[4], unfortunately . It rather reads string[string.index(string.startIndex, offsetBy: 4)]
Next, there is a performance loss: 2s with the String(cString:) variant, vs. 0.48s with my ugly raw char comparison. (Xcode 10.1, performance test with a tight loop that checks only the code we are talking about).

So I'm rather keep my ugly code than using a "proper high-level API". I write a utility library, not a Swift style guide: I only care about fast database decoding.

gwendal.roue · March 5, 2019, 7:31pm

For the record, comparison with UInt8(ascii: "-") or 45 yields no noticeable performance difference.

Jon_Shier · March 5, 2019, 7:37pm

Looks like UInt8(ascii:) is compiled down to the value, so the executed code is exactly the same.

Jon_Shier · March 5, 2019, 7:38pm

For the simple, single comparisons shown, I might agree. For anything more complex, the literal syntax is much more readable.

michelf · March 5, 2019, 7:53pm

I have yet to form a good opinion on everything here, but I just realized something worth mentioning. You can write this today:

let x = UInt8("8") // result: 0x08

It parses the string and give you the integer 8. This proposal makes it so you can write a one-character string with single quotes, which means this would become valid:

let x = UInt8('8') // result: 0x08

I assume the type of the literal '8' will be inferred to String (because this is what the initializer wants) and so it'd parses the string and give you the integer 8.

So far so good... but now if we make UInt8 initializable by a literal directly notice how the result from this is different:

let x: UInt8 = '8' // result: 0x38

or this:

let x = '8' as UInt8 // result: 0x38

That seems very confusing and error prone to me.

johnno1962 · March 5, 2019, 7:55pm

Well spotted. This is picked up by and diagnosed as an error in the implementation. You’ll no longer be able to UInt8(“<any single digit>”) though I’d love to know why you’d want to! This only applys to literals not String values.

Vogel · March 5, 2019, 7:56pm

That's an important point. As mentioned above when @SDGGiesbrecht brought this up, I really think we need this in the standard library:

extension Collection {
    subscript(offset offset: Int) -> Element {
        return self[index(startIndex, offsetBy: offset)]
    }
}

Otherwise, there will always be cases where it's more convenient to use Arrays instead of Strings, just because of the easier syntax.

That's fair. If you care that much about performance, because it's happening suuuuuper often in a loop, then maybe sometimes you want to write it like this. But in that case, it's okay if it needs to be slightly verbose or even outright ugly. That's how someone reading the code knows that it's not semantically perfect (Different types for numbers and characters), but instead just something that happens to be working correctly the way it is written.

The point is this:

It's good that your variant is ugly, because that ugliness communicates something.

That being said, I actually think that the main performance problem here is copying the string from the cString, so maybe it would be helpful to have this:

extension DatabaseDateComponents {
    init?(cString: UnsafePointer<CChar>) {
        self = String.withUnsafeCString(cString) { //allows using the cString as a String without copying it
            guard $0.count >= 5 else {
                return nil
            }

            //... do the other stuff
        }
    }
}

xwu · March 5, 2019, 7:58pm

That is incompatible with SE-0213, which states that T(literal) should have the same behavior as constructing the literal when a type is expressible by that literal.

Well spotted point by @michelf. Adding Unicode scalar literal conformance to integer types would either be source breaking or break SE-0213 (or both, I guess).

michelf · March 5, 2019, 8:00pm

Wait, you mean UInt8("8") is going to be a compile-time error but not UInt8("88")?