SE-0243: Codepoint and Character Literals

Thanks Vogel, this is an opportunity to revisit this old piece of code. I don't want to abuse anyone's time, yet:

  • We don't have string[4], unfortunately :sweat_smile:. It rather reads string[string.index(string.startIndex, offsetBy: 4)]

  • Next, there is a performance loss: 2s with the String(cString:) variant, vs. 0.48s with my ugly raw char comparison. (Xcode 10.1, performance test with a tight loop that checks only the code we are talking about).

So I'm rather keep my ugly code than using a "proper high-level API". I write a utility library, not a Swift style guide: I only care about fast database decoding.

5 Likes

For the record, comparison with UInt8(ascii: "-") or 45 yields no noticeable performance difference.

2 Likes

Looks like UInt8(ascii:) is compiled down to the value, so the executed code is exactly the same.

1 Like

For the simple, single comparisons shown, I might agree. For anything more complex, the literal syntax is much more readable.

3 Likes

I have yet to form a good opinion on everything here, but I just realized something worth mentioning. You can write this today:

let x = UInt8("8") // result: 0x08

It parses the string and give you the integer 8. This proposal makes it so you can write a one-character string with single quotes, which means this would become valid:

let x = UInt8('8') // result: 0x08

I assume the type of the literal '8' will be inferred to String (because this is what the initializer wants) and so it'd parses the string and give you the integer 8.

So far so good... but now if we make UInt8 initializable by a literal directly notice how the result from this is different:

let x: UInt8 = '8' // result: 0x38

or this:

let x = '8' as UInt8 // result: 0x38

That seems very confusing and error prone to me.

12 Likes

Well spotted. This is picked up by and diagnosed as an error in the implementation. You’ll no longer be able to UInt8(“<any single digit>”) though I’d love to know why you’d want to! This only applys to literals not String values.

That's an important point. As mentioned above when @SDGGiesbrecht brought this up, I really think we need this in the standard library:

extension Collection {
    subscript(offset offset: Int) -> Element {
        return self[index(startIndex, offsetBy: offset)]
    }
}

Otherwise, there will always be cases where it's more convenient to use Arrays instead of Strings, just because of the easier syntax.

That's fair. If you care that much about performance, because it's happening suuuuuper often in a loop, then maybe sometimes you want to write it like this. But in that case, it's okay if it needs to be slightly verbose or even outright ugly. That's how someone reading the code knows that it's not semantically perfect (Different types for numbers and characters), but instead just something that happens to be working correctly the way it is written.

The point is this:

It's good that your variant is ugly, because that ugliness communicates something.

That being said, I actually think that the main performance problem here is copying the string from the cString, so maybe it would be helpful to have this:

extension DatabaseDateComponents {
    init?(cString: UnsafePointer<CChar>) {
        self = String.withUnsafeCString(cString) { //allows using the cString as a String without copying it
            guard $0.count >= 5 else {
                return nil
            }

            //... do the other stuff
        }
    }
}

That is incompatible with SE-0213, which states that T(literal) should have the same behavior as constructing the literal when a type is expressible by that literal.

Well spotted point by @michelf. Adding Unicode scalar literal conformance to integer types would either be source breaking or break SE-0213 (or both, I guess).

4 Likes

Wait, you mean UInt8("8") is going to be a compile-time error but not UInt8("88")?

4 Likes

But there might be existing code that's already doing that.

Also, what about this:

UInt8("10")

Why would that be allowed, but

UInt8("9")

not? Very, very confusing.

Edit: @michelf was quicker

It doesn't look like UInt8 conforms to ExpressibleByStringLiteral, which is what I understood 213 to mean.

We're talking conformance to ExpressibleByUnicodeScalarLiteral, which going forward will still be expressible using double quotation marks.

To be explicit, consider:

let x = UInt8("8")
let y = "8" as UInt8

If UInt8 conforms to ExpressibleByUnicodeScalarLiteral, x should be equal to y according to SE-0213.

1 Like

In the long run the plan is it won’t and character literals will separate from string literals apart from the unavoidable let a: String = ‘a’`. This problem will go away compiling Swift 6 and both Int8(“8”) and Int(“88”) will give an error. (String literals will no longer look for ExpressibleByUnicodeScalarLiteral inside the type checker - this change would be internal to the compiler)

Neither the removal of the double quotes nor the removal of these actually useful initializers is part of this proposal, so while it may be somebody's long-term plan it is not the plan of record for Swift.

1 Like

Why? What's wrong with them?

Looks like I need to make a correction Int8(“8”) and Int(“88”) will both work as before this proposal after Swift 6 and they will no longer short-circuit to the ExpressibleByUnicodeScalarLiteral initialiser for single digit literals. Apologies for the mistake.

no, no, no! Please, no one ever do this. Converting bytestrings to String can change the indexing of the bytestring characters, since some of them could be utf8 continuation bytes or form grapheme clusters. (these would not be valid ASCII scalars, but it’s not like CChar knows that.) At the very minimum, use init(decoding:as:) with the Unicode.ASCII codec. It’s disheartening that so many people here who love to trumpet the importance of text encodings pay no heed to them when it actually matters.

However, as you’ve already discovered, String is exactly the wrong tool for this job, since it does not vend random access subscripts.

4 Likes

However, according to SE-0213, if an end user adds the conformance, then the compiler should be short circuiting them and might already do so. This would be a very confusing behavior.

Bottom line is that, with the guarantees of backward compatibility and what's already in the standard library, I don't think that integer types can be made to conform to ExpressibleByUnicodeScalarLiteral without producing some really confusing inconsistent emergent behaviors in the standard library.

Since the whole point here is to help users convert ASCII characters to corresponding integer values, adding some convenience at the cost of adding these footguns does not seem wise.

3 Likes

I agree it’s unfortunate which is why the implementation marks it as an error. If I’m honest I don’t know why the compiler takes this route. There is an IntX.init(_ string: String) initialiser in stdlib but it chooses instead to process “88” as an integer literal given half the chance.

It does so because we agreed that it should do so in SE-0213: this is precisely the consequence of proposing to conform an integer to ExpressibleByUnicodeScalarLiteral under those rules, and (besides '1' / '1' and other examples above) another demonstration of why this conformance is unwise.

What this proposal is attempting to state by the conformance is that, semantically, "8" is an accepted literal expression for 56 in all circumstances. The trouble is that, excepting the specific circumstance where you want the ASCII encoding of the character 8, it is not. What the standard library has already included are methods where "8" in fact converts to 8, not 56. By making those conversion initializers unadorned by a label, it has already staked out a claim that in fact that is the usually desired conversion.

5 Likes