Single Quoted Character Literals (Why yes, again)

johnno1962 · December 8, 2022, 5:56pm

I've been looking at the possibility of re-habilitating the half-proposal of three years ago to only propose single quoted string literals as an alternative syntax for Character and Unicode.Scalar literals. It's difficult writing the motivation section without mentioning the eventual plan to implement the integer conversions. In which case, why would one review one without the other?

This comes about as there is a subtle difference between this proposal and that last reviewed in that rather than being about distinguishing between String and Character literals it is actually closer to the very original pitch in that it is more oriented towards the conversions of these literals to integer constants. One could of course create an implementation of '1' being a synonym for the integer constant 49 but the proposal seeks to offer more flexibility using Swift's type inference and the ExpressibleBy protocols to offer also a Character identity (as is the element of a String) which some find overly complicated.

I don't know where to go from here. If the review manager would offer to bring to review a revised two step proposal making the case as best I can, I would prepare for that. What I can't do is randomly prepare proposals that never come to review.

xwu · December 8, 2022, 6:09pm

It sounds like you’re not convinced that character literals themselves meet the bar to be accepted as a standalone proposal—this is what the core team suggested could be re-pitched and (if the pitch goes well) reviewed after rejecting the original proposal.

You’re not required to pitch what you don’t believe in; someone else can champion this part, or—if there’s no one who believes in it enough to step up, well then we have the answer.

johnno1962 · December 8, 2022, 6:25pm

It's true I'm not very interested introducing single quoted syntax on it's own without the integer conversions. There would seem to be very little point but I will prepare a two step proposal if this pitch doesn't get ensnared in procedural issues and some support were demonstrated for the idea. I suggest we proceed with this as a pitch/discussion for the idea as described above. If there is support but it is necessary in order to have any prospect of coming to review it can be split into two proposals at that point rather than writing it off at this stage.

michelf · December 8, 2022, 6:49pm

Seems like an impasse.

Maybe the pitch should instead be to change ExpressibleByUnicodeScalarLiteral to a marker protocol (a source breaking change for Swift 6) and then allow private conformances to marker protocols (because no dynamic cast makes that much simpler).

You could then write this anywhere you need it:

private extension UInt8: ExpressibleByUnicodeScalarLiteral {
	init(unicodeScalarLiteral value: UnicodeScalar) {
		self = UInt8(ascii: value)
	}
}

let magic = "A" as UInt8

Presumably there are other situations where it'd be useful for various kinds of literals to have a private meaning, so maybe a pitch for all ExpressibleBySomethingLiteral protocols to become marker protocols would get some traction.

taylorswift · December 8, 2022, 7:14pm

i have largely taken a back seat on this proposal this time around, and i am tremendously grateful to @johnno1962 for investing time in reworking this proposal and continuing to push this forward, because i personally had very little interest in re-engaging with the swift evolution process after getting responses like these on pitches i had made in the past.

we are not the US Senate, and i do not think that “lack of consensus building and advocacy” in and of itself is a valid grounds for opposing an idea. if proposals came pre-packaged with advocacy and consensus, what would be the point of discussing them on swift evolution in the first place?

i recognize that there are specific concrete concerns (e.g. EBCDIC) that merit discussion here and i do not deny their existence or importance. but from what i gather by reading this thread, much of the discussion has consisted of appeals to precedent and assertions without evidence that just because one does not personally agree with an idea, that “there is no support” for it across the entire swift community.

i would encourage participants in this review to speak for themselves, and keep their contributions focused their own experiences and use-cases, without trying to speak for the community at large or making claims about the relative popularity of various ideas.

taylorswift · December 8, 2022, 7:21pm

i opened up a project i had been working on earlier this week, and stumbled upon this function:

@inlinable public mutating
func next() -> UInt8?
{
    while let digit:UInt8 = self.iterator.next()
    {
        switch digit 
        {
        case 0x30 ... 0x39: return digit      - 0x30
        case 0x61 ... 0x66: return digit + 10 - 0x61
        case 0x41 ... 0x46: return digit + 10 - 0x41
        default:            continue
        }
    }
    return nil
}

i really should have written a docstring for it, because it took me several seconds to remember that this was a method for parsing hex digits.

how much better would it be if it simply looked like this?

@inlinable public mutating
func next() -> UInt8?
{
    while let digit:UInt8 = self.iterator.next()
    {
        switch digit 
        {
        case '0' ... '9':   return digit      - '0'
        case 'a' ... 'f':   return digit + 10 - 'a'
        case 'A' ... 'F':   return digit + 10 - 'A'
        default:            continue
        }
    }
    return nil
}

xwu · December 8, 2022, 7:41pm

Modulo optimization issues, I think it'd be best if it looked like this:

UnicodeScalar(digit).properties.numericValue.flatMap { UInt8(exactly: $0) }

bbrk24 · December 8, 2022, 7:42pm

Would numericValue even work here?

  1> ("C" as Unicode.Scalar).properties.numericValue
$R0: Double? = nil

xwu · December 8, 2022, 7:45pm

Whoops, think-o there; with hex conversions; UInt8(String($0), radix: 16) is what we're looking for then; UnicodeScalarProperties.isASCIIHexDigit would be used to check that it's one of the valid inputs.

My overarching point is that this is one of those things for which the Swift standard library should vend an implementation and does, perhaps less ergonomically or performantly at present than ideal.

taylorswift · December 8, 2022, 7:48pm

i assume that @xwu probably meant:

Character.init(Unicode.Scalar.init(digit)).hexDigitValue.flatMap(UInt8.init(exactly:))

i plugged this into godbolt (with -O):

func test1(_ digit:UInt8) -> UInt8?
{
    Character.init(Unicode.Scalar.init(digit)).hexDigitValue.flatMap(UInt8.init(exactly:))
}

func test2(_ digit:UInt8) -> UInt8?
{
    switch digit 
    {
    case 0x30 ... 0x39: return digit      - 0x30
    case 0x61 ... 0x66: return digit + 10 - 0x61
    case 0x41 ... 0x46: return digit + 10 - 0x41
    default:            return nil
    }
}

version 1:

output.test1(Swift.UInt8) -> Swift.UInt8?:
        test    dil, dil
        js      .LBB1_6
        inc     dil
        movzx   eax, dil
        test    eax, eax
        je      .LBB1_3
.LBB1_4:
        bsr     ecx, eax
        xor     ecx, 31
        jmp     .LBB1_5
.LBB1_6:
        movzx   eax, dil
        mov     ecx, eax
        and     ecx, 63
        shl     ecx, 8
        shr     eax, 6
        add     eax, ecx
        add     eax, 33217
        test    eax, eax
        jne     .LBB1_4
.LBB1_3:
        mov     ecx, 32
.LBB1_5:
        push    rbp
        push    r14
        push    rbx
        sub     rsp, 16
        shr     ecx, 3
        mov     esi, 4
        sub     rsi, rcx
        lea     ecx, [8*rsi]
        mov     rdx, -1
        shl     rdx, cl
        not     rdx
        mov     eax, eax
        movabs  rcx, 71775015237779199
        add     rcx, rax
        and     rcx, rdx
        mov     qword ptr [rsp], rcx
        mov     rdi, rsp
        call    ($sSS18_uncheckedFromUTF8ySSSRys5UInt8VGFZ)@PLT
        mov     r14, rdx
        mov     rdi, rax
        mov     rsi, rdx
        call    ($sSJ13hexDigitValueSiSgvg)@PLT
        mov     rbx, rax
        mov     ebp, edx
        mov     rdi, r14
        call    swift_bridgeObjectRelease@PLT
        test    rbx, rbx
        sets    cl
        or      cl, bpl
        cmp     rbx, 256
        setae   al
        mov     edx, 256
        cmovb   rdx, rbx
        or      al, cl
        and     al, 1
        movzx   esi, al
        shl     esi, 8
        movzx   edx, dl
        xor     eax, eax
        test    cl, 1
        cmove   eax, edx
        or      eax, esi
        add     rsp, 16
        pop     rbx
        pop     r14
        pop     rbp
        ret

version 2:


output.test2(Swift.UInt8) -> Swift.UInt8?:
        lea     eax, [rdi - 58]
        cmp     al, -10
        jae     .LBB2_3
        lea     eax, [rdi - 103]
        cmp     al, -6
        jae     .LBB2_4
        lea     eax, [rdi - 71]
        add     dil, -55
        xor     ecx, ecx
        cmp     al, -6
        setb    al
        movzx   edi, dil
        cmovb   edi, ecx
        jmp     .LBB2_6
.LBB2_3:
        add     dil, -48
        jmp     .LBB2_5
.LBB2_4:
        add     dil, -87
.LBB2_5:
        xor     eax, eax
.LBB2_6:
        movzx   ecx, al
        shl     ecx, 8
        movzx   eax, dil
        or      eax, ecx
        ret

so i think version 2 is a clear winner here, and the “performance issues notwithstanding” are actually quite significant.

xwu · December 8, 2022, 7:50pm

Sure, and we should address that.

taylorswift · December 8, 2022, 8:04pm

i think the reality is this does not scale, because there are always going to be use cases that don’t reach the bar for inclusion in the standard library, but would still benefit from a readable literal syntax.

sometimes it is not needed. when i pitched this in 2018, the compiler was not very smart, and the difference between the following really mattered:

func test1(_ terminal:UInt8) -> Void?
{
    switch Unicode.Scalar.init(terminal) 
    {
    case "/", "\\":     return ()
    default:            return nil
    }
}

func test2(_ terminal:UInt8) -> Void?
{
    switch terminal 
    {
    //    '/'   '\'
    case 0x2f, 0x5c:    return ()
    default:            return nil
    }
}

but today LLVM can figure out they are the same, so i personally plan on rewriting a lot of these functions to use Unicode.Scalar instead of UInt8:

main:
        xor     eax, eax
        ret

output.test1(Swift.UInt8) -> ()?:
        cmp     dil, 47
        setne   cl
        cmp     dil, 92
        setne   al
        and     al, cl
        ret

output.test2(Swift.UInt8) -> ()?:
        jmp     (output.test1(Swift.UInt8) -> ()?)

__swift_reflection_version:
        .short  3

but just because the compiler has gotten better over the past five years does not mean this feature is not needed anymore, because it still trips and falls on more complex cases, and i think unless you are someone who has memorized their ASCII tables, the following is still quite hard to decipher without the comments:

@inlinable public mutating
func next() -> UInt8?
{
    while let digit:UInt8 = self.iterator.next(), digit != 0x3D // '='
    {
        switch digit
        {
        case 0x41 ... 0x5a: // A-Z
            return digit - 0x41
        case 0x61 ... 0x7a: // a-z
            return digit - 0x61 + 26
        case 0x30 ... 0x39: // 0-9
            return digit - 0x30 + 52
        case 0x2b: // +
            return 62
        case 0x2f: // /
            return 63
        default:
            continue
        }
    }
    return nil
}

extension Base64
{
    public
    enum Digits
    {
        public static
        let ascii:[UInt8] =
        [
            0x41,
            0x42,
            0x43,
            0x44,
            0x45,
            0x46,
            0x47,
            0x48,
            0x49,
            0x4a,
            0x4b,
            0x4c,
            0x4d,
            0x4e,
            0x4f,
            0x50,
            0x51,
            0x52,
            0x53,
            0x54,
            0x55,
            0x56,
            0x57,
            0x58,
            0x59,
            0x5a,
            0x61,
            0x62,
            0x63,
            0x64,
            0x65,
            0x66,
            0x67,
            0x68,
            0x69,
            0x6a,
            0x6b,
            0x6c,
            0x6d,
            0x6e,
            0x6f,
            0x70,
            0x71,
            0x72,
            0x73,
            0x74,
            0x75,
            0x76,
            0x77,
            0x78,
            0x79,
            0x7a,
            0x30,
            0x31,
            0x32,
            0x33,
            0x34,
            0x35,
            0x36,
            0x37,
            0x38,
            0x39,
            0x2b,
            0x2f,
        ]
    }
}

taylorswift · December 8, 2022, 8:11pm

i also think it’s worth mentioning that just because the compiler can optimize between UInt8 and Unicode.Scalar in a local scope, does not mean that we can use Unicode.Scalar freely in API. one benefit of

func test1(_ terminal:UInt8) -> Void?
{
    switch Unicode.Scalar.init(terminal) 
    {
    case "/", "\\":     return ()
    default:            return nil
    }
}

over

func test3(_ terminal:Unicode.Scalar) -> Void?
{
    switch terminal 
    {
    case "/", "\\":     return ()
    default:            return nil
    }
}

is that test3(_:) needs to be marked @inlinable and potentially @_alwaysEmitIntoClient, whereas test1(_:) can stay in its home binary.

benrimmington · December 8, 2022, 8:41pm

C++20 allows character literals to be prefixed with:

u8'*' for a UTF-8 code unit of type char8_t.
u'*' for a UTF-16 code unit of type char16_t.
U'*' for a UTF-32 code unit of type char32_t.

Would it be technically possible to have prefixed literals in Swift?

For example, U'*' instead of Unicode.Scalar('*').

Jon_Shier · December 8, 2022, 8:42pm

What if you defaulted the literals to the "ASCII byte" representation by default and then pulled the conversions into a separate proposal? That seems useful for most of the use presented here and would be inline with what I would've expected if string or other literals had originally gone through the evolution process.

That said, this does seem like a bit of a catch 22, as most of the usefulness of literals in Swift is their ability to represent multiple types while working alongside type inference. Pretty all of them, especially nil, would've been pretty useless without the inference and conversions built in.

johnno1962 · December 8, 2022, 9:01pm

Almost everything is technically possible but that wouldn't be in line with Swift's clean syntax for literals. It also wouldn't be necessary as with the ExpressableBy conformances any integer can take the ASCII value. It was decided during the pitch to restrict it to ASCII values rather than tangle with multiple possible Unicode.Scalar encodings of values outside that range.

johnno1962 · December 8, 2022, 9:07pm

I don't follow you. By ASCII byte do you mean the integer value conversion? I'm tempted to rework the proposal to include, single quoted alternative syntax, the new marker protocols and everything except the conformances for the conversions which you could then offer as a library/Swift Package if that never passed review.

beccadax · December 8, 2022, 9:54pm

I was pretty busy yesterday, so I didn't have time to read it properly.

johnno1962:

Motivation

A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character or Unicode.Scalar type using as Character or as Unicode.Scalar, respectively. Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or a character in some cases.

While the motivation for distinguishing between String and Character literals mostly consists of ergonomic and readability concerns, doing so would also bring Swift in line with other popular languages which do make this syntactic distinction, and facilitates a subsequent effort to improve support for low-level UInt8/Int8 buffer processing tasks common in parsers and codecs.

In my mind, this proposal does two things:

Visibly distinguishes literals that use Unicode.Scalar and Character from those that use String.
Adds a feature for specifying an integer literal based on a Unicode scalar.

I'm not sold on #1 yet, but I think I could be convinced if the Motivation section were more persuasive. The first paragraph asserts the existence of various usability problems, but doesn't actually demonstrate them. I'd like to see examples of code that accidentally does the wrong thing or is very difficult to read because we don't have distinct character literals. Basically, remember the first piece of advice given to fiction writers: Show, don't tell.

By contrast, I see a decent amount of value in #2; it's one of those features that most code won't need, but the code that does need it (like the lexer code I linked to yesterday) will use the feature all over the place and greatly benefit from it. However, because it's a new feature, the proposal ought to explore alternatives for handling these use cases. (Should there be a new literal for an array of integral Unicode scalar values? Should there be new string and character types for handling unvalidated, possibly invalid Unicode text instead of treating them as integer arrays?)

I think this may be part of the reason the Core Team recommended splitting the proposal. (Along with the fact that #2 was much more contentious than #1, of course.) These two aspects have to be motivated in very different ways. You don't necessarily have to split the proposal if you're sure that's the wrong move, but if you don't, you'll need to convince the 2023 Language Workgroup that the 2019 Core Team's recommendation was wrong.

This section probably needs to be better organized because it's currently pretty hard to follow. For example, there should be one contiguous subsection explaining that single quotes are well-precedented in other languages and Swift won't need them for anything else, rather than having this scattered around the entire section.You probably also need to write a subsection explicitly discussing the ways arithmetic with character literals is weird, the design features that are supposed to mitigate this weirdness, and why those features are the right ones for the job.

The way this section is written is incredibly confusing—I actually read your protocol hierarchy backwards at first (because inheritance is usually drawn the other way, with more-derived classes below less-derived classes!) and spent half an hour writing increasingly confused critiques of the backwards design. Even now that I've figured that out, though, I still don't quite see how this design is supposed to work, particularly in terms of initializing types that only conform to the marker protocols.

To get everyone on the same page, I strongly recommend you write the actual declarations for the new protocols along with first cuts at their doc comments, and describe any modifications to the semantics of existing protocols. (For instance, which protocols imply support for which syntaxes?) I would also write examples of the code the compiler should generate when a single-quoted literal is used for a type that only conforms to the new protocols. (Just Swift expressions—the equivalent of saying ""a" as Unicode.Scalar lowers to Unicode.Scalar(unicodeScalarLiteral: 97)"). And I would specify which types will gain conformances to which protocols.

I would also consider whether you really want all of the consequences of using marker protocols for the new features. In particular, it's currently possible to constrain a generic parameter on, say, ExpressibleByExtendedGraphemeClusterLiteral and then use the double-quoted literal syntax with that generic type. Will that be possible with the new marker protocols? If not, should we be okay with that?

Punting this question was appropriate in 2019, but at this point, we are allowing proposals to specify breaking changes that will be implemented in the Swift 6 language mode. Do you think we should force use of single quotes in Swift 6? If so, can you convince us the source break is worth making? (If not, are the problems with double-quoted character literals really severe enough to justify the proposal at all?) And if we are breaking this in Swift 6, should we deprecate it in Swift 5?

beccadax · December 8, 2022, 10:10pm

I appreciate that evolving the language by writing proposals is time-consuming and the results are frustratingly uncertain. I've experienced evolution heartbreak myself. But at the end of the day, "consensus building and advocacy" are just fancy words for "convincing people you're right". In an all-volunteer organization, nobody is assigned to do that for you, so as the authors of a proposal you're ultimately going to have to be the ones making sure people understand its merits and feel their concerns have been taken into account.

A big part of that is the technical work—designing, implementing, and clearly documenting your change. Good technical work makes it much easier to advocate, because when people look at it, they can see that it's solid stuff. But it's not enough on its own.

xwu · December 8, 2022, 10:30pm

It's always been said that rejected proposals are rejected; there isn't an expiration date on that.

In this case, the 2019 core team made room for a specific subset of the rejected proposal to be pitched separately with a path to a new review. However, a draft proposal with a "subtle difference [to] that last reviewed," which has already been rejected—put plainly, I am not sure that from a procedural standpoint there is a path to review even if the workgroup were entirely convinced that the core team was wrong. As per our workgroup charter, decision-making is delegated from the core team to the workgroup and our decisions can be overruled by the core team. It would follow that it is not within our remit to reconsider the core team's existing rejection.

I mention this because @beccadax's advice is, in my view, very good; however, it does show that there is a significant amount of design work, consensus building, and iteration ahead. I would hate to see much energy poured into the technical aspect of this work but in a form that is not reviewable. It's certainly not the goal of the Swift Evolution process to throw up make-work barriers, but it is still a process; as with all processes, form and procedure still matter.