Single Quoted Character Literals (Why yes, again)

beccadax · December 8, 2022, 9:54pm

I was pretty busy yesterday, so I didn't have time to read it properly.

johnno1962:

Motivation

A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character or Unicode.Scalar type using as Character or as Unicode.Scalar, respectively. Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or a character in some cases.

While the motivation for distinguishing between String and Character literals mostly consists of ergonomic and readability concerns, doing so would also bring Swift in line with other popular languages which do make this syntactic distinction, and facilitates a subsequent effort to improve support for low-level UInt8/Int8 buffer processing tasks common in parsers and codecs.

In my mind, this proposal does two things:

Visibly distinguishes literals that use Unicode.Scalar and Character from those that use String.
Adds a feature for specifying an integer literal based on a Unicode scalar.

I'm not sold on #1 yet, but I think I could be convinced if the Motivation section were more persuasive. The first paragraph asserts the existence of various usability problems, but doesn't actually demonstrate them. I'd like to see examples of code that accidentally does the wrong thing or is very difficult to read because we don't have distinct character literals. Basically, remember the first piece of advice given to fiction writers: Show, don't tell.

By contrast, I see a decent amount of value in #2; it's one of those features that most code won't need, but the code that does need it (like the lexer code I linked to yesterday) will use the feature all over the place and greatly benefit from it. However, because it's a new feature, the proposal ought to explore alternatives for handling these use cases. (Should there be a new literal for an array of integral Unicode scalar values? Should there be new string and character types for handling unvalidated, possibly invalid Unicode text instead of treating them as integer arrays?)

I think this may be part of the reason the Core Team recommended splitting the proposal. (Along with the fact that #2 was much more contentious than #1, of course.) These two aspects have to be motivated in very different ways. You don't necessarily have to split the proposal if you're sure that's the wrong move, but if you don't, you'll need to convince the 2023 Language Workgroup that the 2019 Core Team's recommendation was wrong.

This section probably needs to be better organized because it's currently pretty hard to follow. For example, there should be one contiguous subsection explaining that single quotes are well-precedented in other languages and Swift won't need them for anything else, rather than having this scattered around the entire section.You probably also need to write a subsection explicitly discussing the ways arithmetic with character literals is weird, the design features that are supposed to mitigate this weirdness, and why those features are the right ones for the job.

The way this section is written is incredibly confusing—I actually read your protocol hierarchy backwards at first (because inheritance is usually drawn the other way, with more-derived classes below less-derived classes!) and spent half an hour writing increasingly confused critiques of the backwards design. Even now that I've figured that out, though, I still don't quite see how this design is supposed to work, particularly in terms of initializing types that only conform to the marker protocols.

To get everyone on the same page, I strongly recommend you write the actual declarations for the new protocols along with first cuts at their doc comments, and describe any modifications to the semantics of existing protocols. (For instance, which protocols imply support for which syntaxes?) I would also write examples of the code the compiler should generate when a single-quoted literal is used for a type that only conforms to the new protocols. (Just Swift expressions—the equivalent of saying ""a" as Unicode.Scalar lowers to Unicode.Scalar(unicodeScalarLiteral: 97)"). And I would specify which types will gain conformances to which protocols.

I would also consider whether you really want all of the consequences of using marker protocols for the new features. In particular, it's currently possible to constrain a generic parameter on, say, ExpressibleByExtendedGraphemeClusterLiteral and then use the double-quoted literal syntax with that generic type. Will that be possible with the new marker protocols? If not, should we be okay with that?

Punting this question was appropriate in 2019, but at this point, we are allowing proposals to specify breaking changes that will be implemented in the Swift 6 language mode. Do you think we should force use of single quotes in Swift 6? If so, can you convince us the source break is worth making? (If not, are the problems with double-quoted character literals really severe enough to justify the proposal at all?) And if we are breaking this in Swift 6, should we deprecate it in Swift 5?