Single Quoted Character Literals (Why yes, again)

i also think it’s worth mentioning that just because the compiler can optimize between UInt8 and Unicode.Scalar in a local scope, does not mean that we can use Unicode.Scalar freely in API. one benefit of

func test1(_ terminal:UInt8) -> Void?
{
    switch Unicode.Scalar.init(terminal) 
    {
    case "/", "\\":     return ()
    default:            return nil
    }
}

over

func test3(_ terminal:Unicode.Scalar) -> Void?
{
    switch terminal 
    {
    case "/", "\\":     return ()
    default:            return nil
    }
}

is that test3(_:) needs to be marked @inlinable and potentially @_alwaysEmitIntoClient, whereas test1(_:) can stay in its home binary.

1 Like

C++20 allows character literals to be prefixed with:

  • u8'*' for a UTF-8 code unit of type char8_t.
  • u'*' for a UTF-16 code unit of type char16_t.
  • U'*' for a UTF-32 code unit of type char32_t.

Would it be technically possible to have prefixed literals in Swift?

For example, U'*' instead of Unicode.Scalar('*').

1 Like

What if you defaulted the literals to the "ASCII byte" representation by default and then pulled the conversions into a separate proposal? That seems useful for most of the use presented here and would be inline with what I would've expected if string or other literals had originally gone through the evolution process.

That said, this does seem like a bit of a catch 22, as most of the usefulness of literals in Swift is their ability to represent multiple types while working alongside type inference. Pretty all of them, especially nil, would've been pretty useless without the inference and conversions built in.

Almost everything is technically possible but that wouldn't be in line with Swift's clean syntax for literals. It also wouldn't be necessary as with the ExpressableBy conformances any integer can take the ASCII value. It was decided during the pitch to restrict it to ASCII values rather than tangle with multiple possible Unicode.Scalar encodings of values outside that range.

3 Likes

I don't follow you. By ASCII byte do you mean the integer value conversion? I'm tempted to rework the proposal to include, single quoted alternative syntax, the new marker protocols and everything except the conformances for the conversions which you could then offer as a library/Swift Package if that never passed review.

I was pretty busy yesterday, so I didn't have time to read it properly. :slightly_smiling_face:

In my mind, this proposal does two things:

  1. Visibly distinguishes literals that use Unicode.Scalar and Character from those that use String.

  2. Adds a feature for specifying an integer literal based on a Unicode scalar.

I'm not sold on #1 yet, but I think I could be convinced if the Motivation section were more persuasive. The first paragraph asserts the existence of various usability problems, but doesn't actually demonstrate them. I'd like to see examples of code that accidentally does the wrong thing or is very difficult to read because we don't have distinct character literals. Basically, remember the first piece of advice given to fiction writers: Show, don't tell.

By contrast, I see a decent amount of value in #2; it's one of those features that most code won't need, but the code that does need it (like the lexer code I linked to yesterday) will use the feature all over the place and greatly benefit from it. However, because it's a new feature, the proposal ought to explore alternatives for handling these use cases. (Should there be a new literal for an array of integral Unicode scalar values? Should there be new string and character types for handling unvalidated, possibly invalid Unicode text instead of treating them as integer arrays?)

I think this may be part of the reason the Core Team recommended splitting the proposal. (Along with the fact that #2 was much more contentious than #1, of course.) These two aspects have to be motivated in very different ways. You don't necessarily have to split the proposal if you're sure that's the wrong move, but if you don't, you'll need to convince the 2023 Language Workgroup that the 2019 Core Team's recommendation was wrong.

This section probably needs to be better organized because it's currently pretty hard to follow. For example, there should be one contiguous subsection explaining that single quotes are well-precedented in other languages and Swift won't need them for anything else, rather than having this scattered around the entire section.You probably also need to write a subsection explicitly discussing the ways arithmetic with character literals is weird, the design features that are supposed to mitigate this weirdness, and why those features are the right ones for the job.

The way this section is written is incredibly confusing—I actually read your protocol hierarchy backwards at first (because inheritance is usually drawn the other way, with more-derived classes below less-derived classes!) and spent half an hour writing increasingly confused critiques of the backwards design. Even now that I've figured that out, though, I still don't quite see how this design is supposed to work, particularly in terms of initializing types that only conform to the marker protocols.

To get everyone on the same page, I strongly recommend you write the actual declarations for the new protocols along with first cuts at their doc comments, and describe any modifications to the semantics of existing protocols. (For instance, which protocols imply support for which syntaxes?) I would also write examples of the code the compiler should generate when a single-quoted literal is used for a type that only conforms to the new protocols. (Just Swift expressions—the equivalent of saying ""a" as Unicode.Scalar lowers to Unicode.Scalar(unicodeScalarLiteral: 97)"). And I would specify which types will gain conformances to which protocols.

I would also consider whether you really want all of the consequences of using marker protocols for the new features. In particular, it's currently possible to constrain a generic parameter on, say, ExpressibleByExtendedGraphemeClusterLiteral and then use the double-quoted literal syntax with that generic type. Will that be possible with the new marker protocols? If not, should we be okay with that?

Punting this question was appropriate in 2019, but at this point, we are allowing proposals to specify breaking changes that will be implemented in the Swift 6 language mode. Do you think we should force use of single quotes in Swift 6? If so, can you convince us the source break is worth making? (If not, are the problems with double-quoted character literals really severe enough to justify the proposal at all?) And if we are breaking this in Swift 6, should we deprecate it in Swift 5?

11 Likes

I appreciate that evolving the language by writing proposals is time-consuming and the results are frustratingly uncertain. I've experienced evolution heartbreak myself. But at the end of the day, "consensus building and advocacy" are just fancy words for "convincing people you're right". In an all-volunteer organization, nobody is assigned to do that for you, so as the authors of a proposal you're ultimately going to have to be the ones making sure people understand its merits and feel their concerns have been taken into account.

A big part of that is the technical work—designing, implementing, and clearly documenting your change. Good technical work makes it much easier to advocate, because when people look at it, they can see that it's solid stuff. But it's not enough on its own.

7 Likes

It's always been said that rejected proposals are rejected; there isn't an expiration date on that.

In this case, the 2019 core team made room for a specific subset of the rejected proposal to be pitched separately with a path to a new review. However, a draft proposal with a "subtle difference [to] that last reviewed," which has already been rejected—put plainly, I am not sure that from a procedural standpoint there is a path to review even if the workgroup were entirely convinced that the core team was wrong. As per our workgroup charter, decision-making is delegated from the core team to the workgroup and our decisions can be overruled by the core team. It would follow that it is not within our remit to reconsider the core team's existing rejection.

I mention this because @beccadax's advice is, in my view, very good; however, it does show that there is a significant amount of design work, consensus building, and iteration ahead. I would hate to see much energy poured into the technical aspect of this work but in a form that is not reviewable. It's certainly not the goal of the Swift Evolution process to throw up make-work barriers, but it is still a process; as with all processes, form and procedure still matter.

I wouldn’t necessarily even call this an issue of form or procedure. It’s a matter of listening to the decision makers and then working with them and others on an improved proposal. That’s just called “working in a group”.

As @beccadax elaborated, no decisions—not even technical ones—are made based solely on evaluation of blind submissions. Decisionmaking is a social process in any organization, and it’s usually counterproductive to dismiss the feedback of those whom the organization has vested with decisionmaking authority. The more productive approaches are to either incorporate that feedback, or to gather sufficient support from trusted voices in the organization to lobby against that feedback.

2 Likes

Here here. This pitch has been in development for 4 years. One wonders where the finish line is.

unless i am misunderstanding the proposal in its newest iteration, @_marker protocols cannot declare requirements, so user-defined types cannot implement ExpressibleBySingleQuotedLiteral alone; the conformances for Unicode.Scalar, Character, UInt8, etc would have to be baked into the compiler, or rely on ExpressibleByUnicodeScalarLiteral.

from what i recall during the first review, one of the more widespread criticisms of the original proposal was this:

One concern raised during the review was that because ExpressibleByStringLiteral refines ExpressibleByExtendedGraphemeClusterLiteral, then type context will allow expressions like 'x' + 'y' == "xy".

which does not coexist happily with 'x' + 'y' == 241.

with that in mind, could we simply create a new, unrelated hierarchy for ExpressibleByCharacterLiteral? (which is a serendipitously unclaimed name in the standard library.)

@_marker
protocol _ExpressibleByBuiltinCharacterLiteral
{
}

extension Unicode.Scalar:_ExpressibleByBuiltinCharacterLiteral {}
extension Character:_ExpressibleByBuiltinCharacterLiteral {}

protocol ExpressibleByASCIILiteral
{
    init(asciiLiteral:UInt8)
}
protocol ExpressibleByCharacterLiteral:ExpressibleByASCIILiteral
{
    associatedtype CharacterLiteralType:_ExpressibleByBuiltinCharacterLiteral
    init(characterLiteral:CharacterLiteralType)
}
extension ExpressibleByCharacterLiteral
    where CharacterLiteralType == Unicode.Scalar
{
    init(asciiLiteral:UInt8)
    {
        self.init(characterLiteral: .init(asciiLiteral))
    }
}
extension ExpressibleByCharacterLiteral
    where CharacterLiteralType == Character
{
    init(asciiLiteral:UInt8)
    {
        self.init(characterLiteral: .init(.init(asciiLiteral)))
    }
}
extension UInt8:ExpressibleByASCIILiteral
{
    init(asciiLiteral:UInt8) { self = asciiLiteral }
}
extension Unicode.Scalar:ExpressibleByCharacterLiteral
{
    init(characterLiteral:Self) { self = asciiLiteral }
}
extension Character:ExpressibleByCharacterLiteral
{
    init(characterLiteral:Self) { self = asciiLiteral }
}

the key thing to note here is that String does not conform to ExpressibleByCharacterLiteral. so we would not have the situation where 'x' + 'y' == "xy" can occur.

ExpressibleByExtendedGraphemeClusterLiteral and ExpressibleByUnicodeScalarLiteral could then continue to exist unchanged with the double-quoted syntax, and the language could deprecate them at whatever pace people are comfortable with, which may very well be “never”.


behavioral changes i can forsee:

Basic type identities

('€')                   → ('€' as Character)

// compilation error
('€' as String)         → Never 

("1" + "1")             → ("ab" as String)

// compilation error, because `+ (lhs:String, rhs:Character)` does not exist
("1" + '€')             → Never 

// compilation error, because `+ (lhs:Character, rhs:Character)` does not exist
('1' + '1' as String)   → Never

// compilation error, because `UInt8` is not implicitly convertible to `Int`
('1' + '1' as Int)      → Never

Initializers of integers

Int.init("0123")        → (123 as Int?)
// compilation error, because `Int.init(_:Character)` does not exist
// compilation error, because `Int.init(_:Unicode.Scalar)` does not exist
// compilation error, because `Int.init(_:UInt8)` exists but `'€'` is not ASCII
Int.init('€')           → Never

Int.init('3')           → Int.init(51 as UInt8) → (51 as Int)
(['a', 'b'] as [Int8])  → ([97, 98] as [Int8])

More arithmetic

('a' + 1)           → (98 as UInt8)
('b' - 'a' + 10)    → (11 as UInt8)
// runtime error, from integer overflow
('a' * 'b')         → Never
("123".firstIndex(of: '2')) → (String.Index.init(_rawBits: 65799) as String.Index?)
4 Likes

TBH I never had a problem with that. To me it seems logical. Strings are made up of Characters concatenated, A Character is itself a (short) String. I don't recommend implementing a new protocol hieracy just to avoid this. There is no reason this change should be affect ABI.

I would accept is more problematic if you're not expecting it but difficult to avoid if one wants to offer other more useful forms of code point arithmetic. Having both exiting at the same time depending on type context is confusing if you seek out problems but the simple case of integer conversions is at least simple.

this does not exist in the standard library today, "x" + "y" desugars to ("x" as String) + ("y" as String).

the following is not valid swift:

let xy:String = ("x" as Character) + ("y" as Character)
// cannot convert value of type 'Character' to expected argument type 'String'

You can't convert a Character to String but a Character literal can express a String. In the case of 'x' + 'y', the literals are both expressing strings as that is the operator that is available.

which is why i suggested not injecting things into the root of the ExpressibleByUnicodeScalarLiteral hierarchy, and instead having a parallel hierarchy that String would not conform to.

i think it makes sense that

static func + (lhs:Character, rhs:Character) -> String

does not exist, because it is analogous to

static func + (lhs:Int, rhs:Int) -> [Int]

moreover, i anticipate that people would not like

Int.init('3') → Int.init(51 as UInt8) → (51 as Int)

if '3' were capable of expressing a String, because we would expect Int.init("3") to return 3 as Int and not 51 as Int.

This is the difference in emphasis between the new pitch and the old. Single Quoted literals now need to be considered more integer-like than character-like though they can take that role which is also why breaking the proposal in two will present problems.

Separating the hierarchy and introducing new types or protocols i.e. breaking ABI will require people to update their users operating systems before they can use the feature which I'm keen to avoid.

I believe ASCII arithmetics should be implemented not as UInt8, but as ASCII type. So that we can write 'x' + 'y' == ASCII(241), which reads clearer.

You can write like this.

@inlinable public mutating
func next() -> UInt8?
{
    while let digit = self.iterator.next()
    {
        let asciiCharacter = ASCII(digit)
        switch asciiCharacter 
        {
        case '0' ... '9':   return asciiCharacter      - '0'
        case 'a' ... 'f':   return asciiCharacter + 10 - 'a'
        case 'A' ... 'F':   return asciiCharacter + 10 - 'A'
        default:            continue
        }
    }
    return nil
}
2 Likes

i think that making single-quoted literals more integer-like necessitates making them less string-like, because trying to have them do both is generating difficulties that @xwu and @beccadax have highlighted.

and i think that is okay, because we do not currently consider 1 and [1] to be interchangeable, and the fact that "1" as Character looks like "1" as String is a historical oddity that arose from Swift using the " delimiter for both types.

which i think is a huge argument in favor of using ' for Character, Unicode.Scalar, and something that would unblock expressing UInt8 with them.

if we want single-quoted literals to be string-like, then Int.init('3') will become a problem because:

  1. T.init(_:T) disregards default literal inference, and infers T (SE-2013)

  2. we want Int to be expressible by '3'

  3. we want String to be expressible by '3'

  4. we already have Int.init(_:String)

these four things cannot all be true at the same time, and 1 and 4 are a fact of the language.

in my view, 3 is not needed, and is not really consistent with the first of the two major goals of the proposal, which is to have a separate syntax for characters that does not collide with the one we use for strings, and it actively undermines the second major goal of the proposal (integer coercions).

so i really think we could reach a broader “consensus” if we just accept that we will write

let x:String = .init('a')

the same way we write

let x:[Int] = [1]
5 Likes

this was floated during the first review, but having an ASCII type would create more problems than it would solve, because it would not be compatible with:

  • String.UTF8View
  • UnsafeBufferPointer<UInt8>
  • UnsafeRawBufferPointer

all of which have an element type of UInt8, and which would exclude 99% of the cases where ASCII literals would be used.

moreover ASCII implies that the codepoint is less than 0x80, which makes it impossible to inter-operate with UTF-8 strings.

3 Likes

What use case is served by adding two character literals and getting an integer back?

The way I see it, characters are a lot like dates—it makes sense to talk about advancing or backtracking a character by some distance:

let f = 'a' + 5
let t = 'z' - 6

and to compute the distance between two characters:

print('f' - 'a')  // 5

Ranges and comparisons naturally fall out of these relationships, as well.

But like adding two dates, 'x' + 'y' for two arbitrary characters seems nonsensical, only occurring as a side effect of an integer conversion. But what does that integer result actually mean, and why would we ever want to support it? Even if we supported single character literals and the operations I mentioned above, I don't think we want Swift users to be writing 'x' + 'y' in their own code.

11 Likes