Single Quoted Character Literals (Why yes, again)

johnno1962 · December 7, 2022, 5:32pm

Hello again S/E,

I'd like to re-pitch one last time, introducing Single Quoted character literals as an alternative syntax for literals that are intended to be a single Character or Unicode.Scalar. The motivation for this as ever is two fold. First, to bring Swift into line with many other C-style languages where there is a separate syntax for Strings and their constituent atoms (in Swift's case Character). The second motivation is to facilitate a convenient conversion between ASCII character literals and integer values for low level programming. This is the second pitch of idea that had a well subscribed pitch and had an active review but which was rejected with the suggestion it be broken up into two proposals. This was done but languished as a swift-evolution PR and eventually timed out.

This re-pitch comes with a revitalised implementation that resolves the issues which came up in the first review by using "marker protocols" avoid any potential ABI stability issues and to gate the integer convertibility feature to only ASCII character literals. A toolchain for evaluation is available here.

Single Quoted Character Literals

Proposal: SE-XXXX
Authors: Kelvin Ma (“Taylor Swift”), John Holdsworth
Review manager: Ben Cohen
Status: Pending second review
Implementation: [Single Quoted Literals. by johnno1962 · Pull Request #61477 · apple/swift · GitHub)
Threads: 1 2

Introduction

Swift emphasizes a unicode-correct definition of what constitutes a Character, but unlike most common programming languages, Swift does not have a dedicated syntax for Character literals. Instead, three overlapping “ExpressibleBy” protocols and Swift’s type inference come together to produce a syntax where a double quoted string literal can take the role of a String, Character, or Unicode.Scalar value depending on its content, and the expression context.

This proposal assigns an alternative syntax for Character and Unicode.Scalar values, using single quote (') delimiters. This change solely affects the type inference of single literals, and does not seek to change the current compiler validation behaviour for these constructs.

Motivation

A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character or Unicode.Scalar type using as Character or as Unicode.Scalar, respectively. Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or a character in some cases.

While the motivation for distinguishing between String and Character literals mostly consists of ergonomic and readability concerns, doing so would also bring Swift in line with other popular languages which do make this syntactic distinction, and facilitates a subsequent effort to improve support for low-level UInt8/Int8 buffer processing tasks common in parsers and codecs.

Proposed solution

We propose to adopt the 'x' as an alternative syntax for all textual literal types up to and including ExtendedGraphemeClusterLiteral, but not including StringLiteral. These literals will be used to express Character, Unicode.Scalar, and types like Unicode.UTF16.CodeUnit in the standard library (a.k.a. UInt16). These literals would have a default type of Character, as Character is the preferred element type of String. In addition where the character literal is a single ASCII code point, conversions to an integer value are made available using a new ExpressibleByASCIILiteral conformance in the standard library.

Use of single quotes for character/scalar literals is highly precedented in other languages, including C, Objective-C, C++, Java, Elm, and Rust, although different languages have slightly differing ideas about what a “character” is. We choose to use the single quote syntax specifically because it reinforces the notion that strings and character values are different: the former is a sequence, the later is an element (though a single element can itself be a String). Character types also don’t support string literal interpolation and can be optimized, which is another reason to move away from double quotes.

Advantages for a developer to migrate to the single
quote distinction:

Differentiate in the source when a literal is intended to be used in a Character or Unicode.Scalar context as opposed to String
Distinct default type of Character making available that type's methods and properties.

Improvements to the new implementation over that
previously reviewed:

Single-quoted literals have their own new ExpressibleBy marker protocols preventing source breaking changes to the use of double quoted literals in existing source.
A distinct protocol for ASCII literals further ensures the more contentious integer conversions are only available for literals that are a single ASCII codepoint.

Example usage

Some expressions using single quoted literal syntax, their value and their type:

Basic type identities

	'€' // >€< Character
	'€' as String // >€< String
	// Literal "arithmetic"
	"1"+"1" // >11< String
	"1"+'€' // >1€< String
	'1'+'1' as String // >11< String
	'1'+'1' as Int // >98< Int

Initializers of integers

	Int("0123") as Any // >Optional(123)< Optional<Int>
	Int('€') as Any // >nil< Optional<Int>
	Int('3') // >51< Int
	['a', 'b'] as [Int8], // >[97, 98]< Array<Int8>

More arithmetic

	'a' + 1 //  >98< Int
	'b' - 'a' + 10 // >11< Int
	// difficult to avoid allowing
	'a' * 'b' as Int8, // overflows at compilation
	"123".firstIndex(of: '2') as Any 
		// >Optional(Swift.String.Index(_rawBits: 65799))< Optional<Index>

Subtleties involving joined graphemes

	'👩🏼‍🚀'.asciiValue as Any /// >nil< Optional<UInt8>
	('😎' as UnicodeScalar).value // >128526< UInt32
	('👩🏼‍🚀' as UnicodeScalar).value // compilation error

Single quotes in Swift, a historical perspective

In Swift 1.0, single quotes were reserved for some yet-to-be determined syntactical purpose. Since then, pretty much all of the things that might have used single quotes have already found homes in other parts of the Swift syntactical space:

syntax for multi-line string literals uses triple quotes (""")
string interpolation syntax uses standard double quote syntax.
raw-mode string literals settled into the #""# syntax.
Discussions around regex literals arrived at slashes (/) as the delimiter or #//# syntax.

Given that, and the desire for lightweight syntax for single character syntax, and the precedent in other languages for characters, it is natural to use single quotes for this purpose.

Detailed design

This is a change that is internal to the Swift compiler and does not affect how these literal values are represented at runtime and hence does not affect the ABI. Single quoted literals are largely identical to double quoted String literals, supporting the same existing escape syntax, and they reuse the same code in the lexer which happened to already support parsing single quoted syntax. However, the compiler would in addition perform a best-effort attempt at validating that they contain a single extended grapheme cluster, as it currently does when an as Character type coercionannotation is present. Validation behaviour for Unicode.Scalar literals will be unaffected.

// Modified String literal protocol hierarchy:
ExpressibleByStringLiteral
  ↳ ExpressibleByExtendedGraphemeClusterLiteral
      ↳ ExpressibleByUnicodeScalarLiteral
          ↳ @_marker ExpressibleBySingleQuotedLiteral
              ↳ @_marker ExpressibleByASCIILiteral

This is realised by introducing two new ExpressibleBy marker protocols: ExpressibleBySingleQuotedLiteral and ExpressibleByASCIILiteral which are inserted above the existing ExpressibleByUnicodeScalarLiteral in the double quoted literal protocols. As they are prefixed with @_marker this will not affect the ABI of the existing protocol's witness table used by code compiled with a previoustoolchain. The ExpressibleBySingleQuotedLiteral is used only to change the default type of single quoted literals in an expression without type context and the ExpressibleByASCIILiteral used to further gate the ASCII to integer value conversions.

Source compatibility

As the use of the new single quoted syntax is opt-in existing code will continue to compile as before the proposed implementation is not source breaking. Only where the user has opted to use the new single quoted spelling will the integer conversions be available for ASCII Literals. It is straightforward to add a warning and fix-it to prompt the user to move to the new syntax in the course of time. In future it would be possible for the compiler to statically reject double quoted literal syntax being used for Character or UnicodeScalar literals at the type checking stage, without affecting ABI, in the interest of untangling the various textual literal forms. As literal delimiters are a purely compile-time construct, and all double-quotedliterals currently default to String, this will not impact migrated Swift code. In practice, the Character and Unicode.Scalar types occur do not occur frequently in code so migrating would not be an arduous task.

Effect on ABI stability

Assuming injecting @_marker protocols does not alter witness table layout and ABI, this is a purely lexer- and type checker-level change which does not affect the storage or entry points of Character and Unicode.Scalar. The new initializers for integers for literals ExpressibleByASCIILiteral are marked @_transparent and are therefore inlined and willback deploy.

Effect on API resilience

This is a purely lexer- and type checker-level change which does not affect the API of the standard library apart from the two new marker protocols which are not used directly.

Alternatives considered

The most obvious alternative is to simply leave things the way they are where double quoted String literals can perform service as Characters or UnicodeScalar values as required. At its heart, while this is transparent to users, this devalues the role of Characters in source code — a distinction that may come in handy working in lower-level code.

Another alternative discussed on another thread was “Unicode Scalar Literals”. Unicode scalar literals would have the benefit of allowing concise access to code point and ASCII APIs, as methods and properties could be accessed from 'a' expressions instead of unwieldy ('a' as Unicode.Scalar) expressions. However the authors feel this would contradict Swift’s String philosophy, which explicitly recognizes Character as the natural element of String, not Unicode.Scalar.

ksluder · December 7, 2022, 6:11pm

For clarity, is 'é'.utf8 guaranteed to return the same value as "é".utf8, regardless of the encoding of the source file? AFAIK the docs don’t specify when Swift strings are normalized.

Character types also don’t support string literal interpolation and can be optimized, which is another reason to move away from double quotes.

Can I still say \u{65}?

johnno1962 · December 7, 2022, 6:16pm

If it was before, yes, as it uses only a minor modification to the existing parsing of String literals.

Yes, that is expanded during lexical analysis.

Alejandro · December 7, 2022, 6:33pm

String is normalized during comparison and hashing (Comparable, Equatable, and Hashable).

beccadax · December 7, 2022, 9:53pm

Without commenting on the specifics of your proposal, it may be worth noting one major change in Swift's ecosystem since the last review: the SwiftParser project. This project has a goal of roundtripping even invalid UTF-8 source code, so it often works with UInt8 arrays (iirc) instead of Strings. That has led to some awkward code that a feature like this could potentially improve.

mickeyl · December 8, 2022, 6:58am

I like it a lot, the lack of it leads to more verbose code.

ensan-hcl · December 8, 2022, 7:33am

Is there any motivation for 'arithmetic' expressions like 'a' + 'a' or 'a' * 'a'? Or do you mean that it is inevitable in enabling the match between 'a' and UInt8?

johnno1962 · December 8, 2022, 7:51am

Thanks @beccadax, the code you mentioned is exactly the type of code this pitch seeks to improve. Why not comment on the specifics of the proposal though? It seemed to me inserting marker protocols into the pantheon of String's ExpressableBy protocols was a way to introduce the functionality without running into ABI issues.

benrimmington · December 8, 2022, 12:45pm

When SE-0243 was rejected, you were advised to split it into two separate proposals. Can the ASCII literals be moved to a "future directions" section?

If the marker protocols aren't used directly, can they be moved to the _ExpressibleByBuiltin…Literal hierarchy, and/or hidden with a leading underscore?

In Swift 5.7, the default types of all double-quoted literals are String:

github.com

apple/swift/blob/release/5.7.0/stdlib/public/core/Policy.swift#L102-L108


      
          /// The default type for an otherwise-unconstrained unicode scalar literal.
          public typealias UnicodeScalarType = String
          /// The default type for an otherwise-unconstrained Unicode extended
          /// grapheme cluster literal.
          public typealias ExtendedGraphemeClusterType = String
          /// The default type for an otherwise-unconstrained string literal.
          public typealias StringLiteralType = String

I don't know where this is documented, but top-level type aliases can override the defaults:

typealias UnicodeScalarType = Unicode.Scalar
typealias ExtendedGraphemeClusterType = Character

// U+263A: smiling face.
// U+FE0F: emoji presentation selector.

type(of: "\u{263A}")          //-> Unicode.Scalar.Type
type(of: "\u{263A}\u{FE0F}")  //-> Character.Type
type(of: "")                  //-> String.Type

Would function-scoped type aliases be a useful way to enable ASCII literals? Should it be limited to UInt8 rather than all integer types?

JohnBlackburne · December 8, 2022, 1:25pm

I have no problems with single quotes being available to delimit Strings, as an alternative to double quotes. I might even use it myself to highlight or distinguish one use of Strings from another. Using it to explicitly indicate Character instances seems fine. But...

I really dislike the above, which seems counter to Swift's strict typing and the predictability it provides, and frankly just confusing. What is the underlying type of '1' ? I.e. when you do

let one = '1'

what do you get? If it's Int then the value of '1' + '1' should be 2, and the first line should produce the String "2". If it's Char as the rest of the proposal implies, adding them produces a String "11" then the second line should produce the Int 11.

So if they are Characters converting them to any integer type should require an explicit cast or conversion. Yes, that leads to more verbose code. But also clear and unambiguous code.

johnno1962 · December 8, 2022, 2:49pm

JohnBlackburne:

let one = '1'
what do you get? If it's Int then the value of '1' + '1' should be 2, and the first line should produce the String "2". If it's Char as the rest of the proposal implies, adding them produces a String "11" then the second line should produce the Int 11.

The type of one will be Character according to the implementation and can be overridden locally in the same manner @benrimmington mentions:

typealias SingleQuotedType = Unicode.Scalar

I tried splitting the proposal into two but nothing happened for three years - probably because there is no functional advantage to just changing the syntax for Character and Unicode.Scalar literals.

The possibility that '1' can have an integer value corresponding to it's ASCII code is something you're either comfortable with and can see the advantages of in the code @beccadax mentions or not. It won't be everybody's cup of tea. The thrust of this pitch is a little different to the previous one in that if you want this behaviour you have to opt into it by using single quoted syntax and use the literal in an integer context. The behaviour of double quoted String literals remains completely unchanged.

xwu · December 8, 2022, 3:17pm

Unless I’m mistaken, the core team specifically requested that this behavior be split into a separate later proposal, with single-quoted character literals proposed on its own (and, yes, therefore having to stand on its own in terms of justification and usefulness): I don’t see how a proposal which specifically does not address and indeed rejects that core team feedback can move forward.

ksluder · December 8, 2022, 3:18pm

On Swift for s390x, does '1' + '1' add their ASCII codes, or their EBCDIC codes?

johnno1962 · December 8, 2022, 3:22pm

ASCII, Swift uses ASCII.

ksluder · December 8, 2022, 3:23pm

That causes problems in the real world.

johnno1962 · December 8, 2022, 3:24pm

EBCDIC causes problems in the real world.

ksluder · December 8, 2022, 3:26pm

The fact of the matter is EBCDIC exists and is still in active use. Given one of the flagship use cases is round-tripping source code in its native encoding, it seems rather curious to privilege ASCII on platforms where EBCDIC is the native encoding.

johnno1962 · December 8, 2022, 3:32pm

I tried what you suggest Revised proposal for part one of SE0243 - Character Literals. by johnno1962 · Pull Request #1049 · apple/swift-evolution · GitHub, but nothing happened. Changing to single quoted literals without the integer convertibility feature isn't particularly useful in itself.

ksluder · December 8, 2022, 3:36pm

Were you told that was the reason, or are you coming to that conclusion yourself? What consensus building and advocacy did you perform before, during, and after submitting that PR?

There’s a lot going on in the Swift project, and things won’t get done unless they’re on somebody’s roadmap.

johnno1962 · December 8, 2022, 3:49pm

I think at this point it's better to judge this pitch on its current consistent whole rather than its history. There has been a lot of water under the bridge since then and the new implementation should be much less contentious. I don't have another three years to draft, implement and test not one but two proposals on the off chance they come to review.