Hello again S/E,
I'd like to re-pitch one last time, introducing Single Quoted character literals as an alternative syntax for literals that are intended to be a single
Unicode.Scalar. The motivation for this as ever is two fold. First, to bring Swift into line with many other C-style languages where there is a separate syntax for Strings and their constituent atoms (in Swift's case Character). The second motivation is to facilitate a convenient conversion between ASCII character literals and integer values for low level programming. This is the second pitch of idea that had a well subscribed pitch and had an active review but which was rejected with the suggestion it be broken up into two proposals. This was done but languished as a swift-evolution PR and eventually timed out.
This re-pitch comes with a revitalised implementation that resolves the issues which came up in the first review by using "marker protocols" avoid any potential ABI stability issues and to gate the integer convertibility feature to only ASCII character literals. A toolchain for evaluation is available here.
Single Quoted Character Literals
- Proposal: SE-XXXX
- Authors: Kelvin Ma (“Taylor Swift”), John Holdsworth
- Review manager: Ben Cohen
- Status: Pending second review
- Implementation: [Single Quoted Literals. by johnno1962 · Pull Request #61477 · apple/swift · GitHub)
- Threads: 1 2
Swift emphasizes a unicode-correct definition of what constitutes a
Character, but unlike most common programming languages, Swift does not have a dedicated syntax for
Character literals. Instead, three overlapping “ExpressibleBy” protocols and Swift’s type inference come together to produce a syntax where a double quoted string literal can take the role of a
Unicode.Scalar value depending on its content, and the expression context.
This proposal assigns an alternative syntax for
Unicode.Scalar values, using single quote (
') delimiters. This change solely affects the type inference of single literals, and does not seek to change the current compiler validation behaviour for these constructs.
A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a
Unicode.Scalar type using
as Character or
as Unicode.Scalar, respectively. Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or a character in some cases.
While the motivation for distinguishing between
Character literals mostly consists of ergonomic and readability concerns, doing so would also bring Swift in line with other popular languages which do make this syntactic distinction, and facilitates a subsequent effort to improve support for low-level
Int8 buffer processing tasks common in parsers and codecs.
We propose to adopt the
'x' as an alternative syntax for all textual literal types up to and including
ExtendedGraphemeClusterLiteral, but not including
StringLiteral. These literals will be used to express
Unicode.Scalar, and types like
Unicode.UTF16.CodeUnit in the standard library (a.k.a. UInt16). These literals would have a default type of
Character is the preferred element type of
String. In addition where the character literal is a single ASCII code point, conversions to an integer value are made available using a new
ExpressibleByASCIILiteral conformance in the standard library.
Use of single quotes for character/scalar literals is highly precedented in other languages, including C, Objective-C, C++, Java, Elm, and Rust, although different languages have slightly differing ideas about what a “character” is. We choose to use the single quote syntax specifically because it reinforces the notion that strings and character values are different: the former is a sequence, the later is an element (though a single element can itself be a
String). Character types also don’t support string literal interpolation and can be optimized, which is another reason to move away from double quotes.
Advantages for a developer to migrate to the single
- Differentiate in the source when a literal is intended to be used in a
Unicode.Scalarcontext as opposed to
- Distinct default type of
Charactermaking available that type's methods and properties.
Improvements to the new implementation over that
- Single-quoted literals have their own new
ExpressibleBymarker protocols preventing source breaking changes to the use of double quoted literals in existing source.
- A distinct protocol for ASCII literals further ensures the more contentious integer conversions are only available for literals that are a single ASCII codepoint.
Some expressions using single quoted literal syntax, their value and their type:
Basic type identities
'€' // >€< Character '€' as String // >€< String // Literal "arithmetic" "1"+"1" // >11< String "1"+'€' // >1€< String '1'+'1' as String // >11< String '1'+'1' as Int // >98< Int
Initializers of integers
Int("0123") as Any // >Optional(123)< Optional<Int> Int('€') as Any // >nil< Optional<Int> Int('3') // >51< Int ['a', 'b'] as [Int8], // >[97, 98]< Array<Int8>
'a' + 1 // >98< Int 'b' - 'a' + 10 // >11< Int // difficult to avoid allowing 'a' * 'b' as Int8, // overflows at compilation "123".firstIndex(of: '2') as Any // >Optional(Swift.String.Index(_rawBits: 65799))< Optional<Index>
Subtleties involving joined graphemes
'👩🏼🚀'.asciiValue as Any /// >nil< Optional<UInt8> ('😎' as UnicodeScalar).value // >128526< UInt32 ('👩🏼🚀' as UnicodeScalar).value // compilation error
Single quotes in Swift, a historical perspective
In Swift 1.0, single quotes were reserved for some yet-to-be determined syntactical purpose. Since then, pretty much all of the things that might have used single quotes have already found homes in other parts of the Swift syntactical space:
syntax for multi-line string literals uses triple quotes (
string interpolation syntax uses standard double quote syntax.
raw-mode string literals settled into the
Discussions around regex literals arrived at slashes (
/) as the delimiter or
Given that, and the desire for lightweight syntax for single character syntax, and the precedent in other languages for characters, it is natural to use single quotes for this purpose.
This is a change that is internal to the Swift compiler and does not affect how these literal values are represented at runtime and hence does not affect the ABI. Single quoted literals are largely identical to double quoted
String literals, supporting the same existing escape syntax, and they reuse the same code in the lexer which happened to already support parsing single quoted syntax. However, the compiler would in addition perform a best-effort attempt at validating that they contain a single extended grapheme cluster, as it currently does when an
as Character type coercionannotation is present. Validation behaviour for
Unicode.Scalar literals will be unaffected.
// Modified String literal protocol hierarchy: ExpressibleByStringLiteral ↳ ExpressibleByExtendedGraphemeClusterLiteral ↳ ExpressibleByUnicodeScalarLiteral ↳ @_marker ExpressibleBySingleQuotedLiteral ↳ @_marker ExpressibleByASCIILiteral
This is realised by introducing two new
ExpressibleBy marker protocols:
ExpressibleByASCIILiteral which are inserted above the existing
ExpressibleByUnicodeScalarLiteral in the double quoted literal protocols. As they are prefixed with
@_marker this will not affect the ABI of the existing protocol's witness table used by code compiled with a previoustoolchain. The
ExpressibleBySingleQuotedLiteral is used only to change the default type of single quoted literals in an expression without type context and the
ExpressibleByASCIILiteral used to further gate the ASCII to integer value conversions.
As the use of the new single quoted syntax is opt-in existing code will continue to compile as before the proposed implementation is not source breaking. Only where the user has opted to use the new single quoted spelling will the integer conversions be available for ASCII Literals. It is straightforward to add a warning and fix-it to prompt the user to move to the new syntax in the course of time. In future it would be possible for the compiler to statically reject double quoted literal syntax being used for
UnicodeScalar literals at the type checking stage, without affecting ABI, in the interest of untangling the various textual literal forms. As literal delimiters are a purely compile-time construct, and all double-quotedliterals currently default to
String, this will not impact migrated Swift code. In practice, the
Unicode.Scalar types occur do not occur frequently in code so migrating would not be an arduous task.
Effect on ABI stability
@_marker protocols does not alter witness table layout and ABI, this is a purely lexer- and type checker-level change which does not affect the storage or entry points of
Unicode.Scalar. The new initializers for integers for literals
ExpressibleByASCIILiteral are marked
@_transparent and are therefore inlined and willback deploy.
Effect on API resilience
This is a purely lexer- and type checker-level change which does not affect the API of the standard library apart from the two new marker protocols which are not used directly.
The most obvious alternative is to simply leave things the way they are where double quoted
String literals can perform service as
UnicodeScalar values as required. At its heart, while this is transparent to users, this devalues the role of
Characters in source code — a distinction that may come in handy working in lower-level code.
Another alternative discussed on another thread was “Unicode Scalar Literals”. Unicode scalar literals would have the benefit of allowing concise access to code point and ASCII APIs, as methods and properties could be accessed from
'a' expressions instead of unwieldy
('a' as Unicode.Scalar) expressions. However the authors feel this would contradict Swift’s
String philosophy, which explicitly recognizes
Character as the natural element of