Hello again S/E,
I'd like to re-pitch one last time, introducing Single Quoted character literals as an alternative syntax for literals that are intended to be a single Character
or Unicode.Scalar
. The motivation for this as ever is two fold. First, to bring Swift into line with many other C-style languages where there is a separate syntax for Strings and their constituent atoms (in Swift's case Character). The second motivation is to facilitate a convenient conversion between ASCII character literals and integer values for low level programming. This is the second pitch of idea that had a well subscribed pitch and had an active review but which was rejected with the suggestion it be broken up into two proposals. This was done but languished as a swift-evolution PR and eventually timed out.
This re-pitch comes with a revitalised implementation that resolves the issues which came up in the first review by using "marker protocols" avoid any potential ABI stability issues and to gate the integer convertibility feature to only ASCII character literals. A toolchain for evaluation is available here.
Single Quoted Character Literals
- Proposal: SE-XXXX
- Authors: Kelvin Ma (âTaylor Swiftâ), John Holdsworth
- Review manager: Ben Cohen
- Status: Pending second review
- Implementation: [Single Quoted Literals. by johnno1962 · Pull Request #61477 · apple/swift · GitHub)
- Threads: 1 2
Introduction
Swift emphasizes a unicode-correct definition of what constitutes a Character
, but unlike most common programming languages, Swift does not have a dedicated syntax for Character
literals. Instead, three overlapping âExpressibleByâ protocols and Swiftâs type inference come together to produce a syntax where a double quoted string literal can take the role of a String
, Character
, or Unicode.Scalar
value depending on its content, and the expression context.
This proposal assigns an alternative syntax for Character
and Unicode.Scalar
values, using single quote ('
) delimiters. This change solely affects the type inference of single literals, and does not seek to change the current compiler validation behaviour for these constructs.
Motivation
A pain point of using characters in Swift is they lack a first-class literal syntax. Users have to manually coerce string literals to a Character
or Unicode.Scalar
type using as Character
or as Unicode.Scalar
, respectively. Having the collection share the same syntax as its element also harms code clarity and makes it difficult to tell if a double-quoted literal is being used as a string or a character in some cases.
While the motivation for distinguishing between String
and Character
literals mostly consists of ergonomic and readability concerns, doing so would also bring Swift in line with other popular languages which do make this syntactic distinction, and facilitates a subsequent effort to improve support for low-level UInt8
/Int8
buffer processing tasks common in parsers and codecs.
Proposed solution
We propose to adopt the 'x'
as an alternative syntax for all textual literal types up to and including ExtendedGraphemeClusterLiteral
, but not including StringLiteral
. These literals will be used to express Character
, Unicode.Scalar
, and types like Unicode.UTF16.CodeUnit
in the standard library (a.k.a. UInt16). These literals would have a default type of Character
, as Character
is the preferred element type of String
. In addition where the character literal is a single ASCII code point, conversions to an integer value are made available using a new ExpressibleByASCIILiteral
conformance in the standard library.
Use of single quotes for character/scalar literals is highly precedented in other languages, including C, Objective-C, C++, Java, Elm, and Rust, although different languages have slightly differing ideas about what a âcharacterâ is. We choose to use the single quote syntax specifically because it reinforces the notion that strings and character values are different: the former is a sequence, the later is an element (though a single element can itself be a String
). Character types also donât support string literal interpolation and can be optimized, which is another reason to move away from double quotes.
Advantages for a developer to migrate to the single
quote distinction:
- Differentiate in the source when a literal is intended to be used in a
Character
orUnicode.Scalar
context as opposed toString
- Distinct default type of
Character
making available that type's methods and properties.
Improvements to the new implementation over that
previously reviewed:
- Single-quoted literals have their own new
ExpressibleBy
marker protocols preventing source breaking changes to the use of double quoted literals in existing source. - A distinct protocol for ASCII literals further ensures the more contentious integer conversions are only available for literals that are a single ASCII codepoint.
Example usage
Some expressions using single quoted literal syntax, their value and their type:
Basic type identities
'âŹ' // >âŹ< Character
'âŹ' as String // >âŹ< String
// Literal "arithmetic"
"1"+"1" // >11< String
"1"+'âŹ' // >1âŹ< String
'1'+'1' as String // >11< String
'1'+'1' as Int // >98< Int
Initializers of integers
Int("0123") as Any // >Optional(123)< Optional<Int>
Int('âŹ') as Any // >nil< Optional<Int>
Int('3') // >51< Int
['a', 'b'] as [Int8], // >[97, 98]< Array<Int8>
More arithmetic
'a' + 1 // >98< Int
'b' - 'a' + 10 // >11< Int
// difficult to avoid allowing
'a' * 'b' as Int8, // overflows at compilation
"123".firstIndex(of: '2') as Any
// >Optional(Swift.String.Index(_rawBits: 65799))< Optional<Index>
Subtleties involving joined graphemes
'đ©đŒâđ'.asciiValue as Any /// >nil< Optional<UInt8>
('đ' as UnicodeScalar).value // >128526< UInt32
('đ©đŒâđ' as UnicodeScalar).value // compilation error
Single quotes in Swift, a historical perspective
In Swift 1.0, single quotes were reserved for some yet-to-be determined syntactical purpose. Since then, pretty much all of the things that might have used single quotes have already found homes in other parts of the Swift syntactical space:
-
syntax for multi-line string literals uses triple quotes (
"""
) -
string interpolation syntax uses standard double quote syntax.
-
raw-mode string literals settled into the
#""#
syntax. -
Discussions around regex literals arrived at slashes (
/
) as the delimiter or#//#
syntax.
Given that, and the desire for lightweight syntax for single character syntax, and the precedent in other languages for characters, it is natural to use single quotes for this purpose.
Detailed design
This is a change that is internal to the Swift compiler and does not affect how these literal values are represented at runtime and hence does not affect the ABI. Single quoted literals are largely identical to double quoted String
literals, supporting the same existing escape syntax, and they reuse the same code in the lexer which happened to already support parsing single quoted syntax. However, the compiler would in addition perform a best-effort attempt at validating that they contain a single extended grapheme cluster, as it currently does when an as Character
type coercionannotation is present. Validation behaviour for Unicode.Scalar
literals will be unaffected.
// Modified String literal protocol hierarchy:
ExpressibleByStringLiteral
âł ExpressibleByExtendedGraphemeClusterLiteral
âł ExpressibleByUnicodeScalarLiteral
âł @_marker ExpressibleBySingleQuotedLiteral
âł @_marker ExpressibleByASCIILiteral
This is realised by introducing two new ExpressibleBy
marker protocols: ExpressibleBySingleQuotedLiteral
and ExpressibleByASCIILiteral
which are inserted above the existing ExpressibleByUnicodeScalarLiteral
in the double quoted literal protocols. As they are prefixed with @_marker
this will not affect the ABI of the existing protocol's witness table used by code compiled with a previoustoolchain. The ExpressibleBySingleQuotedLiteral
is used only to change the default type of single quoted literals in an expression without type context and the ExpressibleByASCIILiteral
used to further gate the ASCII to integer value conversions.
Source compatibility
As the use of the new single quoted syntax is opt-in existing code will continue to compile as before the proposed implementation is not source breaking. Only where the user has opted to use the new single quoted spelling will the integer conversions be available for ASCII Literals. It is straightforward to add a warning and fix-it to prompt the user to move to the new syntax in the course of time. In future it would be possible for the compiler to statically reject double quoted literal syntax being used for Character
or UnicodeScalar
literals at the type checking stage, without affecting ABI, in the interest of untangling the various textual literal forms. As literal delimiters are a purely compile-time construct, and all double-quotedliterals currently default to String
, this will not impact migrated Swift code. In practice, the Character
and Unicode.Scalar
types occur do not occur frequently in code so migrating would not be an arduous task.
Effect on ABI stability
Assuming injecting @_marker
protocols does not alter witness table layout and ABI, this is a purely lexer- and type checker-level change which does not affect the storage or entry points of Character
and Unicode.Scalar
. The new initializers for integers for literals ExpressibleByASCIILiteral
are marked @_transparent
and are therefore inlined and willback deploy.
Effect on API resilience
This is a purely lexer- and type checker-level change which does not affect the API of the standard library apart from the two new marker protocols which are not used directly.
Alternatives considered
The most obvious alternative is to simply leave things the way they are where double quoted String
literals can perform service as Character
s or UnicodeScalar
values as required. At its heart, while this is transparent to users, this devalues the role of Characters
in source code â a distinction that may come in handy working in lower-level code.
Another alternative discussed on another thread was âUnicode Scalar Literalsâ. Unicode scalar literals would have the benefit of allowing concise access to code point and ASCII APIs, as methods and properties could be accessed from 'a'
expressions instead of unwieldy ('a' as Unicode.Scalar)
expressions. However the authors feel this would contradict Swiftâs String
philosophy, which explicitly recognizes Character
as the natural element of String
, not Unicode.Scalar
.