Prepitch: Character integer literals

Good catch. :thumbsup:

I love this question anyway. :heart:

Yes, the semicolon and the Greek question-mark are different Unicode characters, even though they compare equal in Swift.

I’m not sure what point you are trying to make.

It is the wrong question. We are not trying to encode “é” at all. We are trying to treat it as an integer. And there is only one integer which canonically maps to it, namely 233.

I have now taken the time to read the entire thread from the very beginning. There are two comments I would like to call out:

  1. @xwu said this, and sparked a long back‐and‐forth over whether it was a good idea:

He made several direct quotations from Unicode Technical Reports, which demonstrate he is not a newcomer to Unicode concepts, and that he knows where to look for answers. But he still got it wrong, and the code he suggested is unreliable. The statements in the technical reports mean that ACSII strings will never change to something else under NFx and Latin‐1 strings will never do so under NFC. It does not mean that nothing else will ever change into an ASCII string under FCx or a Latin‐1 string under NFC. The assertion example I used a few posts ago demonstrates clearly that something ASCII can be equal to something Non‐ASCII. So for @xwu’s code, something could appear in signature which is not an ASCII string (such as the Greek question mark, U+037E), but is canonically equivalent to an ASCII string (a semicolon, U+003B), thus causing the check to go in an unintended direction. (Though I don’t think anything folds down to the “J”, “F”, or “I” he actually used).

I say this because the fact that someone who is knowledgeable can make such a mistake, and that then several knowledgeable people can argue about it for a such a long time without being able to demonstrate the problem very clearly, shows just how easy such mistakes are to make, and just how much safer a direct‐to‐integer ASCII literal would be. Until I read that, I was generally against this entire pitch, but it—and it alone—switched me completely around to seeing it as a significant safety improvement for the ASCII range. This is precisely because it insulates you against Unicode pitfalls.

I do share most of @wxu’s actual Unicode‐related concerns though, and I stand by all my previous statements which demonstrate how unsafe such an idea is in the Unicode domain, where it instead makes you more vulnerable to Unicode pitfalls.

That brings me to the second comment I would like to call out:

  1. @johnno1962 said this:

This seems to be a very wise suggestion. The ASCII stuff may succeed or fail during review based solely on its own merits, dangers, and use cases—which are vastly different than the merits, dangers and use cases of the Unicode realm. In fact I think the two are opposite to one another. So let Unicode be a separate and distinct round two.


I have bad news: that is, by definition, an encoding. To treat a character as a number, you need to select an encoding. What you are arguing for is simply this particular set of encodings:

UInt8 : Latin-1
UInt16 : UCS-2
UInt32 : UTF-32

It’s not a great choice.


Nevin is arguing for truncated unicode scalars, which I know gives the same result as Latin-1/USC-2/UTF-32 but is still more principled than seemingly picking encodings at random. I think it can be useful in some situations and it makes the programming model simpler because it can reuse the same protocol as unicode scalar literals (if I understood right). So it's not without its advantages, but I agree there is more potential for harm by misuse for those who haven't memorized the ASCII table or aren't perfectly aware of the encoding they're working with. How much harm this represents, and whether it it acceptable or not, is the real debate here.


I'd like to propose a radically different approach — inspired from the concerns about non-ASCII characters:

  • double quotes mean Unicode string, character, or scalar
  • single quotes mean ASCII string or scalar

So whenever you need to be sure something is purely ASCII you use single-quotes:

let a: String = "planté" // unicode string
let b: String = 'planté' // ERROR: é is not ascii
let c: String = 'plante' // ascii string (no é in this one)
let a: Character = "é" // U+00E9
let b: Character = 'é' // ERROR: U+00E9 is not ascii
let c: Character = 'p' // U+0070
let a: UnicodeScalar = "é" // U+00E9
let b: UnicodeScalar = 'é' // ERROR: U+00E9 is not ascii
let c: UnicodeScalar = 'p' // U+0070

We can then allow ASCII literals to initialize any numeric type:

let a: UInt32 = "é" // ERROR: UInt32 does not conform to ExpressibleByUnicodeScalarLiteral
let b: UInt32 = 'é' // ERROR: U+00E9 is not ascii
let c: UInt32 = 'p' // 0x00000070
let a: UInt16 = "é" // error: UInt16 does not conform to ExpressibleByUnicodeScalarLiteral
let b: UInt16 = 'é' // error: U+00E9 is not ascii
let c: UInt16 = 'p' // 0x0070
let a: UInt8 = "é" // ERROR: UInt8 does not conform to ExpressibleByUnicodeScalarLiteral
let b: UInt8 = 'é' // ERROR: U+00E9 is not ascii
let c: UInt8 = 'p' // 0x70

And you can also initialize an array of numbers from an ASCII string:

let a: [UInt8] = "plante" // ERROR: Array does not conform to ExpressibleByStringLiteral
let b: [UInt8] = 'plante' // ascii
let a: [UInt16] = "plante" // ERROR: Array does not conform to ExpressibleByStringLiteral
let b: [UInt16] = 'plante' // ascii

Of course, this approach completely flips on its head the current character literal proposal.


Now this I could get behind. Especially the part where an ASCII string can be used to express an array of integers. I’m not entirely convinced that ASCII is worth promoting with first-class exclusive syntax in Swift—if this is only for legacy compatibility then I lean against.

But if it is worth supporting pure ASCII, then your idea sounds like the way to do it.

1 Like

Just when I thought the thread was settling…. I’m sorry but I really don’t see how we should be elevating a legacy concept such as ASCII to such a prominent role in Swift. I like Swift’s highly principled abstraction that a Character is a single atomic visual entity regards of how it is represented and for me this is what we should be trying to encapsulate with single quote syntax. Some of these Characters can be represented by a single unicode scalar, some of those fit int an integer storage implied by the expression context and some of those are ASCII. These are secondary internal distinctions which can be used to gate a new shorthand which allows us to express an integer with a character value but the requirements of the niche shorthand shouldn’t feedback to the definition of what is a character literal. I actually argued for your approach myself further up the thread calling them "code point literals" when they were not limited to ASCII as it improved diagnostics but eventually saw the light and changed my mind. Character with a capital C is the abstraction we want to be capturing with the new single quoted literal. Honest!


Sorry about that. I'm just going a bit beyond what was discussed.

The thinking is that if limiting character literals to ASCII makes them less error-prone, the same thing is also true of strings when you intend to use them in term of scalars. If you don't want combining and equivalent characters to be in your way, use an ASCII string.

This could also be a good way to dispel doubts about lookalike characters lurking in sensitive strings. You can't hide a cyrillic "а" in the string "paypal" if the string is limited to ASCII.

But I'll admit the idea of "yet another string syntax" looks a bit unappealing.


Although it's possible to think of these as "truncated Unicode scalars," the stated motivation for the proposal makes it clear that it's not at all how it's going to be used, or why it's useful. There are no C APIs that semantically work with an array of the least significant 8/16 bits of Unicode scalars--they work with an array of ASCII or Latin-1 characters, or UCS-2-encoded characters.

This seems like a pragmatic approach that can be extended in the future. I would like to get this pitch ready for SE.

This has been a very long thread with many changes. The most recent draft is out of date with this direction and also has a lot of irrelevant prose. Should we start a fresh slate?


Sorry @michelf, you threw such a curved ball in there I totally missed what you were suggesting which is a whole ‘nother proposal. Perhaps that might be a valid alternative use of single quote but Characters is at least precedented and another idea for the security feature you propose could be a new escape character e.g. \a which would force ASCII only as in the string ”\a”.

Seems like we have pretty much settled on “Option 4 for now” then. There is one other detail about the migration to single quoted literals that it might be worth discussing here or in the review: If Character literals are to become the preferred spelling for expressing Unicode.Scalar and Character (and sometimes Int*) in Swift should the existing conformances of String be eventually deprecated (with fixit) which would be mildly source compatibility breaking and over what timescale?

I also proposed it in the hope of offering something even cleaner than option 4. See the nice table it produces below. The rest just falls as a consequence of the design, but I don't think this was communicated very well. I'll be fine with option 4 too.

Option 5: ASCII with single-quotes & Unicode with double-quotes

Single quoted (ASCII)

UInt8 UInt16 UInt32 Unicode.Scalar Character String Notes
'x' 120 120 120 U+0078 x x ASCII scalar
'©' error error error error error error Latin‐1 scalar
'é' error error error error error error Latin‐1 scalar which expands under NFD
'花' error error error error error error BMP scalar
';' error error error error error error BMP scalar which changes under NFx
'שּׂ' error error error error error error BMP scalar which expands under NFx
'𓀎' error error error error error error Supplemental plane scalar
'ē̱' error error error error error error Character with no single‐scalar representation
'ab' error error error error error ab Multiple characters

Double quoted (same as now, no duplication or deprecation)

UInt8 UInt16 UInt32 Unicode.Scalar Character String Notes
"x" error error error U+0078 x x ASCII scalar
"©" error error error U+00A9* © © Latin‐1 scalar
"é" error error error U+00E9/error* é é Latin‐1 scalar which expands under NFD
"花" error error error U+82B1* BMP scalar
";" error error error U+037E/U+003B*† ; ; BMP scalar which changes under NFx
"שּׂ" error error error U+FB2D/error* שּׂ שּׂ BMP scalar which expands under NFx
"𓀎" error error error U+1300E* 𓀎 𓀎 Supplemental plane scalar
"ē̱" error error error error ē̱ ē̱ Character with no single‐scalar representation
"ab" error error error error error ab Multiple characters

i don’t see anything wrong with option 5. in fact if you go back to the upper part of the thread it’s a lot closer to the original pitch than the current proposal. in fact someone suggested a v similar idea pretty early in the process except they were trying to cast multichar ascii literals to integer slugs instead of String.

the problem is i don’t think it’s as politically attractive as the current proposal, which is a long-negotiated compromise between a lot of different groups of people with different goals for single quoted literals. (again, read the thread, the whole thread.)

there’s a bunch of people who didn’t want to use up the single-quote syntax on “something as niche as ascii strings”. these people were only placated by appealing to the C (and basically every other language) precedent where single quotes are for “single objects” and double quotes are for “vector objects”. so i don’t think let s:String = 'ab' is going to fly.

there’s a bunch of people (especially in the core team) who wanted to extend the single quote syntax to cover Unicode.Scalar and Character, so that these (important!) types finally get a dedicated literal syntax instead of having to write as Character everywhere. they did not want to see single quotes limited to just the U+0 to U+128 single codepoint range. so i don’t think having let c:Character = '👩🏼‍💻' error out is going to fly.

this thread has 3631637642 posts because influential people wanted these features in the proposal. stripping them out realistically is just gonna put us back where we started and all these things are just gonna get rehashed anew.

Is your first paragraph a separate thought from the rest?

¶1 says you like Option 4.

¶2–5 say you have reservations about “it”. But none of those reservations apply to Option 4; they all apply to Michel’s alternative idea.

1 Like

What you're describing is not Option 4. Here is Option 4. Unicode.Scalar and Character can be created with (appropriate) single-quoted literals.

1 Like

oh sorry that’s a typo I was referring to michel fortin’s table (option 5,, so many options…). the actual option 4 looks sensible to me.

1 Like

*option 5, but again, i like option 5— it’s simple and completely avoids the deprecation/functional duplication issues with option 4, where we would have to figure out how to phase out double quotes without disturbing ABI. i’m just saying i don’t think it has a realistic chance of passing.

1 Like

Since option 4 seems to be the most popular, I’ve rewritten the proposal document based on it, which can be viewed here


Integer-convertible character literals


Swift’s String type is designed for Unicode correctness and abstracts away the underlying binary representation of the string to model it as a Collection of grapheme clusters. This is an appropriate string model for human-readable text, as to a human reader, the atomic unit of a string is (usually) the extended grapheme cluster. When treated this way, many logical string operations “just work” the way users expect.

However, it is also common in programming to need to express values which are intrinsically numeric, but have textual meaning, when taken as an ASCII value. We propose adding a new literal syntax takes single-quotes ('), and is transparently convertible to Swift’s integer types. This syntax, but not the behavior, will extend to all “scalar” text literals, up to and including Character, and will become the preferred literal syntax these types.


For both correctness and efficiency, [UInt8] (or another integer array type) is usually the most appropriate representation for an ASCII string. (See Stop converting Data to String for a discussion on why String is an inappropriate representation.)

A major pain point of integer arrays is that they lack a clear and readable literal type. In C, 'a' is a uint8_t literal, equivalent to 97. Swift has no such equivalent, requiring awkward spellings like UInt8(ascii: "a"). Alternatives, like spelling out the values in hex or decimal directly, are even worse. This harms readability of code, and is one of the sore points of bytestring processing in Swift.

static char const hexcodes[16] = {
    '0', '1', '2', '3', '4' ,'5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
let hexcodes = [
    UInt8(ascii: "0"), UInt8(ascii: "1"), UInt8(ascii: "2"), UInt8(ascii: "3"),
    UInt8(ascii: "4"), UInt8(ascii: "5"), UInt8(ascii: "6"), UInt8(ascii: "7"),
    UInt8(ascii: "8"), UInt8(ascii: "9"), UInt8(ascii: "a"), UInt8(ascii: "b"),
    UInt8(ascii: "c"), UInt8(ascii: "d"), UInt8(ascii: "e"), UInt8(ascii: "f")

Sheer verbosity can be reduced by applying “clever” higher-level constructs such as

let hexcodes = [
    "0", "1", "2", "3",
    "4", "5", "6", "7",
    "8", "9", "a", "b",
    "c", "d", "e", "f"
].map{ UInt8(ascii: $0) }

or even

let hexcodes = Array(UInt8(ascii: "0") ... UInt8(ascii: "9")) + 
               Array(UInt8(ascii: "a") ... UInt8(ascii: "f"))

though this comes at the expense of an even higher noise-to-signal ratio, as we are forced to reference concepts such as function mapping, or concatenation, range construction, Array materialization, and run-time type conversion, when all we wanted to express was a fixed set of hardcoded values.

In addition, the init(ascii:) initializer only exists on UInt8. If you're working with other types like Int8 (common when dealing with C APIs that take char), it is much more awkward. Consider scanning through a char* buffer as an UnsafeBufferPointer<Int8>:

for scalar in int8buffer {
    switch scalar {
    case Int8(UInt8(ascii: "a")) ... Int8(UInt8(ascii: "f")):
        // lowercase hex letter
    case Int8(UInt8(ascii: "A")) ... Int8(UInt8(ascii: "F")):
        // uppercase hex letter
    case Int8(UInt8(ascii: "0")) ... Int8(UInt8(ascii: "9")):
        // hex digit
        // something else

Aside from being ugly and verbose, transforming Unicode.Scalar literals also sacrifices compile-time guarantees. The statement let char: UInt8 = 1989 is a compile time error, whereas let char: UInt8 = .init(ascii: "߅") is a run time error.

ASCII scalars are inherently textual, so it should be possible to express them with a textual literal without requiring layers upon layers of transformations. Just as applying the String APIs runs counter to Swift’s stated design goals of safety and efficiency, forcing users to express basic data values in such a convoluted and unreadable way runs counter to our design goal of expressiveness.

Integer character literals would provide benefits to String users. One of the future directions for String is to provide performance-sensitive or low-level users with direct access to code units. Having numeric character literals for use with this API is hugely motivating. Furthermore, improving Swift’s bytestring ergonomics is an important part of our long term goal of expanding into embedded platforms.

Proposed solution

Let's do the obvious thing here, and conform Swift’s integer literal types to ExpressibleByUnicodeScalarLiteral. These conversions will only be valid for the ASCII range U+0 ..< U+128; unicode scalar literals outside of that range will be invalid and treated similar to the way we currently diagnose overflowing integer literals. This is a conservative limitation which we believe is warranted, as allowing transparent unicode conversion to integer types carries major encoding pitfalls we want to protect users from.

ExpressibleBy UnicodeScalarLiteral ExtendedGraphemeClusterLiteral StringLiteral
UInt8:, … , Int: yes* no no
Unicode.Scalar: yes no no
Character: yes (inherited) yes no
String: no* no* yes
StaticString: no* no* yes

Cells marked with an asterisk * indicate behavior that is different from the current language behavior.

As we are introducing a separate literal syntax 'a' for “scalar” text objects, and making it the preferred syntax for Unicode.Scalar and Character, it will no longer be possible to initialize Strings or StaticStrings from unicode scalar literals or character literals. To users, this will have no discernable impact, as double quoted literals will simply be inferred as string literals.

This proposal will have no impact on custom ExpressibleBy conformances, however, integer types UInt8 through Int will now be available as source types provided by the ExpressibleByUnicodeScalarLiteral.init(unicodeScalarLiteral:) initializer. For these specializations, the initializer will be responsible for enforcing the compile-time ASCII range check on the unicode scalar literal.

init() unicodeScalarLiteral extendedGraphemeClusterLiteral stringLiteral
:UInt8, … , :Int yes* no no
:Unicode.Scalar yes no no
:Character yes (upcast) yes no
:String yes (upcast) yes (upcast) yes (upcast)
:StaticString yes (upcast) yes (upcast) yes

The ASCII range restriction will only apply to single-quote literals coerced to an integer type. Any valid Unicode.Scalar can be written as a single-quoted unicode scalar literal, and any valid Character can be written as a single-quoted character literal.

'a' 'é' 'β' '𓀎' '👩‍✈️' "ab"
:String "ab"
:Character 'a' 'é' 'β' '𓀎' '👩‍✈️'
:Unicode.Scalar U+0061 U+00E9 U+03B2 U+1300E
:UInt32 97
:UInt16 97
:UInt8 97
:Int8 97

With these changes, the hex code example can be written much more naturally:

let hexcodes: [UInt8] = [
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'

for scalar in int8buffer {
    switch scalar {
    case 'a' ... 'f':
        // lowercase hex letter
    case 'A' ... 'F':
        // uppercase hex letter
    case '0' ... '9':
        // hex digit
        // something else

Choice of single quotes

We propose to adopt the 'x' syntax for all textual literal types up to and including ExtendedGraphemeClusterLiteral, but not including StringLiteral. These literals will be used to express integer types, Character, Unicode.Scalar, and types like UTF16.CodeUnit in the standard library.

The default inferred literal type for let x = 'a' will be Character, following the principle of least surprise. This also allows for a natural user-side syntax for differentiating methods overloaded on both Character and String.

Single-quoted literals will be inferred to be integer types in cases where a Character or Unicode.Scalar overload does not exist, but an integer overload does. This can lead to strange spellings such as '1' + 1' == 98. However, we forsee problems arising from this to be quite rare, as the type system will almost always catch such mistakes, and very few users are likely to express a String with two literals instead of the much more obvious "11".

Use of single quotes for character/scalar literals is heavily precedented in other languages, including C, Objective-C, C++, Java, and Rust, although different languages have slightly differing ideas about what a “character” is. We choose to use the single quote syntax specifically because it reinforces the notion that strings and character values are different: the former is a sequence, the later is a scalar (and "integer-like"). Character types also don't support string literal interpolation, which is another reason to move away from double quotes.

Single quotes in Swift, a historical perspective

In Swift 1.0, we wanted to reserve single quotes for some yet-to-be determined syntactical purpose. However, today, pretty much all of the things that we once thought we might want to use single quotes for have already found homes in other parts of the Swift syntactical space. For example, syntax for multi-line string literals uses triple quotes ("""), and string interpolation syntax uses standard double quote syntax. With the passage of SE-0200, raw-mode string literals settled into the #""# syntax. In current discussions around regex literals, most people seem to prefer slashes (/).

At this point, it is clear that the early syntactic conservatism was unwarranted. We do not forsee another use for this syntax, and given the strong precedent in other languages for characters, it is natural to use it.

Existing double quote initializers for characters

We propose deprecating the double quote literal form for Character and Unicode.Scalar types and slowly migrating them out of Swift.

let c2 = 'f'               // preferred
let c1: Character = "f"   // deprecated

Detailed Design

The only standard library change will be to add {UInt8, Int8, ..., Int} to the list of allowed Self.UnicodeScalarLiteralType types. (This entails conforming the integer types to _ExpressibleByBuiltinUnicodeScalarLiteral.) The ASCII range checking will be performed at compile-time in the typechecker, in essentially the same way that overflow checking for ExpressibleByIntegerLiteral.IntegerLiteralType types works today.

protocol ExpressibleByUnicodeScalarLiteral {
    associatedtype UnicodeScalarLiteralType: 
        {StaticString, ..., Unicode.Scalar} + {UInt8, Int8, ..., Int}
    init(unicodeScalarLiteral: UnicodeScalarLiteralType)

The default inferred type for all single-quoted literals will be Character, addressing a longstanding pain point in Swift, where Characters had no dedicated literal syntax.

typealias UnicodeScalarLiteralType           = Character
typealias ExtendedGraphemeClusterLiteralType = Character 

This will have no source-level impact, as all double-quoted literals get their default inferred type from the StringLiteralType typealias, which currently overshadows ExtendedGraphemeClusterLiteralType and UnicodeScalarLiteralType. The UnicodeScalarLiteralType typealias will remain meaningless, but ExtendedGraphemeClusterLiteralType typealias will now be used to infer a default type for single-quoted literals.

Source compatibility

This proposal could be done in a way that is strictly additive, but we feel it is best to deprecate the existing double quote initializers for characters, and the UInt8.init(ascii:) initializer.

Here is a specific sketch of a deprecation policy:

  • Continue accepting these in Swift 5 mode with no change.

  • Introduce the new syntax support into Swift 5.1.

  • Swift 5.1 mode would start producing deprecation warnings (with a fixit to change double quotes to single quotes.)

  • The Swift 5 to 5.1 migrator would change the syntax (by virtue of applying the deprecation fixits.)

  • Swift 6 would not accept the old syntax.

During the transition period, "a" will remain a valid unicode scalar literal, so it will be possible to initialize integer types with double-quoted ASCII literals.

let ascii:Int8 = "a" // produces a deprecation warning 

However, as this will only be possible in new code, and will produce a deprecation warning from the outset, this should not be a problem.

Effect on ABI stability

All changes except deprecating the UInt8.init(ascii:) initializer are either additive, or limited to the type checker, parser, or lexer. Removing String and StaticString’s ExpressibleByUnicodeScalarLiteral and ExpressibleByExtendedGraphemeClusterLiteral conformances would otherwise be ABI-breaking, but this can be implemented entirely in the type checker, since source literals are a compile-time construct.

Removing UInt8.init(ascii:) would break ABI, but this is not necessary to implement the proposal, it’s merely housekeeping.

Effect on API resilience


Alternatives considered

Integer initializers

Some have proposed extending the UInt8(ascii:) initializer to other integer types (Int8, UInt16, … , Int). However, this forgoes compile-time validity checking, and entails a substantial increase in API surface area for questionable gain.

Lifting the ASCII range restriction

Some have proposed allowing any unicode scalar literal whose codepoint index does not overflow the target integer type to be convertible to that integer type. Consensus was that this is an easy source of unicode encoding bugs, and provides little utility to the user. If people change their minds in the future, this restriction can always be lifted in a source and ABI compatible way.

Single-quoted ASCII strings

Some have proposed allowing integer array types to be expressible by multi-character ASCII strings such as 'abcd'. We consider this to be out of scope of this proposal, as well as unsupported by precedent in C and related languages.


One thing I took from option 5 is initialising an array of Ints from the characters in a string. Perhaps this could be added to the standard library as an annex to the proposal:

extension Array where Element: FixedWidthInteger {
  public init(_ characters: String) {
    self = { Element($0.value) }

let hexcodes2 = [Int8]("0123456789abcdef")

You can also use ExpressibleByStringValue

1 Like