While discussing these single-quoted literals, I have become increasingly convinced of several interesting things about them. Here is an opinionated point by point summary:
-
The ergonomics of low-level string processing (most especially, ASCII processing) is a significant pain point for Swift users, especially when it comes to dealing with individual code points. We need to do something to fix this.
-
Characters aren’t integers; integers aren’t characters. There must not be any implicit conversion between ‘charactery’ and integral types and values. The act of encoding a string/character needs to leave a mark in the source; the mark must unambiguously indicate the specific encoding used.
This prohibition against implicit conversions applies to initialization, comparisons (
==
), pattern matching (~=
), and any other way a number and a character could appear in an expression together. -
The consequences of allowing integer types to be directly initialized from single- or double-quoted stringy literals are unacceptable. Abominations like
’a’ % ‘2’
or56 >> ‘q’
or42.multipliedFullWidth(by: ‘b’)
must not be allowed to compile. This is merely a corollary of the previous point, but it highlights a particularly obnoxious side effect. -
Character
is on the wrong level of abstraction when it comes to processing ASCII bytes. (”\r\n”
is a singleCharacter
that represents a sequence of two ASCII characters, andCharacter
considers the ASCII semicolon;
to be substitutable with GREEK QUESTION MARK (U+037E). These are clearly inappropriate features for the byte processing usecase.)Character.asciiValue
is fundamentally broken — it can cause silent data loss, and therefore it needs to be deprecated. (Note: This is not to say theCharacter
abstraction isn’t useful at all. On the contrary: it’s clearly the right choice forString
’s element type.) -
Unicode.Scalar
and its associated string view are much closer to the level of actual encodings, and they are much more appropriate abstractions for low-level text processing. This is particularly true for ASCII, but it also applies to any other context where equivalency under Unicode normalization would be inappropriate/unnecessary.Unicode.Scalar
is a type that is crying out for its own literal syntax. It has grown an awesome set of APIs in Swift 5 — it’s a shame that its rich properties are locked away behind convoluted syntax. I want to be able to just type‘\u{301}’.name
into a playground to learn about a particular code point. -
There are no strong usecases for adding dedicated literal syntax for the
Character
type.”👨👩👧👦“ as Character
and’👨👩👧👦‘ as Character
both seem acceptable spellings, with a preference for the first. -
Arranging things so that
’\r’
evaluates to the Unicode scalar U+000D and’\r’.ascii
evaluates to theUInt8
value of 13 would resolve all of the issues above.As a best-effort quality-of-implementation improvement, the compiler should produce a diagnostic for obviously wrong cases like
’é’.ascii
. However, if the diagnostic proves too difficult or objectionable, then it’s acceptable to merely rely on runtime traps provided by the stdlib. People processing ASCII text can be expected to know what characters are in ASCII.(AFICT, the diagnostic would be straightforward to implement and relatively uncontroversial.)
(Note: I’m using strongly worded statements above, but these are merely my personal convictions. I might change my mind about some of them in the future, if I’m given good reason to do so. Indeed, I previously believed that ‘a’
should default to Character
.)