There are some totally valid concerns here, but I want to address some misunderstandings about Swift and Unicode.
Swift firmly establishes that String and related types are Unicode and not some other standard.
Unicode is a “Universal Character Encoding”. It is a mapping between “characters” (cough not in the grapheme cluster sense) and numbers starting from 0. This assignment to a specific number is the crux of the standard. The elements of this universal encoding we call Unicode is “code points” (cough or Unicode scalar values for only the valid ones).
A “Unicode Encoding” is an encoding of the universal encoding (or a subset of it), which may have alternate numbers (e.g. EBCDIC) or more complex details (e.g. uses a smaller-width representation). The elements of such an encoding are called “code units”.
Nothing in this proposal is attempting to bake in anything about particular “code units” from some particular Unicode encoding, rather it is addressing Unicode itself’s element type, “code point”.
(This is all pretty confusing.)
I’m not presenting an argument that Swift syntax should operate under some implicit conversion between a syntactic construct such as a literal and a number. That’s the purpose of this review thread and I understand that people can disagree for valid reasons. I’m just trying to dispel some of the FUD around mixing up Unicode and some particular Unicode encoding.
Code points are integer values. The code point ‘G’ is inherently 71 in Unicode. The idea that digit 8 is the same thing as number 56 is the point of character encodings.
Swift supports Unicode, but does not mandate a particular Unicode encoding. EBCDIC is not a subset of Unicode as decoding it to Unicode involves remapping some values.
(Again, this is not an argument that Swift’s particular syntactic construct for code points should produce Swift’s number types, just that these kinds of encoding-related concerns are not relevant)
This proposal does not change that.
Swift forces Unicode on us, and then Unicode forces ASCII on us. Unicode by explicit design is a superset of ASCII. From the standard:
While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCII’s limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the world—more than 1 million characters can be encoded.
UTF-8 is a red herring here. We’re talking about Unicode itself, i.e. code points not code units. If 0x61 were to map to ‘x’, that wouldn’t be Unicode, and if it isn’t Unicode, it isn’t Swift.
Any encoding that’s not literally compatible with ASCII (i.e. without decoding) is not literally compatible with Unicode. Such encodings might be “Unicode encodings” (encoding of an encoding), meaning that they need to go through a process of decoding in order to be literally equivalent to ASCII/Unicode.
Could you elaborate? For many tasks, pattern matching over String.utf8 is exactly what you should be doing.
I do agree that arithmetic feels odd and out place for these literals. I feel like most of the utility comes from equality comparisons and pattern matching.
Alternatively, are there any other options for excluding these from operators? I don’t recall exactly how availability works with overload resolution (@xedin?), but would it be possible to have some kind of unavailable/obsoleted/prefer-me-but-don’t-compile-me overloads for arithmetic operators that take