Yes. This feature will be a boon to anyone doing C interop, textual parsing, or working with other low-level data formats. The current UInt8(ascii:) initializer is a pain to type and decreases the readability of code. Furthermore, this lets us finally represent single Characters conveniently without explicitly typing or as-casting them.
100%. I'm bummed that we can't get the conformances in automatically, but I understand that's a limitation of ABI stability right now and this is still a step in the right direction.
This fits nicely with how character literals are handled in many languages similar to Swift, so users will find it natural, and the compile-time validation fits perfectly.
Read and participated in various threads about the issue on the forum.
A new syntax is introduced for single-quoted character literals.
No new literal-expressible protocol is added. Instead, the existing unicode-scalar and extended-grapheme-cluster literal protocols are used.
The default type of a single-quoted character literal is Character.
Since String conforms to ExpressibleByExtendedGraphemeClusterLiteral, a single-quoted character literal can be assigned to a String:
let s: String = 'ü'
It is possible to extend the standard-library integer types (UInt8, Int, etc.) with conformance to ExpressibleByUnicodeScalarLiteral. Doing so will allow the assignment of a single-quoted ASCII character literal to an integer:
let n: UInt8 = '*'
This is checked at compile time to ensure the literal is in the ASCII range (0–127). The double-quoted syntax does not work for integers.
The existing double-quoted syntax for unicode-scalar and extended-grapheme-cluster literals is deprecated. In a future version of Swift, we can expect that syntax to be removed. When that happens, it will no longer work.
• I think String should not be expressible by a single-quoted literal, only a double-quoted literal.
• It is not clear to me why integers are restricted to being represented by ASCII characters only. Every Unicode scalar has a unique integer value, which is already available in Swift at runtime:
let x: Unicode.Scalar = "∫"
let n: UInt32 = x.value
print(n) // 8747
It seems straightforward to make this available at compile-time through the literal syntax being proposed.
• • •
In light of the ABI concerns regarding retroactive conformance of integer types to the ExpressibleByUnicodeScalarLiteral protocol, perhaps another possibility should be considered:
We currently have a scenario where double-quoted literals can be interpreted through any of three different ExpressibleBy___Literal protocols: String, UnicodeScalar, and ExtendedGraphemeCluster. Evidently, there is no problem with the same syntax being used for multiple kinds of literals.
We also have four different syntaxes for integer literals: decimal, hexadecimal (0x), binary (0b), and octal (0o). Thus there is also apparently no problem with multiple syntaxes for the same kind of literal.
Therefore, perhaps we should make single-quoted characters also be valid *integer* literals. Instead of trying to shoehorn integer types into accepting UnicodeScalar literals, we could simply make the compiler recognize single-quoted characters as another way to spell integer literals.
Then there is no ABI concern, and in fact no new conformance to add at all.
Small note: please don't use the term "retroactive conformances" for "conformances added in a compatible ABI version". For better or worse, we're already using it for "conformances added in a separate module from either the type or the protocol", and the issues around the two are not the same. Maybe "version-dependent conformances" or "backwards-deployable conformances" (I'm not sure which meaning's being used here).
I am weakly in favor of the proposal except for the deprecation part, because I don't want to assume we'll have a -swift-version 5.1. I also don't see why init(ascii:) makes sense to deprecate, since it can be useful for run-time values too. Finally,
Some have proposed allowing integer array types to be expressible by multi-character ASCII strings such as 'abcd'. We consider this to be out of scope of this proposal, as well as unsupported by precedent in C and related languages.
I agree that this is out of scope, but there is some precedent for it: multicharacter scalar literals as an implementation-defined part of C and C++. I've definitely had use for "sequence of bytes that have a nice ASCII representation" in parsing code and I'd support such an addition in the future.
This isn’t ideal but difficult to avoid in a ABI stable world:
Greater flexibility would have been my preference but a long discussion about the subtleties of unicode normalisation persuaded the thread it was best to steer clear of being able to represent integers outside the ASCII range (The Option 2 vs. Option 4 debate). We can always relax this constraint later if a killer use-case turns up.
This was the first implementation proposed (Codepoint literals) but it became obvious the proper abstraction in the Swift string world for single quotes should be as Character literals that can use the more flexible existing protocols for nearly the same result.
Use single-quoted literals for ExpressibleByUnicodeScalarLiteral with all the same conforming types as today.
Use single-quoted literals for ExpressibleByExtendedGraphemeClusterLiteral (sidenote: ugh, why was this not spelled ExpressibleByCharacterLiteral?) with all the same conforming types as today.
And also, use single-quoted literals for ExpressibleByIntegerLiteral with all the same conforming types as today.
…just because the conformance is there, doesn’t mean we have to accept the syntax.
I have a vague recollection that some types conforming to Collection do not (or at least did not) allow direct subscripting, and this is/was intentional. It might have been some sort of Range, and the reasoning was that people don’t expect the subscript to be an identity operation. It still worked in generic contexts, it just wasn’t allowed to be used concretely.
I don’t know if that’s still the case, but I am fairly confident it was at some point.
So if we wanted to, I expect we could do something similar and make it an error to directly assign a single-quoted literal to a String.
Do we not have a (Character, Character) -> String overload of the + operator? Because that’s what I’d expect this to use.
I used to agree with you, but it turns out taking ExpressibleByStringLiteral out of the text literal protocol hierarchy brings a lot of ABI issues with it, and in principle, let s:String = 'a' isn’t too different from let f:Float = 1. Complain all you want about the binary stability, but if you ask me it’s not worth fighting the ABI over this.
Early iterations of the proposal allowed exactly what you’re asking, but developers who use a lot of international text said there would be a lot of problems relating to unicode encoding once you get past U+0x7F. (literals that change value when you save the .swift file???) We think their concerns are valid, so we have the ASCII restriction.
We don’t really have to. I don’t care too much about the init(ascii:) initializer and that part can be omitted from the proposal without affecting anything else.
We don’t. it’s okay, i could have sworn we had one too lol. One draft of the proposal said specifically to add this exact operator to Character x Character but once you have 'a' + 'a' the next logical step is 'a' * 5, which of course brings us to 'a' * 'a' (???)
I like most parts, except for the proposed integer conformances. Integer literal initialization seems like a relatively niche feature that comes with an unreasonably large conceptual/pedagogical cost.
I can fully appreciate the convenience of being able to write code like this while I'm writing a parser for some binary file format that includes ASCII elements:
let magic: [UInt8] = ['G', 'I', 'F', '8', '9', 'a']
However, in the overwhelmingly vast majority of my time, I'm not writing such a parser, and the notion that it's okay to consider some characters unambiguously identical to their ASCII codes rubs me the wrong way.
By their nature, characters aren't integer values. There is nothing inherently "71ish" about the letter G; the idea that the digit 8 is the same thing as the number 56 is absurd.
We can encode G and 8 to integers (or, rather, a series of bits) by selecting an encoding. There are many encodings to choose from, and not all use the same representations as ASCII; indeed, Swift has some support for running on IBM Z systems that prefer encodings from the EBCDIC family.
The Swift stdlib has so far been careful to always make the selection of an encoding explicit in Swift source code. This has been best practice in API design for multiple decades now, so I always assumed this was a deliberate choice -- any implicit encoding would certainly lead to hard-to-spot mistakes. This proposal is quite flagrantly breaking this practice. Limiting the feature to ASCII is a good compromise, but it does not fully eliminate the problem.
In view of this, I would prefer if the passage about eventually adding ExpressibleByUnicodeScalarLiteral conformance to Int8/.../Int was dropped from the proposal text. Instead, users should keep explicitly opting into the integer initialization feature, even if versioned conformances become available. The required one-liner conformance declaration honestly doesn't seem like an unreasonable burden for authors of parser packages.
The UInt8(ascii:) initializer should not be deprecated; rather, it should be added to all integer types. (The objections raised in the Alternatives Considered section seem quite weak to me.)
Yes, despite the objection above.
Single-quoted Unicode.Scalar and Character literals are obviously desirable.
As detailed above, I consider integer initialization to be a bad idea in general. Limiting it to ASCII is a workable compromise; indefinitely leaving it opt-in would be even better.
I don't know any language that properly supports Unicode at the level Swift aims for. APIs that implicitly assume a string encoding are generally considered subpar in the libraries I worked with -- unfortunately, character initialization is often a legacy language feature that's has to be kept unchanged.
I was involved in late-stage pitch discussions, and thought long and hard about the issues.
I think there is enough reason to give ASCII preferred treatment over all other encodings, just from the fact that so many major binary formats explicitly specify it. For example, the png standard:
3.2. Chunk layout
Each chunk consists of four parts:
A 4-byte chunk type code. For convenience in description and in examining PNG files, type codes are restricted to consist of uppercase and lowercase ASCII letters (A-Z and a-z, or 65-90 and 97-122 decimal). However, encoders and decoders must treat the codes as fixed binary values, not character strings. For example, it would not be correct to represent the type code IDAT by the EBCDIC equivalents of those letters. Additional naming conventions for chunk types are discussed in the next section.
Non-ASCII 7-bit encodings are so rare compared to Non-UTF8 >7-bit encodings that i really don’t think it’s valid to use the same argument for both. We shouldn’t make the common case (ASCII) difficult to use just to accommodate a very uncommon case (EBCDIC).
We have a + operator that concatenates arrays, but nobody expects there to be a * operator that repeats an array n times. Some people might want that, but we don’t have it and the slope is not slippery.
Furthermore, with the proposal in its current form, 'a' * 'a' would be valid Swift (after a zero-effort user-supplied conformance, which is only required for ABI reasons), expressing the multiplication of 97 by itself as an integer. So if you think “'a' * 'a'” is objectionable, then it follows that the current proposal is as well.
your whole premise is that 'a' shouldn’t be a String, but really there isn’t a strong argument why it shouldn’t be since and if we go with the ABI-friendly route, we get 'a' + 'a' = "aa" for free. 'a' * 'a' is unfortunate but we already decided for users, it’s fine since the type checker would almost certainly catch that mistake. For standard library authors, 'a' * 'a' is a different question since if we are vending + (Character, Character) -> String, we would also have to consider * (Character, Int) -> String along with it (python has it after all)
Way off topic, It works because ’a’ is a character literal expression which can be expressed in a single unicode scalar so it searches for ExpressibleByUnicodeScalarLiteral which String must conform to by virtue of the inheritance hierarchy from ExpressibleByStringLiteral as opposed to Character which is a type. I don’t know exactly why you're seeing that specific error.