What is the return value of your proposed function for the single character \r\n?
I do not aim to--and I believe it is neither possible nor desirable to--make everyone happy. I'd like to arrive through discussion on the best possible API. Previously, I was quite adamant that 'x' should be a character literal, but after this extensive discussion I no longer believe that to be the case.
For a design to have clarity, one has to commit. What is this thing we introduce? With that clarity necessarily comes making clear what this thing is not. There are plenty of reasons why, as @lorentey points out, it is more urgent to have a Unicode scalar literal. Therefore, it is not a character literal. The whole underpinning of the proposal is about why it's been a bad idea to have one literal syntax serve three types of literals (Unicode scalar, extended grapheme cluster, string). We have a chance to correct that here; it would be key not to replicate that error.
One argument could run: Weāre not able to decide on where to draw the line between single quoted literals and double quoted literals viz: Character, Unicode.Scalar, ASCII and this would be a source breaking change generating a collective groan from the community. The targeted operators I suggest seem to meet the ergonomic requirements I was looking for and can be implemented now in Swift 5.0 and be made to be free of foot-guns if we restrict the Unicode.Scalars to ASCII in their implementation. For me the question is whether these should make it into the standard library and be another battery that is included with Swift or be a CocoaPod/SPM module.
Iām not converging on Unicode.Scalar but I could live it it. How will it not break source? ...I guess the two syntaxes could live side by side.
Dāuh I know that! I mean the operators I suggest work with Swift 5.0 as it is but could be added to the stdlib for 5.1 with or without single quote syntax.
Thereās a nice symmetry in using double quotes for human-readable text (String and Character), and single quotes for machine-readable text (Unicode.Scalar).
Itās unfortunate that Stringās UTF-16 and UTF-8 code units are represented by UInt16 and UInt8, instead of dedicated types, because that probably excludes them from being expressed by single-quote literals.
It sure is unideal. But we might as well say that this is not ideal:
"2" < "10" // returns false
and then ask people to be precise in what they write to remove any ambiguity:
"2".isLexicographicallyOrdered(with: "10")
But forcing this on everyone is going to have a lot of drawbacks. The convenience of < for strings comes with edge cases where the result can be surprising, but that's the price you pay for being able to express things concisely.
I accept and embrace the fact that unicode scalars and ASCII characters are numbers, and I believe anyone working with them should too, and thus to me it makes sense to create ranges of them and check for inclusion. That it lets you write bizarre things makes it not ideal, especially to the untrained eye, but a more verbose alternative is not ideal either because it makes things less readable when the verbosity gets repeated.
I think it makes sense to more clearly separate literals for things that are numeric in nature (unicode scalars and ASCII characters) from things that aren't (grapheme clusters). But I also believe restricting single quote literals to unicode scalars is going to encourage people to use scalars when they really should use Character, which is probably worse than being able do do weird unicode scalar ranges expressed with double quotes.
This last problem would be mostly gone if those new literals were restricted to ASCII range. Whatever you do with an ASCII character represented as a Character or UnicodeScalar, it'll probably do the same. The only exception would be matching "\r\n" as a Character. But then restricting it to the ASCII range means you can't deprecate double-quotes for UnicodeScalar literals. We're turning full circle.
I agree! The issue I have with trapping ascii is it basically implies that everything else that Unicode.Scalar can represent is the edge case. Weāre basically turning Unicode.Scalar into an ASCII type with 1,111,870 invalid states. At this point, it would be better just to introduce a proper 7-bit ASCII type.
@inlinable is not enough here, to make compile time validation work, you have to annotate it with @constexpression or whatever itās called once the feature is finalized, to prevent it from being callable on a dynamic value. Any API ought to be either checked with runtime preconditions, or compile-time assertions, but not both. Making the behavior change depending on conditions known only to godbolt seems like a recipe for confusion.
As I mentioned above, I am unconcerned and unconvinced that this requires compile-time validation, though if it can be done heuristically then certainly it is a bonus. It's an issue separable from the main topic here regarding literals so I would rather not delve into that further.
enum BitVectorType
{
static
let uint8:(UInt8, UInt8, UInt8, UInt8) =
(
(" " as Character).asciiValue!,
("8" as Character).asciiValue!,
("'" as Character).asciiValue!,
("d" as Character).asciiValue!
)
static
let uint8x:(UInt8, UInt8, UInt8, UInt8) =
(
(" " as Character).asciiValue!,
("8" as Character).asciiValue!,
("'" as Character).asciiValue!,
("h" as Character).asciiValue!
)
static
let uint8b:(UInt8, UInt8, UInt8, UInt8) =
(
(" " as Character).asciiValue!,
("8" as Character).asciiValue!,
("'" as Character).asciiValue!,
("b" as Character).asciiValue!
)
static
let uint16:(UInt8, UInt8, UInt8, UInt8) =
(
("1" as Character).asciiValue!,
("6" as Character).asciiValue!,
("'" as Character).asciiValue!,
("d" as Character).asciiValue!
)
static
let uint16x:(UInt8, UInt8, UInt8, UInt8) =
(
("1" as Character).asciiValue!,
("6" as Character).asciiValue!,
("'" as Character).asciiValue!,
("h" as Character).asciiValue!
)
static
let uint16b:(UInt8, UInt8, UInt8, UInt8) =
(
("1" as Character).asciiValue!,
("6" as Character).asciiValue!,
("'" as Character).asciiValue!,
("b" as Character).asciiValue!
)
static
let uint32:(UInt8, UInt8, UInt8, UInt8) =
(
("3" as Character).asciiValue!,
("2" as Character).asciiValue!,
("ā" as Character).asciiValue!,
("d" as Character).asciiValue!
)
static
let uint32x:(UInt8, UInt8, UInt8, UInt8) =
(
("3" as Character).asciiValue!,
("2" as Character).asciiValue!,
("'" as Character).asciiValue!,
("h" as Character).asciiValue!
)
static
let uint32b:(UInt8, UInt8, UInt8, UInt8) =
(
("3" as Character).asciiValue!,
("2" as Character).asciiValue!,
("'" as Character).asciiValue!,
("b" as Character).asciiValue!
)
}
did you see it? how long did it take you?
This will compile without warnings. Because static properties are lazily computed, this will run without errors. An enum full of constants is also the last thing youād think to unit-test, and how would you give this testing coverage anyway? It would be monumentally stupid if people had to add tests consisting of nothing but instantiating constants to catch these sort of bugs.
Less than 10 seconds, without the aid of any external tools, on my first read through the code, purely by visual inspection, without backtracking. It's here:
("ā" as Character).asciiValue!,
I don't see in what way this is specific to these constants versus any other constants. This seems like a great argument that there should be better tooling for all constants--and I would agree about that.
That said, constants very much should be tested: that's not monumentally stupid. You can't rely on any compiler to tell you that the constant value you supplied is correct: it's not enough in your example to know that you used ASCII characters--you need to know you've used the right ones. Since you're saying you can't trust careful reading, how do you intend to suss out copypasta and other silly errors without testing?
congratulations on your eyeballs. you get the point. Itās really really easy to accidentally type a ā instead of a ' on a lot of platforms. Some of them will turn one into the other automatically. And in a lot of programming fonts, these characters look similar. Itās also not the only character susceptible to āunicode trespassingā,, thereās - and ā, ` and Ė, etc. So yes, i think this is an issue.
Of course you need careful reading. The great thing about ASCII strings is the characters are all visually different enough that careful reading is all you need to do to verify your constants are correct. Unicode means you have to examine them byte by encoded byte.