I think it should be the created type's decision on whether to allow, or not, multiple 'char' literals.
Also, this should be a completely new literal type, with its own ExpressibleBy*Literal protocol, or protocols.
Possible names I can think of are ExpressibleByCodepointLiteral for the single 'char' case, and ExpressibleByTextLiteral for the multiple 'char' case.
As for real world advantages of 'char' literals, I think some standards use integer discriminators where the ASCII interpretation of the value is correlated with its semantic meaning, and if putting that sort of data in code, it would be better to use the more readable ASCII in source, instead of a raw number.
Do you think there would be value in a separate ExpressibleBy protocol for this? As in, you could have ExpressibleBySingleQuotedLiteral (perhaps with a better name), then FixedWidthInteger (or even perhaps Numeric), Character, and Unicode.Scalar could conform.
One reason I ask is that an effect of the reference implementation (where Character conforms to ExpressibleByIntegerLiteral), users can type the following:
i don’t think this is a good enough reason. maybe javascript programmers see it differently but I wouldn’t assume 8 as Character to be anything but "\u{0x8}"
Now that raw strings have a different syntax, thanks to your work, I'm happy with the idea of using single quotes for character literals. Re-reading the months-old discussion here, I find the view in these posts compelling:
This approach seems clean and straightforward. I can't say I understand the benefit (besides lower implementation complexity) of making single quoted literals be a type of integer literal. This only limits the applications for this literal form, and therefore makes it harder for a proposal to clear the bar for usefulness. A new ExpressibleBy*Literal also lets you do the natural thing of making '🙂' default to Character instead of an Int which cuts down on a lot of weird behaviour (e.g. I presume I can write var x = '🙂' * '🙂' or '8' + '9' in your prototype, etc).
Thanks for the clarifications. I think I’ve finally got the bigger picture, updated the proof of concept implementation and made a toolchain is available here.
There is now a separate ExpressibleByCodepointLiteral protocol and the default type of these literals is now Character. The examples above all work except the following expectation:
let x3: Character = '🇨🇦' // not ok
Despite having type Character, single quoted “codepoint” literals are derived from Integer literals in this implementation and can only represent single codepoint graphemes. The advantage of taking this tack is better error reporting and the checking the codepoint fits into the destination type.
I hope this experiment will be of use moving this pitch along.
Great, thanks. I still think it might be hard for a user to understand why they can't write a Character using what they will roughly think of as a character literal there, though.
If been thinking about single-delimiter literals for quite some time, and imho we could skip the second ' here.
I'm not sure how important the aspect of brevity is, but 33% would be a significant reduction ;-)
It’s an artefact of the implementation or should I say it’s all I could get to work. Somebody who actually knows what they are doing might fare better but I don’t think this is a burdensome limitation in practice. In fact, I like the model that these literals are more Int-like than Character-like. There is no escaping the need to have at least some knowledge of UNICODE’s vagaries.
You can see one problem in your message - external colorising editors would get confused.
That's a small minority of characters, and imho the numerical value is better in this case - or something like Character.space (which could be shortened to .space in many situations).
But afaics, ' wouldn't be problematic either, given that a lonely single quote would always be an error.
Would this actually be allowed?
If that's the case, it would be a really obvious argument for keeping the closing delimiter... but I thought that the length of the literal always has to be one, and in this case, there's no need for a second way to signal its end.
... if the compile-time code execution story finally makes some progress, this wouldn't even be expensive ;-)
But why should we not use '? If there is no other idea for it, there's little merit not utilizing it.