There is no semantic reason why UInt8(ascii:)
requires the ICU runtime or could not be done at compile time.
Okay, so it is mostly a limitation of the current implementation? Or something to do with the moving target unicode spec/ICU version/whatever? My main concern is that it won't make any sense from a user's perspective.
What's the ICU runtime?
Again, that's still very niche. If I ever saw someone classify between ASCII and latin by a mysterious let isAscii = i > 0
, I would instantly reject the CR. No questions asked.
That sort of feature is much better to just be wrapped in a method or boolean computed property.
It could be better used for regex literals, grammar generators, or made into some sort of facility for supporting user DSLs.
I think Swift has more important things to do for the comma, besides mimicking the look of a 50 year old language feature* for supporting a largely defunct 52 year old text encoding.
- Single quotes as chars date back at least as old as the B programming language, circa 1969.
i mean, itâs only niche to some people. If you do any sort of work with any kind of binary format, youâre gonna need these, and need them a lot
Yes, Itâs mostly a limitation of the current implementation though it may be difficult to remove the limitation that single quoted literals be single code points. The key requirement is that you get an error when a character overflows the target type at compile time.
let d2: Int8 = 'đ'
k.swift:6:16: error: codepoint literal '128578' overflows when stored into 'Int8'
The easiest way to do this is to leverage the existing Integer literal code. This reuse imposes the limitation that a single quoted literal can only be a single code point but it does mean you can give a meaningful error message early on.
let invalid = âđšđŒâđ'
k.swift:2:15: error: character not expressible as a single codepoint
Perhaps this limitation can be relaxed in the longer term if it turned out to be a problem but for now the prototype is probably adequate along with Chrisâ draft to move to a worthwhile review.
that proposal (which is mine btw though chris contributed a lot) has an important difference and thatâs that the inferred type of 'a'
is Character
and you do 'a' as Int
or let scalar:Int = 'a'
. So we have a CodepointLiteral
that defaults to Character
, and we conform Int
and friends to ExpressibleByCodepointLiteral
.
Your idea if i get it right is that the inferred type of 'a'
is Int
and you do 'a' as Character
or let character:Character = 'a'
via ExpressibleByIntegerLiteral
.
Personally i think the inferred type being Int
makes more sense, but we also do miss out on an opportunity to provide a syntax for Character
literals that donât need an as
annotation which we currently donât have. Being able to write
let character:Character = 'a'
isnât much of a win over the current
let character:Character = "a"
The drawback is we risk confusing people (even more) about Swiftâs string model since not every valid Character
can be written as a CodepointLiteral
, but every valid Character
can be written as a StringLiteral
.
Sorry @taylorswift, I should have mentioned this was your proposal & Iâm not trying to propose anything different but I can provide you with an implementation to meet the review bar if it helps as I want to see this pitch succeed. Initially I tried to get away with a model where these literals were of type Int
but made a second pass and they are now created using a protocol ExpressibleByCodepointLiteral
which all int types conform to along with Unicode.Scalar and Character so one of these literals can have any of those types.
The implementation imposes a limitation that these âcodepointâ literals be only a single codepoint as they are processed internally as integer literals. The default type is very easy to configure and should probably remain Character
to future-proof the model for a time when perhaps this restriction can be relaxed. This will confuse some people but there is a clear error message and we are still far better off than we are with UInt8(ascii:)
supporting the majority of common 20 bit characters.
the more i think about it the more I think 'a'
should default to Unicode.Scalar
because, we can think of âcharacteryâ constructs in Swift as going from a spectrum from raw to cooked
raw cooked
UInt32 â Unicode.Scalar â Character
so we have three choices for what 'a'
should default to.
-
UInt32
is a perfectly sensible default type. Note that I really donât thinkInt
orUInt
is a good idea, since this should never work:"\u{200000}"
. Iâm fine with the default type having 11 more bits than it should be allowed to have since the purpose is just to remind people that these are not just âinteger literals written with lettersâ. And I assume most people would be using these with explicit type annotations anyway.
Just to be clear, we should be able to coerce'a'
to a 64-bit integer type, we just shouldnât allow this method to set any bits higher than position 21. -
Unicode.Scalar
is also a perfectly sensible default type, since, well, thatâs what these'a'
literals are. We also kill two birds with one stone by underloadingUnicodeScalarLiteral
from double quotes, so thatUnicode.Scalar
s actually have their own literal syntax. -
Character
as a default type would have similar benefits toUnicode.Scalar
in that we would get a way to write aCharacterLiteral
without needing an explicit type annotation. However itâs not a good candidate because not every possibleCharacter
can be written as a codepoint literal. I donât think'đșđž'
should ever work. This means we donât get to underloadCharacterLiteral
from double quotes, since we still need a way to expressCharacter
s like"đșđž"
.
You can argue that maybe'đșđž'
should work, and we should just apply the single-codepoint restriction to explicitly typed integers orUnicode.Scalar
s, but then single-quoted literals kind of just become double quoted literals that work for integers, but donât work forString
s. I donât think this overlap helps us and I think this would only lead to confusion for theUnicode.Scalar
andCharacter
types, since we now have two ways of expressing these literals. It would be as iflet n:Int = 1.0
became a thing.
This also means we canât call single-quoted literals âCodepointLiteral
âs since, well, theyâre not codepoints anymore.
I think most of us agree that it should default to the most cooked representation we can reasonably do, which is why Character
as a default was so popular in this thread. But I think thereâs a good reason to pull back one level and set the default at Unicode.Scalar
. It also takes the problem from adding a whole new literal type to the language to just modifying the existing Unicode.Scalar
literal type to have different syntax.
I'm in the niche you describe, I have an app that reads a binary format. I convert a String
to Data
using the ASCII encoding. Serves this purpose perfectly.
In Swift, a string is now a collection of characters, as it should be in a modern language that has full, first-class support for Unicode. The currency âunitâ of a string is firmly the character (or extended grapheme cluster), and I think it would be quite unjustifiable if 'đșđž'
didnât work.
Are you saying to draw the line between '
and "
between String
and Character
instead of between Character
and Unicode.Scalar
? Iâm not opposed to it but it would mean messing with the existing syntax for both Character
and Unicode.Scalar
(getting rid of "
for those two) instead of just Unicode.Scalar
. So a little more source breaking.
Sure, no problem if that's how you want to approach it. It does impact the naming here though. ExpressibleByCodepointLiteral
isn't an accurate name if the limitation is expected to be relaxed. And in a formal review I would argue against it not being able to represent all Character
s because I don't think it makes sense from a user's perspective.
I don't agree with this at all. I think 'đșđž'
should definitely work, and single quotes should be the preferred way to write all Character
literals (I'm ambivalent about whether the double-quoted versions should be eventually deprecated). I don't find this let n: Int = 1.0
argument convincing. It's more like if let d = 1.0
and let d: Double = 1
both worked to write a Double
, and the difference was just the default type inferred for the different literal forms. And hey, that's exactly how it does work.
I don't see why anything has to be source breaking. The conformance to 'ExpressibleByStringLiteral' could be deprecated (if not strictly by annotating as deprecated, then by a custom warning built into the compiler) instead of removed.
I think thereâs a better way to do this:
Swift has several existing protocols ExpressibleByUnicodeScalarLiteral
, ExpressibleByGraphemeClusterLiteral
, ExpressibleByStringLiteral
that all cohabit the double-quote literal space and look like this:
protocol ExpressibleByUnicodeScalarLiteral
{
init(unicodeScalarLiteral:{Unicode.Scalar, Character, String, StaticString})
}
where the unicodeScalarLiteral:
argument in the requirement is a user chosen associatedtype
chosen from Unicode.Scalar
, Character
, and String
. The compiler checks if the double quoted literal can be narrowed into a Unicode.Scalar
, and then converts it to whatever type you like for the initializer. This conversion always succeeds because you can always upcast a Unicode.Scalar
to a Character
or a String
, so the compiler (or standard library) does it for you.
In contrast the ExpressibleByGraphemeClusterLiteral
protocol requirement doesnât let you write an initializer that takes a Unicode.Scalar
, since the compiler in that case only checked if the literal could be narrowed to a Character
, not all the way down to a Unicode.Scalar
.
protocol ExpressibleByExtendedGraphemeClusterLiteral
{
init(extendedGraphemeClusterLiteral:{Character, String, StaticString})
}
These protocols filter out the double-quoted literals that donât match their requirement at compile time. Thatâs why you get a nice error message when you try to do this:
struct S:ExpressibleByExtendedGraphemeClusterLiteral
{
let value:UInt32
init(extendedGraphemeClusterLiteral:String)
{
self.value = extendedGraphemeClusterLiteral.unicodeScalars.first!.value
}
}
let s1:S = "a",
s2:S = "aa"
literal.swift:11:12: error: cannot convert value of type 'String' to specified type 'S'
s2:S = "aa"
^~~~
That sounds a lot like what weâre trying to do, except more extreme. What we want to do is add two protocols ExpressibleByUnicode16Literal
and ExpressibleByUnicode8Literal
just like the ones above, except they donât just check if the double quoted literal can be downcast into a Character
or a 21-bit Unicode.Scalar
, they check if it can be downcast all the way to a 16 or 8 bit integer.
Thereâs a lot of benefits.
-
We get compile time checking of whether a double-quoted literal overflows a {8, 16, 21}-bit integer, and the compiler emits the same helpful warning it already gives for double-quoted literals that canât be
Characters
orUnicode.Scalars
. -
To write
let n:UInt8 = "a"
all we have to do is conformUInt8
toExpressibleByUnicode8Literal
and the compiler will do the overflow checking. We can make all the ints do the same:Int8
->ExpressibleByUnicode8Literal
UInt16
->ExpressibleByUnicode16Literal
Int16
->ExpressibleByUnicode16Literal
UInt32
->ExpressibleByUnicodeScalarLiteral
Int32
->ExpressibleByUnicodeScalarLiteral
UInt64
->ExpressibleByUnicodeScalarLiteral
Int64
->ExpressibleByUnicodeScalarLiteral
UInt
->ExpressibleByUnicodeScalarLiteral
Int
->ExpressibleByUnicodeScalarLiteral
-
This means you canât sneak something like this past the compiler:
literal.swift:1:25: error: invalid unicode scalar
let u:UInt = "\u{800000}"
which is a good thing.
Instead of using single quote literals for integers, we can just add 8-bit and 16-bit literals as a logical extension of the String -> Character -> Unicode.Scalar
train. We can actually do Int32 and higher already:
extension Int32:ExpressibleByUnicodeScalarLiteral
{
public
init(unicodeScalarLiteral:Unicode.Scalar)
{
self = .init(bitPattern: unicodeScalarLiteral.value)
}
}
let integer:Int32 = "a"
// 97
let hex:[Int32] = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"a", "b", "c", "d", "e", "f"]
//[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102]
Maybe we donât need to use single quote literals after all.
Interesting approach, with the benefit of not requiring a new literal form. The benefits you list seem possible with either single- or double-quoted literals though. I think I would still prefer the single-quoted version because it cordons off a lot of the weird behaviour away from the double quoted literals, which are more commonly used. With your Int32
prototype, for example, you can write:
let x = "8" * "9" // 3192 (in a few (degenerate) languages this would give 72 or "72" or something)
let y = "=" * 5 // 305 (in some languages this is used to repeat a string, giving "=====")
And probably many other weird edge cases I haven't thought of. Most of these things will still be possible with single-quoted literals, but they will be encountered less often.
Single quotes also provide a more convenient way to write a Character
, which seems like a nice minor benefit to me for some string processing (e.g. defining a Set
of Character
s to later filter/split with).
This is getting to the nub of the problem. As historically protocol ExpressibleByStringLiteral
is a descendant of ExpressibleByUnicodeScalarLiteral
so we canât make Int types conform to ExpressibleByUnicodeScalarLiteral
without giving undesirable behaviour to String
type. Therefore, we need a new protocol like ExpressibleByCodepointLiteral
for Int types to conform to for things to work.
Iâd go further and say if only Int types conform to ExpressibleByCodepointLiteral
then it is only useful for these literals to represent integer code points - Character
âs conformance to this protocol is a convenience and shouldnât drive itâs semantics. Itâs a bit of a break for Swift to concede that something exists in a string other than Character
s but I donât see how to avoid it.
wait I donât understand, if ExpressibleByStringLiteral
derives from ExpressibleByUnicodeScalarLiteral
then we can conform Int
to its superprotocol without affecting stuff like String
which conforms to the subprotocols, right? What undesirable String
behavior do you forsee?
I think we should separate the integer literal part of the problem from the single-quotes part of the problem. We can get integer literals just by extending the set of existing double-quoted literal protocols. Single-quoted literals are then a question of whether we want some of the types in the double-quoted literal space
(UInt8
â UInt16
â Unicode.Scalar
â Character
â String
)
to have a different syntax. If we do that I think it should be a clean partitioning, and if something can be written with single-quotes, it shouldnât be allowed to be written with double-quoted literals.
The design I would propose is
ExpressibleByUnicode8Literal // adopted by: UInt8, Int8
â
ExpressibleByUnicode16Literal // adopted by: UInt16, Int16
â
ExpressibleByUnicodeScalarLiteral // adopted by: UInt32, Int32
// UInt64, Int64
â // UInt, Int
ExpressibleByExtendedGraphemeClusterLiteral // adopted by: Character
â
ExpressibleByStringLiteral // adopted by: String
// if we create a new single quoted literal type, we should make it the
// sole literal type for `Character` and below, and set `Character` to be
// its default inferred type.
typealias ExtendedGraphemeClusterType = Character
typealias UnicodeScalarType = Character
typealias Unicode16Type = Character
typealias Unicode8Type = Character
In your example:
extension Int32: ExpressibleByUnicodeScalarLiteral
{
public init(unicodeScalarLiteral: Unicode.Scalar)
{
self = .init(bitPattern: unicodeScalarLiteral.value)
}
}
let i: Int32 = "1" + "1"
print(UInt8(ascii: "1") , i)
// prints 49, 98
This is what @jawbroken was trying to avoid.
I donât think this is avoidable. When you use the +
operator youâre basically signalling to swift that you want this:
let i:Int32 = ("1" as Int32) + ("1" as Int32)
If you agree that
let i1:Int32 = "1",
i2:Int32 = "1"
let i:Int32 = i1 + i2
should give you 98, then the first one should too.
Keep in mind that this:
let string = "1" + "1"
// "11"
would still work as expected.