But there might be existing code that's already doing that.
Also, what about this:
UInt8("10")
Why would that be allowed, but
UInt8("9")
not? Very, very confusing.
Edit: @michelf was quicker
But there might be existing code that's already doing that.
Also, what about this:
UInt8("10")
Why would that be allowed, but
UInt8("9")
not? Very, very confusing.
Edit: @michelf was quicker
It doesn't look like UInt8
conforms to ExpressibleByStringLiteral
, which is what I understood 213 to mean.
We're talking conformance to ExpressibleByUnicodeScalarLiteral, which going forward will still be expressible using double quotation marks.
To be explicit, consider:
let x = UInt8("8")
let y = "8" as UInt8
If UInt8 conforms to ExpressibleByUnicodeScalarLiteral, x should be equal to y according to SE-0213.
In the long run the plan is it wonât and character literals will separate from string literals apart from the unavoidable let a: String = â
aâ`. This problem will go away compiling Swift 6 and both Int8(â8â) and Int(â88â) will give an error. (String literals will no longer look for ExpressibleByUnicodeScalarLiteral inside the type checker - this change would be internal to the compiler)
Neither the removal of the double quotes nor the removal of these actually useful initializers is part of this proposal, so while it may be somebody's long-term plan it is not the plan of record for Swift.
Why? What's wrong with them?
Looks like I need to make a correction Int8(â8â) and Int(â88â) will both work as before this proposal after Swift 6 and they will no longer short-circuit to the ExpressibleByUnicodeScalarLiteral initialiser for single digit literals. Apologies for the mistake.
no, no, no! Please, no one ever do this. Converting bytestrings to String
can change the indexing of the bytestring characters, since some of them could be utf8 continuation bytes or form grapheme clusters. (these would not be valid ASCII scalars, but itâs not like CChar
knows that.) At the very minimum, use init(decoding:as:)
with the Unicode.ASCII
codec. Itâs disheartening that so many people here who love to trumpet the importance of text encodings pay no heed to them when it actually matters.
However, as youâve already discovered, String
is exactly the wrong tool for this job, since it does not vend random access subscripts.
However, according to SE-0213, if an end user adds the conformance, then the compiler should be short circuiting them and might already do so. This would be a very confusing behavior.
Bottom line is that, with the guarantees of backward compatibility and what's already in the standard library, I don't think that integer types can be made to conform to ExpressibleByUnicodeScalarLiteral without producing some really confusing inconsistent emergent behaviors in the standard library.
Since the whole point here is to help users convert ASCII characters to corresponding integer values, adding some convenience at the cost of adding these footguns does not seem wise.
I agree itâs unfortunate which is why the implementation marks it as an error. If Iâm honest I donât know why the compiler takes this route. There is an IntX.init(_ string: String) initialiser in stdlib but it chooses instead to process â88â as an integer literal given half the chance.
It does so because we agreed that it should do so in SE-0213: this is precisely the consequence of proposing to conform an integer to ExpressibleByUnicodeScalarLiteral under those rules, and (besides '1' / '1'
and other examples above) another demonstration of why this conformance is unwise.
What this proposal is attempting to state by the conformance is that, semantically, "8"
is an accepted literal expression for 56
in all circumstances. The trouble is that, excepting the specific circumstance where you want the ASCII encoding of the character 8
, it is not. What the standard library has already included are methods where "8"
in fact converts to 8
, not 56
. By making those conversion initializers unadorned by a label, it has already staked out a claim that in fact that is the usually desired conversion.
Please read the documentation before making wild assumptions like this one. @gwendal.roue stated quite clearly that these are SQLite dates, which cannot contain any non-ASCII characters: Date And Time Functions
Even if they could, chances are that String
indexing would probably be the correct choice:
201đ„-01-01
could be correctly parsed as January first, twothousandeggteen, while [CChar]
indexing could not recognize it properly.
As I've mentioned earlier, this is very dangerous thinking. If that logic applies, I might as well just convert all of my Strings to Arrays now whenever I need some kind of character lookup. This is really just a missing convenience function that should exist as an extension of Collection
.
Discussions about this are plentiful and they tend to run forever. If it interests you, please take it to one of these threads instead. It is off topic here.
If i might suggest a slight amendment that would address some of the concerns raised here:
A good analogy to the issue of textâinteger encodings is the issue of numeric literalâinteger encodings. Meaning, if you write down the digits 1000
, we have no idea if it is decimal, octal, binary, or hexadecimal. In this case, the literal is 1000
, and the codec is the numeric base. In Swift, we use 0x
, 0o
, 0b
to distinguish between all the different integer encodings.
let n1:Int = 1000 // decimal literal, n1 â 1000 {base ten}
let n2:Int = 0b1000 // binary literal, n2 â 8 {base ten}
let n3:Int = 0o1000 // octal literal n3 â 512 {base ten}
let n4:Int = 0x1000 // hex literal n4 â 4096 {base ten}
following this precedent, we should reserve 'a'
exclusively for Character
, and assign prefixes u'a'
, a'a'
for Unicode.Scalar
and ASCII literals. A natural extension to EBCDIC would then be e'a'
for example.
let c0:String = "a" // string literal, c0 â "a"
let c1:Character = 'a' // character literal, c1 â 'a'
let c2:Unicode.Scalar = u'a' // codepoint literal, c2 â U'a'
let c3:UInt8 = a'a' // ASCII literal, c3 â 97
let c4:UInt8 = e'a' // EBCDIC literal, c4 â 129
Iâm aware prefixed string literals were rejected a while back when we were talking about raw string syntax, but I think there are good reasons to adopt it for single-quoted âsingle-elementâ text literals:
Swift already prints Unicode.Scalar
values with the prefix U'
, and it would be nice if the literal syntax aligned with the debug description syntax.
We would have an unambiguous way to write both Character
and Unicode.Scalar
values now, as opposed to just Character
literals (proposal as written), or neither (status quo).
Itâs easily extensible to accommodate alternative characterâinteger mappings like EBCDIC
Itâs concise and clear to read, especially with appropriate syntax highlighting, so weâd get
let fontFeature = (a'k', a'e', a'r', a'n')`
instead of
let fontFeature = (UInt8(ascii: "k"), UInt8(ascii: "e"), UInt8(ascii: "r"), UInt8(ascii: "n"))`
.
There are some totally valid concerns here, but I want to address some misunderstandings about Swift and Unicode.
Swift firmly establishes that String and related types are Unicode and not some other standard.
Unicode is a âUniversal Character Encodingâ. It is a mapping between âcharactersâ (cough not in the grapheme cluster sense) and numbers starting from 0. This assignment to a specific number is the crux of the standard. The elements of this universal encoding we call Unicode is âcode pointsâ (cough or Unicode scalar values for only the valid ones).
A âUnicode Encodingâ is an encoding of the universal encoding (or a subset of it), which may have alternate numbers (e.g. EBCDIC) or more complex details (e.g. uses a smaller-width representation). The elements of such an encoding are called âcode unitsâ.
Nothing in this proposal is attempting to bake in anything about particular âcode unitsâ from some particular Unicode encoding, rather it is addressing Unicode itselfâs element type, âcode pointâ.
(This is all pretty confusing.)
Iâm not presenting an argument that Swift syntax should operate under some implicit conversion between a syntactic construct such as a literal and a number. Thatâs the purpose of this review thread and I understand that people can disagree for valid reasons. Iâm just trying to dispel some of the FUD around mixing up Unicode and some particular Unicode encoding.
Code points are integer values. The code point âGâ is inherently 71 in Unicode. The idea that digit 8 is the same thing as number 56 is the point of character encodings.
Swift supports Unicode, but does not mandate a particular Unicode encoding. EBCDIC is not a subset of Unicode as decoding it to Unicode involves remapping some values.
(Again, this is not an argument that Swiftâs particular syntactic construct for code points should produce Swiftâs number types, just that these kinds of encoding-related concerns are not relevant)
This proposal does not change that.
Swift forces Unicode on us, and then Unicode forces ASCII on us. Unicode by explicit design is a superset of ASCII. From the standard:
While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCIIâs limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the worldâmore than 1 million characters can be encoded.
UTF-8 is a red herring here. Weâre talking about Unicode itself, i.e. code points not code units. If 0x61 were to map to âxâ, that wouldnât be Unicode, and if it isnât Unicode, it isnât Swift.
Any encoding thatâs not literally compatible with ASCII (i.e. without decoding) is not literally compatible with Unicode. Such encodings might be âUnicode encodingsâ (encoding of an encoding), meaning that they need to go through a process of decoding in order to be literally equivalent to ASCII/Unicode.
Could you elaborate? For many tasks, pattern matching over String.utf8 is exactly what you should be doing.
I do agree that arithmetic feels odd and out place for these literals. I feel like most of the utility comes from equality comparisons and pattern matching.
@taylorswift @johnno1962, did you explore the impact of overloads for ~=
and ==
? I donât know if this would cause more type checking issues in practice. (cc @xedin)
Alternatively, are there any other options for excluding these from operators? I donât recall exactly how availability works with overload resolution (@xedin?), but would it be possible to have some kind of unavailable/obsoleted/prefer-me-but-donât-compile-me overloads for arithmetic operators that take ExpressibleByUnicodeScalarLiteral
?
I hear you. Iâve already conceded on the possibility that integer conversions may be sufficiently unacceptable to some that they may not pass in a previous post then there are practical considerations such as this glitch. Itâs good weâre thrashing this out. Iâm still waiting for more people to chime in on whether single quotes for character literals as a use for single quotes in general would be a worthwhile ergonomic improvement to Swift in itself.
I absolutely is off topic here, that's why @taylorswift shouldn't have brought it up as an argument.
Or we just change the way Unicode.Scalar
s are printed.
My comment was directed at no one in particular, but rather to potential future posters in general. I saw no problem with what had been posted so far, which was a reasonable exploration of the area where the subjects intersect. I just knew it had potential to grow out of hand very quickly, and I did not want the review derailed.
I accept this argument and retract my previous argument about alternative encodings.
I would support an ExpressibleByUnicodeScalarLiteral
improvement with new _ExpressibleByBuiltinUnicodeScalarLiteral
conformances:
extension UTF8 {
//get rid of the old typealias to UInt8. Leave UInt8 alone!:
struct CodeUnit: _ExpressibleByBuiltinUnicodeScalarLiteral, ExpressibleByUnicodeScalarLiteral {
//8-bit only, compiler-enforced. Custom types can also use UTF8.CodeUnit as its UnicodeScalarLiteralType:
typealias UnicodeScalarLiteralType = CodeUnit
var value: UInt8
}
}
This would use the well-known double quotes. It would add compiler-enforced 8- and 16-bit code unit types.
It would not pollute Integer APIs at all.
The only problem of course is that changing the Element of String.UTF8View etc. would be a breaking change (It is using UTF8.CodeUnit
). Maybe there needs to be a String.betterUTF8View
(or whatever other name) and the old utf8View
etc. would just be deprecated.
Everyone that wants to mess around with code units can then use types like [UTF8.CodeUnit]
instead of [UInt8]