If you looked at the code this is prevented by use of UInt8(ascii:) under the covers which has a precondition scalar value < 128 so this would be a compilation error.
How would that work without a pattern matching operator accepting the double quoted strings and comparing the UInt8(ascii:) value (unless you're suggesting switching on strings???)
public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
return i == UInt8(ascii: s)
}
Edit: OK, I can see now, you're switching on UnicodeScalars. This is a good option. Let's implement it!
I suppose I failed to provide an implementation that would make clear what I meant:
extension UInt8 {
var ascii: UnicodeScalar? {
guard self < 128 else { return nil }
return UnicodeScalar(self)
}
}
let value = UInt8(ascii: "Y")
switch value.ascii {
case "Y": print("Found you!")
default: print("đ")
}
Edit: this was written too hastily and is now fixed. I didn't mean it to trap with a precondition, but to return nil
when the interger is out of ASCII range. But I suppose this is debatable.
Arenât preconditions runtime errors?
To be honest I don't know. I just checked and disappointingly it's a runtime error. My bad there.
The proposed implementation would trap (at runtime and possibly depending on the order of execution).
I'm going to let the thread run for a while now and stop shooting from the hip. I appreciate the replies that are coming through now (for and against)! I'd really like to put this open question to bed either way after 6 years! @michelf's revised suggestion of a nillable property on UInt8 is a very good one.
This sounds like exactly the kind of use case which motivated Swiftâs current division between integers and codepoints.
A type that is initializable from UInt8 cannot use the type system to statically enforce that it only holds characters in the range [0, 127]
. Wherever you use such a type you invite either a runtime crash, locale-dependent behavior, or disparity with C code running on the same platform. If you go with either the second or third option, you now have a type that can compare against partial UTF-8 codepoint sequences. And if youâre using such a type to parse a mixed stream of human-readable binary and UTF-8 text, such a type invites you to write bugs that accidentally match the wrong bytes.
How would that happen pray tell with the operators I've proposed? I am no longer proposing integer conversions to any type but limited type safe comparisons which is another way to solve the problem.
You picked the first option.
There is no UInt7 type in Swift and it is unlikely there ever would be. If we had given ASCII literals a unique syntax using say, single quoted literal type, a compile time check might have been possible but I don't have the energy to fight that battle any more. UInt8(ascii: "Ă")
is a run time error in Swift unfortunately. Why should these new affordances be held to a different standard?
There may be other reasons but I can't help wondering if your trenchant opposition to new affordances to ASCII in Swift isn't founded in concerns that they would somehow disfavour support for EBCDIC. That needn't be the case. A starting point would be IBM contributing an implementation of something like UInt8(ebcdic: "A")
. Let us ASCII folks move forward into the 1970s.
Because you are proposing to make comparisons between integers and codepoints 0â127 more convenient by hiding the word âasciiâ from the developer. There are very reasonable arguments that it should not be easy to do this, because it contributes to the ignorance of English-speaking programmers that other languages exist. That ignorance can manifest as an unexpected program crash when a user types their name (âJosĂ©â) or it can be much worse, leading to locale-dependent or even undefined behavior.
I'd welcome Int7
, Int15
and other "Int 2^n - 1"
types to Swift: in some contexts the half range is quite enough for the task at hand and memory usage could be reduced dramatically: e.g. array of Optional<Int63>
will take half of the space of array of Optional<Int64>
(another option of a similar space reduction if to sacrifices the most negative number to represent nil).
That seems a little overstated. If someone codes UInt8(ascii: "Ă©") in an infrequently traveled code path it will result in a crash but undefined behaviour, seriously? The operators would not bring this crash about due to an a-typical input stream. People who are coding with buffers of bytes need to know what they are doing and the proposed affordance makes their lives slightly easier and their code slightly less of a mess. There is no escaping that text issues require education and a least common denominator is required, I don't have a problem with ASCII being implied for low level code in the same way that Unicode is implied in Swift's String model. We'll agree to disagree on this point at the end of the day.
i did a quick search for case 0x
in some of my current projects, and found what i think is a prototypical use case for operating on ASCII bytes:
func remainder(hex:UInt8) -> UInt8?
{
switch hex
{
case 0x30 ... 0x39: hex - 0x30
case 0x61 ... 0x66: hex + 10 - 0x61
case 0x41 ... 0x46: hex + 10 - 0x41
default: nil
}
}
i think it is not in dispute that this spelling is just awful. i think that @michelf âs suggestion is helpful, but insufficient, because it doesnât solve the âdistance toâ problem:
func remainder2(hex:UInt8) -> UInt8?
{
switch Unicode.Scalar.init(hex)
{
case "0" ... "9": hex - 0x30
case "a" ... "f": hex + 10 - 0x61
case "A" ... "F": hex + 10 - 0x41
default: nil
}
}
i donât like that this is still subtracting from integer literals.
(by the way, both of these compile to the exact same machine code on Swift 5.10. how far we have come from the 4.x days!)
in my mind there are two simple additions we can make to Michelâs idea that would get this snippet to something satisfactory:
- add a
distance(to:)
method toUnicode.Scalar
- make
Unicode.Scalar
expressible by a single-quoted literal
then we could write
func remainder2(hex:UInt8) -> UInt8?
{
let digit:Unicode.Scalar = .init(hex)
switch digit
{
case "0" ... "9": return '0'.distance(to: digit)
case "a" ... "f": return 'a'.distance(to: digit) + 10
case "A" ... "F": return 'A'.distance(to: digit) + 10
default: return nil
}
}
Indeed. Neither of us has to convince each other; the only people who ever truly need to be convinced of anything are the Language Workgroup.
But it doesn't look that awful to me when expressed like this:
func remainder2(hex:UInt8) -> UInt8?
{
switch Unicode.Scalar.init(hex)
{
case "0" ... "9": hex - UInt8(ascii: "0")
case "a" ... "f": hex + 10 - UInt8(ascii: "a")
case "A" ... "F": hex + 10 - UInt8(ascii: "A")
default: nil
}
}
Ok, sure, hex - UInt8(ascii: "0")
is a bit verbose, but is it really worse than a hypothetical '0'.distance(to: digit)
?
Looking at this from a different angle: how about these new constants would be simply Int
?
'Abcd'
or perhaps 0_Abcd
(to follow the 0x
0o
0b
tradition) would be indistinguishable from 0x41626364. The character set could be restricted to ascii only.
I would dispute that. For my taste, this spelling is better than any spelling in terms of character/string literals because it makes explicit that you are depending on properties of the encoding (that these runs of characters have specific values and are laid out consecutively).
but there are a ton of places i can find where i have a hex literal that represents some ASCII character, and while there might be some value in making the reliance on the numeric encoding explicit, i just donât think thatâs worth having to fire up python3
and run hex(ord('='))
every time i want to do some operation on an ASCII character.
For single-character constants you could use something like this:
enum AsciiChar: UInt8, Hashable, Comparable {
case d0 = 0x30
case d9 = 0x39
case a = 0x61
case f = 0x66
case A = 0x41
case F = 0x46
static func < (lhs: Self, rhs: Self) -> Bool {
lhs.rawValue < rhs.rawValue
}
static func - (lhs: UInt8, rhs: Self) -> UInt8 {
lhs - rhs.rawValue
}
}
func remainder(hex: UInt8) -> UInt8? {
switch AsciiChar(rawValue: hex)! {
case .d0 ... .d9: hex - .d0
case .a ... .f: hex + 10 - .a
case .A ... .F: hex + 10 - .A
default: nil
}
}