Single Quoted Character Literals (Why yes, again)

If you looked at the code this is prevented by use of UInt8(ascii:) under the covers which has a precondition scalar value < 128 so this would be a compilation error.

1 Like

How would that work without a pattern matching operator accepting the double quoted strings and comparing the UInt8(ascii:) value (unless you're suggesting switching on strings???)

  public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
    return i == UInt8(ascii: s)
  }

Edit: OK, I can see now, you're switching on UnicodeScalars. This is a good option. Let's implement it!

2 Likes

I suppose I failed to provide an implementation that would make clear what I meant:

extension UInt8 {
	var ascii: UnicodeScalar? {
		guard self < 128 else { return nil }
		return UnicodeScalar(self)
	}
}

let value = UInt8(ascii: "Y")
switch value.ascii {
case "Y": print("Found you!")
default: print("👀")
}

Edit: this was written too hastily and is now fixed. I didn't mean it to trap with a precondition, but to return nil when the interger is out of ASCII range. But I suppose this is debatable.

2 Likes

Aren’t preconditions runtime errors?

1 Like

To be honest I don't know. I just checked and disappointingly it's a runtime error. My bad there.

1 Like

The proposed implementation would trap (at runtime and possibly depending on the order of execution).

I'm going to let the thread run for a while now and stop shooting from the hip. I appreciate the replies that are coming through now (for and against)! I'd really like to put this open question to bed either way after 6 years! @michelf's revised suggestion of a nillable property on UInt8 is a very good one.

2 Likes

This sounds like exactly the kind of use case which motivated Swift’s current division between integers and codepoints.

A type that is initializable from UInt8 cannot use the type system to statically enforce that it only holds characters in the range [0, 127]. Wherever you use such a type you invite either a runtime crash, locale-dependent behavior, or disparity with C code running on the same platform. If you go with either the second or third option, you now have a type that can compare against partial UTF-8 codepoint sequences. And if you’re using such a type to parse a mixed stream of human-readable binary and UTF-8 text, such a type invites you to write bugs that accidentally match the wrong bytes.

1 Like

How would that happen pray tell with the operators I've proposed? I am no longer proposing integer conversions to any type but limited type safe comparisons which is another way to solve the problem.

1 Like

You picked the first option.

There is no UInt7 type in Swift and it is unlikely there ever would be. If we had given ASCII literals a unique syntax using say, single quoted literal type, a compile time check might have been possible but I don't have the energy to fight that battle any more. UInt8(ascii: "È") is a run time error in Swift unfortunately. Why should these new affordances be held to a different standard?

There may be other reasons but I can't help wondering if your trenchant opposition to new affordances to ASCII in Swift isn't founded in concerns that they would somehow disfavour support for EBCDIC. That needn't be the case. A starting point would be IBM contributing an implementation of something like UInt8(ebcdic: "A"). Let us ASCII folks move forward into the 1970s.

3 Likes

Because you are proposing to make comparisons between integers and codepoints 0–127 more convenient by hiding the word “ascii” from the developer. There are very reasonable arguments that it should not be easy to do this, because it contributes to the ignorance of English-speaking programmers that other languages exist. That ignorance can manifest as an unexpected program crash when a user types their name (“JosĂ©â€) or it can be much worse, leading to locale-dependent or even undefined behavior.

3 Likes

I'd welcome Int7, Int15 and other "Int 2^n - 1" types to Swift: in some contexts the half range is quite enough for the task at hand and memory usage could be reduced dramatically: e.g. array of Optional<Int63> will take half of the space of array of Optional<Int64> (another option of a similar space reduction if to sacrifices the most negative number to represent nil).

3 Likes

That seems a little overstated. If someone codes UInt8(ascii: "Ă©") in an infrequently traveled code path it will result in a crash but undefined behaviour, seriously? The operators would not bring this crash about due to an a-typical input stream. People who are coding with buffers of bytes need to know what they are doing and the proposed affordance makes their lives slightly easier and their code slightly less of a mess. There is no escaping that text issues require education and a least common denominator is required, I don't have a problem with ASCII being implied for low level code in the same way that Unicode is implied in Swift's String model. We'll agree to disagree on this point at the end of the day.

1 Like

i did a quick search for case 0x in some of my current projects, and found what i think is a prototypical use case for operating on ASCII bytes:

func remainder(hex:UInt8) -> UInt8?
{
    switch hex
    {
    case 0x30 ... 0x39: hex      - 0x30
    case 0x61 ... 0x66: hex + 10 - 0x61
    case 0x41 ... 0x46: hex + 10 - 0x41
    default:            nil
    }
}

i think it is not in dispute that this spelling is just awful. i think that @michelf ’s suggestion is helpful, but insufficient, because it doesn’t solve the “distance to” problem:

func remainder2(hex:UInt8) -> UInt8?
{
    switch Unicode.Scalar.init(hex)
    {
    case "0" ... "9": hex      - 0x30
    case "a" ... "f": hex + 10 - 0x61
    case "A" ... "F": hex + 10 - 0x41
    default:          nil
    }
}

i don’t like that this is still subtracting from integer literals.

(by the way, both of these compile to the exact same machine code on Swift 5.10. how far we have come from the 4.x days!)

in my mind there are two simple additions we can make to Michel’s idea that would get this snippet to something satisfactory:

  1. add a distance(to:) method to Unicode.Scalar
  2. make Unicode.Scalar expressible by a single-quoted literal

then we could write

func remainder2(hex:UInt8) -> UInt8?
{
    let digit:Unicode.Scalar = .init(hex)
    switch digit
    {
    case "0" ... "9": return '0'.distance(to: digit)
    case "a" ... "f": return 'a'.distance(to: digit) + 10
    case "A" ... "F": return 'A'.distance(to: digit) + 10
    default:          return nil
    }
}
1 Like

Indeed. Neither of us has to convince each other; the only people who ever truly need to be convinced of anything are the Language Workgroup.

But it doesn't look that awful to me when expressed like this:

func remainder2(hex:UInt8) -> UInt8?
{
    switch Unicode.Scalar.init(hex)
    {
    case "0" ... "9": hex      - UInt8(ascii: "0")
    case "a" ... "f": hex + 10 - UInt8(ascii: "a")
    case "A" ... "F": hex + 10 - UInt8(ascii: "A")
    default:          nil
    }
}

Ok, sure, hex - UInt8(ascii: "0") is a bit verbose, but is it really worse than a hypothetical '0'.distance(to: digit)?

2 Likes

Looking at this from a different angle: how about these new constants would be simply Int?

'Abcd' or perhaps 0_Abcd (to follow the 0x 0o 0b tradition) would be indistinguishable from 0x41626364. The character set could be restricted to ascii only.

I would dispute that. For my taste, this spelling is better than any spelling in terms of character/string literals because it makes explicit that you are depending on properties of the encoding (that these runs of characters have specific values and are laid out consecutively).

2 Likes

but there are a ton of places i can find where i have a hex literal that represents some ASCII character, and while there might be some value in making the reliance on the numeric encoding explicit, i just don’t think that’s worth having to fire up python3 and run hex(ord('=')) every time i want to do some operation on an ASCII character.

2 Likes

For single-character constants you could use something like this:

enum AsciiChar: UInt8, Hashable, Comparable {
    case d0 = 0x30
    case d9 = 0x39
    case a = 0x61
    case f = 0x66
    case A = 0x41
    case F = 0x46
    
    static func < (lhs: Self, rhs: Self) -> Bool {
        lhs.rawValue < rhs.rawValue
    }
    static func - (lhs: UInt8, rhs: Self) -> UInt8 {
        lhs - rhs.rawValue
    }
}

func remainder(hex: UInt8) -> UInt8? {
    switch AsciiChar(rawValue: hex)! {
        case .d0 ... .d9: hex - .d0
        case .a ... .f: hex + 10 - .a
        case .A ... .F: hex + 10 - .A
        default: nil
    }
}
1 Like