Single Quoted Character Literals (Why yes, again)

johnno1962 · March 8, 2024, 6:47pm

If you looked at the code this is prevented by use of UInt8(ascii:) under the covers which has a precondition scalar value < 128 so this would be a compilation error.

johnno1962 · March 8, 2024, 6:51pm

michelf:

Personally, I'd go for an even simpler solution. We could add an ascii property on UInt8 converting it to a UnicodeScalar? as a counterpart to UInt8(ascii:). Then you can write this:
switch self.previous.ascii {
case " ", "\r", "\n", "\t", "(", "[", "{", ",", ";", ":", "\0":
  return false

How would that work without a pattern matching operator accepting the double quoted strings and comparing the UInt8(ascii:) value (unless you're suggesting switching on strings???)

  public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
    return i == UInt8(ascii: s)
  }

Edit: OK, I can see now, you're switching on UnicodeScalars. This is a good option. Let's implement it!

michelf · March 8, 2024, 6:56pm

I suppose I failed to provide an implementation that would make clear what I meant:

extension UInt8 {
	var ascii: UnicodeScalar? {
		guard self < 128 else { return nil }
		return UnicodeScalar(self)
	}
}

let value = UInt8(ascii: "Y")
switch value.ascii {
case "Y": print("Found you!")
default: print("👀")
}

Edit: this was written too hastily and is now fixed. I didn't mean it to trap with a precondition, but to return nil when the interger is out of ASCII range. But I suppose this is debatable.

ksluder · March 8, 2024, 6:56pm

Aren’t preconditions runtime errors?

johnno1962 · March 8, 2024, 7:01pm

To be honest I don't know. I just checked and disappointingly it's a runtime error. My bad there.

johnno1962 · March 8, 2024, 7:20pm

The proposed implementation would trap (at runtime and possibly depending on the order of execution).

I'm going to let the thread run for a while now and stop shooting from the hip. I appreciate the replies that are coming through now (for and against)! I'd really like to put this open question to bed either way after 6 years! @michelf's revised suggestion of a nillable property on UInt8 is a very good one.

ksluder · March 8, 2024, 7:24pm

This sounds like exactly the kind of use case which motivated Swift’s current division between integers and codepoints.

A type that is initializable from UInt8 cannot use the type system to statically enforce that it only holds characters in the range [0, 127]. Wherever you use such a type you invite either a runtime crash, locale-dependent behavior, or disparity with C code running on the same platform. If you go with either the second or third option, you now have a type that can compare against partial UTF-8 codepoint sequences. And if you’re using such a type to parse a mixed stream of human-readable binary and UTF-8 text, such a type invites you to write bugs that accidentally match the wrong bytes.

johnno1962 · March 9, 2024, 1:58pm

How would that happen pray tell with the operators I've proposed? I am no longer proposing integer conversions to any type but limited type safe comparisons which is another way to solve the problem.

ksluder · March 9, 2024, 6:14pm

You picked the first option.

johnno1962 · March 9, 2024, 6:55pm

There is no UInt7 type in Swift and it is unlikely there ever would be. If we had given ASCII literals a unique syntax using say, single quoted literal type, a compile time check might have been possible but I don't have the energy to fight that battle any more. UInt8(ascii: "È") is a run time error in Swift unfortunately. Why should these new affordances be held to a different standard?

There may be other reasons but I can't help wondering if your trenchant opposition to new affordances to ASCII in Swift isn't founded in concerns that they would somehow disfavour support for EBCDIC. That needn't be the case. A starting point would be IBM contributing an implementation of something like UInt8(ebcdic: "A"). Let us ASCII folks move forward into the 1970s.

ksluder · March 9, 2024, 7:00pm

Because you are proposing to make comparisons between integers and codepoints 0–127 more convenient by hiding the word “ascii” from the developer. There are very reasonable arguments that it should not be easy to do this, because it contributes to the ignorance of English-speaking programmers that other languages exist. That ignorance can manifest as an unexpected program crash when a user types their name (“José”) or it can be much worse, leading to locale-dependent or even undefined behavior.

tera · March 9, 2024, 7:09pm

I'd welcome Int7, Int15 and other "Int 2^n - 1" types to Swift: in some contexts the half range is quite enough for the task at hand and memory usage could be reduced dramatically: e.g. array of Optional<Int63> will take half of the space of array of Optional<Int64> (another option of a similar space reduction if to sacrifices the most negative number to represent nil).

johnno1962 · March 9, 2024, 7:30pm

That seems a little overstated. If someone codes UInt8(ascii: "é") in an infrequently traveled code path it will result in a crash but undefined behaviour, seriously? The operators would not bring this crash about due to an a-typical input stream. People who are coding with buffers of bytes need to know what they are doing and the proposed affordance makes their lives slightly easier and their code slightly less of a mess. There is no escaping that text issues require education and a least common denominator is required, I don't have a problem with ASCII being implied for low level code in the same way that Unicode is implied in Swift's String model. We'll agree to disagree on this point at the end of the day.

taylorswift · March 9, 2024, 11:19pm

i did a quick search for case 0x in some of my current projects, and found what i think is a prototypical use case for operating on ASCII bytes:

func remainder(hex:UInt8) -> UInt8?
{
    switch hex
    {
    case 0x30 ... 0x39: hex      - 0x30
    case 0x61 ... 0x66: hex + 10 - 0x61
    case 0x41 ... 0x46: hex + 10 - 0x41
    default:            nil
    }
}

i think it is not in dispute that this spelling is just awful. i think that @michelf ’s suggestion is helpful, but insufficient, because it doesn’t solve the “distance to” problem:

func remainder2(hex:UInt8) -> UInt8?
{
    switch Unicode.Scalar.init(hex)
    {
    case "0" ... "9": hex      - 0x30
    case "a" ... "f": hex + 10 - 0x61
    case "A" ... "F": hex + 10 - 0x41
    default:          nil
    }
}

i don’t like that this is still subtracting from integer literals.

(by the way, both of these compile to the exact same machine code on Swift 5.10. how far we have come from the 4.x days!)

in my mind there are two simple additions we can make to Michel’s idea that would get this snippet to something satisfactory:

add a distance(to:) method to Unicode.Scalar
make Unicode.Scalar expressible by a single-quoted literal

then we could write

func remainder2(hex:UInt8) -> UInt8?
{
    let digit:Unicode.Scalar = .init(hex)
    switch digit
    {
    case "0" ... "9": return '0'.distance(to: digit)
    case "a" ... "f": return 'a'.distance(to: digit) + 10
    case "A" ... "F": return 'A'.distance(to: digit) + 10
    default:          return nil
    }
}

ksluder · March 9, 2024, 11:19pm

Indeed. Neither of us has to convince each other; the only people who ever truly need to be convinced of anything are the Language Workgroup.

michelf · March 9, 2024, 11:41pm

But it doesn't look that awful to me when expressed like this:

func remainder2(hex:UInt8) -> UInt8?
{
    switch Unicode.Scalar.init(hex)
    {
    case "0" ... "9": hex      - UInt8(ascii: "0")
    case "a" ... "f": hex + 10 - UInt8(ascii: "a")
    case "A" ... "F": hex + 10 - UInt8(ascii: "A")
    default:          nil
    }
}

Ok, sure, hex - UInt8(ascii: "0") is a bit verbose, but is it really worse than a hypothetical '0'.distance(to: digit)?

tera · March 10, 2024, 12:31am

Looking at this from a different angle: how about these new constants would be simply Int?

'Abcd' or perhaps 0_Abcd (to follow the 0x 0o 0b tradition) would be indistinguishable from 0x41626364. The character set could be restricted to ascii only.

scanon · March 10, 2024, 12:39am

I would dispute that. For my taste, this spelling is better than any spelling in terms of character/string literals because it makes explicit that you are depending on properties of the encoding (that these runs of characters have specific values and are laid out consecutively).

taylorswift · March 10, 2024, 12:48am

but there are a ton of places i can find where i have a hex literal that represents some ASCII character, and while there might be some value in making the reliance on the numeric encoding explicit, i just don’t think that’s worth having to fire up python3 and run hex(ord('=')) every time i want to do some operation on an ASCII character.

tera · March 10, 2024, 12:54am

For single-character constants you could use something like this:

enum AsciiChar: UInt8, Hashable, Comparable {
    case d0 = 0x30
    case d9 = 0x39
    case a = 0x61
    case f = 0x66
    case A = 0x41
    case F = 0x46
    
    static func < (lhs: Self, rhs: Self) -> Bool {
        lhs.rawValue < rhs.rawValue
    }
    static func - (lhs: UInt8, rhs: Self) -> UInt8 {
        lhs - rhs.rawValue
    }
}

func remainder(hex: UInt8) -> UInt8? {
    switch AsciiChar(rawValue: hex)! {
        case .d0 ... .d9: hex - .d0
        case .a ... .f: hex + 10 - .a
        case .A ... .F: hex + 10 - .A
        default: nil
    }
}