Single Quoted Character Literals (Why yes, again)

johnno1962 · March 10, 2024, 11:16am

There's an operator for that™️. It's just a question of where you draw the line for inclusion in the stdlib based on how frequently it would be used.

extension UInt8 {
  ...
  public static func - (i: Self, s: Unicode.Scalar) -> Self {
    return i - UInt8(ascii: s)
  }
}

From memory, it broke a surprising number of Swift's tests as well so thought it best to remove it.

tera · March 10, 2024, 1:05pm

Could this be done with a macro?

#ascii("a")    // 0x61
#ascii("Abcd") // 0x41626364

johnno1962 · March 10, 2024, 1:13pm

Yes, but verrry slowwwwly.. Macros open up a world of new possibilities but let's not forget you could do that with a function!

Edit: This is interesting though as a macro is a form of constexpr evaluated in the compiler. You could have a #uInt8Ascii("È") macro used inside UInt8(ascii:) or the operators I propose that would provide an error at compilation time

Another Edit: This isn't how macros work when compiled into other code and you can't use macros in the stdlib anyway but there is a germ of something interesting in there. If only preconditions could be checked at compile time.

tera · March 12, 2024, 3:40am

Search for "fourCharacterCode" as an example.

We could use the following "decision tree":

need: we need this feature | we don't need this feature
char literals: single | multi
ascii only: yes | no (then what?)
notation: 'Abcd' | #macroName("Abcd") | 0_Abcd
type: UInt8 | UnicodeScalar | else
design: a simple enum (single char literal only) | macro | a new language feature

My preference on this one would be 1. yes, 2. multi, 3. ascii only, 4, 'Abcd' and 5. UInt8, 6. a new language feature. However the macro approach is not so bad.

nnnnnnnn · March 13, 2024, 4:54pm

I just read through the proposal – thanks for continuing to push for a solution in this area! I do believe there's a version of this that could improve the experience of working with ASCII comparisons.

I think it would help if the proposal presented the intended semantics of the proposed operators before presenting the implementation, since there are some potentially controversial aspects to the implementation that may not be necessary. From what I can see, there are at least two routes you could take:

Allow ASCII comparisons between UInt8 and (existing syntax) single-character string literals.
Alternatively, provide general UInt8 comparisons with UnicodeScalar values, even for values > 128. This design might be less surprising for users, who would otherwise see:
```
0x65 == "e"      // true
0x65 == "\u{65}" // true
0xE9 == "é"      // false (or a runtime error)
0xE9 == "\u{E9}" // false (or a runtime error)
```
At the same time, there's less clarity with this approach, since for values >= 128, you no longer have the fact of the Unicode scalar value being equal to the UTF-8 encoded value. A value of 0x65 pulled from a Data instance can be interpreted as "e", but the same can't be said for higher UInt8 values (there are some assumptions in this statement, as well).

With either of these goals, I think it's possible to bring the implementation into alignment without some of the danger it currently poses. Some notes:

The operator functions, as currently defined, will trap if used with non-ASCII Unicode scalars. If the goal is to provide ASCII comparisons, comparing any non-ASCII Unicode scalar could simply return false instead of trapping. If not limited to ASCII, anything over 255 could return false.
Since these are heterogeneous operators, the additions should provide overloads that allow comparison in both directions: 0x65 == "e" and "e" == 0x65.
~~I wouldn't expect the Optional<UInt8> overloads to be necessary – did you find that lifting non-optional values for the comparisons didn't happen?~~

Edit: I see that this is doing – lifting to Optional only helps for equatable types. Funnily enough, our existing heterogeneous operators don't provide these optional overloads, so you get:
```
let x: Int = 5
let y: Int? = 5
let z: UInt8? = 5
x == y // true
x == z // error: value of optional type 'UInt8?' must be unwrapped to a value of type 'UInt8'
```
(I guess it's up for discussion whether we need them here and/or if we should add them for optional integers as well.)
The two additional initializers (for FixedWidthInteger and arrays of integers) don't seem to serve the purpose of enabling comparisons. What's the motivation for these inclusions? Again, I don't think it makes sense to trap on invalid input here, so if these were to be included, I think they would be better as failable initializers.

johnno1962 · March 15, 2024, 8:14pm

Thanks for saying this. For me it's unfinished business bordering on an obsession after all this time but I may have reached the point where I can make a Swift Package available that could satisfy the various constituencies. The principle motivation is we need to find an alternative to this sort of abomination:

johnno1962:

    switch self.previous {
     case UInt8(ascii: " "), UInt8(ascii: "\r"), UInt8(ascii: "\n"), UInt8(ascii: "\t"),  // whitespace
       UInt8(ascii: "("), UInt8(ascii: "["), UInt8(ascii: "{"),  // opening delimiters
       UInt8(ascii: ","), UInt8(ascii: ";"), UInt8(ascii: ":"),  // expression separators

Using a few well chosen operators this is possible and there is also the novel alternative of a .ascii property on UInt8 suggested by @michelf though that could never trap if you tried to compare a literal that wasn't in the ASCII range. Your point that not trapping may be better is valid but that leaves you in a situation where there is little hope of picking up what is invalid code. It should trap but only for a DEBUG build IMO. You can't do that if the operators where added to the standard library but you can with a package so I think this package may be a better alternative:

The package contains both @michelf 's approach if you import ASCII and the operators approach if you import ImplicitASCII so people get know where the magic is coming from. It took a while experimenting with where to put @inlinable or @_transparent etc. but the final product is performant testing with various branches of swift-syntax (property approach and operators approach). The Implicit approach will give a fatal error on invalid non-ascii cases or comparisons for a DEBUG build and ignores them for a release which seems the best of both worlds. The property approach is marginally faster but the swift-syntax project's tests are not passing which I've not investigated as yet.

It would be better if invalid comparisons would be an error at compile time but that needs support in the compiler and, to avoid breaking existing code pretty soon you need to start looking at pressing single quotes into service with integer conversions to UInt8 . I looked at that and the changes aren't that onerous but the main obstacle is the current legislative log jam after the last review (which is an interesting read after all this time). There is a a toolchain available if anyone would like to try it out.

Anyway, a bit more data for you. My current position is a Package would be a perfectly fine way to solve this problem since it is possible to get it to be performant which matters for this type of code. Inside the package my preference is for the operators (Ironically I was reviewing the original review and I put them forward in 2019 but nobody picked up on the idea - not even me it would seem). Using a package, people would be able to use either approach or even import a Package for other character encodings.

Edit: In answer to a couple of your questions:

Consider the FixedWidthInteger(unicode:) initialiser "bonus content" I slipped in to provide an alternative to UInt8(ascii:) (which so often has to be subsequently casted) expanded out for unicode values. I've made it a nil-able constructor if the value overflows. I've removed the Array initialiser as it was problematic for embedded systems and people can always write their own.

I didn't consider it to be necessary. Adding still more overloads to comparison operators imposes load on the TypeChecker slowing it down for compiling all code however slightly.

Dmitriy_Ignatyev · March 30, 2024, 12:12pm

I would argue that this code has bad readability. For high level developers it is close to assembler. I personally also don’t want to remember codePoint–asciiChar mapping tables keeping them in my head or google them every time. ASCII literals make code easy to understand. It is a compiler responsibility to convert human readable code to machine code and do other low level stuff.

jpmhouston · April 7, 2024, 1:24pm

Am I the only one who dislikes "x" being able to represent a single Unicode.Scalar and then especially, comparable to an int? This seems like a category error to me, something that would make the language worse. (/does make? I can't recall how much of this special case for a single element string is in the language already)

Let a string be a string, I want distinct syntax to represent elements of a string. The syntax 'x' is a term of art in C-derived languages to mean an element of a string, Swift should use it.

johnno1962 · April 11, 2024, 8:16pm

I quite agree. While I've explored adding comparison operators as a means of writing code that works on byte streams of text that isn't terrible as a tactical measure with Swift as it is today, I still believe pressing single quotes into service as an alternative syntax for elements of a string (anything up to a character) is the way forward. This distinct syntax could have special conformances allowing expressibility of UInt8 only if the literal happens to be ASCII. I've kept exploring this in a private PR on the compiler Character syntax by johnno1962 · Pull Request #17 · johnno1962/swift · GitHub which remains quite straightforward.

The TL;DR of how the implementation works is it injects a new @_marker "compiler protocol" ExpressibleByASCIIScalarLiteral at the head of the existing string literal protocol hierarchy ExpressibleByStringLiteral: ExpressibleByExtendedGraphemeClusterLiteral: ExpressibleByUnicodeScalarLiteral. ExpressibleByASCIIScalarLiteral is only activated when a single quoted literal is a single ASCII value. The second part (which can be reviewed separately) is a conformance of UInt8 only to that protocol:

extension UInt8: ExpressibleByASCIIScalarLiteral,
  _ExpressibleByBuiltinUnicodeScalarLiteral {

  @_transparent @_alwaysEmitIntoClient
  public init(_builtinUnicodeScalarLiteral value: Builtin.Int32) {
    self = UInt8(UInt32(value))
  }
}

The advantages of the approach explored in the PR are:

Perhaps Swift's double quotes for everything is overly consistent and it would be helpful to distinguish and provide errors early for single quoted literals which are not Characters.
This would move Swift back into line with practically every other language which makes the distinction.
The change is additive and not source breaking. Swift's 10,000+ tests all compile and run.
A distinct syntax allows for the introduction of a focused expressibility to UInt8 that would provide a compilation time error if the literal was not ASCII.
Using a @_marker protocol it is possible to make this feature available without ABI issues or backporting drama.

At present we have valid technical approach, something something, improvement to Swift. The missing part in the middle I can't provide.

Passing some random code run though a toolchain from the PR to discuss:

        let u = UInt8(ascii: "b")
        let v: UInt8 = 'c'
        let e = UInt8(ascii: "È") // runtime error (current behaviour)
        let f: UInt8 = 'È' // compile time error
        let c: Character = '👩🏼‍🚀'
        let d = 'ab' // error, not Character
        let j: UInt8 = 'e' + 5
        let k = u - 'a'
        let l = 'a' * 'l' // compile time error (overflow)
        let m = 'a' * 'È' // compile time error
        let n: [UInt8] = ['a', 'b', 'c']
        let h = j == 'j'
        switch u {
        case 'a' ... 'z':
            print("LETTER")
        case 'È': // compile time error
            print("WTF?")
        default:
            print("DEFAULT")
        }