Prepitch: Character integer literals


(^) #1

In C, 'a', is a char literal, equivalent to 97. Swift has no such equivalent, requiring awkward spellings like ("a" as Unicode.Scalar).value, which may or may not need additional casts to convert it to the right integer type. Or worse, spelling out the values in hex or decimal directly. This harms readability of code.

static char const hexcodes[16] = 
{
    '0', '1', '2', '3', '4' ,'5', '6', '7', '8', '9', 
    'a', 'b', 'c', 'd', 'e', 'f'
};
let hexcodes:[UInt8] = 
    [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102] 
    // what do these numbers mean???

i propose we use single quotes (') as an alternate spelling of integer literals, where the value is the unicode scalar value of the character. This is a logical and useful extension of the C model, in Swift we would now have '”' == 912, and so we could use char literals for larger integer types as well.


SE-0200: "Raw" mode string literals
SE-0200: "Raw" mode string literals
#2

My first thoughts are that this seems like a niche use case which probably isn’t worth reserving single quotes for (there’s still a lot of design work around String for the future which could probably use them), and that it might be confusing that it doesn’t match the Swift definition of a Character. Could this be handled with a slightly less awkward spelling that isn’t as concise, like an instance method with an appropriate name? Or a macro-like # prefixed function that took a string and converted it into an array of [UInt8] or similar?


Pure Bikeshedding: Raw Strings (why yes, again!)
#3

Is there a reason you can’t just extend whatever type your project uses to be ExpressibleByStringLiteral?


(^) #4

that’s what i thought at first but I found myself looking up ASCII tables so often that i’ve come to believe this is something that deserves to have syntax in the language. as for reserving single quotes, i’ve heard this excuse so many times, but no one has ever proposed or even has an idea of what single quotes would even be used for in such a context, whereas integer char literals have precedent in C, C++, and Java. i can’t think of a more appropriate

also # macros, eww


(^) #5

in my opinion, extending an external type to conform to an external protocol is a horrible idea


#6

I think your particular project, whatever it is, is probably not representative of the general population if you’re using ASCII integer codes that frequently. There have been multiple discussions about the use of single quotes on the mailing list, e.g. controlling escaping behaviour. While you’re correct that many C-adjacent languages use single quotes for this, it’s far from universal. Some examples, from a quick search of relatively popular languages (forgive any errors):

  • Ruby, PHP: Single quotes disable string interpolation and most escaping.
  • Python, Javascript: No significant difference between single and double quotes (except which one needs to be escaped within the string).

(Chris Lattner) #7

I’m not arguing this is great, but FYI, you can spell this as UInt8(ascii: “a”) today.


(^) #8

the thing is half the time you want to use these things as the right hand side of a comparison, and having an expression on the right hand side just isn’t a good look

if u == '@' 
{
    ...
} 
else if u == '#'
{
    ...
} 
...

vs

if u == UInt8(ascii: "@") // what is going on here? 
// does the compiler even know this is a constant?
{
    ...
}
else if u == UInt8(ascii: "#")
{
    ...
} 
...

(Xiaodi Wu) #9

Yes, it does.


(Tim Buchheim) #10

If I were doing a lot of work where ASCII constants would be useful I’d probably do something like this:

enum ASCII {
	static let NUL = 0
	static let SOH = 1
	static let STX = 2
	// …
	static let _0 = 48
	static let _1 = 49
	// …
	static let at = 64
	static let A = 65
	static let B = 66
	// …
	static let a = 97
	// …
}

and use let foo = [ASCII.F, ASCII.o, ASCII.o]

Well, actually I’d probably just define enum A to save some typing: let foo = [A.F, A.o, A.o]

But really I’d want to re-think why I was using so many ASCII constants in my code to begin with. If I had so many that I need to shorten them like this I probably just want to read in an external data file instead.


(^) #11

yes it probably does, but as many people have found recently, the answer to the general question “does the compiler know this is a constant” is “we really don’t know for sure”


(Xiaodi Wu) #12

I’m not sure what you mean. The answer is yes, it does.

Source:

let x = UInt8(ascii: "@")

Assembly output (-O):

main:
  push rbp
  mov rbp, rsp
  mov byte ptr [rip + output.x : Swift.UInt8], 64
  xor eax, eax
  pop rbp
  ret

output.x : Swift.UInt8:

(^) #13

running --emit-assembly and counting the movs is a learning exercise, not a workflow.


(Tony Allevato) #14

Swift already has precedent for treating untyped double-quoted string literals as different types via the ExpressibleBy{String,Character,UnicodeScalar}Literal protocols, instead of introducing different quotes.

So, if we want to pursue this, the consistent way would be to define a new compiler-known protocol, ExpressibleByASCIICodeUnitLiteral, have UInt8 conform to it, and then update the handling of string literals in the compiler to make them type-check correctly, detecting errors at compile-time instead of runtime:

func foo(_ x: UInt8) { ... }

foo("a")   // works
foo("È")   // error: U+00C9 is not a valid ASCII code unit
foo("ab")  // error: the literal has more than one code unit

(^) #15

this is pretty close to what i’m proposing i only use single quotes because i think double quotes are a little overloaded in the language currently


(Xiaodi Wu) #16

Still not sure what you’re getting at. You asked a yes-or-no question and I answered it; I believe my answer is correct, and I showed my work.


(^) #17

yes and you’ve convinced me that the compiler knows that in this case (at least for now barring a regression) but anyone else not reading this thread would have to repeat the experiment for themselves to figure this out. and you haven’t convinced me that will happen in any context, since the compiler’s thought process is pretty much opaque to anyone who’s not a swift compiler dev and we’re all too familiar with the compiler making strange and nonobvious decisions

either way we’re really making a mountain out of what was originally a small aside as this prepitch is really more about ergonomics


(Chris Lattner) #18

This can be made to work, but the logical endpoint of this is that UInt16 would allow code points that fit into a UTF16 size and UInt32 would allow any code point. Is that desirable? I could see how this could be confusing to some:

let x : Int = "f"

One nice thing about the C approach with single quotes is that it makes it clear what is going on, and it would allow defining a new default type for:

 let x = 'x'

which would clearly be Character.

-Chris


(Tony Allevato) #19

I think the argument could be made that we want to provide APIs that satisfy the most common needs of users without worrying about taking things to their logical end. Unscientifically, my gut tells me that processing ASCII text that arrives in the form of [UInt8] or Data probably qualifies as more valuable to have as shorthand than UTF-16 or UTF-32.

Another option would be to introduce a unique ASCIICodeUnit type that is essentially a 7-bit unsigned integer instead of extending UInt8 directly, but that might make interop with UInt8s more verbose elsewhere.

I wouldn’t expect that to be supported; for clarity, we’d probably only want to support specific-size integers, if support for more than UInt8 was up for debate. But that’s just my opinion and attempt at drawing the line somewhere.

By itself, a new quoting scheme specifically for Character feels like a separate issue than the one in the OP about making it cleaner to write ASCII literals that can be inferred as UInt8. Unless, are you suggesting that the scheme could be stretched such single quotes would default to Character but also be inferred as other types representing “singular” text entities (ASCII UInt8, Unicode.Scalar) and double quotes would only be inferred as sequences of those entities (String, StaticString)?


(Xiaodi Wu) #20

If we’re going to enable uses like

func foo(_ x: UInt8) { ... }
foo("a")   // works

then I think that’d be the way to go.

It looks very confusing that you can supply what looks like a string literal to a function that takes a numeric argument. Unless I’m mistaken, the Swift compiler can already work with builtin types such as Int7 internally, and the LLVM intrinsics are there to support casting without issue. With such a type, your hypothetical function would be self-documenting:

func foo(_ x: ASCII.CodeUnit) { ... }
foo("a") // of course this works