Single Quoted Character Literals (Why yes, again)

No, the protocol hierarchy is still as I described and the type checker finds the ASCII to string conversions. There is just a hard coding in C++ to flag them as an error now.

I'm trying to find the pitch thread of this previous attempt.

I can only find Three outstanding proposals (August 2019), where someone asked:

Do you have link to the current thread on Single Quote Character Literal?

but didn't receive a reply.

I never re-pitched it being a follow-up on a review that had only just happened. I should clearly have been consensus building and advocating.

Having hopefully diffused the debate about Single Quoted literals being convertible to Strings with a bit of hard coded slight-of-hand. I'd like to start rehearsing arguments for the over-generalised arithmetic being available on the integer values Single Quoted literals can express as it is coming up again and again.

The first thing to note is this form of unconstrained arithmetic on code point values is a feature of four out of the five most popular computer languages used today according to tiobe (aside: seriously? is Python really the most widely use programming language in the world??). It is seen as a legacy concept however and one that Swift is "above" with it's strongly abstract String model.

Perhaps the better defence for trying to introduce integer convertible single quoted literals into Swift is the argument for the presence of UnsafePointers in the language. Something many would never/rather not use but if you need to it's critical that there is an escape hatch available. Look at the code @beccadax mentions (in the Swift compiler project no less.)

Could we create a new ASCII type on which it is possible to only define the "good" operators? I don't think so. Apart from ABI issues involved in introducing a new type, you loose the interpretability with buffers of ints that is a primary motivation. While we could clearly avoid defining multiplication and divide operators some operators (+-) are useful for offsetting and taking the difference of code points so how can we avoid 'x' + 'y' working? I don't see the solution lies there.

I guess in the end you just have to roll with it and accept that it will not be possible to prevent people from writing some absurd expressions in their code if the integer conversions are allowed. I don't believe this form of permissiveness would have too many negative consequences. It is unlikely to crash your app and is not the sort of thing one would type inadvertently. Note, one of the nonsense expressions much discussed during review was 'a'.isMultiple(of: 2). This was never possible as the default type of a Single Quoted literal is Character and that determines which methods are available.

Anyway, let me know if you find these arguments convincing or not.

1 Like

The simplest argument against allowing arithmetic on ASCII characters is “why just ASCII?” What about ISO Latin 1? Or Windows-1252, which is what most text that claims to be ISO Latin 1 is actually encoded in? All Unicode codepoints have a numeric value; why not allow arithmetic directly on UnicodeScalar? Why not EBCDIC for those folks at IBM writing Swift for z/OS?

All of the reasons Swift doesn’t have arithmetic on these character encodings apply equally to ASCII.

(Maybe this discussion should be split off from the pitch thread…)

1 Like

My original preference was for all potentially 20 bit code points but there was considerable push back on this in the review thread due to the multiple encodings Unicode allows for "the same" character. I resisted it for a while but conceded it was best to stick to "just ASCII". I don't know what the solution for EBCDIC would look like (which arithmetic on code points make absolutely no sense at all as letters and numbers do not have sequential code points) and that's not a problem I'm trying to solve at this point.

The utility of Single Quoted literals is partly aesthetic but primarily as a more convenient syntax for UInt8(ascii: "\n"). The arithmetic is an unfortunate side-effect of the integer conversions though it has its uses.

A bigger issue IMO is that a major potential audience for this feature seems to be people working with binary formats, which often use legacy encodings. But the BMP is only compatible with one specific legacy encoding: ISO 8859-1. The aesthetic appeal of 'z' - 'a' masks the complexity of understanding necessary to reason about '€ + 1`. Does it depend on the encoding of the source file? Of the host machine? Of the target? Of the user’s current locale?

How many people would even think to ask these questions? No matter which behavior is chosen, some large number of Windows programmers will think there’s a bug, because half of them will be expecting it to behave like other Windows legacy APIs that assume CP1252, and the other half will expect it to behave like modern, Unicode-aware APIs that incorporate ISO Latin 1.

Bitwise operations do make sense on EBCDIC. But I’m not bringing up EBCDIC because I think the API design for EBCDIC needs to be solved; I’m bringing it up because it’s part of the design space and the API needs to be designed in a way that it can eventually be accommodated.

EBCDIC is like Walter’s ex-wife’s Pomeranian in the Big Lebowski. He’s watching it while his ex is in Hawaii with her boyfriend, so he brings it to the bowling alley.

The Dude [incredulous]: “you brought the Pomeranian bowling?!”
Walter: “I brought it bowling; I’m not renting it shoes. I’m not buying it a beer. He’s not taking your turn.”
The Dude: “If my ex-wife asked me to watch her dog while she and her boyfriend went to Honolulu […]”
Walter: “First of all, Dude, you don’t have an ex.”

Some unfortunate folks out there have to carry EBCDIC the Pomeranian around, and the rest of us don’t understand why because we were never married to its owner, IBM.

1 Like

My sympathies for the IBM folk. They also have the misfortune of being one of the last big-endian 64 bit architectures which must cause them no end of problems but that doesn't mean they should share those problems with us. In the end the only least common denominator one can reach for is ASCII.

Here's a weird idea... since the idea is to produce integers, we could expand the current hex, octal, and binary literals to recognize unicode and/or ascii scalar values:

let x = 0uA // unicode scalar value of "A"
let y = 0aA // ascii value of "A"

But this wouldn't really work for tabs or space characters (among others), so a quoted version should be allowed too:

let x = 0u'A' // unicode scalar value of "A"
let y = 0a'A' // ascii value of "A"

This is just another syntax to write an integer literal. Examples above would produce values as Int, the default integer type, because there's no context to infer another type.

4 Likes

This is an interesting idea if a little unprecedented. For non-printibles you could use \. One problem I guess is typically you'd be using character literals for symbols not letters which isn't going to interact well with the lexer.

Re the other languages, one of the strongest arguments for Swift's strict typing is to avoid the problems that arise in those languages because they let you do things like those that would be added by this proposal.

While you see this as an argument for the proposal, I see it as an argugument against the proposal, one of the strongest. I have years of experience of C/C++ and don't want to go back there.

I haven't thought much about what set of characters would work without quotes. But the quoted version would be necessary for many character values:

let asciiSpace = 0a' '

Maybe it'd be better to always require quotes, I don't know. The basic idea was to make it look like an integer literal so you can't argue that arithmetic is unexpected. But maybe some people will object to this regardless:

assert(0a1 + 0a2 == 99)
// or
assert(0a'1' + 0a'2' == 99)

Edit: arithmetic example changed to be easier to object to.

1 Like

Please don't be melodramatic, nobody is asking you to make that choice. Introducing this accommodation for a particular type of coding in Swift does not deprecate any other aspect of the powerfully abstract Swift String model. The suggested feature is opt-in for those that need it.

It doesn't work like that though. If a feature is added to the language then people will use it. Then whether I want to or not I will encounter it – in sample code, in Swift library code, in code written by other people that I'm asked to work on, have to reason about.

I don't understand why we need an expression like 'a' + 1 in the first place. Wouldn't an expression like 'a'.asciiOffset(1) be sufficient? And if we do so we can avoid 'x' + 'y' from the beginning.
Such operations appear natural because our expected alphabetical order happens to match the order of the ASCII code. Expressions like '*' + '1' are never natural. I think using method is clearer to make it explicit that it is using ASCII order.

1 Like

i’ve gone back and reworked the proposal based on the feedback from this thread. since it is a significant departure from the proposal in its current form, i decided to post it as a new thread. here is the link for anyone interested:

1 Like

Unfortunately this pitch wandered into a legislative labyrinth that I don't have the wit to find the way out of, nor the wisdom to know when to give up. So.. I've started casting about for alternatives and found an extension and a new Array initialiser that I believe solve the bulk of what I was looking to achieve.

First, looking at the "awkward code" @beccadax mentioned:

If you define just this one simple extension: (Edited)

extension FixedWidthInteger {
    /// Basic equality operators
    @_transparent
    public static func == (i: Self, s: Unicode.Scalar) -> Bool {
        return i == s.value
    }
    @_transparent
    public static func != (i: Self, s: Unicode.Scalar) -> Bool {
        return i != s.value
    }
    /// Used in switch statements
    @_transparent
    public static func ~= (s: Unicode.Scalar, i: Self) -> Bool {
        return i == s.value
    }
    /// Maybe useful now and then
    @_transparent
    public static func - (i: Self, s: Unicode.Scalar) -> Self {
        return i - Self(s.value)
    }
}

The code could be transformed from:

    switch self.previous {
    case UInt8(ascii: " "), UInt8(ascii: "\r"), UInt8(ascii: "\n"), UInt8(ascii: "\t"), // whitespace
      UInt8(ascii: "("), UInt8(ascii: "["), UInt8(ascii: "{"),            // opening delimiters
      UInt8(ascii: ","), UInt8(ascii: ";"), UInt8(ascii: ":"),              // expression separators
      0:                          // whitespace / last char in file
      return false

to

    switch self.previous {
    case " ", "\r", "\n", "\t", "(", "[", "{", ",", ";", ":",              // expression separators
      0:                          // whitespace / last char in file
      return false

This would be a candidate for inclusion in the standard library IMHO as it is an additive change that shouldn't involve collateral damage to the language it is so finely targeted. The compiler tests run through and there are only 12 failures where the diagnostic changed in some tests for invalid code. I have further tested 1000 or so Swift packages from the Swift Package Index and didn't see problems.

Coming full circle on how these pitches started out:

It may be better to simply introduce a new initialiser on Arrays of FixedWidthIntegers; something along the lines of:

extension Array where Element: FixedWidthInteger {
    /// Initialise an Integer array of "characters"
    @inline(__always)
    public init(unicode: String, default: UInt32 = 0) {
        self.init(unicode.map {
            let scalars = $0.unicodeScalars
            if scalars.count == 1,
                let v = scalars.first?.value,
                v <= Element.max {
                return Element(v)
            }
            return Element(`default`)
        })
    }
}

So the code above would become:

let hexcodes = [UInt8](unicode: "0123456789abcdef")

Between these two suggestions I believe the majority of use cases I felt were poorly served by current Swift find a solution. I've put together a small Swift Package so you can try these ideas out but would hope we could find a path for these to find their way into the stdlib:

5 Likes

Thanks for pushing this forward. While I'm writing code for at least 6 last years, I'm still confused seeing double quotes on scalars and grapheme cluster literals.

1 Like

I suppose an additional pitch can be made later, e.g. make double quotes a warning in Swift 6 mode

Just to be clear I'm altering this pitch to tack away from changing literal syntax for UnicodeScalars (and Characters) to single quotes (even if that might be nice, and creating new integer conversions using the ExpressableBy protocols on those literals) to using targeted operators to plug what to seem to be a few gaps in the Swift language for low level coding. As an example to show this can tidy code up I've ported swift-syntax's Lexer mentioned above to use these new operators:

I don't know what the performance implications of using a protocol extension are going to be exactly (if someone wants to fill me in on that) but the compiler and tests are still running in the same amount of time (though I don't imagine Lexing is on the critical path)

1 Like