Prepitch: Character integer literals


(^) #279

There is no ExpressibleByASCIILiteral for the same reason there is no ExpressibleByUInt32Literal or ExpressibleByUInt16Literal. You get the ASCII restriction by specifying the type of the argument in Self.init(unicodeScalarLiteral:T) to be one of the integer types among the many choices of Self.UnicodeScalarLiteralType. I agree this is a very confusing system, but it’s how Swift’s literal protocols currently work, and changing that would be a much bigger change than this proposal aims to be. If someone is making custom conformances to the literal protocols, we can assume they are pretty well versed in the intricacies of the type system, so I don’t think this would be a problem.


(Xiaodi Wu) #280

I know how Swift’s literal protocols currently work; I’m explicitly suggesting that they would be made less confusing, and the ASCII-restricted behavior of a numeric type expressible by a character literal more obvious, by changing that for character literals. I think this proposal should aim to make that bigger change.

If Swift’s character literal protocols were not already designed this way, it is implausible that one would ask for two different protocols for three different behaviors. For backwards compatilibility reasons, we can’t coalesce them into one protocol even though integer literals work that way in Swift. The only sensible design left is to have distinct protocols, not pretending that UInt8 is actually expressible by a Unicode scalar but not expressible by an extended grapheme cluster in the same way that Unicode.Scalar is, which in this design direction it really isn’t.


(Jeremy David Giesbrecht) #281

I philosophically agree with you, but ABI stability has been a big concern so it may not be practical anymore.

The wording of compiler error message is an alternate way to make the same thing clear. “ is not ASCII” should already be enough to quell any questions about why it didn’t work when "e" did. If they move on to the question of “Why shouldn’t it work?”, then ExpressibleByASCIILiteral wouldn’t really answer that for them either.

The idea of an ExpressibleByASCIILiteral could still be mentioned in the proposal. I do like it if it is ABI‐viable. It would be a small enough and likely uncontested difference that the core team can always “accept with revisions” to add or subtract it from the implementation based on their better understanding of its ABI impact. Ask for it without demanding and invite them to decide for themselves.


(^) #282

the impact would be that ExpressibleByUnicodeScalarLiteral (and indirectly ExpressibleByExtendedGraphemeClusterLiteral) would now inherit from from ExpressibleByASCIILiteral instead of being a base protocol. These requirements could be satisfied by default implementation, but i don’t know the ABI impact of that.


(John Holdsworth) #283

These are the constraints that guided the new implementation. Aside from being more complicated if we introduced ExpressibleByASCIILiteral at this stage it would also break the deal we made with ourselves to leave the door open for Option 2.


(^) #284

can we please just get this to review? this thread has been going in circles for a year now


(Jeremy David Giesbrecht) #285

Not really. That would be no more difficult than adding conformances for Int types to ExpressibleByUnicodeScalarLiteral. In fact, that set‐up would allow you to trivially add such a conformance in your own module if you understood what you were doing and wanted to live dangerously. The way we have it now, the conformance is already there and in the way, so you cannot do it yourself. (...Well, though uglier, you could add a conformance to the character version to work around it, so maybe never mind.)

But more people are thinking this than are outgoing enough to say it:


(^) #286

it’s not that, it’s the fact that we’re changing ExpressibleByUnicodeScalarLiteral from a base protocol to an inherited one. this means the requirements for ExpressibleByUnicodeScalarLiteral, and every literal protocol that inherits it, are different now. to users, there’s no change, since the new requirements all get default implementations, but again, idk how this affects ABI


(Jeremy David Giesbrecht) #287

I meant the present change would not make John’s desired future addition any more difficult at that future time.

I understand the difference it makes at the present.


(Xiaodi Wu) #288

On the contrary, I think this thread is just starting to go in some interesting directions. Given that it won’t make it into Swift 5.0, I don’t see any reason to cut off such fruitful discussion.


(Tino) #289

I have no idea what merit continuing this thread may have or if anyone reads all those messages, but as it is tagged as a pitch and even labeled as "prepitch", I think it's sensible to move forward by creating a new topic.

Also, imho it would be a good idea to start this with a introduction to the fundamental problems of encodings (I guess there are some good resources that could be linked), and their impact on Swift:
It's quite hard to find information about source file encoding for swiftc, and swiftc --help doesn't even mention what encoding it expects.


(^) #290

about midway through the thread we added exactly such a section, but we removed it because people said it “wasn’t relevant”. the “background and terminology” section is a vestige of that originally much longer background section.


(Jean-Daniel) #291

And what the use case for that ? Your int will not be usable with any UTF-8 byte buffer, which is the only representation using UInt8 actually.


(Tino) #292

SE is hard... ;-) but afaics, there's not even a link to the current proposal text in the first post (imho that alone should be reason enough to start a fresh thread), and it's tedious to find this deleted section.
I have no strong opinion how detailed the "basics" should be addressed in the proposal text, but I think it would be beneficial to have at least a link to a page that explains all those points that may seem trivial for experts, but still have potential to confuse less experienced readers.

Of course, there are some resources that explain why strings in Swift are so much more "complicated" than in other languages, but I'm not aware of any official document (mhh, looking at recent entries from https://swift.org/blog/: Maybe someone with write access could post an updated version of https://www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html there? ;-)


(John Holdsworth) #293

The original proposal document you’re looking for is here. It’s Unicode that is hard and thankfully we are able to avoid a detailed discussion about it since we limited the integer conversion feature to ASCII (a.k.a. Option 4). Swift does a great job of hiding the vagaries of Unicode under an clean abstraction. We’re not proposing anything that would change that other than separating Character/Unicode.Scalar literals from String literals which allows us to add making integers expressible by the new character literals for ASCII characters.


(David Waite) #294

To emphasize your point, even the source document (where this character literal is present) is in an encoding, and also risks being normalized or denormalized by other tools.

With ASCII, the risk there is that a tool might make code that didn't compile before work, or code which did compile before fail.

Extending beyond ASCII, the primary risk is that people will get the codes wrong (e.g. not understand that the character literal will be interpreted as Latin 1/UCS-2/UTF-32). I do not know if source files in swift can be in non-UTF8 encoding, where this becomes even more of a confusing issue for the developer.

With UCS-2/UTF-32, the risk is that a valid scalar will be interpreted at compile time as a different valid scalar. Initializing a Unicode.Scalar from a literal is a risk but one people could be expected to understand considering they are working with literal initialization of unicode scalars.

IMHO, it's far easier just to limit it to ASCII only, and (probably more arguably) [U]Int8 only.


(Frank Swarbrick) #295

FWIW, the IBM Toolkit for Swift on z/OS allows for source code in any of 3 different EBCDIC code pages.

As an EBCDIC user this whole thread makes me a bit antsy, but I don't specifically know if it will be unusable or hard to use on an EBCDIC based OS.


(Jeremy David Giesbrecht) #296

Doesn’t the following assertion hold when compiled on z/OS right now?

assert("a".unicodeScalars.first?.value == 0x61)

If so, nothing will really change. Since 'a' will be treated the same way, nothing will be unusable and source code will be completely compatible between the two systems. This assertion will also hold:

assert('a' == 0x61)

If that is not actually how you want it to behave, then please elaborate.


(Frank Swarbrick) #297

Yes, the above assertion holds on z/OS. My issue is that there might be a case where I want to call a z/OS subsystem service or API that uses an EBCDIC codepage as its native encoding. Let's say it expects to be called with an 8-byte EBCDIC character string. If only ASCII/UTF-8 is supported then I can't just do the following:

let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])
Or perhaps even:
let r = zOSAPI1('TEST ')

Because that would pass an array of 8 ASCII characters, not an array of 8 EBCDIC characters.

That being said, I don't know how realistic this example is. Certainly there are many z/OS systems that have this type of API. But to make them useful and more Swifty one would want to write Swift wrappers around them anyway, where we'd just use Swift strings and then convert the strings to EBCDIC character arrays within the wrapper, using the Foundation supplied String.Encoding method(s). So what is the likelihood of a user application calling the API directly like this with a literal? Perhaps not all that likely.

One issue I have with this entire thread is the example uses cases are very small and perhaps not all that realistic either. I'd like to see some more fleshed out uses cases for me to be able to think about how they might work in an EBCDIC world.

Perhaps I'm making a mountain out of a mole hill. I don't know...


(Jeremy David Giesbrecht) #298

...sigh...

The dangers of that sort of method are part of why this pitch exists:

let bytes: [UInt8] = ebcdic37("â") // 0x42

The above code would crash at runtime if the “â” got decomposed while the source file was being handled. The compiler would instead have created the Unicode sequence [U+0041, U+0302] which fails conversion to EBCDIC 37: [0x81, invalid]

But I guess you would be insulated as long as the source file itself stayed in the EBCDIC encoding, so you are not as vulnerable as the rest of us.


I don’t know how to make that sort of thing possible in both of two different encodings at the compiler level. But you could still do it at runtime. In pseudocode:

func zOSAPI1(_ parameter: [Character]) {
    let safetyCheckedScalars = parameter.map({ $0.precomposedStringWithCanonicalMapping.unicodeScalars.first! })
    let bytes = safetyCheckedScalars.map({ $0.ebcdic37Value })
    zOS.zOSAPI1(bytes) // ← Call the actual primitive.
}
let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])