Prepitch: Character integer literals

johnno1962 · January 11, 2019, 10:44pm

These are the constraints that guided the new implementation. Aside from being more complicated if we introduced ExpressibleByASCIILiteral at this stage it would also break the deal we made with ourselves to leave the door open for Option 2.

taylorswift · January 11, 2019, 11:50pm

can we please just get this to review? this thread has been going in circles for a year now

SDGGiesbrecht · January 11, 2019, 11:57pm

Not really. That would be no more difficult than adding conformances for Int types to ExpressibleByUnicodeScalarLiteral. In fact, that set‐up would allow you to trivially add such a conformance in your own module if you understood what you were doing and wanted to live dangerously. The way we have it now, the conformance is already there and in the way, so you cannot do it yourself. (...Well, though uglier, you could add a conformance to the character version to work around it, so maybe never mind.)

But more people are thinking this than are outgoing enough to say it:

taylorswift · January 12, 2019, 12:14am

it’s not that, it’s the fact that we’re changing ExpressibleByUnicodeScalarLiteral from a base protocol to an inherited one. this means the requirements for ExpressibleByUnicodeScalarLiteral, and every literal protocol that inherits it, are different now. to users, there’s no change, since the new requirements all get default implementations, but again, idk how this affects ABI

SDGGiesbrecht · January 12, 2019, 1:06am

I meant the present change would not make John’s desired future addition any more difficult at that future time.

I understand the difference it makes at the present.

xwu · January 13, 2019, 10:02am

On the contrary, I think this thread is just starting to go in some interesting directions. Given that it won’t make it into Swift 5.0, I don’t see any reason to cut off such fruitful discussion.

Tino · January 13, 2019, 11:07am

I have no idea what merit continuing this thread may have or if anyone reads all those messages, but as it is tagged as a pitch and even labeled as "prepitch", I think it's sensible to move forward by creating a new topic.

Also, imho it would be a good idea to start this with a introduction to the fundamental problems of encodings (I guess there are some good resources that could be linked), and their impact on Swift:
It's quite hard to find information about source file encoding for swiftc, and swiftc --help doesn't even mention what encoding it expects.

taylorswift · January 13, 2019, 2:01pm

about midway through the thread we added exactly such a section, but we removed it because people said it “wasn’t relevant”. the “background and terminology” section is a vestige of that originally much longer background section.

Jean-Daniel · January 13, 2019, 7:21pm

And what the use case for that ? Your int will not be usable with any UTF-8 byte buffer, which is the only representation using UInt8 actually.

Tino · January 14, 2019, 12:42pm

SE is hard... ;-) but afaics, there's not even a link to the current proposal text in the first post (imho that alone should be reason enough to start a fresh thread), and it's tedious to find this deleted section.
I have no strong opinion how detailed the "basics" should be addressed in the proposal text, but I think it would be beneficial to have at least a link to a page that explains all those points that may seem trivial for experts, but still have potential to confuse less experienced readers.

Of course, there are some resources that explain why strings in Swift are so much more "complicated" than in other languages, but I'm not aware of any official document (mhh, looking at recent entries from https://swift.org/blog/: Maybe someone with write access could post an updated version of mikeash.com: Friday Q&A 2015-11-06: Why is Swift's String API So Hard? there? ;-)

johnno1962 · January 14, 2019, 1:03pm

The original proposal document you’re looking for is here. It’s Unicode that is hard and thankfully we are able to avoid a detailed discussion about it since we limited the integer conversion feature to ASCII (a.k.a. Option 4). Swift does a great job of hiding the vagaries of Unicode under an clean abstraction. We’re not proposing anything that would change that other than separating Character/Unicode.Scalar literals from String literals which allows us to add making integers expressible by the new character literals for ASCII characters.

dwaite · January 15, 2019, 6:54pm

To emphasize your point, even the source document (where this character literal is present) is in an encoding, and also risks being normalized or denormalized by other tools.

With ASCII, the risk there is that a tool might make code that didn't compile before work, or code which did compile before fail.

Extending beyond ASCII, the primary risk is that people will get the codes wrong (e.g. not understand that the character literal will be interpreted as Latin 1/UCS-2/UTF-32). I do not know if source files in swift can be in non-UTF8 encoding, where this becomes even more of a confusing issue for the developer.

With UCS-2/UTF-32, the risk is that a valid scalar will be interpreted at compile time as a different valid scalar. Initializing a Unicode.Scalar from a literal is a risk but one people could be expected to understand considering they are working with literal initialization of unicode scalars.

IMHO, it's far easier just to limit it to ASCII only, and (probably more arguably) [U]Int8 only.

fswarbrick · January 16, 2019, 12:25am

FWIW, the IBM Toolkit for Swift on z/OS allows for source code in any of 3 different EBCDIC code pages.

As an EBCDIC user this whole thread makes me a bit antsy, but I don't specifically know if it will be unusable or hard to use on an EBCDIC based OS.

SDGGiesbrecht · January 16, 2019, 1:04am

Doesn’t the following assertion hold when compiled on z/OS right now?

assert("a".unicodeScalars.first?.value == 0x61)

If so, nothing will really change. Since 'a' will be treated the same way, nothing will be unusable and source code will be completely compatible between the two systems. This assertion will also hold:

assert('a' == 0x61)

If that is not actually how you want it to behave, then please elaborate.

fswarbrick · January 16, 2019, 1:28am

Yes, the above assertion holds on z/OS. My issue is that there might be a case where I want to call a z/OS subsystem service or API that uses an EBCDIC codepage as its native encoding. Let's say it expects to be called with an 8-byte EBCDIC character string. If only ASCII/UTF-8 is supported then I can't just do the following:

let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])
Or perhaps even:
let r = zOSAPI1('TEST ')

Because that would pass an array of 8 ASCII characters, not an array of 8 EBCDIC characters.

That being said, I don't know how realistic this example is. Certainly there are many z/OS systems that have this type of API. But to make them useful and more Swifty one would want to write Swift wrappers around them anyway, where we'd just use Swift strings and then convert the strings to EBCDIC character arrays within the wrapper, using the Foundation supplied String.Encoding method(s). So what is the likelihood of a user application calling the API directly like this with a literal? Perhaps not all that likely.

One issue I have with this entire thread is the example uses cases are very small and perhaps not all that realistic either. I'd like to see some more fleshed out uses cases for me to be able to think about how they might work in an EBCDIC world.

Perhaps I'm making a mountain out of a mole hill. I don't know...

SDGGiesbrecht · January 16, 2019, 2:12am

...sigh...

The dangers of that sort of method are part of why this pitch exists:

let bytes: [UInt8] = ebcdic37("â") // 0x42

The above code would crash at runtime if the “â” got decomposed while the source file was being handled. The compiler would instead have created the Unicode sequence [U+0041, U+0302] which fails conversion to EBCDIC 37: [0x81, invalid]

But I guess you would be insulated as long as the source file itself stayed in the EBCDIC encoding, so you are not as vulnerable as the rest of us.

I don’t know how to make that sort of thing possible in both of two different encodings at the compiler level. But you could still do it at runtime. In pseudocode:

func zOSAPI1(_ parameter: [Character]) {
    let safetyCheckedScalars = parameter.map({ $0.precomposedStringWithCanonicalMapping.unicodeScalars.first! })
    let bytes = safetyCheckedScalars.map({ $0.ebcdic37Value })
    zOS.zOSAPI1(bytes) // ← Call the actual primitive.
}
let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])

Ben_Cohen · February 26, 2019, 9:14pm

Hi everyone,

In prepping this proposal for review, we've encountered a snag in the implementation that will need to be addressed. Once Swift is in the OS, all new types shipped in the OS will be subject to availability via @available annotations. However, Swift does not yet have a way of annotating a conformance with availability. This is a capability that could be added to the language in future, but does not exist right now.

With new protocols, this is not a problem: the protocol itself will have availability, and therefore any conformances will too. Likewise new structs will have availability so can conform to existing protocols. But this won't apply to new conformances of existing types to existing protocols. The pitch proposes adding exactly this: of conforming existing integer types to existing ExpressibleBy... protocols.

This means we will need to adapt the proposal to accommodate this. The core team's preferred approach would be that we subset the proposal to just introducing the 'x' syntax for the scalar and grapheme literals. This addresses the pain point of having to give Character literals type context. It will also allow users to conform the integer types themselves via a short extension. Not as convenient as having it built in to the standard library, to be sure, but not too taxing. Once we have the ability to apply conformances retroactively, we could then put them in the standard library too.

Let me know your thoughts,
Ben

p.s. the ability to declare availability on a conformance is certainly something the core team expects to see added to the language in future versions of Swift – any thoughts on the design/implementation of that feature would be welcome over on the compiler development forum.

taylorswift · February 26, 2019, 9:32pm

The core of the proposal is the statically checked ascii literals,, the character and unicode scalar literal syntax is just a side benefit. the first thing isn’t possible in user-defined extensions, the second is just a syntactic difficulty from not having enough type context. taking the second without the first feels like cutting out the cake for the icing.

how long should we expect before this gets unblocked?

Nevin · February 26, 2019, 10:03pm

As long as the static checking for ASCII literals is confined to single-quoted characters, what’s the problem?

protocol ExpressibleBySingleQuotedCharacterLiteralOrWhatever { … }

extension Int: ExpressibleBy… { … }

let x: Int = 'a'    // 97

taylorswift · February 26, 2019, 10:21pm

there can be no static checks in user extensions?