Prepitch: Character integer literals


(Xiaodi Wu) #288

On the contrary, I think this thread is just starting to go in some interesting directions. Given that it won’t make it into Swift 5.0, I don’t see any reason to cut off such fruitful discussion.


(Tino) #289

I have no idea what merit continuing this thread may have or if anyone reads all those messages, but as it is tagged as a pitch and even labeled as "prepitch", I think it's sensible to move forward by creating a new topic.

Also, imho it would be a good idea to start this with a introduction to the fundamental problems of encodings (I guess there are some good resources that could be linked), and their impact on Swift:
It's quite hard to find information about source file encoding for swiftc, and swiftc --help doesn't even mention what encoding it expects.


(^) #290

about midway through the thread we added exactly such a section, but we removed it because people said it “wasn’t relevant”. the “background and terminology” section is a vestige of that originally much longer background section.


(Jean-Daniel) #291

And what the use case for that ? Your int will not be usable with any UTF-8 byte buffer, which is the only representation using UInt8 actually.


(Tino) #292

SE is hard... ;-) but afaics, there's not even a link to the current proposal text in the first post (imho that alone should be reason enough to start a fresh thread), and it's tedious to find this deleted section.
I have no strong opinion how detailed the "basics" should be addressed in the proposal text, but I think it would be beneficial to have at least a link to a page that explains all those points that may seem trivial for experts, but still have potential to confuse less experienced readers.

Of course, there are some resources that explain why strings in Swift are so much more "complicated" than in other languages, but I'm not aware of any official document (mhh, looking at recent entries from https://swift.org/blog/: Maybe someone with write access could post an updated version of https://www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html there? ;-)


(John Holdsworth) #293

The original proposal document you’re looking for is here. It’s Unicode that is hard and thankfully we are able to avoid a detailed discussion about it since we limited the integer conversion feature to ASCII (a.k.a. Option 4). Swift does a great job of hiding the vagaries of Unicode under an clean abstraction. We’re not proposing anything that would change that other than separating Character/Unicode.Scalar literals from String literals which allows us to add making integers expressible by the new character literals for ASCII characters.


(David Waite) #294

To emphasize your point, even the source document (where this character literal is present) is in an encoding, and also risks being normalized or denormalized by other tools.

With ASCII, the risk there is that a tool might make code that didn't compile before work, or code which did compile before fail.

Extending beyond ASCII, the primary risk is that people will get the codes wrong (e.g. not understand that the character literal will be interpreted as Latin 1/UCS-2/UTF-32). I do not know if source files in swift can be in non-UTF8 encoding, where this becomes even more of a confusing issue for the developer.

With UCS-2/UTF-32, the risk is that a valid scalar will be interpreted at compile time as a different valid scalar. Initializing a Unicode.Scalar from a literal is a risk but one people could be expected to understand considering they are working with literal initialization of unicode scalars.

IMHO, it's far easier just to limit it to ASCII only, and (probably more arguably) [U]Int8 only.


(Frank Swarbrick) #295

FWIW, the IBM Toolkit for Swift on z/OS allows for source code in any of 3 different EBCDIC code pages.

As an EBCDIC user this whole thread makes me a bit antsy, but I don't specifically know if it will be unusable or hard to use on an EBCDIC based OS.


(Jeremy David Giesbrecht) #296

Doesn’t the following assertion hold when compiled on z/OS right now?

assert("a".unicodeScalars.first?.value == 0x61)

If so, nothing will really change. Since 'a' will be treated the same way, nothing will be unusable and source code will be completely compatible between the two systems. This assertion will also hold:

assert('a' == 0x61)

If that is not actually how you want it to behave, then please elaborate.


(Frank Swarbrick) #297

Yes, the above assertion holds on z/OS. My issue is that there might be a case where I want to call a z/OS subsystem service or API that uses an EBCDIC codepage as its native encoding. Let's say it expects to be called with an 8-byte EBCDIC character string. If only ASCII/UTF-8 is supported then I can't just do the following:

let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])
Or perhaps even:
let r = zOSAPI1('TEST ')

Because that would pass an array of 8 ASCII characters, not an array of 8 EBCDIC characters.

That being said, I don't know how realistic this example is. Certainly there are many z/OS systems that have this type of API. But to make them useful and more Swifty one would want to write Swift wrappers around them anyway, where we'd just use Swift strings and then convert the strings to EBCDIC character arrays within the wrapper, using the Foundation supplied String.Encoding method(s). So what is the likelihood of a user application calling the API directly like this with a literal? Perhaps not all that likely.

One issue I have with this entire thread is the example uses cases are very small and perhaps not all that realistic either. I'd like to see some more fleshed out uses cases for me to be able to think about how they might work in an EBCDIC world.

Perhaps I'm making a mountain out of a mole hill. I don't know...


(Jeremy David Giesbrecht) #298

...sigh...

The dangers of that sort of method are part of why this pitch exists:

let bytes: [UInt8] = ebcdic37("â") // 0x42

The above code would crash at runtime if the “â” got decomposed while the source file was being handled. The compiler would instead have created the Unicode sequence [U+0041, U+0302] which fails conversion to EBCDIC 37: [0x81, invalid]

But I guess you would be insulated as long as the source file itself stayed in the EBCDIC encoding, so you are not as vulnerable as the rest of us.


I don’t know how to make that sort of thing possible in both of two different encodings at the compiler level. But you could still do it at runtime. In pseudocode:

func zOSAPI1(_ parameter: [Character]) {
    let safetyCheckedScalars = parameter.map({ $0.precomposedStringWithCanonicalMapping.unicodeScalars.first! })
    let bytes = safetyCheckedScalars.map({ $0.ebcdic37Value })
    zOS.zOSAPI1(bytes) // ← Call the actual primitive.
}
let r = zOSAPI1(['T', 'E', 'S', 'T', ' ', ' ', ' '. ' '])

(Ben Cohen) #299

Hi everyone,

In prepping this proposal for review, we've encountered a snag in the implementation that will need to be addressed. Once Swift is in the OS, all new types shipped in the OS will be subject to availability via @available annotations. However, Swift does not yet have a way of annotating a conformance with availability. This is a capability that could be added to the language in future, but does not exist right now.

With new protocols, this is not a problem: the protocol itself will have availability, and therefore any conformances will too. Likewise new structs will have availability so can conform to existing protocols. But this won't apply to new conformances of existing types to existing protocols. The pitch proposes adding exactly this: of conforming existing integer types to existing ExpressibleBy... protocols.

This means we will need to adapt the proposal to accommodate this. The core team's preferred approach would be that we subset the proposal to just introducing the 'x' syntax for the scalar and grapheme literals. This addresses the pain point of having to give Character literals type context. It will also allow users to conform the integer types themselves via a short extension. Not as convenient as having it built in to the standard library, to be sure, but not too taxing. Once we have the ability to apply conformances retroactively, we could then put them in the standard library too.

Let me know your thoughts,
Ben

p.s. the ability to declare availability on a conformance is certainly something the core team expects to see added to the language in future versions of Swift – any thoughts on the design/implementation of that feature would be welcome over on the compiler development forum.


@availability on conformances
(^) #300

The core of the proposal is the statically checked ascii literals,, the character and unicode scalar literal syntax is just a side benefit. the first thing isn’t possible in user-defined extensions, the second is just a syntactic difficulty from not having enough type context. taking the second without the first feels like cutting out the cake for the icing.

how long should we expect before this gets unblocked?


#301

As long as the static checking for ASCII literals is confined to single-quoted characters, what’s the problem?

protocol ExpressibleBySingleQuotedCharacterLiteralOrWhatever { … }

extension Int: ExpressibleBy… { … }

let x: Int = 'a'    // 97

(^) #302

there can be no static checks in user extensions?


(Ben Cohen) #303

So, even though you cannot add availability to the conformance, you can add the functions that fulfill the conformance. So it would be possible to add the init(unicodeScalarLiteral:) in the proposal, guarded by availability. If it's within that function that the static checks happen, this would still allow users to add the conformance to Int8 themselves while getting the compile-time warning. It's worth an experiment to confirm.


(^) #304

the tightest that unicodeScalarLiteral can be constrained to is Unicode.Scalar, which says nothing about whether the scalar value (which is 32 bits long) is actually an ascii scalar. Are you suggesting we add Int8 to the list of allowed Self.UnicodeScalarLiteralType associatedtypes?


(Ben Cohen) #305

I haven't looked at this part of the implementation PR, but how was the desired feature of a compile-time error being handled previously?

The question is, if everything except the actual conformance of Int8 to ExpressibleByUnicodeScalarLiteral were implemented, but the conformance itself left to the user (sidestepping the need for availability), would you still get that compile-time feedback? It's worth running a quick experiment to confirm (and if it doesn't work, seeing if the implementation can be phrased slightly differently so it does work).


(^) #306

It would be possible only if Int8 were conformed to _ExpressibleByBuiltinUnicodeScalarLiteral because this protocol is what constrains Self.UnicodeScalarLiteralType. I can’t do this test because that would involve modifying a builtin compiler protocol. This is the only way that would make it possible to write an initializer for UInt8 and Int8 that takes a Int8 statically-checked input and conforms them to ExpressibleByUnicodeScalarLiteral.


(Ben Cohen) #307

The test I'm describing involves changing the standard library i.e. modify the PR to still add the needed initializers, but just don't add the conformances, and instead try adding them from outside the std lib using that compiler and see if all the expected behaviors still work.

Now, it's possible that approach won't work with _ExpressibleByBuiltinUnicodeScalarLiteral since it's special (you certainly can't write your own implementation of the init because it uses a Builtin.Int32 – but maybe if that's already there you can add your own conformance). It's also more dubious a practice to encourage adding conformance to an underscored protocol. But it's worth an experiment at least.