Prepitch: Character integer literals

taylorswift · November 20, 2018, 7:10pm

I think we’re getting two things mixed up here. The choice of naming is completely irrelevant to the functionality and changes we’re trying to make to the language here. Changing the protocol names will also have no ABI impact, as we would only be adding two protocols (...CodepointLiteral, ...CharacterLiteral) and keeping the old two protocols (...UnicodeScalarLiteral, ...ExtendedGraphemeClusterLiteral) exactly the same in the ABI. The legacy protocols would then effectively die when we phase out the double-quotes syntax (another ABI-irrelevant change) since they would only apply to double quoted unicode scalar and character literals. Their entry points could stay in the ABI for all anyone cares, but they would be effectively unreachable to modern Swift users, which is what we want, to avoid cluttering up the API.

Implementing the functionality on the other hand is necessarily ABI breaking, at least if we’re set on doing it in-place on {...UnicodeScalarLiteral, ...ExtendedGraphemeClusterLiteral}. It is possible to make {Int32, UInt32, Int64, UInt64, Int, UInt} expressible by codepoint literals without touching the protocol system,, in fact you can actually do it today with the double-quote syntax. This is not possible for {Int8, UInt8, Int16, UInt16} without tinkering with the protocols. which is horrible because these four types probably make up 99% of the use cases for codepoint literals.

there’s also the issue of removing API cruft from the protocols. The first two tables here sum it up,, the idea is to make each constraint disjoint, which cleans up both the API and the implementation a lot, and gets rid of the confusing mess of overlapping conformances we have today. This is just housekeeping though, and not really central to the feature we’re trying to add.

The main advantage I see to using new protocol names is that we basically get to design them as if the old literal protocol design didn’t exist. This is only workable because we’re also introducing a new syntax ('a' vs "a") at the same time, so we can tie the new binary interface to the new syntax, and sweep the old binary interface under the rug with the old syntax. This gives us a chance to “break ABI” without really breaking ABI. I think this is a lot more convenient than embarking on a (quixotic) push to change the ABI stability policy of the entire language.