Prepitch: Character integer literals

@taylorswift, @johnno1962, what are your thoughts on things like '\u{200D}', interpolations, multi-line and raw, for Character literals? It's conceivable that Character's conformance could construct from raw scalar values or interpolations that way. Off the cuff, I'd say this is a "rejected" direction, because users can always use the double quotes to access all of that.

For the content of the pitch itself, I feel like it can be distilled down into its essence:

This can just mention that we have awesome String literals (which are continuing to get increasingly awesome), but it's also common in programming to want to use things that appears as characters to users in numeric contexts, using their Uncode scalar value (example: C chars).

The bytestrings concept is totally something we'll be exploring more in the future, but seems very out of place in this pitch because we're not pitching bytestrings. We can just drop it.

We can just drop this entire section. The discussion of encodings is unrelated, terminology used can just be standard terminology, canonical equivalence is unrelated, bytestrings is an unrelated concept except as a literary device for the motivation section, same for machine strings, etc.

Again, can drop bytestrings concept and encoding validity discussion, which is unrelated to this pitch except as a motivator. The motivation is simply that it's common to want to use the visual representation of a numeric value in code when that corresponds to a character.

One of the future directions for String (a more recent link escapes me, but an old one is here) is to provide performance-sensitive or low-level users with direct access to code units. In that world, it would be much nicer to have numeric-character literals for use in conjunction with this hypothetical future API:

extension String {
  func withCodeUnits(_ f: (UnsafeBufferPointer<UInt8>) throws -> T) rethrows -> T { ... }
}

The value of character literals which can convert to UInt8 for the body of f is hugely motivating compared to raw numbers in code.

If we want to go with the tables in Prepitch: Character integer literals - #180 by Michael_Ilseman, then the proposed solution is fairly straight forward. The tables are pretty self-explanatory, we use single-quote for character literals, and we can list the protocol declarations under "Detailed Design". The deck-chair rearrangement necessary for source compatibility can go under the "Source Compatibility" section to keep it out of the spotlight.

Actually, this has a nice ABI impact of purging all the unnecessary intermediary protocols. We can keep the entry points if this doesn't make the deadline if necessary.

It seems like there have been several ones debated on this thread. You can also mention that we're not going to extend anything fancy like interpolations or scalar values into the character literal syntax.

If you'd like, I can also help drive this proposal because I think it is a compelling future direction for String, if you're willing to wait a few weeks ;-)

7 Likes

Pitch please!

3 Likes

This should absolutely be built on top of compile-time constant expressions and could be done entirely in the library once the feature is implemented and powerful enough. I'd love to see a @compilerEvaluable init(_ url: StaticString) that does exactly what you describe.

8 Likes

yes, no, no, and no. Most codepoints are very hard to type in source, so \u{200D} would definitely be useful. I donā€™t see interpolations and raw being worth it for a character literal. i donā€™t see a use-case for multi-line character literals at all.

The idea is that character literals combined with Arrayā€™s Collection conformance are a good enough bytestring API that we wouldnā€™t need a separate bytestring type anymore.

A lot of confusion and arguing took place earlier in the thread because a lot of people misunderstood or were speaking with different definitions, or were confused about Unicode, so I figured itā€™d be worth it to define them.

That was the original motivation paragraph, but that just caused everyone to argue about ā€œC strings vs Swift stringsā€

Iā€™m confused at that table.

i thought Unicode.Scalar and Character currently arenā€™t ExpressibleByStringLiteral, and we wereā€™t planning on making them. String is however ExpressibleBy(ExtendedGraphemeCluster)Literal and ExpressibleByUnicodeScalarLiteral as well. We were planning on removing these though.

1 Like

Currently, they are not but they are ExpressibleByExpressibleByExtendedGraphemeClusterLiteral / ExpressibleByUnicodeScalarLiteral respectively. Both of those opt you into the syntax of the double quote, with some compiler checking of single-grapheme / single-scalar.

This seems over complicated for user type conformances. I guess what I was getting at is that it would be nice if there was one protocol per syntactic form you wanted to be expressible by: one for single quote and one for double quotes. I don't know how often it is that a user type would want the compiler "overflow" checking (i.e. single-grapheme/single-scalar), or how best to signal this to the compiler.

As @allevato mentioned, this might be doable via @compilerEvaluable and some kind of compilation-time assertion in the future, and stdlib types are still checked much like integer literals. Integer literals do not have this split with one protocol per bit-width, and in hindsight guaranteeing single-grapheme in the compiler ended up being unworkable.

3 Likes

The way itā€™s done for integers is with an associated type on the Expressible protocol. To implement a conforming type, you give it an initializer that takes an instance of the associated type.

The associated type, of course, must conform to a built-in, compiler-known, standard-library-only protocol. Thus the integer literal is converted into a built-in integer type at compile-time, which is then passed into the init for the user-defined type.

Essentially the same approach should work here as well: the Expressible protocol gets an associated type constrained to a compiler-known protocol which only Character, Unicode.Scalar, and certain integer types conform to.

That way, when the associated type is an integer, the compiler can verify that the literal comprises just one codepoint and does not overflow.

1 Like

Yes, we know all of this lol, itā€™s how the implementation works:

https://github.com/kelvin13/swift/tree/unicodeintegerliterals

There are more details in the latest draft: Integer-convertible character literals Ā· GitHub

I feel like you're taking a very strict interpretation of backward compatibility and "additive" that's painting the design into a corner. I'd like to address your paragraph above point-to-point to give a different perspective:

  • "[these protocols] are already spoken for so it would be a source-breaking change": yes, these protocols have an existing tie-in to double-quoted string literals. On the other hand, their use is extremely rare, and it may be that a change in behavior---while technically source-breaking---will have little practical impact. The Core Team has accepted a number of such changes (including the Swift 5 time-frame) where the benefits of moving the language forward outweigh the costs of minor source breakage.
  • "[...] would no longer be additive": yes, it's technically true, but there are two issues here. First, if the result of forcing this to be additive is that we have 5 ExpressibleByStringishLiteral protocols in a complicated hierarchy. Second, it puts enormous pressure on the process because this proposal has ABI impact in a way that is currently hard to back-deploy should it miss the "Swift 5 ABI stability" window.
  • "problems like Ints being expressible by double quoted string literals": again, this assumes strict backward compatibility. If we were to say that single-quoted literals correspond to the ExpressibleByUnicodeScalarLiteral and ExpressibleByExtendedGraphemeClusterLiteral protocols, whereas double-quoted literals correspond to 'ExpressibleByStringLiteral' (only), we don't get this problem. Perhaps there is a fallback for Swift <= 4 mode where double-quoted literals can correspond to ExpressibleByUnicodeScalarLiteral and ExpressibleByExtendedGraphemeClusterLiteral with a suitable warning.
  • "single-grapheme double quoted literals having a different default type from multi grapheme strings": The default type for double-quoted literals would remain String. For single-quoted literals it would become 'Character'.

I recommend that this proposal re-use the existing ExpressibleByUnicodeScalarLiteral and ExpressibleByExtendedGraphemeClusterLiteral protocols. Doing so has a number of advantages:

  • The proposal has less (or no) ABI impact, so we don't need to rush the process quite as thoroughly. Adding conformances for the integer types to ExpressibleByUnicodeScalarLiteral is ABI-impacting, but on a smaller scale
  • The end result is simpler: 3 protocols that are well-motivated vs. 5 protocols
  • The proposal itself is a smaller change, making it easier to review
  • Single vs. double-quoted literals are used consistently to distinguish character/code-point literals vs. string literals (still)
  • Source compatibility is a mostly matter of making the compiler cope with double-quoted literals in Swift <= 4 mode, rather than an enduring part of the standard library design (and ABI)

I'd also encourage you to remove the + and * operators from the proposal. They are not central to the proposal itself, can be independently added later, and are likely to cause a significant distraction during the review.

Doug

6 Likes

Hi Doug,

Iā€™m not going to disagree with anything you said at all.. in fact further down the thread from the message you quote I contradict myself to go with the flow and put forward the following model:

ExpressibleByUnicodeScalarLiteral        _LegacyExpressibleByUnicodeScalarLiteral
             ā†“                                               ā†“ 
ExpressibleByCharacterLiteral            _LegacyExpressibleByExtendedGraphemeClusterLiteral
                                                             ā†“ 
                                                ExpressibleByStringLiteral

typealias StringLiteralType                         = String 
typealias _LegacyExtendedGraphemeClusterLiteralType = String
typealias _LegacyUnicodeScalarLiteralType           = String 
typealias CharacterLiteralType     = Character 
typealias UnicodeScalarLiteralType = Character

Is this something we could agree on? ExpressibleByExtendedGraphemeClusterLiteral has been renamed ExpressibleByCharacterLiteral even though it has the same signature as it is a much better name - freed of Unicode jargon and, ExpressibleByUnicodeScalarLiteral differs only in that it takes as IntegerLiteral rather than BuiltIn.UInt32 in order for the compile time overflow detection to work. The _Legacy protocols are kept about for stdlib to code the swift4 conformances of double quoted strings against and select the default type but clearly signalled as being for deprecation. In this model the way forward is clear though anybody writing their own custom conformances to the legacy protocols (which has to be incredibly rare) will have to make minor changes to their code so this is scarcely source breaking.

The real problem with this process however is not the specifics (though the proposal should be updated) but that we need to get to review quite soon to tie these things down in time for adoption Swift5. An implementation of almost anything is possible in the time we have available even if it is getting short - we only need to decide what it is.

John

4 Likes

No, the two protocols we seek to add are unconnected to the existing ExpressibleByExtentedGraphemeClusterLiteral and ExpressibleByUnicodeScalarLiteral protocols, other than that many types such as Character and Unicode.Scalar will conform to both the old protocols and the new protocols. The new protocols have no relationship to ExpressibleByStringLiteral by design.

The old protocols are intended to be deprecated by Swift 6, so in the end we would only have 3 textual literal protocols, as before. The question is just whether we remove them in Swift 5 or in Swift 6.

Iā€™ā€™ve dusted off and rebased the prototype implementation an squashed to into a single commit and made the minor changes to have it conform to the model I put forward above. Iā€™m pretty sure this is our best option and weā€™ll not be able to reuse the existing protocols - certainly not to the point of being non-api-breaking. As @taylorswift points out as they need to break the inheritance relationship to ExpresiibleByStringLiteral. The result is 3 public well defined protocols related to character/string literals: ExpressibleByUnicodeScalarLiteral, ExpressibleByStringLiteral, with the renamed ExpressibleByCharcterLiteral in between and two legacy protocols to bring Swift4 behaviour forward in the short term.

Were it me Iā€™d update the proposal to adopt this model and hope we can get this in before the freeze. Looks like introducing character literals after Swift 5 looks like it could be much more difficult if Iā€™m reading the tea leaves correctly.

The final branching for Swift 5 is tomorrow.

Commits are merged after ā€œfinalā€ branching. My maths is based on 2 weeks for review 1 week of adjudication Two months bake-in time before February. The implementation is ready. Given there seems little prospect of abi-breaking changes after that I donā€™t see an alternative other than to push.

The alternative would be to go after a design without ABI impact along the lines that Doug has sketched out.

Ultimately, it's up to the core team, but I really don't see how one could justify cherrypicking such a large overhaul after the final branching date when it still has yet to be reviewed (or even converged on a final design for review, as far as I'm aware). Indeed, if such a change could still be merged, then what is the bar for changes that can't?

I guess my point is that it seems wiser to meet the ABI stability cutoff with the existing design and its known downsides than to rush with an unknown design with potentially unknown downsides. Even with the best of planning I'd expect that perfecting multiple new protocols would take months of tweaking.

I'm encouraging you to consider adopting the existing protocols as they are. It feels like we don't lose much---the protocol names aren't perfect, but they're rarely seen anyway, for example, and we could probably use the constant-evaluation work to help get overflow diagnostics. The benefit is that it reduces the ABI effect of the proposal down to only added conformances... the rest is in the source-level language, reducing risk/churn/schedule pressure a bit.

Doug

1 Like

Iā€™ll have to concede we have probably missed the boat this time around. I just wanted to make sure I was off the critical path. The flap was due to the ā€œitā€™s now or neverā€ aspect of the impending ABI freeze in Swift5. I'm sure Doug, if you say I should reuse the existing protocols it is possible but Iā€™m also sure itā€™s going to be more difficult. (I wrote the rest of this post before seeing your reply...)

Is it worth perhaps taking a step back to double check that the movement to ABI stability is absolutely necessary? Iā€™m not convinced myself.

I have to admit I winced when I heard the phrase ā€œBake Swift into the operating systemā€ at this years WWDC as Iā€™m under no illusions just how difficult a technical feat this is going to be and itā€™s far too early in Swiftā€™s lifecycle to be talking of freezing things. Swift has a very large ABI surface due to the prevalence of value types and itā€™s avoidance of ā€œlate bindingā€ so there is plenty of scope for things to go wrong.

That said there are compelling reasons to try to stabilise and share the Swift standard libraries. Having to ship them in every copy of every app has never been popular and not sharing them probably adds half a second to an apps initial startup time and wastes device memory. Also, Iā€™ve no doubt internally Apple would like to start shipping System frameworks such as HealthKit developed in Swift to offer a richer API and the Swift team would like to be able to meet this requirement.

One way to implement this is to placing the Swift libraries at a new shared path and adjusting the applications ā€œRuntime search pathā€ (RPATH) to pick these up at runtime rather than in the app package as it is now.

/System
	Library
		Swift
			libSwiftCore.dylib
			HealthKit.framework

I wonder though would it be possible to introduce the concept of an ā€œABI stability windowā€ where these libraries are versioned and Xcode would adjust the RPATH built into the application so apps built with different versions of Swift would use different versions of the shared Swift libraries. In this model the ABI would not be fixed forever (which is a very long time) but for the duration of the window.

/System
	Library
		Swift
			4_2
				libSwiftCore.dylib
				HealthKit.framework
			5_0
				libSwiftCore.dylib
				HealthKit.framework

These ā€œSwift Version Packsā€ would accumulate over time and could be shipped with the Operating system or better downloaded on demand as part of the installation of an app by the ā€œApp Storeā€ app. All the required information about which version of Swift is required and which System libraries are being used would be available load commands of the app executable or could be added as attributes in the application's Info.plist. AppStoreConnect requests to load .ipas from versions of the App Store app that have not been upgraded could inject the correct libraries into the app package at the server as before for backward compatibility.

Ultimately the granularity of the ABI stability window could be brought all the way down to Swift minor language versions (There are only a few a year) and the requirement for the concept of ABI stability simply melts away as applications would share the Swift standard libraries if they were compiled with the same version and not break if they didn't. There are a few details to work out but this would allow all the advantages of using shared libraries without impeding the future evolution of Swift or having to modify the operating system.

4 Likes

Iā€™m really afraid this proposal is gonna get derailed again, so I have to ask, is this purely a protocol naming issue? Exactly what conforms to what, what inherits what, and what gets what default implementation is already described in the proposal document and its following detailed design section, and it doesnā€™t seem to me like there is any dispute over the underlying design and the conformances we will drop (for example, String : Self.CharacterLiteralType). It sounds like entirely a matter of replacing all occurrences of the phrases ā€œCodepointLiteralā€ and ā€œCharacterLiteralā€ with ā€œUnicodeScalarLiteralā€ and ā€œExtendedGraphemeClusterLiteralā€.

I have no issue with reusing the existing naming scheme (UnicodeScalarLiteral, ExtendedGraphemeClusterLiteral.) It will make the migration a little rockier and break a bit more source, but if the core team is okay with it, I am okay with it as well. I do prefer the term CharacterLiteral over ExtendedGraphemeClusterLiteral though.

Adding conformances to ExpressibleByUnicodeScalarLiteral to Int and friends without the corresponding overflow checks was rejected a long time ago, and I think many here pushing this proposal consider it a dealbreaker. I do not think @compilerEvaluable will land in any semblance of a timeframe that would make this point moot.

I also hear a lot of talk about separating the ā€œsingle-elementā€ single-quoted literal domain from the String domain. I agree with this, but removing String (and StaticString) conformances from ExpressibleByUnicodeScalarLiteral and ExpressibleByExtendedGraphemeClusterLiteral and their respective initializer associatedtype constraints is necessarily ABI breaking. A much bigger issue is adding {UInt8, Int8, UInt16, ..., UInt, Int} to the set of conforming Self.CodepointLiteralType types, which is also ABI breaking. While the String/StaticString conformances can be written off as API cruft, this proposal cannot be implemented at all without the {UInt8, Int8, UInt16, ..., UInt, Int} conformances.

1 Like

My sense here is that is mostly a protocol naming issue given that we're now squarely in a "changing ABI is hard" world, and these protocols have been locked into the ABI of Swift 5, and this proposal (super unfortunately IMO) has missed the boat for Swift 5.

If this proposal can be made to work with the existing protocols, then we can avoid having character literals require specific deployment targets....

-Chris

I have to say Iā€™m in the dark about what the constraints are of the new "changing ABI is hardā€ world. The proposal as put forward was additive and put forward two new protocols. Is this no longer going to be possible? What is going to be possible??

I think this is the first data point in a theme that is going to occur again and again trying to evolve swift in the post fixed ABI world - that it will be considerably harder. I personally feel ABI stability is a huge mistake and will prove to be impractical in the longer term.

Take for example someone who installs the Swift with iOS13 but can not upgrade for whatever reason. Two years down the line they will not be able to install new apps using a new additively evolved Swift so a mechanism to upgrade the Swift on the device outside the OS that is running is necessary. All I suggested above is that these updates be versioned and we can break free of the need to ā€œfix the ABIā€ altogether and free evolution back up. The only alternative is to not allow Swift to evolve at all and it is far to early to do that.

6 Likes

I think weā€™re getting two things mixed up here. The choice of naming is completely irrelevant to the functionality and changes weā€™re trying to make to the language here. Changing the protocol names will also have no ABI impact, as we would only be adding two protocols (...CodepointLiteral, ...CharacterLiteral) and keeping the old two protocols (...UnicodeScalarLiteral, ...ExtendedGraphemeClusterLiteral) exactly the same in the ABI. The legacy protocols would then effectively die when we phase out the double-quotes syntax (another ABI-irrelevant change) since they would only apply to double quoted unicode scalar and character literals. Their entry points could stay in the ABI for all anyone cares, but they would be effectively unreachable to modern Swift users, which is what we want, to avoid cluttering up the API.

Implementing the functionality on the other hand is necessarily ABI breaking, at least if weā€™re set on doing it in-place on {...UnicodeScalarLiteral, ...ExtendedGraphemeClusterLiteral}. It is possible to make {Int32, UInt32, Int64, UInt64, Int, UInt} expressible by codepoint literals without touching the protocol system,, in fact you can actually do it today with the double-quote syntax. This is not possible for {Int8, UInt8, Int16, UInt16} without tinkering with the protocols. which is horrible because these four types probably make up 99% of the use cases for codepoint literals.

thereā€™s also the issue of removing API cruft from the protocols. The first two tables here sum it up,, the idea is to make each constraint disjoint, which cleans up both the API and the implementation a lot, and gets rid of the confusing mess of overlapping conformances we have today. This is just housekeeping though, and not really central to the feature weā€™re trying to add.

The main advantage I see to using new protocol names is that we basically get to design them as if the old literal protocol design didnā€™t exist. This is only workable because weā€™re also introducing a new syntax ('a' vs "a") at the same time, so we can tie the new binary interface to the new syntax, and sweep the old binary interface under the rug with the old syntax. This gives us a chance to ā€œbreak ABIā€ without really breaking ABI. I think this is a lot more convenient than embarking on a (quixotic) push to change the ABI stability policy of the entire language.

4 Likes