Prepitch: Character integer literals

(Tino) #69

If been thinking about single-delimiter literals for quite some time, and imho we could skip the second ' here.
I'm not sure how important the aspect of brevity is, but 33% would be a significant reduction ;-)

let hexcodes = ['0, '1, '2, '3, '4, '5, '6, '7, '8, '9, 'a, 'b, 'c, 'd, 'e, 'f]

doesn't look that bad to me (and whitespace is always invisible when you specify it directly).

(John Holdsworth) #70

It’s an artefact of the implementation or should I say it’s all I could get to work. Somebody who actually knows what they are doing might fare better but I don’t think this is a burdensome limitation in practice. In fact, I like the model that these literals are more Int-like than Character-like. There is no escaping the need to have at least some knowledge of UNICODE’s vagaries.

You can see one problem in your message - external colorising editors would get confused.

(Ben Rimmington) #71

Why not use the ExpressibleByUnicodeScalarLiteral protocol and Unicode.Scalar struct?

(John Holdsworth) #72

That’s easy to do and might be more correct the way things turned out:

/// The default type for single quoted "character" literals.
public typealias CodepointLiteralType = Unicode.Scalar

(^) #74

how to write the space character ' '?

(Dante Broggi) #75

One thing I think these literals should be able to do is:

let literal: UInt32 = 'aeio'
assert(literal == 0x6165696f)

Which would require multi-scalar literals.

Edit: would be UInt32.

(^) #76

I don’t think this is obvious behavior at all. what would this be?

let literal:UInt32 = 'aθi'

(Tino) #77

That's a small minority of characters, and imho the numerical value is better in this case - or something like (which could be shortened to .space in many situations).
But afaics, ' wouldn't be problematic either, given that a lonely single quote would always be an error.

Would this actually be allowed?
If that's the case, it would be a really obvious argument for keeping the closing delimiter... but I thought that the length of the literal always has to be one, and in this case, there's no need for a second way to signal its end.

(Alexander Momchilov) #78

I really can't behind wasting the ' reserved character on so niche/uncommon, that could easily be done with a map call:

let hexcodes = [
	"0", "1", "2", "3", "4" ,"5", "6", "7",
	"8", "9", "a", "b", "c", "d", "e", "f"

// [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102]

(Michel Fortin) #79

I see you've chosen big-endian here. This will not please everyone.

(Tino) #80

... if the compile-time code execution story finally makes some progress, this wouldn't even be expensive ;-)
But why should we not use '? If there is no other idea for it, there's little merit not utilizing it.

(^) #81

this constructs Character values from literals which cannot be done at compile time since the grapheme cluster stuff depends on the ICU runtime

you might also want to have them as Int8s instead of UInt8s since that makes testing ASCII vs latin extended easy (ASCII is always positive)

(Xiaodi Wu) #82

There is no semantic reason why UInt8(ascii:) requires the ICU runtime or could not be done at compile time.


Okay, so it is mostly a limitation of the current implementation? Or something to do with the moving target unicode spec/ICU version/whatever? My main concern is that it won't make any sense from a user's perspective.

(Alexander Momchilov) #84

What's the ICU runtime?

Again, that's still very niche. If I ever saw someone classify between ASCII and latin by a mysterious let isAscii = i > 0, I would instantly reject the CR. No questions asked.

That sort of feature is much better to just be wrapped in a method or boolean computed property.

(Alexander Momchilov) #85

It could be better used for regex literals, grammar generators, or made into some sort of facility for supporting user DSLs.

I think Swift has more important things to do for the comma, besides mimicking the look of a 50 year old language feature* for supporting a largely defunct 52 year old text encoding.

  • Single quotes as chars date back at least as old as the B programming language, circa 1969.

(^) #86

i mean, it’s only niche to some people. If you do any sort of work with any kind of binary format, you’re gonna need these, and need them a lot

(John Holdsworth) #87

Yes, It’s mostly a limitation of the current implementation though it may be difficult to remove the limitation that single quoted literals be single code points. The key requirement is that you get an error when a character overflows the target type at compile time.

let d2: Int8 = '🙂'
k.swift:6:16: error: codepoint literal '128578' overflows when stored into 'Int8'

The easiest way to do this is to leverage the existing Integer literal code. This reuse imposes the limitation that a single quoted literal can only be a single code point but it does mean you can give a meaningful error message early on.

let invalid = ‘👨🏼‍🚀'
k.swift:2:15: error: character not expressible as a single codepoint

Perhaps this limitation can be relaxed in the longer term if it turned out to be a problem but for now the prototype is probably adequate along with Chris’ draft to move to a worthwhile review.

(^) #88

that proposal (which is mine btw though chris contributed a lot) has an important difference and that’s that the inferred type of 'a' is Character and you do 'a' as Int or let scalar:Int = 'a'. So we have a CodepointLiteral that defaults to Character, and we conform Int and friends to ExpressibleByCodepointLiteral.

Your idea if i get it right is that the inferred type of 'a' is Int and you do 'a' as Character or let character:Character = 'a' via ExpressibleByIntegerLiteral.

Personally i think the inferred type being Int makes more sense, but we also do miss out on an opportunity to provide a syntax for Character literals that don’t need an as annotation which we currently don’t have. Being able to write

let character:Character = 'a'

isn’t much of a win over the current

let character:Character = "a"

The drawback is we risk confusing people (even more) about Swift’s string model since not every valid Character can be written as a CodepointLiteral, but every valid Character can be written as a StringLiteral.

(John Holdsworth) #89

Sorry @taylorswift, I should have mentioned this was your proposal & I’m not trying to propose anything different but I can provide you with an implementation to meet the review bar if it helps as I want to see this pitch succeed. Initially I tried to get away with a model where these literals were of type Int but made a second pass and they are now created using a protocol ExpressibleByCodepointLiteral which all int types conform to along with Unicode.Scalar and Character so one of these literals can have any of those types.

The implementation imposes a limitation that these “codepoint” literals be only a single codepoint as they are processed internally as integer literals. The default type is very easy to configure and should probably remain Character to future-proof the model for a time when perhaps this restriction can be relaxed. This will confuse some people but there is a clear error message and we are still far better off than we are with UInt8(ascii:) supporting the majority of common 20 bit characters.