Single-quoted code unit literals

John_McCall · December 14, 2022, 6:48pm

Okay. So in your opinion, having character literals would not be useful without having integer conformances to the character-literal protocols?

jrose · December 14, 2022, 7:01pm

I’m not sure I think they’re needed at all, but I don’t consider the Collection/Element problem enough motivation on its own to make a change like this. (OptionSets have the same problem.) So a proposal that just changes Character and UnicodeScalar is uninteresting to me; I basically never use those types to begin with*, and so a syntax to construct only them makes the language a little more complicated for a use case I’ll never see.

* and generally one shouldn’t, because even grapheme clusters usually can’t be manipulated independently. Consider uppercasing “ß”, which at least historically has produced “SS”.

sspringer · December 15, 2022, 12:30am

My take:

As expressed by @John_McCall, “enabling the syntax, have it require one of the two existing protocols (extended grapheme cluster / unicode scalar), and make it use Character as its default literal type”.
Analogous to let d: Double = 2, the compiler might accept let i: UInt8 = 'a' (or let myArray: [UInt8] = ['a','b']), figuring-out the right integer value, and giving sense to digit + 10 - 'a'. (Any opinion on this?)
Please no further magic with types, keep it simple. Once something is a Character or UnicodeScalar, do not substract it from an integer.
Is then anything missing to make real use cases easier?
Please do not confound Unicode scalars in a limited range expesssible by UInt8 or UInt16 with UTF-8 encoding or UTF-16 encoding. Note that you need 21 bits to express any Unicode codepoint as the according number (which makes it expressable by UInt32).

taylorswift · December 15, 2022, 3:59am

i remember one of the things that sank the 2018 proposal was the ExpressibleByExtendedGraphemeClusterLiteral → ExpressibleByStringLiteral implication.

so this is why i recreated separate ExpressibleByCodepointLiteral and ExpressibleByCharacterLiteral protocols in the current proposal. but i accept that people do not like the idea of introducing two new protocols that do very similar things to two existing protocols we already have today. so in light of that, i would like to gauge everyone’s feelings about a potential alternative because we do have a feature in the compiler today that we did not have four years ago: marker protocols.

because the whole reason ExpressibleByCharacterLiteral exists in the proposal is so that ExpressibleByStringLiteral won’t inherit from it but if you take away that reason it doesn’t really make sense to have ExpressibleByCharacterLiteral that is just a duplicate of ExpressibleByExtendedGraphemeClusterLiteral

and i think a better way to approach this is to instead say which types we are going to opt in to the single quoted syntax and that we will not be enabling this syntax for String and StaticString the same way we added Sendable but said some of the types like UnsafePointer are not going to be Sendable by default.

so if we add a @_marker protocol ExpressibleBySingleQuotedLiteral that types like Character and Unicode.Scalar conform to but types like String and StaticString don’t conform to, then we can continue to use the ExpressibleByExtendedGraphemeClusterLiteral and ExpressibleByUnicodeScalarLiteral protocols.

that way single quoted Character literals won’t be range-limited and can contain extended grapheme clusters.

'🇨🇦'.property
'🇺🇸'.function()

and the new ExpressibleByASCIILiteral/ExpressibleByBMPLiteral protocols would be orthogonal to ExpressibleBySingleQuotedLiteral and types would have to conform to both.

because it is a marker protocol it isn’t in the ABI so it would backdeploy.

johnno1962 · December 15, 2022, 11:10am

I agree, I was just thinking this yesterday. As this is coming up again and again, I've pushed a commit that splits the new _ExpressibleByASCIILiteral and _ExpressibleBySingleQuotedLiteral marker protocols off, rather than grafting them onto the existing hierarchy. You can then manually conform Character and Unicode.Scale to these protocols directly.

@_marker _ExpressibleByASCIILiteral
   ↳ @_marker _ExpressibleBySingleQuotedLiteral
 
ExpressibleByUnicodeScalarLiteral
   ↳ ExpressibleByExtendedGraphemeClusterLiteral
      ↳ ExpressibleByStringLiteral

This doesn't really change the situation that I wouldn't want to see single quoted character literals accepted without the integer conversions and it seems the language working group has already adjudicated on that.

I'd find that a procedural anomaly if it weren't for that fact it seems to be a reflection of the view of the broader Swift community.

taylorswift · December 15, 2022, 3:18pm

johnno1962:

I agree, I was just thinking this yesterday. As this is coming up again and again, I've pushed a commit that splits the new _ExpressibleByASCIILiteral and _ExpressibleBySingleQuotedLiteral marker protocols off, rather than grafting them onto the existing hierarchy. You can then manually conform Character and Unicode.Scale to these protocols directly.
@_marker _ExpressibleByASCIILiteral
   ↳ @_marker _ExpressibleBySingleQuotedLiteral
 
ExpressibleByUnicodeScalarLiteral
   ↳ ExpressibleByExtendedGraphemeClusterLiteral
      ↳ ExpressibleByStringLiteral

thank you john! but it isn’t quite what i was suggesting, rather i was envisioning that ExpressibleBySingleQuotedLiteral would be a syntactical marker protocol, and it would be orthogonal to the literal expression domain protocols, which have runtime impact and cannot be @_marker. so, it would look like:

// new
@_marker ExpressibleBySingleQuotedLiteral
    ↳ ExpressibleByASCIILiteral // only for ASCII-restricted domains
        ↳ ExpressibleByBMPLiteral // only for BMP-restricted domains
 
// existing
ExpressibleByUnicodeScalarLiteral
    ↳ ExpressibleByExtendedGraphemeClusterLiteral
        ↳ ExpressibleByStringLiteral

extension Unicode.Scalar:ExpressibleBySingleQuotedLiteral
{
}
extension Character:ExpressibleBySingleQuotedLiteral
{
}

thoughts?

so, i was thinking about the “we don’t want users writing 'x' + 'y'” problem, and i realized this problem isn’t actually exclusive to UTF-8/UTF-16, we have a lot of types in the ecosystem today that suffer from a similar problem.

at the risk of going off topic, i want to talk about FilePath, because i think FilePath is a good case study for where we have a similar problem that does not have to do with code units.

because we usually want to think of FilePath as a collection of path components, and this kind of abstraction probably wants to support concatenation with + like:

let directory:FilePath = "Sources"
let fileID:FilePath = "Foo.swift"

let file:FilePath = directory + fileID // returns "Sources/Foo.swift"

and FilePath is also stringlike so we probably want to make it ExpressibleByStringLiteral so you could do:

self.load(textures: ["albedo.png", "specular.png", "normals.png"])

but then users would be able to write nonsensical things like

self.parse(sourceFile: "Sources" + "Foo.swift")

and we don’t want this because what is sourceFile? is it "SourcesFoo.swift" or is it "Sources/Foo.swift"?

and what we really need is to be able to add a little bit of friction to the literal type inference so that you would always have to write:

self.load(textures: ["albedo.png", "specular.png"] as [FilePath])

self.parse(sourceFile: ("Sources" as FilePath) + ("Foo.swift" as FilePath))

and i think this is actually a more general feature that we need, and maybe it could look like an attribute on an ExpressibleBy conformance like:

extension FilePath:@noninferred ExpressibleByStringLiteral
{
}

John_McCall · December 15, 2022, 5:13pm

There was also a prior review here. The current draft proposal has an extended reverie explaining why the Core Team was wrong, and of course you’re entitled to think that and argue it in your proposal, but I don’t think you can be too surprised that the Language Workgroup still believes now what many of its members fairly clearly believed then.

taylorswift · December 15, 2022, 5:58pm

although i agree with @johnno1962 viewpoint technically, i think it is obvious that bare conformances to ExpressibleByASCIILiteral for UInt8 are controversial and detracting from the proposal.

so regardless of whether one would describe the proposal as a “reverie” (my oh my everyone's delusional except for me!) i think the only productive way forward is to limit the proposed changes to areas where there is broad agreement, namely:

@_marker ExpressibleBySingleQuotedLiteral
    ↳ ExpressibleByASCIILiteral // only for ASCII-restricted domains
        ↳ ExpressibleByBMPLiteral // only for BMP-restricted domains

and conformances to ExpressibleBySingleQuotedLiteral for Character and Unicode.Scalar that allow expressing the full range of those types with single quoted literals.

i have put a lot of thought into formulating this design in a manner that would not block off future directions for UTF string processing, and in my view at least, continuing to try and push through an omnibus bill does not make sense.