Allow more characters (like whitespaces and punctuations) for escaped identifiers

Hello everyone, a quick update!

The proposal now includes the defined grammar for escaped identifiers and how to handle Objective-C interoperability. I will now try to move it forward by asking feedback from the Core Team, as it looks like there are no remaining incomplete points to be addressed.

Thanks again to all of you, who helped shaping this proposal, in particular to @allevato for helping me out for some parts of the implementation.

3 Likes

Will this enable identifiers beginning with a number? I assume it won't, but asking anyways.

Yes it will! :grin:

The proposal removes any character constrain (apart from prefix, in that case the compiler will emit a diagnostic error since prefix is reserved for compiler internals).

My theory is that non escaped identifiers are avoiding starting with a number because of numeric literals parsing efficiency and (maybe) mangling was not supporting it.
Escape identifiers will not conflict with numeric parsing and mangling does support names starting with a digit, therefore, I saw no reason to keep this limitation.

May I ask why you were assuming that starting with a number would still not be enabled? I am asking to understand if from the proposal (specifically the grammar section) was not clear enough or if it was something that you are expecting. If you have in mind any use case for numeric-starting-identifiers I would love to hear those! :slight_smile:

1 Like

That's actually great to hear. I can think of two cases that will immediately benefit:

  • Allowing typed HTTP status codes to be identified by the raw status code (e.g. HTTPStatus.'300')
  • Allowing for asset names to be represented directly (e.g. a wrapper around SF Symbols could use identifiers that match the asset name exactly like '10.circle' instead of _10_circle.)

(I used ' in place of the backtick because Markdown formatting was tripping over it.)

I didn't read the grammar, but now that I've read it, it's clear. I assumed that this:

was the approach being taken which would've disallowed identifiers beginning with numbers (I think.)

2 Likes

With this proposal, all identifiers have two syntactic forms, only one of them being always parseable.

A bad side effect is that this proposal will break code generators, documentation generators, and other programs of this family.

A fix, for those generators, will be to escape all identifiers, just in case they could not be parsed raw.

// Welcome to the future
import `TheModule`
class `TheClass`: `TheProtocol` {
    func `theFunc`() { ... }
    func `when I told you escaping was useful`() { ... } 
}

It reminds me of generated SQL (except that almost nobody reads SQL, when we read Swift all day long):

-- Say hello to double quotes, just in case
SELECT * FROM "player" WHERE "id" = 1

Let's take a practical example, starting with our own tooling: the Swift interface generator embedded in Xcode. It escapes `default`, `self`, and alike, but preserves other identifiers without escaping them, for legibility. Thank you, this is a quality tool.

The Swift interface generator ships with a function which decides if an identifier should be escaped or not (returns true for default, false for foobar).

This function is currently simple (it only has to check for a known list of keywords).

This function will become complex.

It is likely that this function will not be easily available to third-party tooling. Those may just give up and escape everything. Or ship with a poor-man buggy implementation.

It is likely that this function will be slow, forcing performance-focused tools to, again, escape everything.

With this proposal, all Swift identifiers can mean something in other languages.

This gives a security consideration: as long as HTML documentation generators are not "fixed" for this change, we'll see funny code appear:

func `<script>alert("pwnd")</script>`() { ... }

I stop here, but I'd be happy if the authors of the pitch would consider all those unfortunate consequences with care :-) Do we want "fallacies developers think about Swift identifiers" web pages to flourish?

2 Likes

But isn't that already the case today? You can escape identifiers today that don't actually need to be escaped:

let `x` = 5
print(x)    // 5
print(`x`)  // also 5

"More" complex, but it's difficult to imagine it being significantly more complex, based on the implementation already provided by the PR author. Now instead of checking a list of keywords, it also checks the token to determine if it contains any non-identifier-safe code points (with a special case for property wrapper dollar signs).

Special cases are unfortunate, of course, but that brings us to the next point:

Why do you think it's unlikely for that function to be available to third-party tooling? The Swift syntax parser is already factored out into a dylib that ships with the toolchain for use by SwiftSyntax. I think it would be entirely reasonable and within the realm of possibility to add the relevant C binding to that library and Swift API to SwiftSyntax to answer the question "does this identifier need to be escaped?"

Moreover, since this list of keywords that the compiler checks is already not available to third-party tooling, those tools already have to duplicate some logic and may get it wrong. That list of keywords may also differ subtly between Swift versions (although hopefully must not anymore, for source compatibility reasons). The list of keywords where this is necessary is also somewhat non-obvious, because certain reserved words are context-sensitive. This recent thread is a great example, where the word set can be used as an identifier outside of a computed property, but in one specific location in an accessor block, it must be escaped or it's interpreted as a keyword:

struct S {
  var set = [0]  // OK, "set" is unambiguous here

  var first: Int? {
    set.first  // error: "set" treated as start of accessor
  }
}

So the conditions under which code generators have to mangle or escape identifiers are already somewhat fraught with edge cases, and I don't think expanding the space of valid escaped identifiers exacerbates that significantly—especially if we take the opportunity to provide clients with an API that matches the one used by the compiler, which would be an improvement over the state of things today.

Since my previous message may sound like it ruins the hopes of many, I'd like to suggest two things which may address the original motivation:

@description("test validation should succeed when input is less then ten")
func test#() { ... }

@description("test validation should fail when input is more than twenty")
func test#() { ... }

The first is a suffix (here #) which has the compiler generate a unique name for the function, preserving its prefix. The name is unknown to the programmer, but unique in the relevant scope. XCTest, for example, finds as many test prefixed selectors as expected.

The second is a free-form annotation (here @description) which is made available at runtime for whatever purpose (like printing something) - I don't know how, this is just the baby of an idea.

4 Likes

I can, but I don't. You missed the paragraph about quality generators.

Why do you think it's unlikely for that function to be available to third-party tooling?

Because generators are written in many languages, and run on many architectures, most of them won't have access to the holly dylib.

My post contains other objections. You don't have to rush :-)

Another consideration is runtime API that does, or may in the future, want to be able to parse qualified Swift symbol names, for things like dynamic type or method lookup. If identifiers are allowed to include punctuation marks like . or <, for instance, this could confuse an API that tried to look up a type by name:

struct `Foo<Int>.Bar` { }

struct Foo<T> { struct Bar { } }

let t = typeByName("Foo<Int>.Bar")

It might be prudent to keep characters that are significant in the type grammar off-limits from identifiers to avoid introducing escaping problems for runtime APIs.

13 Likes

That's not the point I was replying to, which was the statement "with this proposal, all identifiers have two syntactic forms, only one of them being always parseable". That read as if it was implying that it was this proposal that made that functionality possible, but it's already possible today. Did I misinterpret what you were saying?

I didn't miss it; more importantly, statements like these are unnecessarily antagonistic. Let's stick to the technical merits of the discussion.

I have experience writing code generators as well (I'm one of the maintainers of swift-protobuf), so I do understand the issues involved, especially when translating identifiers from one schema to another.

Generators today could take the easy way out and escape every identifier if they wanted to, because the language allows it. I don't think we've seen that to any great degree, and I don't think the chances are that much higher that we'd see it a lot more with new identifier rules. That's just conjecture on my part, but that's what your concerns were as well; do you have any concrete reasons to believe that generated code will suffer because of this change?

Code generators are also a very small subset of the day-to-day code written and read in Swift. I'm not sure that the possibility of someone writing a "bad" code generator should be a mark against a feature. And as someone who uses generated code in a number of my projects, I'm not sure I'd care that much if someone escaped all the identifiers in the generated code, because I don't look at the generated implementation that often. I'm usually more interested in viewing an interface-only API digest provided by Xcode, which would presumably only escape the identifiers that actually need it (since the escaping is not actually part of the identifier in the AST). But I realize that reasonable people may disagree on this point.

That's fine, though—the language doesn't have to provide an API for every possible language/architecture to identify identifiers. The grammar rules for identifiers in Swift are already fairly complex, especially with regard to the ranges of acceptable Unicode code points. To my knowledge, there's not an API anywhere today that allows third-parties to exactly match that in their own tooling regardless of language/architecture, so Swift providing one to third-parties who write their tools in C and Swift would still be a major improvement. And again, that would make it available to third-party tooling, satisfying the requirement in your original post; if someone chooses to write that tooling in a language that doesn't provide access to that API, then that's their choice, and they need to work around that decision.

3 Likes

Yes you did. I suggest a re-read.

That's a good point! We should definitely consider this.

One possibility would be for the API to parse the identifier the same way that the compiler would, thus requiring escaping inside the string if you wanted to handle identifiers that otherwise contained special delimiters:

struct `Foo<Int>.Bar` { }  // #1
struct Foo<T> { struct Bar { } }  // #2

let t = typeByName("Foo<Int>.Bar")  // #2
let t = typeByName("`Foo<Int>.Bar`")  // #1

There's some possible ambiguity about symbols that would need to be escaped in source but not in the string API call, like

struct `Foo Bar` {}

// Should this work? The API probably doesn't *need* to escape the
// identifier here.
let t = typeByName("Foo Bar")

// Or should we require this, for consistency with source?
let t = typeByName("`Foo Bar`")

Off the top of my head, I'm not sure I have a strong preference on this one.

1 Like

Thanks @Joe_Groff, valid consideration.

I do like this option as it feels more coherent with the approach of the proposal by keeping the "every char is allowed because this is an escaped identifier". If we go with that, I am more prone to always respect the grammar since it follows how we can statically reference to a type too:

`Foo Bar`() // Valid
let t = typeByName("`Foo Bar`") // Valid
Foo Bar() // Compiler error
let t = typeByName("Foo Bar") // Runtime error

I do not fully understand if _typeByName currently supports only Swift mangled names or also the example you mentioned, can you confirm if that is the case?
If we don't currently support qualified complex type names, do you think @allevato's suggested option may be a valid one that could be implement if/when Runtime API will support so?

_typeByName currently only supports mangled names, that's correct, so it wouldn't immediately be a concern because the mangling handles special characters already. My concern was about hypothetical future APIs that might want to parse identifier names in their human-consumable form.

3 Likes

It also supports top level classes by name no?

class Foo {}

print(_typeByName("Module.Foo")!) // Module.Foo

This feels unimportant and not worth the extra complication. Does not meet the threshold imv.

1 Like

Great. There were always some edge places where I needed the identifier to start with a number.

enum Dimension {
  case `1D`
  case `2D`
  case `3D`
  ...
}

enum Union3<A, B, C> {
  case `1`(A)
  case `2`(B)
  case `3`(C)
}
5 Likes

Hi Gwendal, thanks for bringing the topic of code and documentation generator!

I do agree that the proposal, as any change of the grammar, may have an effect on this type of programs and I want to share my reasons why I believe the impact may not be significant.

Most popular code, documentation generators or linters like Jazzy, Sourcery, SwiftLint and SourceDocs all use SourceKit under the hood.
Since back-ticks are currently considered (prior the proposal) as leading and trailing trivia it means that, if those tools currently support escaped identifiers they will automatically support the proposed change when updating SourceKit (update that may be required anyway for new Swift versions).
Do you have concrete examples of popular tools where this change does have an impact?

Related to HTML documentation security concern I find hard to see how this proposal can contribute to the issue.
Printing any methods, identifiers or comments should be already escaped as Swift already supports many characters outside the seven-bit ASCII that HTML supports without escaping. For example, characters like the < and > signs are, in fact, part of method declarations and they should be already escaped.
I went deeper and checked Jazzy implementation, as a result they do not have the mentioned issue: they either escape or wrap in a code tag the Swift declaration.
I am assuming other documentation generators may do the same.

I fully agree that we should be mindful on how a change may impact not only the language but the language environment too, so thanks for raising this and making me review the current state of Swift code and documentation generators.

2 Likes

That information is so out of date that when it actually was (sort of) true some people on this forum weren’t born yet. :wink:


But yes, as a former contributor to Jazzy and SwiftLint who went on to write in‐house replacements, this won’t make any significant difference to any such tools. In case you missed it, you also have the lead developer (@allevato) of Swift’s official formatter (swift-format) actively campaigning to have this added.

This would actually make things much easier. Right now I have about 200 lines of code just dedicated to producing valid identifiers. The ability to slap spaced grave accents on either end and use a file name as‐is would be so much simpler.


But the real reason I want it is that outside the English world, I’ve found that camel case just doesn’t always cut it and there is no legible solution without access to apostrophes, hyphens and other currently invalid joiners. French is one of the least problematic, but since I know at least one other person in this thread speaks it, I’ll use it for my examples. Which of the following uses the “right” style?

  • « aujourd’hui »

    • aujourdHui
    • aujourdhui
    • aujourd_Hui
    • aujourd_hui
    • aujourdeHui
    • `aujourd’hui`
  • « Faire quelque chose jusqu’à ce moment‐là. »

    • faireQuelqueChose(jusquÀ: ceMomentLà)
    • faireQuelqueChose(jusquà: ceMomentlà)
    • faireQuelqueChose(jusqu_À: ceMoment_Là)
    • faireQuelqueChose(jusqu_à: ceMoment_là)
    • faireQuelqueChose(jusqueÀ: ceMomentLà)
    • faireQuelqueChose(jusqueÀ: ceMoment_Là)
    • faireQuelqueChose(`jusqu’à`: `ceMoment‐là`)

To me, the currently valid ones all feel weird and are hard to choose between. But I would choose the last one in a heartbeat if it were available, especially if code completion can suggest it before the problematic character is typed, and can automatically place the accents on either side of the identifier.

8 Likes

But Swift is a language based on English. Why does it matter that its grammar and syntax fits poorly with other languages?

Do you also want to support localized versions of keywords, allowing arguments to be spelled before the function name, or adding argument labels after argument values?

Terms of Service

Privacy Policy

Cookie Policy