[Pitch] Revisiting backtick-delimited identifiers that allow more non-identifier characters

allevato · September 7, 2024, 6:56pm

I'm proposing (again) that we allow identifiers to contain whitespace and other non-identifier characters when surrounded by backticks (full pitch write-up). For example,

import `my/cool/project/ui/navigation`

@Test func `tapping pushes the nav stack`() { /* ... */ }

Now you might be thinking: this was proposed (and rejected) as SE-0275. In the four years since then, I'm hoping to revisit this based on both new information and continued lived experience. I've written a new proposal (I am not the original author; thank you to @adellibovi for their work on SE-0275!) to hopefully address the concerns that were raised by both the community and the Core Team that led to its rejection. To briefly summarize below:

Descriptive test naming

One of the primary motivating use cases was descriptive test naming, and one of the reasons for rejection was that the Core Team wanted to see a testing framework design emerge that provided a different way to attach a string to a test case. That framework has now emerged as swift-testing, but users wanting a descriptive test name are required to name the test twice:

@Test("tapping pushes the nav stack")
func tappingPushesTheNavStack() { /* ... */ }

This is redundant but also introduces inconsistency; test result reports and test UIs will show the descriptive names, but other tooling (debuggers, backtraces, index data for code navigation) will use the declaration name. I believe this is a worse resting spot than if we simply allowed the function name and the descriptive name to be the same (and for reasons I discuss further in the pitch write-up, I believe the overall design of swift-testing is correct and that it should not be fundamentally changed to tackle this problem in a different way).

Module naming

The use case I wrestle with the most is module naming. For massive codebases where a project might contain hundreds or more fine-grained modules, letting developers choose their own module names doesn't scale, nor does the flat namespace that we have for all Swift modules in general.

We (Bazel users) have adopted the convention of automatically deriving module names by mangling the path to the module's build target; for example, a target labeled //my/cool/project/ui:navigation would be named my_cool_project_ui_navigation. This is "fine" but it poses challenges for the humans writing those imports since they have to always transform the build target labels into the right module name. In these massive codebases, we also try to automate as much as we can involving dependency/import management via tooling. Since this transformation is not reversible, it makes writing such tooling more difficult.

Over the years I've experimented with other designs for this problem, but ultimately I keep coming back to the same spot: anything else I've come up with adds even more complexity to serialization and search path logic. The proposed solution (letting the build system provide module aliases that have non-identifier characters) is far simpler and cleaner (and Clang modules can already do this, if you eschew the @import syntax and use path-based #includes, so those come along for free).

The new proposal also tries to go into more detail about edge cases and interactions with tooling and possible future directions that were raised as concerns.

The full write-up can be found here. The write-up also includes links to a partial implementation. Thanks for taking a look!

dmt · September 7, 2024, 9:08pm

I also used to work with modules with long names formed from paths in the file system. Therefore, I understand the motivation in this part of the proposal and definitely welcome it. I'm not sure there's such a strong need for raw identifiers for type and function names, but I don't mind either.

A couple of notes to consider:

Is there a publicly available API for getting the FQN of a type? I'm aware of _typeName but it's underscored so it's probably fine to break clients depending on it. But if there are other functions that do the same we might want to consider disallowing . in raw identifiers.

I'd also prefer a more conservative way of specifying the set of allowed characters in a raw identifier. Rather than saying "Everything is allowed except ..." we define two lists:

Always allowed - Unicode ranges of characters that are always allowed
Allowed when at least one character from the "always allowed" list is present - operator characters and maybe something else in the future.

This way it will be more resilient to future Unicode extensions

xwu · September 7, 2024, 9:41pm

I'm not super sure how I feel about the feature overall, and I can imagine several counterarguments that could carry the day in my mind.*

But I do want to focus on one specific corner of this iteration of the proposal:

struct UsesWrapper {
  @Wrapper var `0`: Int

  func f() {
    let closure: (Int) -> Int {
      doSomethingWith(`$0`)  // ok, refers to projected value of `0`
      return $0              // ok, refers to unnamed closure argument
    }
  }
}

Currently, for garden-variety identifiers, superfluous backticks don't change meaning. For example, I can write:

struct T { /* ... */ }
let x = T()
// or...
let x = `T`()

Now, as you've shown and no doubt know, there are places where we forbid superfluous backticks, such as "1 `+` 2" and "`$0`" [sorry, I can't put these into code voice because the rules around escaping backticks are rather counterintuitive].

Where we do decide to allow additional uses of backticks for whatever reason, I would urge that we observe the same invariant as for garden-variety identifiers. By that principle, in the specific example I quoted from your pitch the closure argument $0 would simply shadow the projected value of the property wrapper, just as it'd shadow any closure argument from an outer closure if it were nested. I think the alternative you present, where we allow backticks to mean something different (whether all the time or just contextually) presents more opportunity for confusion for the reader.

I'd also imagine that macros may want to be able to emit superfluously backtick-enclosed identifiers, and having niche cases where they mean something different would be not good from that perspective as well.

Another thing I will bring up is Unicode normalization of such identifiers.

There have been, of course, prior discussions on doing this for all identifiers, which may or may not be too late to do—we should definitely not deal with that more general question as part of this feature.

But given that backtick-delimiter identifiers actively invite a larger character set and (possibly) more natural language uses, and given that the "raw identifiers" you pitch would be totally new without legacy concerns, and additionally given that we now have Swift-native implementations of Unicode normalization and now compiler passes written in Swift, do we want to build normalization into these new identifiers from the get-go?

*) For example, one might advance an argument that it's probably a good idea that tests with very long names should also have a short name such that the current status quo in swift-testing where they're named twice is a feature and not a bug. I haven't really tried lately, but at least for a while Swift compiler tests had a recommended max character count for their names so that they would display properly in charts and graphs.

xwu · September 7, 2024, 10:06pm

Besides the point about not changing behavior in the presence of superfluous backticks, I should mention that we already have language precedent on this issue.

This is because we have a small number of implicit argument names that don't start with $, such as error in catch blocks, newValue in willSet, etc.

Here, the behavior is clear: you cannot use backticks to refer to an explicitly named thing in an outer scope—it still refers to the implicit argument which shadows that outer thing:

struct T {
  let newValue: Int = 42
  var x: Int { willSet { print(`newValue`) } }
}
var y = T(x: 42)
y.x = 24 // prints `24`, not `42`

Now, I do understand that currently in the Swift grammar we treat $0 in closures as something other than a bona fide identifier, but diverging the behavior on that basis feels a wee bit language lawyer-y.

allevato · September 7, 2024, 10:14pm

String(reflecting: Any.Type) calls _typeName under the hood and produces the qualified name. I don't know if the format has ever been made into a documented guarantee anywhere, but folks use it so I don't think we could make arbitrary changes to it now.

At the same time, it's not something that you can really round-trip back to another API to get the metatype back in a supported way (that I'm aware of). Since this would be adding purely new kinds of names to the set of possible outputs, it wouldn't technically be changing any behavior (only adding new behaviors), but we'd still need to be considerate of what people are using that API for and what they would expect to get back if they handed it a raw-identified type (for example, if they were parsing it themselves).

I'm openminded about additional rules regarding what characters are or aren't permitted, but at the same time I'm concerned those kinds of carve-outs can quickly become arbitrary. We already have some very silly situations today in Swift's categorization ( is an operator, but is an identifier). There are definitely improvements to be had here around categorization as a whole, and it feels like something that would be best looked at holistically, rather than done specifically for raw identifiers.

xwu · September 7, 2024, 10:16pm

I kind of love that you know this, and that we all know it now too.

allevato · September 7, 2024, 10:54pm

xwu:

Where we do decide to allow additional uses of backticks for whatever reason, I would urge that we observe the same invariant as for garden-variety identifiers. By that principle, in the specific example I quoted from your pitch the closure argument $0 would simply shadow the projected value of the property wrapper, just as it'd shadow any closure argument from an outer closure if it were nested. I think the alternative you present, where we allow backticks to mean something different (whether all the time or just contextually) presents more opportunity for confusion for the reader.

I'd also imagine that macros may want to be able to emit superfluously backtick-enclosed identifiers, and having niche cases where they mean something different would be not good from that perspective as well.

This is definitely a valid concern and interpretation! I'll disagree however on the basis that the language syntax defines $0 as not a regular "identifier" but as a "dollar-identifier". You do mention this as being "language lawyer-y" and I was inclined to coyly agree with that at first. But since you brought up macros, on that basis the distinction between the two token types actually does matter and is already important end-user-facing information since macros have made the entire language syntax into public API.

To define it for the sake of this discussion, a "dollar identifier" is specifically defined to be any identifier that starts with a $ and is followed by one or more digits; $0 is a dollar identifier but $someWrappedProperty is a regular identifier (which includes the dollar sign).

For regular identifiers, backticks don't change the meaning, which is why I can rationalize the behavior of newValue that you mentioned; newValue is a regular identifier injected into the accessor's scope so wrapping it like `newValue` doesn't change its meaning, just as x is equivalent to `x`.

But likewise, backticks also already change the meaning of some symbols today: just as `for` changes the meaning of for from a keyword to an identifier, `$0` would change the meaning of the dollar-identifier $0 into a regular identifier.

If it weren't for the fact that property wrappers and closure arguments share the same prefix, I could see this shaking out differently. But if we want to allow purely numeric identifiers (for which I think there are valid use cases), treating `$0` as a dollar identifier instead of a regular identifier would forbid property wrappers for those identifiers. I think the potential for confusion is unlikely enough that a special exception to the rule would do more harm than good.

Given the existing syntactic distinction between identifiers and dollar-identifiers, I don't think this situation would actually arise in practice. Macros operating in a closure context need to be aware of that distinction already today if they're processing things all the way down at a token level, and code that touches identifier-kind tokens to potentially wrap them in backticks simply wouldn't/shouldn't do the same for dollarIdentifier-kind tokens.

This is a very good idea if there aren't any other hurdles to implementing it (i.e., how close are we to shedding the C++ parser and only using the Swift parser in the compiler?).

After years of writing language tooling and trawling for edge cases, trust me, there are a lot of things I wish I didn't know.

xwu · September 7, 2024, 11:14pm

Yeah, between our repurposing of $ as a "regular" identifier for projected values and this pitch to make 0 a "regular" identifier (even if mandatorily backticked), I see no justification other than historical accident for $0 to be a special class of identifier rather than exactly like newValue: a regular identifier injected into the relevant scope.

There is no need obviously to rip out how it's internally represented in the compiler, but my point is that the distinction ought not be user-facing (modulo swift-syntax) and I'd like us not to leak that into observable language semantics.

DevAndArtist · September 8, 2024, 2:43am

Since this seems like it would support the following case, I‘m optimistically supportive of the pitch:

enum Dimension {
  case `2D`
  case `3D`
}

ksluder · September 8, 2024, 2:46am

Likewise, I recently needed to make an enum of chroma subsampling modes, and resorted to prefixing everything with YCbCr because I couldn’t just say case ‘422’ etc. Though if this pitch were implemented, I wonder if I’d be tempted to use case ‘4:2:2’.

DevAndArtist · September 8, 2024, 5:32am

Additionally this might bring us a step closer for creating a variadic generic Either / OneOf enum with orthogonally projected cases that start with an index.

// straw man code
enum Either<each T> {
  case `0`(T[0])
  case `1`(T[1])
  …
}

j-f1 · September 10, 2024, 12:38am

Another thing to consider here (which I just ran into) is that if let $0 { ... } is not valid syntax in today’s Swift — you have to name the closure argument. I’m not quite sure how this should intersect with rethinking “dollar identifiers” like this. Note that if let $foo { ... } from a property wrapper’s projected value is legal.

Filozoff · September 14, 2024, 5:20pm

+1 for an alternative to test names. Kotlin has it for a very long time and I see it quite useful.

Joe_Groff · October 22, 2024, 6:47pm

Since it'll probably come up a lot in this thread in particular, I thought it'd be worth the reminder that Markdown includes HTML, so you can still wrap code in <code> tags as an alternative to backticks and write <code>\`code with backticks\`</code> to get `code with backticks`.

FlorianPircher · October 22, 2024, 7:18pm

You can also use double backticks and a space on either side, like so:

`` `code` ``

renders as: `code`

ksluder · October 22, 2024, 7:43pm

I distinctly remember trying that when I saw Xiaodi’s post. I believe there’s been at least one Discourse update since then, so perhaps this is a recent improvement!

grynspan · October 22, 2024, 10:45pm

I'll just say that Swift Testing is open to adopting this functionality if it is added (we'll need to find a way to produce a "canonical" test ID for such a function that is machine-readable. My problem, not yours.)

Edit: We'd likely also need a replacement for _typeName() that can reliably produce a string or sequence of strings for a name that might contain stray periods.