SE-0275: Allow more characters (like whitespaces and punctuations) for escaped identifiers

Just for the record now that there are other ideas flying around:

  • If the proposal were constrained to enable no new characters, but only to allow the grave accents escape operators as well as identifiers. I would be disappointed, but I would still support it.
  • If the proposal were constrained so that the accents only allowed a few privileged new characters (such as only space), I would shift to be in opposition. This would obliterate many of the use cases brought up in the pitch phase (many of which received no mention in the proposal itself). Examples include direct use of paths and URLs, easier code generation from user strings, improved legibility options for foreign scripts (especially caseless ones), etc. On its own, “I’d rather name tests with spaces instead of camel case.” just doesn’t pull its weight in my opinion.

Doesn't Swift use name-mangling when finalizing the results of compiling functions into object code (to support overloads)? And doesn't the normal name of the function affect what the mangled name will be? If yes to both, then doesn't this proposal actually affect ABI to a degree? We would have to define how exotic identifiers are mapped onto the mangled namespace.

1 Like

The mangling for these added symbols is in fact already defined. The same punycode-based encoding that gets used for Unicode characters in names also works for ASCII non-alphanumeric characters.

6 Likes
  • What is your evaluation of the proposal?

-1

  • Is the problem being addressed significant enough to warrant a change to Swift?

No. The ability to do the + an operator portion seems worth doing, but not the arbitrary characters in identifiers.

  • Does this proposal fit well with the feel and direction of Swift?

As the author said, "Swift has a beautiful concise yet expressive syntax."

Beautiful: the back ticks are ugly and make visually parsing the language unpleasant. I would not want to work in a file that was littered with this mess.

Concise: if your identifiers are concise, this feature isn't needed (the long identifiers with spaces being one of the motivating statements).

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

People mention testing without using something like Quick as a reason to support this but I don't see that this proposal would improve anything in that regard. Prose in identifiers is not an improvement. Add a comment if a concise but descriptive identifier name isn't sufficient.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal, read all the comments, imagined what a file full of identifiers like this would be like to look at and edit and concluded that there really doesn't seem to be any need for this and the results will be less enjoyable to interact with.

11 Likes

Thanks Owen for the feedback and review!

Good that you raised this and I want to clarify it.
Newlines are indeed scoped out the proposal, even if the grammar just mention \r, \n the rest are also not allowed, some are, in fact, invalid characters and Swift will warn and remove those.

There is available a toolchain if you want to play around it :slight_smile:
https://ci.swift.org/job/swift-PR-toolchain-osx/467//artifact/branch-master/swift-PR-28966-467-osx.tar.gz

Hi @michelf, that is indeed an important topic, so let me explain more.

Expanding the grammar for escaped identifiers will still respect any current semantic constraints.
As an example, $ dollar identifiers are still compiler-reserved names, `$identifierNames` will produce an error. This means that even if the parser allows you to potentially declare something, it does not necessary mean that semantically you can.

Same goes for operators, if an escaped identifier can be "processed", in its context, as an operator then it will be an operator and therefore any logic associated to this set of identifiers will apply.

As for this, it is an operator function, for example moving inside the declaration of a struct the compiler will prompt to Operator '+' declared in type 'Int' must be 'static' as it would do for the a non escaped version.

Thanks Jeremy for giving an insight of the pitch discussion.
I want to slightly correct this sentence and make it clear. We do not explicitly check for the first character, but instead the compiler check if the whole identifier can be an operator (obviously if the first char is not an operator we already know that the whole identifier is not an operator). This choice didn't increase complexity, instead it made simpler as those checks were already in place. For the record, `+a`() would be considered a function.

Yes, referencing operator was considered a nice side effect and not really the main goal.

I was playing around with BDD and you could do (take it as a proof of concept) something like this:

    func `test account has sufficient funds`() {
        given(`the account balance is`(100.dollars)) {
            and(`the card is valid`)
        }
        .when(`the account holder requests`(20.dollars))
        .then(`the account balance should be`(80.dollars))
    }

The different methods (`the account balance is`, `the card is valid` etc..) can be used for setting up the test or asserting a particular condition. I personally find the fact of being able to explore opportunities like this one exciting. I do also understand, and respect, that some of us may still want to prefer the camelCase option (or not).

Said so, Michel, I hope this helps for your review, in particular, I hope it clarified your doubts about operators, if not, please let me know which other questions you may have, I would be happy to reply to those :slight_smile:

Hi TJ,

Thanks for raising the topic about complexity of implementation.
I want to share that the whole proposal fits in a Pull Request with 118 additions and 61 deletions. This may give you a better idea.

Generally negative, except for the side effect of being able to reference operator functions as static members, which I think is useful.

Maybe. The proposal mentions a few “problems”. I think the issue of naming test cases does warrant some kind of change to Swift, although I’m not convinced this is the best way to handle that issue. Being able to arbitrarily name functions in normal code, I think is a non-goal, and not worth “fixing”. It would hurt readability of normal non-test code.

If accepted, the backtick syntax fits well with the language, and there is precedent in using it to escape identifiers.

Kotlin has similar feature, but a far more restrictive set of valid characters. I far prefer Kotlin’s stance over this proposal.

I followed the pitch thread, read the proposal, but did not review the actual proposed implementation PR.

9 Likes

Thanks for thoughts Svein.

I want to say something about this topic as it come up also in the pitch thread.

Having poor choices about identifiers name is an already existing issue, developers are fundamentally free to name their identifier the way they want, i.e.: the compiler will not tell us if a identifier does not follow Swift API guidelines, therefore if we want to have something unreadable we already can.

The proposal wants to give an opportunity to developers and teams to weight different options and take what feels a meaningful and readable choice for them, whatever that choice it is.

These clarifications are very sensible and greatly change my evaluation of what’s being proposed. However, the draft implementation does not suffice on its own as documentation of the proposed behavior and all of this information should be in the proposal itself.

It is critical for the evaluation of this proposal that it is laid out clearly in prose form what, in the end, will be an allowed operator or non-operator identifier.

9 Likes

Yes, I understand that. And I feel that it is a bad idea, hence my feedback. I am well aware that people are already free to write ugly code, but this will make it easier to do so, at very little benefit as far as I'm concerned. I've yet to see convincing (to me) examples of code that would benefit from this proposal.

I understand that you and others feel differently about this issue.

3 Likes

I like what this pitch wants to achieve, but -1 the current implementation - perhaps some examples would make me more positive but my feeling is that functional code that liberally used this feature would quickly become a mess of backticks and become quite ugly/confusing, and where it was used sparingly, escaped calls/definitions would look jarring and inconsistent with everything else.

Test code is one place where I can definitely understand the benefit - but even there, I can only see people really using it for the names of their test methods, and those are rarely referenced from other code, which sort of defeats the point.

There's also the benefit of referencing operators, but there might be better ways to enable that.

4 Likes

I still have unanswered questions about operators from my previous post. Namely, do you need to use the labels when calling the operator directly like you'd do when calling a function?

// given this
func +(lhs: Int, rhs: Int) -> Int
// which one is valid?
let three = Int.`+`(1, 2)
let three = Int.`+`(lhs: 1, rhs: 2)

Normally, parameter labels are enforced for regular functions, but at least up to now have been meaningless for operator functions. Does this proposal make them meaningful when called directly?

Also, are these declarations allowed?

class `%` {}
struct `...` {}
enum `=` { case `?` }
let `+`: (`=`, `%`) -> `...`
func `*`(`#`: Int, `@`: Int) -> Int

I'm not suggesting anyone do this, but I wonder if you can actually use operator-like identifiers everywhere.

That seems like a lot of boilerplate to write a test that is basically like this:

    func `test account has sufficient funds`() {
        account.balance = 100.dollars
        card.valid = true
        account.holder.request(20.dollars)
        XCTAssertEqual(account.balance, 80.dollars)
    }

This might just be unfamiliarity, but I had to search and find this before I could understand your code and rewrite it to something familiar. Even then, I'm not too sure if those pseudo-control-flow functions actually do anything (given, when, and then) or why they need to be chained.

You're writing an intermediary API sitting between your actual tested API and the actions you want to test. It might be nice that usage of this intermediary API reads like prose with actual spaces, but the whole concept of testing using a an intermediary layer seems counterproductive to me.

5 Likes

Thank you for your response! I wasn't actually too concerned with the implementation complexity, though. It's things like "even if the parser allows you to potentially declare something, it does not necessary mean that semantically you can."

This is a fundamental change in how we should think about identifiers and explaining them because "many" things are valid but many aren't. This could just be that I'm viewing this as a change instead of coming to it fresh. (I suffer from this with access modifiers)

Conceptually +1, and I had it in the back of my mind to propose something similar for a separately motivated use case.

But, I think the current proposal leaves a lot open to interpretation as it's currently written which is potentially causing confusion. There was a lot of good discussion in the original thread that went into more specific details about edge cases and that didn't get captured in the proposal write-up, which is unfortunate:

  • Which, if any, identifiers are still banned, even when escaped, and what is the rationale for that? (For example, it appears that $-prefixed identifiers are still reserved for compiler/debugger use.)

  • When an identifier contains both operator and non-operator code points, how does the compiler determine whether it is an operator or a regular identifier? (It appears that it is an operator if all code points are operator code points.)

  • The grammar for escaped identifiers excludes the backtick (U+0060) as a valid code point. Is it worth discussing, perhaps as a future direction, the ability to include backticks in identifiers by using a syntax inspired by raw string literals (e.g., #`this`has`some`backticks`#)?

  • Joe Groff's point about how a future type-lookup-by-name API would handle these identifiers is an important one. The proposal doesn't have to offer a complete solution now, but it should acknowledge it and briefly discussing possible approaches (for example, having the API understand backticks and parsing them out of the string the same way the compiler would parse the identifier) so that it's clear that this proposal doesn't completely block any future progress in that area.

  • The proposal's motivation only focuses on the test use case. Other use cases were mentioned in the discussion thread that (arguably) would strengthen the motivation, such as generated names for accessors of assets that start with non-identifier characters (`10_circle`). Indeed, when writing code generators, being able to escape such identifiers is a much better solution than mangling them, because mangling them sets off a chain reaction—you must then ensure that the mangled name doesn't collide with some other identifier in the same scope, ad infinitum.

Having this be clear in the proposal is important not just for review purposes, but because the proposal serves as future documentation/specification for the feature later on if it is accepted.

Which code points are allowed

The proposal as written currently allows any Unicode code point other than U+000A, U+000D, and U+0060 inside an escaped identifier. In my opinion, this proposal should allow users to escape more than the existing set of identifier and operator code points for use in identifiers, but I also think there's room to discuss further reasonable exceptions.

As one example, the colon (:) is not currently permitted as an identifier or operator code point, but I would like to allow it because my own use case depends on it (being able to name a module something of the form `//foo/bar:baz`).

So in general, I don't think we should arbitrarily forbid printable meaningful characters just because they aren't already valid operator/identifier code points. Backticks currently signify "this sequence of code points is not currently a valid identifier because it has meaning as a reserved word, so I need to escape it", and I view this proposal as generalizing that statement to "this sequence of code points is not currently a valid identifier because it has meaning as a reserved word, so I need to escape it".

With regard to such printable characters, I would be disappointed and less likely to support its acceptance if it were saddled with additional restrictions, like "only existing operator/identifier code points are allowed, but also space", or "these printable characters, but not these". If a character or sequence does need to be banned, it should be backed by technical rationale (such as $-prefixed identifiers being reserved by the compiler/debugger), not by subjective aesthetic calls.

With that being said, there are also a number of unprintable/control code points that probably ought to be discussed and likely omitted. Other folks replying here have already mentioned other Unicode line/paragraph separators. I certainly don't think we want an identifier with embedded nulls. What about the other U+0001–001F code points not already mentioned, or U+007F–0+009F? Should other Unicode spaces be permitted (U+00A0, U+2000, etc.), or just U+0020?

Yes. In addition to the author's original motivation of improving test method naming and making it possible to refer to operator functions by name, I would love to follow this up with some frontend changes that allow me to give Swift modules names that match their Bazel target labels. This would be a huge win for the usability of Bazel builds in and out of Google.

Yes. Swift already goes farther than many languages used today with respect to Unicode support in language identifiers. This feature would give us a lot of new capabilities:

  • Cleaner test method names vs. camelCasing
  • Ability to reference operator methods by name
  • Fewer Anglo-centric constraints around identifiers (one poster in the original discussion referenced French and other Romance languages where apostrophes are commonplace as parts of words)
  • A better "escape hatch" for code generators and other names compared to ad hoc mangling schemes

Some reviewers argue that some or all of the above capabilities could be had without opening the door to the much wider set of Unicode code points in an escaped identifier. That may be true, but I don't find the argument that we'll end up with a significant amount of unreadable or ugly code to be all that compelling—it simply doesn't align with the reality of what we've seen in 5+ years of Swift.

From day one, Swift has supported emoji in identifiers (to the degree that it's shown off in the official documentation!), but we haven't had an epidemic of libraries released where their usage depends on writing strange hieroglyphs in your code. Similarly, you can make incredibly confusing names today by swapping out characters that look the same but are semantically different, like Latin uppercase A and Greek uppercase Alpha, but this doesn't happen in reality. Even if we limit ourselves to 7-bit ASCII, no language or compiler that I'm aware of prevents the identifier O0O0O000OOOO0 or III11ll1lIIlIl11IlI1.

I trust most programmers to make sensible, grown-up decisions. If this proposal is accepted and one of your teammates or contributors tries to name a method `*!@$%`, squash it in a code review. If you want to have tighter rules about identifiers in your own code base, apply a linter rule. If some developer releases a library that is hard to use because it has poorly named escaped identifiers, then that library is unlikely to see much adoption, so that problem essentially solves itself. I'm not sure what exactly we need to be protected from, but it's not something that this proposal will be responsible for.

I would hate to see the number of legitimate and good use cases that this proposal would enable blocked because of a hypothetical fear that programmers are going to lose their sensibilities when it comes to naming conventions.

No; I wasn't aware F# or Kotlin had similar features, but I'm glad that there is prior art there.

Participated in the original discussion thread and on the PR.

12 Likes

This.

7 Likes

-1. I'm against strongly this proposal for several reasons:

  • It makes basic tooling harder to build, because the simple notion of "what's an identifier that can name things in Swift" is no longer simple. GitHub syntax-highlighting var in the var `some var` = 0 example from the proposal is one example. If you double-click on either of the works within the back ticks, there is no editor or tool out there that will properly select the whole identifier. And because back-ticks aren't a balanced set of delimiters, you literally have to scan to the beginning of the source line to determine what is the "identifier".

  • Putting unstructured text within the back-ticks cuts off any future evolution that might put more structured information within back-ticks. For example, providing the ability to name zero-parameter functions or to name the accessors of properties as a function entity, e.g., x.`getter:property` or ``x.zeroParameterMethod(), which cannot be named today. Similarly for subscripts. Some of this was discussed a long time ago, but still seems like a good direction for the language. We shouldn't cut off future directions for a small win.

No. There are two motivating use cases as far as I can tell:

  • Having spaces in the names of test functions. This is a small convenience, and we can probably improve this case in another way.

  • Being able to reference an operator by name, e.g., Int.+` . This one could be addressed by allowing Int.+ in the grammar. It's more discoverable and should be straightforward.

    Doug

36 Likes

I'm with @Douglas_Gregor and a few others on this one, I'm -1.

I've always viewed the back-tick solution as a backup feature that was an ugly but necessary evil for very certain edge cases. This proposal brings it to the forefront as a 1st class language feature where I think a more elegant solution could be found.

I can see the one thing that everyone likes is the ability to do Int.'+' (replace single quotes with back-ticks), and I agree that's a really nice thing to have. But like @Douglas_Gregor said, it's nicer to have it without the back-ticks at all. I can't comment on the implementation side, but from a user point of view it's discoverable and makes sense whilst also feeling swifty.

I have no experience of using test cases with spaces like the example, but while I see the appeal at the call site the declaration itself is very convoluted and difficult to parse when skimming through code. I think a better solution is out there for this, possibly something along the property wrapper lines but for functions, but for me back-ticks solution isn't it.

I like the desire behind the proposal, but not the solution.

12 Likes

-1

a) Snake case is perfectly fine for the case with the test-method names.
b) Back-ticks are okay if unavoidable for edge cases. Better is without.
c) Referencing operators can be solved separately.
+) Back-ticks hurt readability and are a complication.

9 Likes

-1. I don't believe the proposal as written would be a good step forward for Swift.

Aspects of it are, and I think improvements can be made here, I just am not in favor of this specific proposal.

From a technical position, as far as I can tell, the proposal cannot be implemented. It suggests:

This proposal wants to extend the current grammar for every escaped identifier (properties, methods, types etc...) by allowing every Unicode scalar.

...but of course, backtick shouldn't be allowed inside the backticks :-).

My more general concern is that this is taking a very important grammar production and throwing the door way open, completely occupying the space that may be important for other reasons. Taking this proposal would mean that it is impossible to introduce a grammar into the backquotes to refer to aggregate names (e.g. the getter on a decl), for example. It isn't clear that we would want to do that, but that's the thing about the future: it is hard to predict.

I would be more favorable of specific limited expansions, e.g. if it were important to allow a horizontal space in backticks then we could consider that. OTOH, I consider that to have very marginal value over using underscore: I don't see how:

func `foo bar`() {
}

is better than:

func foo_bar() {
}

Finally, if such an expansion of the grammar takes place, we should incorporate something about this into the Swift API Naming Guidelines.

No.

Detailed review.

-Chris

18 Likes
Terms of Service

Privacy Policy

Cookie Policy