SE-0275: Allow more characters (like whitespaces and punctuations) for escaped identifiers

sveinhal · January 14, 2020, 8:00am

Generally negative, except for the side effect of being able to reference operator functions as static members, which I think is useful.

Maybe. The proposal mentions a few “problems”. I think the issue of naming test cases does warrant some kind of change to Swift, although I’m not convinced this is the best way to handle that issue. Being able to arbitrarily name functions in normal code, I think is a non-goal, and not worth “fixing”. It would hurt readability of normal non-test code.

If accepted, the backtick syntax fits well with the language, and there is precedent in using it to escape identifiers.

Kotlin has similar feature, but a far more restrictive set of valid characters. I far prefer Kotlin’s stance over this proposal.

I followed the pitch thread, read the proposal, but did not review the actual proposed implementation PR.

adellibovi · January 14, 2020, 8:44am

Thanks for thoughts Svein.

I want to say something about this topic as it come up also in the pitch thread.

Having poor choices about identifiers name is an already existing issue, developers are fundamentally free to name their identifier the way they want, i.e.: the compiler will not tell us if a identifier does not follow Swift API guidelines, therefore if we want to have something unreadable we already can.

The proposal wants to give an opportunity to developers and teams to weight different options and take what feels a meaningful and readable choice for them, whatever that choice it is.

xwu · January 14, 2020, 11:25am

These clarifications are very sensible and greatly change my evaluation of what’s being proposed. However, the draft implementation does not suffice on its own as documentation of the proposed behavior and all of this information should be in the proposal itself.

It is critical for the evaluation of this proposal that it is laid out clearly in prose form what, in the end, will be an allowed operator or non-operator identifier.

sveinhal · January 14, 2020, 12:31pm

Yes, I understand that. And I feel that it is a bad idea, hence my feedback. I am well aware that people are already free to write ugly code, but this will make it easier to do so, at very little benefit as far as I'm concerned. I've yet to see convincing (to me) examples of code that would benefit from this proposal.

I understand that you and others feel differently about this issue.

Pampel · January 14, 2020, 1:58pm

I like what this pitch wants to achieve, but -1 the current implementation - perhaps some examples would make me more positive but my feeling is that functional code that liberally used this feature would quickly become a mess of backticks and become quite ugly/confusing, and where it was used sparingly, escaped calls/definitions would look jarring and inconsistent with everything else.

Test code is one place where I can definitely understand the benefit - but even there, I can only see people really using it for the names of their test methods, and those are rarely referenced from other code, which sort of defeats the point.

There's also the benefit of referencing operators, but there might be better ways to enable that.

michelf · January 14, 2020, 2:10pm

I still have unanswered questions about operators from my previous post. Namely, do you need to use the labels when calling the operator directly like you'd do when calling a function?

// given this
func +(lhs: Int, rhs: Int) -> Int
// which one is valid?
let three = Int.`+`(1, 2)
let three = Int.`+`(lhs: 1, rhs: 2)

Normally, parameter labels are enforced for regular functions, but at least up to now have been meaningless for operator functions. Does this proposal make them meaningful when called directly?

Also, are these declarations allowed?

class `%` {}
struct `...` {}
enum `=` { case `?` }
let `+`: (`=`, `%`) -> `...`
func `*`(`#`: Int, `@`: Int) -> Int

I'm not suggesting anyone do this, but I wonder if you can actually use operator-like identifiers everywhere.

adellibovi:

I was playing around with BDD and you could do (take it as a proof of concept) something like this:

    func `test account has sufficient funds`() {
        given(`the account balance is`(100.dollars)) {
            and(`the card is valid`)
        }
        .when(`the account holder requests`(20.dollars))
        .then(`the account balance should be`(80.dollars))
    }

That seems like a lot of boilerplate to write a test that is basically like this:

    func `test account has sufficient funds`() {
        account.balance = 100.dollars
        card.valid = true
        account.holder.request(20.dollars)
        XCTAssertEqual(account.balance, 80.dollars)
    }

This might just be unfamiliarity, but I had to search and find this before I could understand your code and rewrite it to something familiar. Even then, I'm not too sure if those pseudo-control-flow functions actually do anything (given, when, and then) or why they need to be chained.

You're writing an intermediary API sitting between your actual tested API and the actions you want to test. It might be nice that usage of this intermediary API reads like prose with actual spaces, but the whole concept of testing using a an intermediary layer seems counterproductive to me.

griotspeak · January 14, 2020, 5:02pm

Thank you for your response! I wasn't actually too concerned with the implementation complexity, though. It's things like "even if the parser allows you to potentially declare something, it does not necessary mean that semantically you can."

This is a fundamental change in how we should think about identifiers and explaining them because "many" things are valid but many aren't. This could just be that I'm viewing this as a change instead of coming to it fresh. (I suffer from this with access modifiers)

allevato · January 14, 2020, 7:24pm

Conceptually +1, and I had it in the back of my mind to propose something similar for a separately motivated use case.

But, I think the current proposal leaves a lot open to interpretation as it's currently written which is potentially causing confusion. There was a lot of good discussion in the original thread that went into more specific details about edge cases and that didn't get captured in the proposal write-up, which is unfortunate:

Which, if any, identifiers are still banned, even when escaped, and what is the rationale for that? (For example, it appears that $-prefixed identifiers are still reserved for compiler/debugger use.)
When an identifier contains both operator and non-operator code points, how does the compiler determine whether it is an operator or a regular identifier? (It appears that it is an operator if all code points are operator code points.)
The grammar for escaped identifiers excludes the backtick (U+0060) as a valid code point. Is it worth discussing, perhaps as a future direction, the ability to include backticks in identifiers by using a syntax inspired by raw string literals (e.g., #`this`has`some`backticks`#)?
Joe Groff's point about how a future type-lookup-by-name API would handle these identifiers is an important one. The proposal doesn't have to offer a complete solution now, but it should acknowledge it and briefly discussing possible approaches (for example, having the API understand backticks and parsing them out of the string the same way the compiler would parse the identifier) so that it's clear that this proposal doesn't completely block any future progress in that area.
The proposal's motivation only focuses on the test use case. Other use cases were mentioned in the discussion thread that (arguably) would strengthen the motivation, such as generated names for accessors of assets that start with non-identifier characters (`10_circle`). Indeed, when writing code generators, being able to escape such identifiers is a much better solution than mangling them, because mangling them sets off a chain reaction—you must then ensure that the mangled name doesn't collide with some other identifier in the same scope, ad infinitum.

Having this be clear in the proposal is important not just for review purposes, but because the proposal serves as future documentation/specification for the feature later on if it is accepted.

Which code points are allowed

The proposal as written currently allows any Unicode code point other than U+000A, U+000D, and U+0060 inside an escaped identifier. In my opinion, this proposal should allow users to escape more than the existing set of identifier and operator code points for use in identifiers, but I also think there's room to discuss further reasonable exceptions.

As one example, the colon (:) is not currently permitted as an identifier or operator code point, but I would like to allow it because my own use case depends on it (being able to name a module something of the form `//foo/bar:baz`).

So in general, I don't think we should arbitrarily forbid printable meaningful characters just because they aren't already valid operator/identifier code points. Backticks currently signify "this sequence of code points is not currently a valid identifier because it has meaning as a reserved word, so I need to escape it", and I view this proposal as generalizing that statement to "this sequence of code points is not currently a valid identifier ~~because it has meaning as a reserved word~~, so I need to escape it".

With regard to such printable characters, I would be disappointed and less likely to support its acceptance if it were saddled with additional restrictions, like "only existing operator/identifier code points are allowed, but also space", or "these printable characters, but not these". If a character or sequence does need to be banned, it should be backed by technical rationale (such as $-prefixed identifiers being reserved by the compiler/debugger), not by subjective aesthetic calls.

With that being said, there are also a number of unprintable/control code points that probably ought to be discussed and likely omitted. Other folks replying here have already mentioned other Unicode line/paragraph separators. I certainly don't think we want an identifier with embedded nulls. What about the other U+0001–001F code points not already mentioned, or U+007F–0+009F? Should other Unicode spaces be permitted (U+00A0, U+2000, etc.), or just U+0020?

Yes. In addition to the author's original motivation of improving test method naming and making it possible to refer to operator functions by name, I would love to follow this up with some frontend changes that allow me to give Swift modules names that match their Bazel target labels. This would be a huge win for the usability of Bazel builds in and out of Google.

Yes. Swift already goes farther than many languages used today with respect to Unicode support in language identifiers. This feature would give us a lot of new capabilities:

Cleaner test method names vs. camelCasing
Ability to reference operator methods by name
Fewer Anglo-centric constraints around identifiers (one poster in the original discussion referenced French and other Romance languages where apostrophes are commonplace as parts of words)
A better "escape hatch" for code generators and other names compared to ad hoc mangling schemes

Some reviewers argue that some or all of the above capabilities could be had without opening the door to the much wider set of Unicode code points in an escaped identifier. That may be true, but I don't find the argument that we'll end up with a significant amount of unreadable or ugly code to be all that compelling—it simply doesn't align with the reality of what we've seen in 5+ years of Swift.

From day one, Swift has supported emoji in identifiers (to the degree that it's shown off in the official documentation!), but we haven't had an epidemic of libraries released where their usage depends on writing strange hieroglyphs in your code. Similarly, you can make incredibly confusing names today by swapping out characters that look the same but are semantically different, like Latin uppercase A and Greek uppercase Alpha, but this doesn't happen in reality. Even if we limit ourselves to 7-bit ASCII, no language or compiler that I'm aware of prevents the identifier O0O0O000OOOO0 or III11ll1lIIlIl11IlI1.

I trust most programmers to make sensible, grown-up decisions. If this proposal is accepted and one of your teammates or contributors tries to name a method `*!@$%`, squash it in a code review. If you want to have tighter rules about identifiers in your own code base, apply a linter rule. If some developer releases a library that is hard to use because it has poorly named escaped identifiers, then that library is unlikely to see much adoption, so that problem essentially solves itself. I'm not sure what exactly we need to be protected from, but it's not something that this proposal will be responsible for.

I would hate to see the number of legitimate and good use cases that this proposal would enable blocked because of a hypothetical fear that programmers are going to lose their sensibilities when it comes to naming conventions.

No; I wasn't aware F# or Kotlin had similar features, but I'm glad that there is prior art there.

Participated in the original discussion thread and on the PR.

Avi · January 14, 2020, 7:46pm

This.

Douglas_Gregor · January 15, 2020, 5:45pm

-1. I'm against strongly this proposal for several reasons:

It makes basic tooling harder to build, because the simple notion of "what's an identifier that can name things in Swift" is no longer simple. GitHub syntax-highlighting var in the var `some var` = 0 example from the proposal is one example. If you double-click on either of the works within the back ticks, there is no editor or tool out there that will properly select the whole identifier. And because back-ticks aren't a balanced set of delimiters, you literally have to scan to the beginning of the source line to determine what is the "identifier".
Putting unstructured text within the back-ticks cuts off any future evolution that might put more structured information within back-ticks. For example, providing the ability to name zero-parameter functions or to name the accessors of properties as a function entity, e.g., x.`getter:property` or ``x.zeroParameterMethod(), which cannot be named today. Similarly for subscripts. Some of this was discussed a long time ago, but still seems like a good direction for the language. We shouldn't cut off future directions for a small win.

No. There are two motivating use cases as far as I can tell:

Having spaces in the names of test functions. This is a small convenience, and we can probably improve this case in another way.
Being able to reference an operator by name, e.g., Int.+` . This one could be addressed by allowing Int.+ in the grammar. It's more discoverable and should be straightforward.

Doug

dlbuckley · January 15, 2020, 10:10pm

I'm with @Douglas_Gregor and a few others on this one, I'm -1.

I've always viewed the back-tick solution as a backup feature that was an ugly but necessary evil for very certain edge cases. This proposal brings it to the forefront as a 1st class language feature where I think a more elegant solution could be found.

I can see the one thing that everyone likes is the ability to do Int.'+' (replace single quotes with back-ticks), and I agree that's a really nice thing to have. But like @Douglas_Gregor said, it's nicer to have it without the back-ticks at all. I can't comment on the implementation side, but from a user point of view it's discoverable and makes sense whilst also feeling swifty.

I have no experience of using test cases with spaces like the example, but while I see the appeal at the call site the declaration itself is very convoluted and difficult to parse when skimming through code. I think a better solution is out there for this, possibly something along the property wrapper lines but for functions, but for me back-ticks solution isn't it.

I like the desire behind the proposal, but not the solution.

phlebotinum · January 15, 2020, 10:34pm

-1

a) Snake case is perfectly fine for the case with the test-method names.
b) Back-ticks are okay if unavoidable for edge cases. Better is without.
c) Referencing operators can be solved separately.
+) Back-ticks hurt readability and are a complication.

Chris_Lattner3 · January 15, 2020, 10:49pm

-1. I don't believe the proposal as written would be a good step forward for Swift.

Aspects of it are, and I think improvements can be made here, I just am not in favor of this specific proposal.

From a technical position, as far as I can tell, the proposal cannot be implemented. It suggests:

This proposal wants to extend the current grammar for every escaped identifier (properties, methods, types etc...) by allowing every Unicode scalar.

...but of course, backtick shouldn't be allowed inside the backticks :-).

My more general concern is that this is taking a very important grammar production and throwing the door way open, completely occupying the space that may be important for other reasons. Taking this proposal would mean that it is impossible to introduce a grammar into the backquotes to refer to aggregate names (e.g. the getter on a decl), for example. It isn't clear that we would want to do that, but that's the thing about the future: it is hard to predict.

I would be more favorable of specific limited expansions, e.g. if it were important to allow a horizontal space in backticks then we could consider that. OTOH, I consider that to have very marginal value over using underscore: I don't see how:

func `foo bar`() {
}

is better than:

func foo_bar() {
}

Finally, if such an expansion of the grammar takes place, we should incorporate something about this into the Swift API Naming Guidelines.

No.

Detailed review.

-Chris

compnerd · January 16, 2020, 4:40am

-1

This seems to add support for something which generally increases complexity of the language for a feature which seems can be accommodate in other ways.

No. The strongest motivation seems to be testing usages. This is a reasonable request, but it seems better as associated "metadata".

No.

Yes, testing with it style declarations in JavaScript offer similar level of self-documenting code with string metadata associated with the function. This feels like trying to augment the syntax for something similar which doesn't seem necessary to support the functionality.

A quick read

adellibovi · January 16, 2020, 7:34am

Hi Chris, thanks for the feedback, I want to share a correction about this.

Backtick are indeed not allowed, in fact are specified in the proposal, in the grammar details section (I used the Unicode value, maybe that’s why you may have missed it, anyway I will take it as an amend for future proposals).
It is then, very important to clarify that the proposal can be implemented and is actually already implemented in a PR attached to the proposal together with a working toolchain.

adellibovi · January 16, 2020, 8:13am

Hi @Douglas_Gregor thank you very much for the review!

Even if this topic comes from 5 years ago, I do agree with you that it could be interesting and good for Swift.
Without entering into the detail of a different proposal, I would like to ask you (and @Chris_Lattner3 since he raised a similar lost opportunity) why we should necessary consider that this proposal completely cut off something like your suggestions. I do understand that it should be considered a "source breaking change", but it would also be true that it would be kind of unlikely to happen, i.e.: in order to conflict with getter:property we would need to define a func/property named that way, do you feel that we should completely avoid that, no matter if it may not happen often or, even, at all?

Github syntax-highlighting is unfortunately already broken with the current support of backtick, in fact having something like var `class` won't be correctly highlighted (as it is considered as keyword), while this should not be read as: it is already bad, so it is not important. I believe that extending support and therefore increase usage of the back-tick syntax may actually help prioritize and improve syntax-highlighting.
It is worth to mention, to make it clear for all the readers, that obviously this won't apply to Xcode, which will natively support, in case the proposal will get accepted.

Thanks again,

Lantua · January 16, 2020, 6:45pm

I second this. I think this proposal should be updated a few more times.

It also doesn't explain very well that ` is still not part of the identifier. Should the escaped identifier without ` be a valid identifier, it will be treated as that identifier, and should the escaped identifier without ` be a valid operator, it will be treated as that operator. This is an important part of the design that we probably want to make clear that it is not changing.

I also agree with this. And to add that we might accidentally include many more characters that could be annoying to work with. Whitelisting might be better than blacklisting in this case.

broadway_lamb · January 16, 2020, 7:49pm

-1. I think the problem that this proposal aims to solve is not on the language side, but rather on the testing framework side, and therefore should be solved by different means.

If we did go with the language-level change, I'd rather prefer this issue to be addressed using custom attributes:

@TestDescription("validation should succeed when input is less then ten")
func testValidation() {
    // ...
}

Besides, I argee with @Douglas_Gregor regarding syntax highlighting, which will be broken. It is already hard enough to properly highlight Swift code (even today GitHub fails to do it in some cases, e. g. with contextual keywords).

I believe the problem described in the Motivation section is not broad enough to be solved with such a radical syntax change. Like people already mentioned, there's nothing wrong with using underscores instead of whitespaces. I personally have never found a need to use whitespaces in my testing methods, and I'm quite an active XCTest user.

IMO, it does not.

I have not.

I've read through the text of the proposal.

adellibovi · January 16, 2020, 11:07pm

Thank everyone for the important feedbacks you all have been sharing.
I opened a PR to amend some changes in order to add clarity on the doubts that have been raised during these days.
You can find the full diff here: Amend changes to SE-0275 by adellibovi · Pull Request #1113 · apple/swift-evolution · GitHub

numist · January 16, 2020, 11:07pm

There's some interesting prior art here in SQL; the language standard allows identifiers to be arbitrarily named—including whitespace—requiring double-quotes around identifiers and single-quotes around string literals. Engines tend to allow the omission of double quotes for identifiers that match [a-zA-Z0-9]+ (as this proposal must imply for source compatibility), and some engines also allow double quoted string literals when there is no ambiguity.

In my experience this permissiveness (and complexity—the rules for how a token may be interpreted is now three levels deep!) adds far more confusion than benefit and I feel like it is not in Swift's best interests to support relatively arbitrary symbol naming.