Classification of Identifier

krzyzanowskim · February 25, 2019, 12:28am

Looks like Identifier tokens are not classified as SwiftSyntax.SyntaxClassification.identifier. Quick question: is this intentional? if so, why?

see:

github.com

apple/swift/blob/cb1f8194742fbd2598ed5324ff4fad1784f71b5c/utils/gyb_syntax_support/Token.py#L315


      
                             protocol='ExpressibleByColorLiteral'),
          
          
Literal('IntegerLiteral', 'integer_literal',
                  classification='IntegerLiteral', serialization_code=111),
          Literal('FloatingLiteral', 'floating_literal',
                  classification='FloatingLiteral', serialization_code=112),
          Literal('StringLiteral', 'string_literal',
                  classification='StringLiteral', serialization_code=113),
          
          
Misc('Unknown', 'unknown', serialization_code=115),
          Misc('Identifier', 'identifier', classification=None,
               serialization_code=105),
          Misc('UnspacedBinaryOperator', 'oper_binary_unspaced',
               serialization_code=107),
          Misc('SpacedBinaryOperator', 'oper_binary_spaced', serialization_code=108),
          Misc('PostfixOperator', 'oper_postfix', serialization_code=110),
          Misc('PrefixOperator', 'oper_prefix', serialization_code=109),
          Misc('DollarIdentifier', 'dollarident', classification='DollarIdentifier',
               serialization_code=106),
          
          
Misc('ContextualKeyword', 'contextual_keyword', classification='Keyword',

akyrtzi · February 25, 2019, 1:13am

Not sure if it's intentional, this was added by @ahoppen, but I noticed it as well and I believe it should tag those tokens as identifier, and have lit-test-helper optionally ignore them for its testing output (because the classification output is too noisy with them).

I intend to make such a change after I finish my WIP rewrite of the classification mechanism that makes it much more efficient.

ahoppen · February 25, 2019, 8:12am

This is intentional. I'll explain a bit how the syntax classification algorithm works and why it is necessary.

Some preliminaries

There is the syntax classification None (the Python keyword) and "None" (the string). "None" means that a token should not be syntax coloured, None states that the colouring should be inherited from the parent node. If there is no syntax colouring specified by any parent node, a colouring of "None" is assumed.

Why do we need the inheritance?

IdentifierTokens appear all over the place where they aren't identifiers as far a syntax colouring is concerned. A very simple example is the TypeIdentifier. If an identifier is determined to be a type, it should be coloured as a type and not as a plain identifier. Another interesting example is the availability version restriction where the macOS in @available(macOS 10.13, *) is an identifier but should be coloured as a keyword.

How is the issue solved?

All tokens except IdentifierToken have an intrinsic syntax colouring (i.e. a TypealiasToken is always a token, there is nothing to argue about). That intrinsic colouring can be overwritten if the Child that contains the token has force_classification set to true but that's only relevant for parentheses in string interpolation, which are weird anyway.

As described above the syntax colouring of an IdentifierToken depends on its context. In all cases where the identifier is e.g. a variable name, no context is set and thus the colouring defaults to "None" (hence has no syntax colouring as you would expect). In https://github.com/apple/swift/blob/master/utils/gyb_syntax_support/TypeNodes.py#L6 (the TypeIdentifier) we override the default for the identifier token to be a TypeIdentifier since we want the contained identifier to be coloured as a type. We cannot force the classification because the Self token should still be coloured as a keyword.
I believe, here is also a case where the inheritance of colouring goes more than one level up but I can't remember which one it was from the top of my head.

I hope that explains your question. If you'd like to see the differences, set the syntax colouring of an IdentifierToken to "None" and run the tests. That should give you an overview of where the inheritance is needed (it's quite a lot of places).

krzyzanowskim · February 25, 2019, 11:31am

Thanks @ahoppen!

As described above the syntax colouring of an IdentifierToken depends on its context.

Do you know a code snippet where this actually works? I'm trying to make it happen, though the Identifier is never classified - I guess there is never a context available? Or did I misunderstood and it not suppose to work out of the box for any case.

akyrtzi · February 25, 2019, 3:58pm

I understand the need for "inheritance" but why do we assume no syntax coloring for normal identifiers ? That should be a decision for the client, e.g. imagine a client that wants to render normal identifiers with italics. Inheriting the classification from the parent when required and tagging a normal identifier as identifier do not seem contradictory to me.

ahoppen · February 25, 2019, 4:36pm

I understand the need for "inheritance" but why do we assume no syntax coloring for normal identifiers ? That should be a decision for the client, e.g. imagine a client that wants to render normal identifiers with italics. Inheriting the classification from the parent when required and tagging a normal identifier as identifier do not seem contradictory to me.

I just mimicked the behaviour that sourcekitd previously provided in which identifiers were not syntax coloured. I assume that should be easy to fix by either
a) setting the classification of IdentiferExpr in ExprNodes.py:72 to be Identifier or
b) changing the default classification in SyntaxClassifier.swift.gyb:38 from .none to .identifier

I am leaning towards a) being the cleaner approach, but I'm not 100% sure of the top of my head. Would need to check how the results differ.

ahoppen · February 25, 2019, 4:39pm

According to the test coloring.swift lines 11 - 18 it should work in

struct S {
  var x : Int
}

In there Int should be coloured as a type while it is an IdentifierToken.

krzyzanowskim · February 25, 2019, 4:43pm

Yes, this works. I expected it work with:

func foo() {
    self.x
}

in this context, x is an identifier.

akyrtzi · February 25, 2019, 4:49pm

I see, thanks for clarifying. It was beneficial for sourcekitd to reduce size of de/serialization, which is why we avoided tagging normal identifiers, but for SwiftSyntax this doesn't apply.

As I mentioned earlier I'd like to hold on before making classifier changes, I'm close to creating a PR for changes for it being more efficient. We can make changes to tag identifiers soon after.

ahoppen · February 25, 2019, 6:12pm

I am leaning towards a) being the cleaner approach, but I'm not 100% sure of the top of my head. Would need to check how the results differ.

Just a quick follow-up on how the results differ. With approach a) obviously only variables in expressions are coloured. With approach b) among others, type names in type declarations, enum cases and function names are also classified as identifiers. If we are to change this, I would suggest hand-picking the places where identifiers should be classified as an identifier using approach a)

krzyzanowskim · June 29, 2019, 7:40pm

As far as I'm aware, we never back to tag identifiers, did we?

akyrtzi · July 1, 2019, 9:41pm

No, opened PR here: [utils/gyb_syntax_support] Add the classification kind for identifiers by akyrtzi · Pull Request #25925 · apple/swift · GitHub

krzyzanowskim · July 2, 2019, 3:53pm

awesome! thanks for handling it