[GSoC 2024] Improving keyword completion in SwiftSyntax - Initial approach & discussion

krishnababani · March 5, 2024, 12:53am

I'm Krishna Babani, a junior year student studying Software Engineering at San Jose State University. I'm excited about the opportunity to contribute to the Swift through Google Summer of Code 2024. As a winner of the Swift Student Challenge in 2022, I have experience building creative and impactful projects using Swift. I'm really keen, on getting involved in the "Code Completion for Keywords using syntax" project for GSoC 2024. I've been diving into the swift syntax codebase. Checking out your implementation in apple/swift syntax#1014. I wanted to share my ideas on how to tackle this challenge and hear your thoughts.

As I see it the key hurdle lies in tuning the keyword completion recommendations based on context to only suggest keywords that are valid at a specific position even when the syntax tree may not perfectly reflect the grammar. Here's a basic outline of my approach:

Study the Swift language grammar thoroughly to pinpoint the rules dictating when each keyword is permissible or not from a perspective.
For every type of syntax node establish which child nodes can accommodate which keywords. For instance while a function declaration might allow keywords like throws and async in positions it may not accept others like class or import.
When generating completions at a point navigate up the syntax tree to gather information about our node type and context. Then tailor the keyword suggestions according to the guidelines, for that node type and child position.
When dealing with scenarios such, as being at the top of a file within a type body or inside a function body it's important to consider the keyword rules that apply.
It's crucial to test auto suggestions in different contexts to ensure we cover various language structures effectively. The goal is to propose all keywords for a given context while minimizing suggestions for ones.
Integrating the auto logic with sourcekitd will make it accessible in IDEs like Xcode.
We should assess the performance impact. Ensure that any additional filtering does not significantly slow down completion suggestions.

I believe that the challenging part will be aligning grammar rules with syntax tree structures during step 3. We may need to establish connections between grammar productions and syntax types.

I'd appreciate hearing your thoughts on this! Does this align with your vision? Are there any obstacles you think I might have overlooked? I'm eager to develop this idea into a proposal and collaborate on swift-syntax this summer. Please share your feedback!

Best,
Krishna Babani

ahoppen · March 5, 2024, 1:46am

Hi @krishnababani,

Great to hear that you’re interested in the project. Your outline is heading in the right direction.

swift-syntax does have information about which keywords are permissible in each location, for example here: swift-syntax/CodeGeneration/Sources/SyntaxSupport/CommonNodes.swift at 25ce3a297818076ceab8054873a6189e8b8b6688 · apple/swift-syntax · GitHub

In my opinion, one of the key problems is that the syntax tree is ambiguous about which nodes it allows in certain positions. For example a FunctionDeclSyntax contains a body: CodeBlockSyntax, whose CodeBlockItem.item can be any DeclSyntax. Thus the SwiftSyntax tree allows, eg. a ProtocolDecl inside a function. But that’s not valid in Swift and thus we shouldn’t suggest protocol as a keyword inside function bodies.

– Alex

krishnababani · March 5, 2024, 5:29am

Hi @ahoppen,

Thanks a lot, for the feedback and guidance! Your insights have really helped me understand the challenges we're facing in this project.

I've gone through the CommonNodes.swift file. Here is what I discovered:

In the file there's a list called COMMON_NODES that lists types of syntax nodes such as CodeBlock, ThrowsClause, FunctionEffectSpecifiers, Decl, Expr and more.
Each node entry provides information about its type, base type, parsing function, diagnostic name and child nodes. This gives us an overview of how the syntax tree's structured.
Some specific keyword rules are defined within these entries. For instance the ThrowsClause node includes child nodes for keywords like throws or rethrows while the FunctionEffectSpecifiers node contains keywords like async or reasync along with a ThrowsClause.
The definition of the CodeBlockItem node specifies that it can contain a Decl, Stmt or Expr which aligns with your observation about FunctionDeclSyntax permitting any DeclSyntax, within its body.
There are also definitions, for nodes labeled as "missing," like MissingDecl MissingExpr and so on which I'll need to handle when creating suggestions.

However as you mentioned these node definitions are a bit broad for the Swift grammar. They allow for some structures that aren't actually valid in the language.

To address this, I'm considering enhancing the details from CommonNodes.swift with grammar rules stored in a separate data format. This approach would help narrow down syntax tree navigation to include valid keyword suggestions based on the current context.

Specifically my revised plan involves;

Using CommonNodes.swift as a reference for the syntax tree layout and keyword regulations.
Developing a data structure to represent grammar constraints not covered by the syntax tree, such as specifying which declaration types can appear in a code block.
When generating suggestions navigating through the syntax tree based on the CommonNodes definitions but refining the outcomes using the grammar regulations.
Handling "missing" nodes appropriately since they might need handling.
Thoroughly testing completion outcomes, against the language specifications to ensure accuracy.

Do you think this strategy sounds reasonable?
I'm also thinking about whether it could be helpful to include handling for types of nodes that have intricate keyword rules, such, as function declarations. Feel free to share your thoughts on this.

I'm looking forward to delving into the gritty of the implementation and beginning the prototyping phase. If you have any ideas or advice as I move forward please do let me know!

Warm regards,
Krishna

ahoppen · March 5, 2024, 7:42am

That summary sounds correct to me. I don’t think we need handling of missing nodes as they are only used to represent invalid Swift code (eg. if you write let x = and the expression on the right-hand-side of = is missing). You can see this quite well when playing around on http://swift-ast-explorer.com.

I am interested to see what kind of constraints you will come up with – I think this is the key part of the project. As another example of things that are supported in the SwiftSyntax tree but not in valid Swift code: We allow arbitrary nesting of types, which makes inout @escaping () -> Void representable in the SwiftSyntax tree (an AttributedTypeSyntax wrapped in a SomeOrAnyTypeSyntax). But during parsing an attribute on a type must occur all the way on the left.

krishnababani · March 25, 2024, 11:50pm

Hi Alex,

I've made progress on the prototype, and I'm excited to share the results with you. The current implementation does the following:

Traverses the syntax tree using the swift-syntax library and identifies various declarations and statements, such as function declarations, class declarations, struct declarations, protocol declarations, extension declarations, and more.
Provides context-specific keyword suggestions based on the type of declaration or statement encountered during the traversal. The suggestions are determined by predefined mappings between the context and the relevant keywords.

Here's a sample of the prototype's output:

Keyword suggestions for FunctionDeclaration:
  func greet(name: String) {
      print("Hello, \(name)!")
  }
  Suggestions: throws, rethrows, async, mutating

Keyword suggestions for ClassDeclaration:
  final class Person {
      var name: String
      init(name: String) {
          self.name = name
      }
  }
  Suggestions: public, internal, private

I've uploaded the code on GitHub. You can delve into the prototype and explore it here: GitHub - babanikrishna/KeywordCompletitionPrototype

I'd appreciate your feedback to confirm if this aligns with our starting point. I'm open to your feedback and look forward to refining the proposal based on your insights.

If there's anything I've misunderstood or if you have any additional requirements or ideas, please don't hesitate to let me know.

Thank you for your continued support and mentorship.

ahoppen · March 26, 2024, 8:18am

Thanks for sharing the prototype that you have built. It produces good results in the cases you highlighted but I see three downsides with the approach you took:

As far as I can tell, it will eg. suggest throws at every single code completion position in the file if there is a single function declaration in the source file and your prototype does not take the position, at which code completion is invoked, into account.
The list of suggestions needs to be manually maintained. A goal of the project would be to automatically infer this list from the structure of the syntax tree so that new keywords in the language automatically show up. One example of this is that your implementation is missing the open access modifier, which is an easy miss.
Your approach assumes a somewhat well-formed syntax tree to start with. But in many cases when you want to complete eg. public, the declaration that you want to apply it to hasn’t been written yet. For example, when invoking code completion in an empty file, we should, among others, suggest public, struct.