Personal Background
Hey everyone,
I am currently a sophomore studying Computer Engineering at Georgia Tech. This summer, I will be interning at Apple in the Cupertino office on the CoreOS performance team. I will primarily be working with C++ and Swift.
I am really interested in the “code completion for keywords” project since the possibility of having my changes improve the Xcode completion feature is exciting to me. Additionally, I really enjoyed the automata & formal languages course at my school, which is why working on a practical application of a syntax tree is appealing to me.
Description
Currently, code completion in the Xcode IDE works partly due to the file CodeCompletion.cpp
. The issue with this solution is that it requires us to keep track of keywords which are currently available manually.
On the other hand, predicting keywords by leveraging Swift's Grammar rules is a better solution. This is because PR 1014 shows in file TokenChoicesFile.swift
how we can programmatically create a new file which extends KeyPath
and allows us to assign specific tokens in the tokenChoices
variable to each type of Syntax Node. The advantage of this approach is that since the specific tokens are auto generated, any change to the Swift language is automatically accounted for since each case
statement is built from the underlying grammar.
The problem with this approach is that it does not take into account the context in which the cursor is currently located. For example, consider the following (simplified) productions from the official Swift grammar:
- function-body → code-block
- code-block →
{
statements ?}
- statements → statement statements ?
- statement → declaration
;
? - declaration → protocol-declaration
Here is an example derivation using the rules above: function-body → code-block → { statements } → { statement } → { declaration ; } → { protocol-declaration ; }
We were able to turn the function-body
into a protocol-declaration
which is grammatically correct in Swift. However, even though there is a valid parse tree, it is not valid Swift code. Therefore, the problem with using the Swift-Syntax for code completion is that just using syntax rules overlooks the other rules in Swift (such as rules for scope and type checking).
Naive solution by altering AST
The code-block
from the Swift grammar is implemented by the CodeBlockItem
type. This type can be reused in many places throughout the parse tree, from all the way at the top level, to within function bodies.
Since the CodeBlockItem.item
can accept any DeclSyntax
, the naive solution here may be to extend CodeBlockItem
into objects like FunctionCodeBlock
or ProtocolCodeBlock
which have additional restrictions on which types of statements may be allowed inside them.
However, I believe that this is not a good solution since it would alter the underlying AST, which would mean huge overhead in updating all generic code-blocks
to be more specific towards their scope. Additionally, this would have very high chances of breaking the whole language since we are essentially modifying and creating new "intermediate" grammar rules.
A key insight here might be that the AST is always an OVER approximation. This means that we only need to take away false positives, rather than worrying about missing suggestions altogether. If we can find a way to:
- Define all the context / scope rules which Swift follows (other than grammar rules) in a rigorous way.
- Encode the Syntax Nodes with the new rules, so that as you traverse the AST from the root to the cursor location, the static information at each node tells you which keywords to allow / disallow when going further.
then we can start with a list of all keywords available at the top level and modify that list as we visit new nodes.
Need For Semantic Analysis
Consider the sentence The cat barked at the mailman, while the tree walked the dog in the park.
Even though it is grammatically correct, it does not make any sense. This problem is analogous to the problem of the protocol nested inside of the function-body
. Since we are only checking for syntax / grammar rules, we overlook other, non-grammatical, rules of English.
One interesting observation is that even though Xcode will suggest the protocol
keyword within a function as if it were valid, it will highlight it in red after you actually type it out.
The reason why the IDE is able to catch the error, even when it predicted the wrong keyword, is because Xcode preforms some type of background / incremental compilation process as the user types. The nested protocol fails the semantic analysis phase of this compilation process, and thus the error is thrown. If we can "model" this semantic analysis phase within the 1014 PR, we can eliminate extraneous suggestions. I see two main approaches that can be used for this issue.
Approach 1: Additional Metadata / Information on Syntax Node
As I briefly mentioned above, we can modify each type of Syntax Node to contain static information about which keywords are enabled / disabled past that node in the AST. The advantage to this solution is that if the semantic rules of Swift can be found (via documentation), it would be very easy to implement.
The downside to this solution is that it is essentially hardcoded. Changes to the Swift language will require updates to the static information on each node, which diminishes the value of adding the Swift-Syntax based completion in the first place. Additionally, if there is no good documentation on all the semantic rules of Swift, it will not be straightforward to implement.
Approach 2: Compiler Based Solution
This second approach is more exciting than the first, but theres one problem which I will discuss later. This approach works because syntax analysis and semantic analysis are the first two phases of the compilation process. The 1014 PR is already doing the syntax analysis phase and is only missing the semantic analysis in order to generate accurate results.
If we can find a way to incorporate the semantic analysis phase provided in the Swift compiler, we can have a auto completion feature which will not require any manual changes to any internal list of completions or node metadata. With this solution, if Swift were to change (along with its compiler), the completion feature would automatically adjust since it would use that underlying compiler to filter its results.
One way to incorporate the compiler is to get the current over-approximated list of keywords and try each word. If the compiler produces errors, we can eliminate that possibility.
The only downside of this solution would be speed. I am not sure how fast these "mini-compilations" can occur in a large codebase, so perhaps a lightweight version of the Swift semantic analyzer would be needed (since we don't require type checking and other semantic checks for keyword completions).
End
Any thoughts on if I am on the right track / which approach seems more reasonable? Everyone is welcome to comment, along with @ahoppen who is the mentor for this project.