Hey all,
As per the GSoC announcement, @jansvoboda11 is working on Integration of libSyntax with the compiler pipeline with @rintaro as his mentor. I'd like to provide more details on what the project entails and what changes you should expect to see coming to the Swift repository.
Introduction
We have been developing libSyntax
(and SwiftSyntax
as the Swift counterpart) whose purpose is to represent the syntax of Swift source code with full fidelity (including whitespace), enable structured editing, provide immutable, thread-safe data structures, and support incremental re-parsing.
Currently, generation of the libSyntax tree is not part of the compiler pipeline. It is an optionally produced additional tree that the Swift parser can generate while also producing the existing (semantic) AST, that the rest of the pipeline uses, as mandatory step. That semantic AST is completely ignored and wasted when SwiftSyntax invokes the Swift parser to get the libSyntax tree for some source text.
The goal of the GSoC project is to have the Swift parser fully embrace libSyntax and stop emitting the semantic AST. This will result in multiple benefits:
Robust architecture
Clean separation of the parsing functionality from the rest of the compiler pipeline will result in a more robust compiler architecture. Currently, instead of only focusing on parsing the Swift grammar and emitting syntactic diagnostics, the parser includes semantic functionality that muddles its role, for example name binding, local discriminator assignment, single expression closure, platform condition evaluation, and semantic diagnostics.
Decoupling semantic functionality from the parser will allow a much cleaner separation of syntactic functionality from the rest of the compiler pipeline and will allow the parser to focus on only doing the work necessary for dealing with the Swift grammar and nothing more.
SwiftSyntax parsing performance improvements
Having the Swift parser only focus on syntactic functionality and the creation of a libSyntax tree will allow the parser to be more efficient and performant. It will avoid wasting memory and CPU for creating the semantic AST and doing the semantic operations mentioned above. This is very important for the performance of SwiftSyntax parsing which is critical to have real-time, "as you type" performance, and for which every millisecond we can save counts.
Enable future work for making the compiler pipeline more suitable for interactive contexts
Once libSyntax is part of the compiler pipeline it will allow us to depend on it in interactive contexts (like in IDEs) and take advantage of libSyntax's unique properties, like incremental re-parsing, and general capability for re-using libSyntax trees without having to re-parse from scratch, and without worrying about multi-threaded access.
Upcoming Changes
The goal is to move from this existing model:
(Source) -> [Parser] -+-> (AST) -> [Sema] -> ...
| and
+-> (libSyntax) -> [SwiftSyntax, ...]
To a new compiler pipeline:
(Source) -> [Parser] -> (libSyntax) -+-> [Transformer] -> (AST) -> [Sema] -> ...
| or
+-> [SwiftSyntax, ...]
We also want to make this transition incrementally and gradually move the parser to the new model one function at a time. This will allow us to use the existing tests and ensure that every parser function that we converted is behaving correctly and without any breakage. Every change to the parser will get us closer to the ultimate goal but it will also be fully validated by our regression tests.
Until we reach the point of fully converting the parser, there will be an intermediate stage that will have two sets of parser functions, one that generates libSyntax nodes and an associated one that generates the existing semantic AST via using the transformer to walk libSyntax nodes and generate the AST. The latter function will be the one called from parser functions that have not yet been converted. Once every parser function is converted we will remove the semantic AST producing functions from the parser and have the transformer generate the AST after the parser is done parsing the grammar, as showing in the diagram above. The takeaway and the thing to keep in mind is that this intermediate stage will last a significant amount of time and the parser functions may look rather "messy" while we incrementally convert them but this is temporary until we reach full conversion.
I’m super excited about the project and the benefits it will bring, let me know of any questions or feedback you may have!