Integrating libSyntax into the compiler pipeline

akyrtzi · July 3, 2019, 11:45pm

Hey all,

As per the GSoC announcement, @jansvoboda11 is working on Integration of libSyntax with the compiler pipeline with @rintaro as his mentor. I'd like to provide more details on what the project entails and what changes you should expect to see coming to the Swift repository.

Introduction

We have been developing libSyntax (and SwiftSyntax as the Swift counterpart) whose purpose is to represent the syntax of Swift source code with full fidelity (including whitespace), enable structured editing, provide immutable, thread-safe data structures, and support incremental re-parsing.

Currently, generation of the libSyntax tree is not part of the compiler pipeline. It is an optionally produced additional tree that the Swift parser can generate while also producing the existing (semantic) AST, that the rest of the pipeline uses, as mandatory step. That semantic AST is completely ignored and wasted when SwiftSyntax invokes the Swift parser to get the libSyntax tree for some source text.

The goal of the GSoC project is to have the Swift parser fully embrace libSyntax and stop emitting the semantic AST. This will result in multiple benefits:

Robust architecture

Clean separation of the parsing functionality from the rest of the compiler pipeline will result in a more robust compiler architecture. Currently, instead of only focusing on parsing the Swift grammar and emitting syntactic diagnostics, the parser includes semantic functionality that muddles its role, for example name binding, local discriminator assignment, single expression closure, platform condition evaluation, and semantic diagnostics.

Decoupling semantic functionality from the parser will allow a much cleaner separation of syntactic functionality from the rest of the compiler pipeline and will allow the parser to focus on only doing the work necessary for dealing with the Swift grammar and nothing more.

SwiftSyntax parsing performance improvements

Having the Swift parser only focus on syntactic functionality and the creation of a libSyntax tree will allow the parser to be more efficient and performant. It will avoid wasting memory and CPU for creating the semantic AST and doing the semantic operations mentioned above. This is very important for the performance of SwiftSyntax parsing which is critical to have real-time, "as you type" performance, and for which every millisecond we can save counts.

Enable future work for making the compiler pipeline more suitable for interactive contexts

Once libSyntax is part of the compiler pipeline it will allow us to depend on it in interactive contexts (like in IDEs) and take advantage of libSyntax's unique properties, like incremental re-parsing, and general capability for re-using libSyntax trees without having to re-parse from scratch, and without worrying about multi-threaded access.

Upcoming Changes

The goal is to move from this existing model:

(Source) -> [Parser] -+-> (AST) -> [Sema] -> ...
                      | and
                      +-> (libSyntax) -> [SwiftSyntax, ...]

To a new compiler pipeline:

(Source) -> [Parser] -> (libSyntax) -+-> [Transformer] -> (AST) -> [Sema] -> ...
                                     | or
                                     +-> [SwiftSyntax, ...]

We also want to make this transition incrementally and gradually move the parser to the new model one function at a time. This will allow us to use the existing tests and ensure that every parser function that we converted is behaving correctly and without any breakage. Every change to the parser will get us closer to the ultimate goal but it will also be fully validated by our regression tests.

Until we reach the point of fully converting the parser, there will be an intermediate stage that will have two sets of parser functions, one that generates libSyntax nodes and an associated one that generates the existing semantic AST via using the transformer to walk libSyntax nodes and generate the AST. The latter function will be the one called from parser functions that have not yet been converted. Once every parser function is converted we will remove the semantic AST producing functions from the parser and have the transformer generate the AST after the parser is done parsing the grammar, as showing in the diagram above. The takeaway and the thing to keep in mind is that this intermediate stage will last a significant amount of time and the parser functions may look rather "messy" while we incrementally convert them but this is temporary until we reach full conversion.

I’m super excited about the project and the benefits it will bring, let me know of any questions or feedback you may have!

beccadax · July 4, 2019, 2:51am

This is super-exciting! Two questions:

Are there any plans to systematically test that the new parser/transformer doesn’t produce different ASTs from the old parser? For instance, I could imagine comparing the output of -dump-parse on some large corpus of Swift code to snapshots of the output before the refactoring.
In our grand tradition of bikeshedding: The box labeled “Transformer” will actually be called “ASTGen”, right?

dan-zheng · July 4, 2019, 3:31am

Seeing "Transformer" immediately reminded me of compiler plugins à la Scala.
Is it possible to support a syntax-transforming compiler plugin system during the Transformer stage?

jansvoboda11 · July 4, 2019, 12:37pm

I think that existing tests validate that pretty well. If we were to generate a different AST, Sema tests would start failing. Adding a separate layer for testing AST specifically might be a good idea though, I might look into it a bit more, thanks!

As for the final name of the transformer, I'm open to suggestions and more bikeshedding

jansvoboda11 · July 4, 2019, 12:47pm

While theoretically possible, I feel like such plugin system can be emulated by performing analyses/transformations on the source code with SwiftSyntax before the compilation step. Do you have in mind any use-cases where this wouldn't be sufficient?

dan-zheng · July 4, 2019, 1:30pm

"Just using SwiftSyntax" is a good point. I suppose a fully-integrated plugin system has advantages: plugin authors could inject custom plugins during any phase of compilation (Syntax, AST, typed AST, SIL, etc), users can use multiple plugins, etc. All of this could be designed to work in Swift.

From Scala plugin docs:

A compiler plugin consists of:

Some code that implements an additional compiler phase.

Some code that uses the compiler plugin API to specify when exactly this new phase should run.

Additional code that specifies what options the plugin accepts.

Use cases:

Popular compiler plugins (as of 2018) include:

Alternate compiler back ends such as Scala.js, Scala Native, and Fortify SCA for Scala.

Linters such as Wartremover and Scapegoat.

Plugins that support reformatting and other changes to source code, such as scalafix and scalafmt (which are built on the semanticdb and scalahost compiler plugins).

Plugins that alter Scala’s syntax, such as kind-projector.

Plugins that alter Scala’s behavior around errors and warnings, such as silencer.

Plugins that analyze the structure of source code, such as Sculpt and acyclic.

Plugins that instrument user code to collect information, such as the code coverage tool scoverage.

Plugins that add metaprogramming facilities to Scala, such as Macro Paradise.

Plugins that add entirely new constructs to Scala by restructuring user code, such as scala-continuations.

Sorry to veer off-topic about plugins: a general plugin system isn't directly related to integrating libSyntax and involves more things (related: SIL transform in Swift, C++ interoperability). But it may be nice (if it makes sense) to design new compiler phases like Transformer to be amenable to eventual support for plugins. I don't think that would require too much (enable plugins written using SwiftSyntax, support plugin registration, support plugin options).

DaveZ · July 4, 2019, 1:53pm

If the syntax nodes are comprehensive, will source locations/ranges be removed from the AST nodes? (Presumably, right?) Will back pointers exist from AST nodes to syntax nodes?

Can you also elaborate more on how this work will be broken into stages?

akyrtzi · July 4, 2019, 4:06pm

+1, almost everything depends on the AST so if something is not right I have more confidence that the tests+validation-tests will catch it than -dump-parse, which is a high level view of the AST and doesn't include every little bit of the AST nodes.

akyrtzi · July 4, 2019, 4:15pm

That is something we want to do eventually but changes to the AST are out of scope for the GSoC project, changing the Parser is already a significant amount of work.

You can see an example of converting parseTypeSimpleOrComposition. Before such kind of changes can be made the transformer will be introduced and made available. Though note that the transformer will be not be fully formed from the beginning, it will be extended as more parser functions are converted.

akyrtzi · July 4, 2019, 5:53pm

You can see the initial PR by @jansvoboda11.

DaveZ · July 5, 2019, 6:22am

Interesting. I assumed that the AST refactoring would come first because it would make introducing a "Transformer" stage easier. If the GSoC project won't tackle the AST changes, then when might it get done?

akyrtzi · July 5, 2019, 8:39pm

The Parser should be fully converted first before considering changing the AST.

akyrtzi · October 15, 2019, 11:08pm

Hey all, I'd like to give an update on this. @rintaro has picked up and continued development on this since Jan's GSoC project concluded, but unfortunately finishing this project is going to take significantly longer than we anticipated. There are other more urgent tasks that require our attention and focus.

We've made the difficult decision to pause development on this for now and resume it at a later point. We've put the existing parser refactoring changes in a separate branch and reverted them from master. We needed to revert from master because the changes were causing serious performance regression which is not avoidable until we complete the parser migration.

We are intending to periodically keep the new branch working so that we can easily resume this project at a later point.