Update on the new Swift Parser

Douglas_Gregor · September 23, 2022, 4:43pm

Hi all,

It's been a month since we introduced the new Swift parser into swift-syntax. Since then, we've made some great progress, so I wanted to share some of that here:

The new parser is now linked into the compiler, with additional testing that we can enable for every single compile. There are two kinds of tests at the moment:
- Round-trip testing: this ensures that the syntax nodes produced by the new parser reproduce every byte of the input source correctly, no matter how broken the input source is. This critical testing for the robustness of the parser and syntax nodes.
- Valid-parse testing: this ensures that, if the C++ parser produces no errors when parsing, the new Swift parser also produces a syntax tree that contains no unknown, unexpected, or missing nodes (all of which imply errors). This makes sure that the new parser is covering the full grammar correctly.
The compiler's full test suite (~15,000 tests) runs with both checked modes above enabled. There are only about two failures, which are intentionally-crafted bad inputs with, e.g., a couple of hundred opening ('s in a row.
Parser recovery is improved, including handling source files with malformed UTF-8.
Added support for creating syntax nodes via string interpolation, so one can create a syntax node by interpolating bits of syntax into a string literal, e.g.,
```
let expr: ExprSyntax = 
  """
  \(node.withoutTrivia()).toggle()
  """
```
Added the SwiftOperators library, which handles operator precedence results to rewrite a syntax tree for a "sequence" expression such as x + y * z into one that shows the order of operations.
swift-format has adopted the new parser and SwiftOperators module.

Thanks to everyone who tried it out, provided feedback, and fixed bugs. Our next big leap forward is bringing up ASTGen, which will be part of the Swift compiler itself that translates Swift syntax nodes into the C++ AST, and will eventually replace the C++ parser.

Doug

David_Ungar2 · September 26, 2022, 7:23pm

Great news, indeed! It would be great to add lots of weird mistakes and check for good error messages. Or even typos that still parse but confuse the type checker...
Maybe telemetry to see how people correct errors?
(Thanks for letting me blue-sky your priorities.)

Douglas_Gregor · September 28, 2022, 1:00am

Right. We’ve got a big pile of inputs from the C++ parser test suite to work through, and the goal is that it’s very easy to go add a new special-case diagnostic when we see a new weird case.

Doug

filip-sakel · September 28, 2022, 1:47am

Really exciting updates and great work on the stability! But how about performance; has it been tested against the current parser?

Douglas_Gregor · September 28, 2022, 5:39am

Yes, we've been testing performance, and @rintaro in particular has made significant improvements here.

One recent data point: SwiftLint is seeing 0-8% speedup in optimized builds when replacing SwiftSyntaxParser with SwiftParser, per https://github.com/realm/SwiftLint/pull/4216#issuecomment-1260079022.

Doug

ahoppen · October 27, 2022, 8:47pm

We’ve made some more progress in the last month, as highlighted in the October Update Post

robnik · October 31, 2022, 6:44pm

Are there any books or other sources about compilers or language design that describe how to write a modern parser like this? I've seen a dozen "write your own compiler" tutorials or books where parsers are described as relatively simple things – like recursive descent with a bit of lookahead, then throw an error when you hit something unexpected. But then I've also seen multiple languages or compiler projects rewriting the parser at some point.

In the first post, @Douglas_Gregor mentions two points ("Resilience", and "Incremental") that sound like they could be very challenging to implement. So I wonder if you're inventing this as you go or if there are some guides about prior art that people can work from.

Max_Desiatov · October 31, 2022, 9:15pm

I personally found this article from Joe Groff on the topic to be quite illuminating: Constructing human-grade parsers.