A New Swift Parser for SwiftSyntax

Douglas_Gregor · August 22, 2022, 5:36pm

Hello Swift Community,

SwiftSyntax is a SwiftPM package that allows one to parse Swift source code into a syntax tree, manipulate that tree, and render the tree back to source code. It is used by tools such as SwiftLint and swift-format that operate on Swift source code. The parsing capabilities of SwiftSyntax (in the module SwiftSyntaxParser) are provided by a C++ library that is tightly coupled with the rest of the Swift compiler, which means that the SwiftSyntax parser is dependent on a specific Swift toolchain, complicating its use.

We have started a project to reimplement the Swift parser in Swift, to become part of the SwiftSyntax library. The current implementation can parse much of the Swift grammar already. Check out the quickstart documentation if you want to try it out.

Goals

The main goal of this project is to fully replace the C++ implementation of the Swift parser for all clients. There are two major stages in this plan:

Replace existing SwiftSyntaxParser: The reliance on the C++ library for parsing makes SwiftSyntaxParser hard to use in tools and applications, requiring a complicated bundling process. The first goal of this project is to deprecate the existing SwiftSyntaxParser module and replace it with the new implementation. Once the new parser is fully implemented, existing clients of SwiftSyntaxParser should be able to use the new parser (from the module SwiftParser) fairly easily. Swift 5.7 is the last release that is guaranteed to include SwiftSyntaxParser.
Adopt in the compiler: The eventual goal for this new parser is to replace the C++ parser within the Swift compiler, and all tools built on it. This involves creating a new component in the Swift compiler that walks the SwiftSyntax tree produced by the new parser and creates the corresponding C++ AST nodes. This step necessarily depends on allowing Swift code in the Swift compiler, which is the subject of a separate discussion, and requires goal #1 to have been completed successfully before it can make meaningful progress.

Design philosophy

We are not interested in rewriting the Swift parser just for the sake of adopting Swift. Rather, we have a number of specific goals for the new parser that themselves warrant a rewrite. The new parser follows a design philosophy that makes it more amenable to tools:

Resilient: The parser will attempt to recover from syntax errors, maintaining as much of the program structure as is feasible. It has no side effects, and in particular produces no errors regardless of how ill-formed the input source text is. Instead, all errors are described in the syntax tree itself, and can be diagnosed by a separate pass that identifies such errors. These errors come in one of two forms:
- Unexpected nodes: syntax that doesn't match any part of the Swift grammar is kept in an “unexpected” child node, which are placed in the syntax tree and can be queried by clients.
- Missing tokens: syntax that is required by the Swift grammar but isn't present in the source code will be recoded in the resulting tree as "missing" tokens, which the parser will introduce. For example, a missing ')' in a function declaration will be inserted by the parser as a missing token. Such tokens will be skipped when rendering back to the original source code, but can also be used by tools to provide fixes for the source code.
Efficient: The parser should provide similar parsing performance to the existing C++ parser implementation that it seeks to replace.
Source-preserving: SwiftSyntax is designed to maintain all “trivia” (including whitespace, comments, etc.) precisely as it occurs in the source text, so that a syntax tree can be rendered back into text that is byte-for-byte identical to the original source. The parser must maintain this property, regardless of whether the input text was well-formed Swift code.
Minimal context: The parser requires minimal context to parse Swift code, which consists of only those things required to handle a suitable Swift dialect, e.g., whether regex literals are supported. The parser can be invoked on any input source code, starting at any major production in the grammar (e.g., full source file, an individual type, an individual expression).
Incremental: A parse tree produced for a source file can be incrementally updated for a new version of that source file, reusing syntax nodes where possible to reduce computation overhead and memory.

Get involved!

The new parser is under active development, and we’d love help! You can experiment by pulling the latest SwiftSyntax and trying some code as shown in the quickstart guide. Testing is extremely important for such a central component. If you’d like to help out, you can run swift-parser-test on your own Swift code and report any bugs you find, optionally reducing them down to smaller test cases first or even taking a shot at fixing parser bugs directly. If you’re ready to dive into development, check out our implementation status page for ideas.

[EDIT: Updated links now that the PR has been merged]

FranzBusch · August 22, 2022, 7:16pm

Love to see this happening. Do you have any plans for introducing SemVer versioning to SwiftSyntax?
Now with the toolchain dependency dropped this ought to be possible, right? This is becoming more and more important because SPM plug-ins are depending on it and will make it super easy to run into version conflicts without SemVer

taylorswift · August 22, 2022, 8:12pm

this is so badly needed, as swift-package-factory depends on swift-syntax, and it’s still really painful to set it up on macOS for the exact reasons you’ve highlighted. (linux is a lot better but still could be streamlined.)

Add a Swift parser library that is written in Swift by ahoppen · Pull Request #616 · apple/swift-syntax · GitHub cannot get merged soon enough. and a kudos to @ahoppen for all his work on swift-syntax. the amount of progress this library has made in the past few months is staggering.

taylorswift · August 22, 2022, 8:20pm

can we get a semver tag added to the repository? this will make it usable from SPM, because right now SPM cannot handle plugin dependencies that themselves depend on an arbitrary branch.

ahoppen · August 22, 2022, 8:27pm

I would say tagging SwiftSyntax using SemVer is definitely a goal. I don’t want to make any promises when this will happen though, because it doesn’t make sense to lock ourselves into an API and structure of the syntax tree if it continues to change at the current pace to make it fit the parser’s needs.

taylorswift · August 22, 2022, 9:05pm

to me, this is less about conforming to SemVer’s expectations (which everyone already flouts anyways) and more about working around (what i consider) a bug in SPM. so there is value in tagging versions even if you are just going to break them in a few weeks.

please consider adding tags to the repo, even in its current state.

gwendal.roue · August 23, 2022, 12:47pm

Hello @Douglas_Gregor,

I guess this package has many intended use cases. Is one of them the ability to develop reusable and configurable SPM plugins that parse some code and do whatever they want (such as generating new code for example)? Or do you think SwiftSyntax is NOT the correct tool for such a goal?

Karl · August 23, 2022, 1:04pm

IIUC the compiler has a library architecture precisely so that components may be integrated in to other applications. The same applies to other parts of the toolchain, such as SwiftPM - it also has a library architecture and can be integrated in to other applications.

Otherwise, those plugins or other applications would just have to write their own parsers, which may not be as accurate, or performant, or may not stay up-to-date as the language evolves. Having these libraries also makes it easier to write great tools which benefit Swift developers, and lets those plugin authors focus on the features that matter to their plugins.

gwendal.roue · August 23, 2022, 1:19pm

Yes, but it is not the same to ship a parser that was developed for a particular compiler version, where everything is tied together and works as a whole, and ship a parser that is supposed to work with a certain amount of future compiler versions. The former is enough from the point of view of the compiler. I will not take the latter for granted.

I'm not talking about parsing future language versions (although I expect "unexpected nodes" and "missing tokens" to provide a certain amount of forward-compatibility).

I'm talking about successfully building a package with swift-tools-version:5.8, and a dependency on SwiftSyntax for Swift 5.8, with a later compiler (Swift 5.8.1, Swift 5.9, and maybe more): this is a scenario where the version of SwiftSyntax lags behind the compiler version. Since this scenario goes further than what is strictly needed for a compiler, I feel the need to ask.

If this scenario is not supported, package plugins would need to have their dependency on SwiftSyntax depend on the compiler version, and this would make them very difficult to maintain, and frankly a PITA from the point of view of their end users (read: I wouldn't use them if a minor Xcode upgrade breaks a build)

igor-makarov · August 23, 2022, 5:35pm

What are the implications of this on the question of adding macros to the compiler?

As far as I know, Rust macros operate on the syntax token level.

gwendal.roue · August 23, 2022, 5:58pm

That's exactly what's in my mind ;-)

taylorswift · August 23, 2022, 6:47pm

these are all valid points, except for the // swift-tools-version: which is not really involved here.

using a plugin that does not match the version of the toolchain you are using is not correct usage. so source generators should never run (or ideally, be built at all) as part of a CI matrix, or be enabled by default for end users. this precludes using them as buildTools; all source generators should run as command plugins, using them otherwise is fundamentally unsafe.

right now in swift-json, i am using:

#if swift(>=5.8) && (os(Linux) || os(macOS))
    plugins.append(.package(url: "https://github.com/kelvin13/swift-package-factory", 
        branch: "swift-DEVELOPMENT-SNAPSHOT-2022-08-18-a"))
#endif

in the package manifest. this effectively requires having the swift-DEVELOPMENT-SNAPSHOT-2022-08-18-a installed in order to develop (as opposed to just use) the package. this makes it harder to contribute to the project, and ideally SPM would be able to detect the currently-installed toolchain, and choose the appropriate tag of swift-package-factory to include as a dependency. but right now SPM cannot do this.

related: Allow SPM to specify a toolchain dependency

gwendal.roue · August 23, 2022, 7:08pm

The problem with SPM packages with such precise dependencies is that it is impossible to publish them for public consumption, and expect a good user experience for the package users.

A good user experience involves 1) supporting a reasonable range of Xcode/compiler versions, and 2) not triggering SPM conflicts because sub-packages declare over-precise dependencies that can not be compared and merged.

My initial question was about 1).

Semantic versioning is the expected answer to 2).

taylorswift · August 23, 2022, 7:17pm

correct. this is why i believe that buildTool is bad practice, and source generation should:

run as a command tool
emit sources that are checked into the repository
be disabled by default

ideally end users should never have to build the plugins, or even know about them in the first place. the #if swift(>=5.8) && (os(Linux) || os(macOS)) hoops are there because SPM does not support build configurations, and cannot detect the version of the toolchain.

this is why i pitched an alternative dependency type in the other thread, because plugins shouldn’t be transitive since they aren’t part of any compiled product. it should be perfectly valid for one package to depend on swift-DEVELOPMENT-SNAPSHOT-2022-08-15-a and another package in the same build graph to depend on swift-DEVELOPMENT-SNAPSHOT-2022-08-18-a, as long as the plugin never runs at compile time.

i don’t think semantic versioning will help you with 2) because the syntax parser must always match the installed toolchain exactly (including any extant parser bugs/quirks in the compiler), otherwise you’re just asking for subtle sourcegen bugs.

FranzBusch · August 23, 2022, 7:22pm

This is getting a bit off topic but I would like to understand more why you think code gen shouldn't happen during build time. This is quite a normal thing to do and makes the end user experience very smooth.

In general, I agree on the point that using SwiftSyntax without SemVer shouldn't be used though.

Furthermore, I agree that it is unfortunate that plugin dependencies and target dependencies are mixed up right now. It would be great if we can separate that out and have different dependency graphs for them.

taylorswift · August 23, 2022, 7:32pm

the way swift-syntax-based source gen works right now is it takes 1 version of swift as input, and emits many versions of swift as output.

so, you a library author with swift-DEVELOPMENT-SNAPSHOT-2022-08-18-a installed, write templates that are compatible with swift-DEVELOPMENT-SNAPSHOT-2022-08-18-a, and then the source gen tool emits code that is compatible with however many versions of swift you intend to support.

to support many-to-many version source gen would require every nightly release of swift-syntax to also package all of its past versions (including past nightlies) as part of every tag, and also require SPM to know what toolchain it is using, which currently it does not.

i think the root of your confusion is you seem to assume that the grammar of the swift language is stable and forwards-compatible. this is only true for a small subset of the language. things like underscored attributes and modifiers pop in and out of existence all the time.

FranzBusch · August 23, 2022, 7:37pm

The core grammar of the language is stable between major version of the language. New additions are made but that is backwards compatible. This is the whole goal of the source breaking section of our evolution process. Including underscored features in the general stable grammar is IMO not fair. If you generate code that uses such features you are opening yourself up for source breakage. Source breakage can happen with new major Swift versions but that has a broader effect on the ecosystem and not just code gen plugins.

In the end, I strongly believe that if SwiftSyntax will adopt SemVer we can safely write code gen tools as build plugins.

taylorswift · August 23, 2022, 7:40pm

this is the reality of swift library development. i don’t know many serious libraries that are able to avoid the unstable parts of the language entirely, and source breakage is something that happens quite frequently. this is why comprehensive CI is so important.

some of the underscored features, like _modify, __owned, and @_exported are pretty much load-bearing columns today.

FranzBusch · August 23, 2022, 7:44pm

I am with you that libraries are often resorting to using underscored features and sometimes even code gen tools need to generate code with them.
However, your blanket statement that one should not use SwiftSyntax in build tools is not fair. It always depends what you are outputting and for most code that is fine and it will be forward compatible.

taylorswift · August 23, 2022, 7:52pm

lets suppose you have a protocol with primary associated types

public 
protocol ParsingRule<Element, Index>
{
    associatedtype Element 
    associatedtype Index:Strideable

    init()
}

and you want to target swift going back to 5.4. so one way you could achieve this is by using swift-package-factory’s @retro attribute:

@retro public 
protocol ParsingRule<Element, Index>
{
    associatedtype Element 
    associatedtype Index:Strideable

    init()
}

which generates

#if swift(>=5.7)
protocol ParsingRule<Element, Index>
{
    associatedtype Element 
    associatedtype Index:Strideable

    init()
}
#else 
protocol ParsingRule
{
    associatedtype Element 
    associatedtype Index:Strideable

    init()
}
#endif

but @retro only works if swift-syntax can understand PATs. and only recent snapshots of swift-syntax can understand PATs. so if you wanted SPF to run as a buildTool and not a command plugin, that would require the user to have a toolchain installed that understands PATs, which would defeat the purpose of @retro.