Speeding up SwiftSyntax by using the parser directly

(Argyrios Kyrtzidis) #1

tl;dr; Providing direct access to the parser speeds up SwiftSyntax 8x, and it becomes 2x faster than the legacy sourcekitd syntactic request

Currently there are 2 ways to create a SwiftSyntax tree from a source file:

  • Parse the JSON output from the compiler executable (slow)
  • Use sourcekitd + binary serialization format (fast)

When comparing SwiftSyntax with the legacy request, a major benefit is that SwiftSyntax allows for incremental re-parsing but it is still critical for SwiftSyntax to have very fast, real-time performance when parsing a lot of code (e.g. first time a file is opened, processing multiple files, etc.).
From that aspect, even the fastest available approach for SwiftSyntax is multiple times slower than the legacy sourcekitd syntactic request.

To see how faster SwiftSyntax can be, I prototyped providing a C library that exposes a couple of APIs that use the Swift parser to create the raw syntax-tree nodes that SwiftSyntax needs, and pass them to the client. I then modified SwiftSyntax to call these C APIs and get back the raw syntax nodes directly from the parser.

The performance gains of this approach are significant, see the timings below. For the source test case I used the largest file in firefox-ios repo and copied it 2x, so total is a file with 4,120 LOC.

compiler side SwiftSyntax side total
legacy syntactic request 25 ms 25 ms
SwiftSyntax/sourcekitd 44 ms 60 ms 104 ms
SwiftSyntax/direct parse 11 ms 2 ms 13 ms

As you can see, using the C library SwiftSyntax became 8x faster, and even 2x faster than the legacy sourcekitd syntactic request! :racing_car::dash:

The only potential downside of this approach is that we are losing the crash protection that sourcekitd provided, but:

  • The parser is a much less complicated system than the rest of the compiler pipeline, and given the additional quality assurance tools that we have, like the amazing swift stress tester, I believe we can be confident that we will protect SwiftSyntax against crashing bugs.
  • Moving SwiftSyntax off sourcekitd has an additional upside beyond performance. A long standing desire of us was to isolate the syntactic functionality from the sourcekitd crashes (a typechecker crash should have no effect on syntactic functionality), and this approach allows us to provide this.

Given how critical the performance of SwiftSyntax is for allowing it to replace all uses of the legacy syntactic request, I believe this approach is the way forward in order to achieve this goal.

Pitch: an Official Style Guide and Formatter for Swift
SE-0250: Swift Code Style Guidelines and Formatter
(Jon Shier) #2

As an outsider who isn't involved in this process at all:

When SwiftSyntax was introduced, I thought I remembered a future direction of integrating it into the compiler pipeline so it would become the first step, and its output could be shared between the compiler and other tools which need to consume the high fidelity output (like the LS). Is that not the case, or is that just a long long term goal?

As a Swift user, yes please! Syntax highlighting breaking because of sourcekitd crashes are super annoying, especially since Xcode's "Report a bug" button is cruel joke. Highlighting flashes because the entire file has to be highlighted at once are also super annoying. Flashes while editing code are super super annoying. Anything that improves this situation would be great.

On a related note, when Swift LSP support was announced, there was much rejoicing, aside from the comments that, since it would still use sourcekitd, every IDE would now suffer from the same issues Xcode sees. So this would help general developer confidence as well.

(Argyrios Kyrtzidis) #3

It's a long term goal.

(Marcin Krzyzanowski) #4

are you saying that sourcekitd "parser" is the slowest possible approach. Slower than playing with JSON ?

(Argyrios Kyrtzidis) #5

No, JSON is multiple times slower than the sourcekitd approach, I just didn't bother taking specific measurements for it, we know this already.

(Marcin Krzyzanowski) #6

What is "legacy syntactic request" ? it's faster than "sourcekitd" - I found this part intriguing.

(Argyrios Kyrtzidis) #7

It's the functionality for opening a source document and getting a 'syntax-map' (flat array used for syntactic highlighting) and a 'document structure' (tree describing the structure of the code).
Check examples for syntax-map and document-structure.

(Argyrios Kyrtzidis) #8

An update on this: we started incorporating related changes, like refactoring how swift syntax parsing works in order to slot in a C API for it. There are plenty more changes coming in, and we are also looking into the SwiftSyntax side to make it more efficient to create and visit the tree. Such changes may be source breaking as well.

Given that these are disruptive changes and that we'd like to have some time internally to "bake" the changes before making them available via a released toolchain, we are not aiming to have this available for the swift-5.0 release, but it will be part of the next release.

(JP Simard) #9

This is great news Argyrios. I really appreciate the focus on performance here!


My only worry is this potential downside -- the last time I used Rust, which has the RLS (Rust Language Server), I encountered a lot of issues with the language server primarily because it is very tightly coupled with the Rust compiler infrastructure; this led to crashes or situations where analysis broke down due to an error in the code which would have been easier to fix should analysis have been available. Ultimately, I hope that providing SwiftSyntax direct access to the Swift parser doesn't result in scenarios where static analysis is kneecapped by issues that would be important at compile-time, but are less important while code is being developed.

(Argyrios Kyrtzidis) #11

Not quite sure I'm following but I don't think what you describe matches what we intend to do. SwiftSyntax will be used only for syntactic processing, and any semantic analysis, like typechecking, will stay behind the sourcekitd process barrier.

(Yasuhiro Inami) #12

FYI here are some experiments I made using 1. SwiftLang + ByteTree and 2. newly-released direct-parser:

  1. Add RunSwift5Command (SwiftLang + ByteTree, experiment) by inamiy · Pull Request #13 · inamiy/SwiftRewriter

  2. Update toolchain to PR-21772-166 & use direct-parser by inamiy · Pull Request #14 · inamiy/SwiftRewriter

(Yasuhiro Inami) #13

Add RunSwift5Command (SwiftLang + ByteTree, experiment) by inamiy · Pull Request #13 · inamiy/SwiftRewriter

@akyrtzi @Xi_Ge
Is libswiftSwiftLang.dylib not available in Xcode 10.2 toolchain?
I thought we could at least try ByteTree for faster parsing than in Swift 4.2.

(Argyrios Kyrtzidis) #14

We didn't want to put it in the toolchain because we intend at some point to move it out of the Swift repo and into its own SwiftPM package, like SwiftSyntax.
Maybe the most convenient thing would be to copy the relevant swift files in your repo, if you want to use them, there's only a few number of them.