Speeding up SwiftSyntax by using the parser directly

(Argyrios Kyrtzidis) #1

tl;dr; Providing direct access to the parser speeds up SwiftSyntax 8x, and it becomes 2x faster than the legacy sourcekitd syntactic request

Currently there are 2 ways to create a SwiftSyntax tree from a source file:

  • Parse the JSON output from the compiler executable (slow)
  • Use sourcekitd + binary serialization format (fast)

When comparing SwiftSyntax with the legacy request, a major benefit is that SwiftSyntax allows for incremental re-parsing but it is still critical for SwiftSyntax to have very fast, real-time performance when parsing a lot of code (e.g. first time a file is opened, processing multiple files, etc.).
From that aspect, even the fastest available approach for SwiftSyntax is multiple times slower than the legacy sourcekitd syntactic request.

To see how faster SwiftSyntax can be, I prototyped providing a C library that exposes a couple of APIs that use the Swift parser to create the raw syntax-tree nodes that SwiftSyntax needs, and pass them to the client. I then modified SwiftSyntax to call these C APIs and get back the raw syntax nodes directly from the parser.

The performance gains of this approach are significant, see the timings below. For the source test case I used the largest file in firefox-ios repo and copied it 2x, so total is a file with 4,120 LOC.

compiler side SwiftSyntax side total
legacy syntactic request 25 ms 25 ms
SwiftSyntax/sourcekitd 44 ms 60 ms 104 ms
SwiftSyntax/direct parse 11 ms 2 ms 13 ms

As you can see, using the C library SwiftSyntax became 8x faster, and even 2x faster than the legacy sourcekitd syntactic request! :racing_car::dash:

The only potential downside of this approach is that we are losing the crash protection that sourcekitd provided, but:

  • The parser is a much less complicated system than the rest of the compiler pipeline, and given the additional quality assurance tools that we have, like the amazing swift stress tester, I believe we can be confident that we will protect SwiftSyntax against crashing bugs.
  • Moving SwiftSyntax off sourcekitd has an additional upside beyond performance. A long standing desire of us was to isolate the syntactic functionality from the sourcekitd crashes (a typechecker crash should have no effect on syntactic functionality), and this approach allows us to provide this.

Given how critical the performance of SwiftSyntax is for allowing it to replace all uses of the legacy syntactic request, I believe this approach is the way forward in order to achieve this goal.

(Jon Shier) #2

As an outsider who isn't involved in this process at all:

When SwiftSyntax was introduced, I thought I remembered a future direction of integrating it into the compiler pipeline so it would become the first step, and its output could be shared between the compiler and other tools which need to consume the high fidelity output (like the LS). Is that not the case, or is that just a long long term goal?

As a Swift user, yes please! Syntax highlighting breaking because of sourcekitd crashes are super annoying, especially since Xcode's "Report a bug" button is cruel joke. Highlighting flashes because the entire file has to be highlighted at once are also super annoying. Flashes while editing code are super super annoying. Anything that improves this situation would be great.

On a related note, when Swift LSP support was announced, there was much rejoicing, aside from the comments that, since it would still use sourcekitd, every IDE would now suffer from the same issues Xcode sees. So this would help general developer confidence as well.

(Argyrios Kyrtzidis) #3

It's a long term goal.

(Marcin Krzyzanowski) #4

are you saying that sourcekitd "parser" is the slowest possible approach. Slower than playing with JSON ?

(Argyrios Kyrtzidis) #5

No, JSON is multiple times slower than the sourcekitd approach, I just didn't bother taking specific measurements for it, we know this already.

(Marcin Krzyzanowski) #6

What is "legacy syntactic request" ? it's faster than "sourcekitd" - I found this part intriguing.

(Argyrios Kyrtzidis) #7

It's the functionality for opening a source document and getting a 'syntax-map' (flat array used for syntactic highlighting) and a 'document structure' (tree describing the structure of the code).
Check examples for syntax-map and document-structure.