"Immutability and Faithfulness of AST nodes"

jrose · May 28, 2019, 4:17pm

Clang just added a new section to its Internals manual on the "immutability" and "faithfulness" of its AST nodes. I wanted to bring up (with no particular urgency) whether Swift should have something similar. So far, he Swift AST has followed the "faithfulness" part pretty well, but because of the separate parsing and type-checking phases (or "semantic checking", if you prefer) it's pretty mutable.

We've had thoughts before about making the AST less mutable in various ways, one of the reasons being because it makes it easier to do live-editing tricks like incremental type-checking, where you throw out just the types that could have been touched by a change. It can also make circularity issues much easier to catch, although the request evaluator is doing that too. On the flip side, though, we've seen evidence that accessing fields of AST nodes is noticeably faster than looking them up in a map—this was tested with Clang nodes, which are allocated at offset -1 now but used to be in a global map on the ASTContext.

But wait. The Swift AST has done fairly well on the faithfulness front…but still wasn't enough to perfectly recreate the original source; instead, we now have SwiftSyntax for that. So some people (mostly not me) have also talked about building ASTs from SwiftSyntax nodes, and maintaining a link back for the source information. If we do go down that path, we'd have to be careful to not slow down compilation, but it would also allow dropping information from AST nodes that would be redundant with the syntax nodes.

I mostly wanted to bring this up so people could read the Clang change. But it seems like there's a good chance "immutable + faithful" isn't the place to end up for Swift's AST. Thoughts?

(And yes, this is sort of another IR: Syntax -> AST -> SIL -> LLVM IR -> LLVM MC -> assembly. That's compilers for you…but it also means being careful about perf.)

Slava_Pestov · May 28, 2019, 10:11pm

I would argue that as long as the mutation is hidden by the request evaluator (ie, the "internal caching" scheme we use for some requests) instead of exposing a setter method to all clients, then you get most of the benefits of immutability without the performance hit of a side table.

DaveZ · May 29, 2019, 10:57am

Hi @jrose!

I think there are four topics packed into this thread:

AST fidelity ("faithfulness")
AST reversibility
AST mutability
AST efficiency ("performance")

On the first topic, I agree that Swift is faithful to the intent of programmers. I think/hope that anybody that hacks on compilers regularly respects the dividends that "faithfulness" pays. Also, the way I like to think about it is this: the more implicit a language is, the more explicit the compiler needs to be.

On the second topic, reversibility, I think that as a much younger language than ObjC++, we shouldn't be surprised that Swift has prioritized feature work over reversibility. Anybody that skims the parser source will notice source location/range tracking data being dropped on the floor all over the place. Personally, I think the way to solve this is to create a "verify reversibility" pass for assert builds.

On the third topic, mutability, I'm disappointed that we don't do two things: 1) use const more (especially for fields) and 2) have a per-node "done" bit that mutating methods check to see if Sema believes that the node should no longer be mutated. Yes, this can be tedious in C++, and in a single-threaded context it can also feel pointless -- but here is my perspective, the people that are experts at extracting concurrency and the people that are good compiler designers are rarely the same people. If the latter want the former to try and make the compiler faster, then we need mutation to be explicit and checked.

Finally, on the topic of efficiency/performance and syntax tracking, I've been tempted to move all of the source location/range tracking fields out into separate data structures inside of a separate heap just to get better cache density for the critical path. This would be very similar to "building ASTs from SwiftSyntax nodes, and maintaining a link back for the source information" but motivated by a different goal.