Redesigning parts of the Clang Importer

zoecarver · March 24, 2021, 5:17pm

As I've been fixing bugs to, mostly, prevent C++ interop from crashing when importing various APIs, I've noticed a common theme: these bugs often occur when we do something differently than Clang does it. This makes sense because we're using Clang and importing C++. Here are a few examples of such bugs:

https://github.com/apple/swift/pull/36553
https://github.com/apple/swift/pull/35963
https://github.com/apple/swift/pull/35962
https://github.com/apple/swift/pull/35819

In fact, all these bugs occur because Clang is lazy and we are not. What I mean by that is, Clang often won't evaluate types or members until it really needs to (i.e., until they are actually used). For example, Clang will lazily evaluate inline static data members. So, if there's an error, it will only be emitted when the member is used. Because we try to evaluate every member, (before that patch) we would sometimes find invalid members that Clang hadn't yet instantiated and therefore didn't mark as invalid.

Another example of this can be demonstrated with the following snippet:

template<class T> struct X { X<X<T>> test(); } ;
void test(X<int>);

Clang has no problem compiling this program, but we fail to import it (luckily, we no longer hang/crash). Why is this? Because Clang won't evaluate the function type until it's actually called, whereas we will evaluate and instantiate the function type as we're importing it.

So, what's the solution? We could redesign some parts of the Clang importer to be lazier (especially when dealing with C++ templates). We have already started to do this with function templates (see #35819), but I think we could integrate laziness into the Clang importer in a much more fundamental way.

This would not only fix some of the issues above but also allow us to import more C++ APIs, more closely match what Clang does, and be vastly more performant, especially when importing templates.

In this post, I'm not proposing we do anything. I'm just bringing up a "problem" (or, more accurately, an observation about related problems) and brainstorming some possible solutions. I'm creating this post in response to the discussion @typesanitizer and I had here: cxx-interop Improve performance of deep template instantiations. by zoecarver · Pull Request #36553 · apple/swift · GitHub.

I'd love to hear what people think about making the Clang importer lazy and if there are any suggestions as to the best way to design this (I know there are some other parts of the compiler that rely on lazy evaluation, it would be helpful to know how those are implemented and if the model works well). As Varun said, taking a step back and thinking about the best implementation for the Clang importer might be extremely beneficial in the long run.

schuett · March 24, 2021, 6:20pm

The Swift compiler became lazy with the RequestEvaluator:

github.com

apple/swift/blob/main/docs/RequestEvaluator.md

# Request-Evaluator

The [request-evaluator](https://github.com/apple/swift/blob/main/include/swift/AST/Evaluator.h) is a new architectural approach for the Swift compiler that attempts to provide infrastructure for fine-grained tracking and visualization of dependencies, separately caching the results of computation within the compiler, and eliminating mutable state, particular from the Abstract Syntax Tree (AST). It is intended to address a number of problems that arise in the Swift compiler:

* Type checking (as well as later stages) tend to depend on mutable AST state, which is hard to reason about: which mutable state contributed to the result of a type check? How was it computed?
* Type checking, particularly in targets with more source files, is intended to be "lazy": the compiler will try to type check only what is needed to compile the given source file, and nothing more. Combined with mutable state, we often end up with compiler bugs that only manifest in multi-file targets, which are (1) the norm for real-world development, but (2) far less friendly for compiler writers writing test cases.
* Lazy type checking uses an ad hoc set of callbacks captured in the [lazy resolver](https://github.com/apple/swift/blob/main/include/swift/AST/LazyResolver.h). This ad hoc recursion can lead to cycles in the type-checking problem (e.g., to compute A we need to compute B, which depends on A): with mutable state, it's easy to cause either wrong results or infinite recursion.
* Because it's not clear which state is set by a given callback in the `LazyResolver`, the callbacks there tend to be very coarse-grained, e.g., "resolve everything about a declaration that might be used by a client." This performs too much work in multi-file targets (making the compiler slower than it should be) and contributes to more cycles (because coarse-grained queries are more likely to be cyclic where finer-grained ones might not be).

The request-evaluator attempts to address these issues by providing a central
way in which one can request information about the AST (e.g., "what is the superclass of the given class?"). The request-evaluator manages any state in its own caches, and tracks the dependencies among these requests automatically, identifying (and reporting) cycle dependencies along the way. It is designed for incremental adoption within the compiler to provide a smooth path from ASTs with mutable state, and provides useful debugging and performance-monitoring utilities to aid in its rollout.

## Requests
At the core of the request-evaluator is a "request", which is a query provided to the "evaluator", which will return the result of servicing the request. For example, a request could be "get the superclass of class `Foo`" or "get the interface type for function `bar(using:)`". Each of these request kinds would produce a type as its result, but other requests could have other result types: "get the formal access kind for declaration `x`" would return a description of the access level, or "get the optimized SIL for function `bar(using:)`" would produce a `SILFunction`.

Each "request" is described by, effectively, a description of a computation to perform. A request is a value type that is capable of comparing/hashing its state, as well as printing its value for debugging purposes. Each request has an evaluation function to compute the result of the request. Only the evaluator itself will invoke the evaluation function, memoizing the result (when appropriate) for subsequent requests.

All existing requests inherit the [`SimpleRequest`](https://github.com/apple/swift/blob/main/include/swift/AST/SimpleRequest.h) class template, which automates most of the boilerplate involved in defining a new request, and mostly abstracts away the handling of the storage/comparison/hashing/printing of the inputs to a request. Some example requests for access control are provided by [`AccessRequests.h`](https://github.com/apple/swift/blob/main/include/swift/AST/AccessRequests.h).

## Dependencies and Cycles

This file has been truncated. show original

Maybe this is a future direction for Clang too. You could ask Clang to type-check something even so it wouldn't have done so itself.

beccadax · March 24, 2021, 9:25pm

It's interesting that @schuett mentions the request evaluator, because ClangImporter is a large chunk of the compiler that has not yet been requestified, and changing that might help. I know that @codafi has talked about this in the past; he might have something to contribute to the discussion.

Some random observations:

ClangImporter is highly re-entrant and does a lot of ad-hoc cycle breaking. For instance, there are places where we partially initialize a parent declaration, insert it into a lookup table, import related declarations which will need to reference it, and only finish initializing the parent once those are imported. It would be nice to formalize this kind of thing.
However, there aren't actually that many entry points to the importer (or at least to SwiftDeclConverter). About six months ago, I experimented with instrumenting the ClangImporter to get it to explain its importing logic, and managed to narrow it to five entry points. That makes me think you won't necessarily need a million different requests to make this work.
Some of ClangImporter's logic is easy to implement when you start from the Clang declaration, but quite difficult when you start from the Swift reference to that declaration. For instance, if you're starting from a qualified lookup of init(contentsOf:) in NSData, how do you map that to the Objective-C selector initWithContentsOfURL:error:? The answer we went with is that we build a lookup table of context + Swift name to ObjC declaration by examining all declarations ahead of time (in a separate compiler job that builds a PCH with the table, even!), but that's inherently an eager operation. Is it possible to make that lazy instead?
Some of ClangImporter's logic is much easier to implement when you examine and evaluate many sibling declarations together than it would be if you were evaluating them all separately. For instance, when SwiftDeclConverter::VisitEnumDecl() sees several enumerators with the same raw value, it imports the earliest (non-unavailable) enumerator as a real case and imports the others as static lets (actually, static computed variables). If you evaluate all of the enumerators together, it's pretty easy to use a little temporary state to track which enumerators should be imported as cases and which should be imported as static lets; if you try to make this fully lazy, then suddenly you need to factor this state into a separate computation, and you're probably going to end up at least partially processing all of the enumerators anyway.

jrose · March 24, 2021, 11:48pm

By the way, I remember the implementation before the lookup table, and it was miserable. Members were imported as you referenced them, but that led to weird ordering issues because importing isn't quite idempotent. (Inheritance of requirements from protocols is a particularly nasty case in Objective-C.) Additionally, lots of that work was "lazy" at class granularity but eager once you got to members, i.e. importing one member of a particular category pulled them all in. There was no good way to support the swift_name attribute, of course. And finally, the lookup table actually sped up compilation quite a bit, because the importer didn't have to crawl through the Clang AST from scratch every time.

I'm not saying there's no good lazy solution here; we have plenty of parts of the compiler that improved on what was there before, and yet still can be improved themselves (potentially by replacement). But even so…thank you @Douglas_Gregor for designing and implementing the lookup table. :-)

EDIT and @Michael_Ilseman for separating the name lookup / name mapping from the rest of the importer. (I hope I've remembered correctly which of you were responsible for what.)