Thoughts on caching compiled parseable interfaces

jrose · October 5, 2018, 9:25pm

Background: This is part of the module stability effort. @Graydon_Hoare is currently working on the "compile and cache" part in https://github.com/apple/swift/pull/19518, and asked me to write up my thoughts on how the cache is going to work before I go on vacation for a bit.

Like header files, neither today's binary swiftmodules nor the planned parseable interfaces are self-contained; they have imports that reference other modules, and then use declarations from those modules in their own declarations. (This could be argument or return types, subclassing, adopting a protocol, overriding a method…it's basically everywhere.) Like header files, this means that if you swap out a dependency (say, Foundation) with another module that has the same name but completely different content, it will fail to work. Today swiftmodules try to recover from a few cases of that, but if you step outside the "recovery" logic the compiler will just straight-up crash. It ought to emit a diagnostic and then refuse to generate code, but right now it mostly just crashes, for…implementation reasons.

This post is (mostly) not about that. The important part is that parseable interfaces are also not self-contained.

Now, the way we're planning to handle parseable interfaces is that we'll "compile" them in a mostly-self-contained environment, and then load the compiled form into the main ASTContext the way we do for swiftmodules today. But we don't want to start from scratch every time—you don't want to effectively be compiling the standard library in all its inlinable glory on every build! So we need to cache these as well, and only do the compiling work if the cache is out of date.

Sound familiar? This is basically how Clang avoids re-processing headers via pre-compiled modules. So we're taking a lot of cues from that.

Okay. Hi @Graydon_Hoare (and @harlanhaskins). Here's the new stuff.

There are two things you need to do to use a cache:

The cache key is where to look for an entry, to see if it exists at all. This is usually some kind of hash lookup.
Once you find the entry, you have to check to see if it's up to date.

In an offline discussion, Graydon pointed out that you can avoid (2) if the "up to date" check can be expressed as part of the key, i.e. if "being up to date" means "having these attributes". This is certainly simpler, but does have some problems:

If "being up to date" includes checks like "anything older than when this was created is fine", they can't be expressed as inputs to a hash.
If the cache isn't automatically cleaned, you could end up with a lot of out-of-date entries just sitting around. (@Douglas_Gregor was strongly pushing us away from having to clean up this cache.)

So I think we're likely to want to keep the separate up-to-date check, since it's more flexible.

Next, what goes in the cache key? This should be anything where the same swiftinterface might generate multiple swiftmodules, so we should include things like

The full compiler version - different compilers shouldn't stomp on each other
Some notion of identity for the swiftinterface file - because you might have two modules named "Foo" in unrelated parts of the project, and you don't want to have them fighting each other

(Note that the target platform isn't included here; there are different swiftinterface files for each platform because of #if. Take a look at the original post for more info.)

On the other hand, the up-to-date check needs to take into account whether the swiftinterface file itself has changed. If this were in the cache key, every time you updated the file you'd get a new cache entry, which as noted above isn't great if the compiler isn't the one responsible for cleaning the cache.

And then there's a tricky case: dependencies. Let's use Foundation and UIKit as our examples, pretending they were pure Swift frameworks. If I change Foundation's swiftinterface, do I need to treat UIKit's cache entry as out of date? On the one hand, UIKit's module is just going to reference things from Foundation, so either Foundation's API surface stayed compatible or it didn't. If it didn't, we're going to have problems anyway. (Though admittedly the recovery for trying to recompile UIKit's swiftinterface is much better than the "oh shoot ABORT ABORT ABORT" we get from a swiftmodule.)

On the other hand, search paths might really make a difference. If I have two completely different implementations of "Foundation", and I choose between them with search paths, it's possible that UIKit's swiftinterface will behave differently in each case. (Imagine a typealias that's Int64 in one version and Double in the other.) So we probably want some notion of which dependencies went into building a cached swiftmodule, but maybe that's just a path and not the timestamp or contents.

That said, we do want to maximize sharing of cached swiftmodules, so we don't just want to hash the search paths up front. We only want to count two swiftmodules as different if the different search paths actually resulted in finding different things.

(Other settings are tricky. If they're mirrored down from the calling context, they probably have to go into either the cache key or the up-to-date check. "ExtraClangImporterArgs" is especially tricky since some modules can't be loaded without providing some macro or another, but it also contains a bunch of things like header maps when using Xcode, and that would destroy all sharing. We might just have to mirror that one down but pretend it doesn't affect the contents of the dependencies.)

Okay, I'm sure I've left things out but I've also written a ton, so I'm going to leave off here.

P.S. Even with all this, though, we still need recovery logic in loading swiftmodule files, because we have many ways to change the way imported declarations look: the active Swift version, the deployment target, even random macros being set. None of these things are guaranteed to be consistent across the various imported Swift modules; they're set by the client for the current compilation context.