Thoughts on a hypothetical "Multiple primary file" mode

Hi, all. David Ungar and Graydon Hoare have been doing compiler performance investigations, and in particular are interested in the cases where WMO-with-Onone builds are faster than the multi-file -Onone builds. By now this is out there as a hackaround for build time on targets with many files because the compiler spends too much time doing repeated work, and whatever work is saved by -incremental isn't worth it. -incremental is still very conservative and we could probably do better there, but meanwhile there's something to this: could we save on work in non-WMO mode by processing multiple "primary files" in one frontend invocation?

(This email is a bit of a brain dump on some of the past discussions on this, because David's fairly new to the Apple side of the compiler project, and even Graydon hasn't been here for all of it. I also realized partway through writing it that I already said a lot of this in docs/Driver.md <https://github.com/apple/swift/blob/master/docs/Driver.md>, but that's more for outside people. docs/DependenciesAnalysis.rst <https://github.com/apple/swift/blob/master/docs/DependencyAnalysis.rst> may also be relevant.)

First, some background:

WMO mode
- one frontend process
- parses, resolves imports, and type-checks all files
- generates SIL for all files in a single SILModule
- usually generates N LLVM modules, which turn into N .o files in parallel
- does not support incremental builds, even under -Onone (one reason: it isn't guaranteed which content goes into which .o)

(There's also a fully single-threaded WMO mode which outputs a single .o file. This allows slightly more LLVM optimization at the cost of, well, being single-threaded. For historical reasons this is the default WMO mode on the command line, but isn't used by Xcode or SwiftPM.)

Non-WMO mode
- a.k.a. "multi-file mode", "multi-frontend mode", "primary file mode", or "incremental mode" – there isn't really a standard name for this
- parses and resolves imports in all files, type-checks just the "primary file" (specified on command line)
- …but type-checking one file can sometimes require partially type-checking others (e.g. because you inherit from a class in another file). You pretty much always get to skip function bodies, though.
- generates SIL for the primary file in a SILModule
- generates 1 LLVM module <http://llvm.org/docs/LangRef.html#module-structure>, which results in one .o file
- very conservative incremental builds (as in, "err on the side of rebuilding")

So of course there's repeated work in the non-WMO mode. Parsing all files is something that we don't consider expensive because it's a relatively small part of the total compile time, though I can't remember if we're smart enough to delay parsing of function bodies in other files. However, it turns out even the cost of opening the files can matter. Beyond that, there are a number of other pieces of potentially repeated work:

- anything brought in through the Clang importer
- anything deserialized from a Swift module (usually the standard library)
- searching for modules on disk (both Clang modules and Swift modules)
- the "partial type-checking" I alluded to above
- synthesizing helper functions for things imported from Clang (for example, memberwise initializers for structs)

David suggested today the idea of a "multiple primary file" mode to speed up compiles (prompting this email), which isn't a bad idea. However, we definitely want to be sure we don't break incremental builds. That means a few restrictions:

- We still want to generate one object file per source file.
- This object file has to be self-contained, in case we don't rebuild it next time, or do rebuild something it depends on.
- The easiest way to guarantee one self-contained object file is to have one LLVM module.
- But we probably also want one SIL module per source file, since SILModule has a field that controls what scope we can assume perfect information for, and it wouldn't be self-contained to have that cross source file boundaries. (I don't trust us to only use this during optimization, either.)
- The Swift AST and ASTContext are not thread-safe, so we can't process multiple SILModules in different threads today.
- When we do synthesize helper functions that aren't associated with a source file, we probably need to put them in every object file.

So, the least intrusive approach here that still supports incremental builds would be to type-check multiple files, but then serially process a separate SILModule for each "primary file". (We might want to call them something more like "active files".) This is (probably) what David was getting at.

However…

…there's something more interesting.

Pipelining

Currently, the way we incrementally build a project is by passing all input files to the Swift driver, which decides which files need to be rebuilt and then spawns a bunch of "frontend jobs", one per primary file that needs to be rebuilt. (It's actually a little worse than this because we may discover that a file needs to be rebuilt only after doing some of the work <https://github.com/apple/swift/blob/master/docs/Driver.md#incremental-builds>discover that a file needs to be rebuilt <https://github.com/apple/swift/blob/master/docs/Driver.md#incremental-builds> only after doing some of the work.) The "least intrusive" approach I described above would partition up the files we know we need to rebuild ahead of time, but it would still end up spawning new jobs for those "late-discovered" files. It could also end up accidentally putting five long files on one job and five short files on another, which isn't a great use of resources.

An alternate approach would be what Daniel Dunbar has called pipelining: when a particular frontend job is done, the parent process should be able to send it another file to work on, starting from the type-checking phase.

There are some downsides to pipelining:
- Even less deterministic than a normal parallel build (if "less deterministic" can be considered a thing)
- Harder to test – we'd have to make the "multiple primary file" mode anyway just to ensure the same series of processing.
- Unbounded growth – we'd probably want to start from scratch at some point if memory usage exceeded some threshold.

But overall I think this fits our model better, and if we can figure out a good way to communicate between the frontend and driver this is the way to go.

(Since David recently pushed through a nice refactoring of CompilerInstance::performSema, we know there's an additional complication around targets that have a main.swift—today we type-check that file piecemeal instead of parsing it all at once. We'd probably either subset main.swift out from this optimization, or figure out if we can drop the weird special-case type-checking.)

Faster Optimized Builds?

Also important! But since we expect most optimized builds these days are WMO builds, this sort of approach is less important than it might otherwise be.

Incremental Optimized Builds?

Unlikely for WMO, although there's already a small optimization where if a particular LLVM module's contents are exactly the same as the last build, we won't bother re-optimizing at the LLVM level. For non-WMO, naively extending the current model would require that we were sure that the current dependency tracking mechanism included all information that the optimizer uses, which I'm not confident about. However, we've talked (idly) about doing incremental processing up to the "mandatory optimizations" in SIL, i.e. up until we've produced canonical SIL and emitted flow-sensitive diagnostics. We would then write this to disk (we call these "SIB" files, for "Swift Intermediate Binary file") and then do a non-incremental stage turning this into object files. We could even do cross-file WMO-like optimization this way, either by using a single process that merged in all the SIB information, or by loading all N files into multiple processes, each responsible for one or more object files.

But of course, within most optimized builds, type-checking and SILGen is at most a third of the build time, usually less. So it's unclear whether this would be a net win given all the extra overhead.

That's probably all the relevant information I have right now, but it seems like quite enough.

Jordan

Thank you, Jordan. Lots to mull over for me here. BTW, although it was I who mentioned "multiple primary files" today, it was not my idea. I was relaying what Graydon had mentioned.
(Just to give credit where it's due.)

- David

···

On Sep 25, 2017, at 3:45 PM, Jordan Rose <jordan_rose@apple.com> wrote:

Hi, all. David Ungar and Graydon Hoare have been doing compiler performance investigations, and in particular are interested in the cases where WMO-with-Onone builds are faster than the multi-file -Onone builds. By now this is out there as a hackaround for build time on targets with many files because the compiler spends too much time doing repeated work, and whatever work is saved by -incremental isn't worth it. -incremental is still very conservative and we could probably do better there, but meanwhile there's something to this: could we save on work in non-WMO mode by processing multiple "primary files" in one frontend invocation?

(This email is a bit of a brain dump on some of the past discussions on this, because David's fairly new to the Apple side of the compiler project, and even Graydon hasn't been here for all of it. I also realized partway through writing it that I already said a lot of this in docs/Driver.md <https://github.com/apple/swift/blob/master/docs/Driver.md>, but that's more for outside people. docs/DependenciesAnalysis.rst <https://github.com/apple/swift/blob/master/docs/DependencyAnalysis.rst> may also be relevant.)

First, some background:

WMO mode
- one frontend process
- parses, resolves imports, and type-checks all files
- generates SIL for all files in a single SILModule
- usually generates N LLVM modules, which turn into N .o files in parallel
- does not support incremental builds, even under -Onone (one reason: it isn't guaranteed which content goes into which .o)

(There's also a fully single-threaded WMO mode which outputs a single .o file. This allows slightly more LLVM optimization at the cost of, well, being single-threaded. For historical reasons this is the default WMO mode on the command line, but isn't used by Xcode or SwiftPM.)

Non-WMO mode
- a.k.a. "multi-file mode", "multi-frontend mode", "primary file mode", or "incremental mode" – there isn't really a standard name for this
- parses and resolves imports in all files, type-checks just the "primary file" (specified on command line)
- …but type-checking one file can sometimes require partially type-checking others (e.g. because you inherit from a class in another file). You pretty much always get to skip function bodies, though.
- generates SIL for the primary file in a SILModule
- generates 1 LLVM module <http://llvm.org/docs/LangRef.html#module-structure>, which results in one .o file
- very conservative incremental builds (as in, "err on the side of rebuilding")

So of course there's repeated work in the non-WMO mode. Parsing all files is something that we don't consider expensive because it's a relatively small part of the total compile time, though I can't remember if we're smart enough to delay parsing of function bodies in other files. However, it turns out even the cost of opening the files can matter. Beyond that, there are a number of other pieces of potentially repeated work:

- anything brought in through the Clang importer
- anything deserialized from a Swift module (usually the standard library)
- searching for modules on disk (both Clang modules and Swift modules)
- the "partial type-checking" I alluded to above
- synthesizing helper functions for things imported from Clang (for example, memberwise initializers for structs)

David suggested today the idea of a "multiple primary file" mode to speed up compiles (prompting this email), which isn't a bad idea. However, we definitely want to be sure we don't break incremental builds. That means a few restrictions:

- We still want to generate one object file per source file.
- This object file has to be self-contained, in case we don't rebuild it next time, or do rebuild something it depends on.
- The easiest way to guarantee one self-contained object file is to have one LLVM module.
- But we probably also want one SIL module per source file, since SILModule has a field that controls what scope we can assume perfect information for, and it wouldn't be self-contained to have that cross source file boundaries. (I don't trust us to only use this during optimization, either.)
- The Swift AST and ASTContext are not thread-safe, so we can't process multiple SILModules in different threads today.
- When we do synthesize helper functions that aren't associated with a source file, we probably need to put them in every object file.

So, the least intrusive approach here that still supports incremental builds would be to type-check multiple files, but then serially process a separate SILModule for each "primary file". (We might want to call them something more like "active files".) This is (probably) what David was getting at.

However…

…there's something more interesting.

Pipelining

Currently, the way we incrementally build a project is by passing all input files to the Swift driver, which decides which files need to be rebuilt and then spawns a bunch of "frontend jobs", one per primary file that needs to be rebuilt. (It's actually a little worse than this because we may <https://github.com/apple/swift/blob/master/docs/Driver.md#incremental-builds>discover that a file needs to be rebuilt <https://github.com/apple/swift/blob/master/docs/Driver.md#incremental-builds> only after doing some of the work.) The "least intrusive" approach I described above would partition up the files we know we need to rebuild ahead of time, but it would still end up spawning new jobs for those "late-discovered" files. It could also end up accidentally putting five long files on one job and five short files on another, which isn't a great use of resources.

An alternate approach would be what Daniel Dunbar has called pipelining: when a particular frontend job is done, the parent process should be able to send it another file to work on, starting from the type-checking phase.

There are some downsides to pipelining:
- Even less deterministic than a normal parallel build (if "less deterministic" can be considered a thing)
- Harder to test – we'd have to make the "multiple primary file" mode anyway just to ensure the same series of processing.
- Unbounded growth – we'd probably want to start from scratch at some point if memory usage exceeded some threshold.

But overall I think this fits our model better, and if we can figure out a good way to communicate between the frontend and driver this is the way to go.

(Since David recently pushed through a nice refactoring of CompilerInstance::performSema, we know there's an additional complication around targets that have a main.swift—today we type-check that file piecemeal instead of parsing it all at once. We'd probably either subset main.swift out from this optimization, or figure out if we can drop the weird special-case type-checking.)

Faster Optimized Builds?

Also important! But since we expect most optimized builds these days are WMO builds, this sort of approach is less important than it might otherwise be.

Incremental Optimized Builds?

Unlikely for WMO, although there's already a small optimization where if a particular LLVM module's contents are exactly the same as the last build, we won't bother re-optimizing at the LLVM level. For non-WMO, naively extending the current model would require that we were sure that the current dependency tracking mechanism included all information that the optimizer uses, which I'm not confident about. However, we've talked (idly) about doing incremental processing up to the "mandatory optimizations" in SIL, i.e. up until we've produced canonical SIL and emitted flow-sensitive diagnostics. We would then write this to disk (we call these "SIB" files, for "Swift Intermediate Binary file") and then do a non-incremental stage turning this into object files. We could even do cross-file WMO-like optimization this way, either by using a single process that merged in all the SIB information, or by loading all N files into multiple processes, each responsible for one or more object files.

But of course, within most optimized builds, type-checking and SILGen is at most a third of the build time, usually less. So it's unclear whether this would be a net win given all the extra overhead.

That's probably all the relevant information I have right now, but it seems like quite enough.

Jordan