Improved build system integration

richardh · May 20, 2019, 8:15pm

TL;DR

The Swift compiler is difficult to efficiently integrate into build systems such as Bazel and Buck as it is a scheduling multithreaded compiler. This proposal suggests a minimal change to make it possible to efficiently integrate into build systems.

Motivation

Currently the Swift compiler has two components, the driver and the frontend. The driver is responsible for creating and running separate frontend tasks to get the various tasks done. This is covered in detail in the docs, with recommendations for build system integration discussed here. There are a couple of problems with integrating as suggested:

For whole module optimisation mode, it is not possible to schedule concurrent module builds without over or under subscription of threads. This is because the swift frontend itself will parallelise for part of the build if requested to create the separate object files in a single process. If multiple compiles are run concurrently and they generate object files concurrently this can result in much slower builds as the processes fight with each other for resources. If single threaded mode is used then the build system threads are not always effectively utilised, resulting in bottlenecks waiting for dependencies to build.
One optimisation to reduce build graph bottlenecks is to generate a module’s swiftmodule in a separate pass. It is then possible to start building dependent modules without having to wait for object file generation to complete. Unfortunately this involves a lot of wasted work as there is no way to share the work done when generating the AST for the module during the swiftmodule generation pass, it will have to be repeated for the object file generation pass.

Proposal

Both of these problems can be resolved with the same solution: produce an intermediate compiler output of a serialized AST that can be reused in subsequent compiler invocations for object file generation. This allows a build system to change the build graph as shown in the diagram from single threaded WMO builds to single threaded AST generation and multiple object file output processes:

This solves the wasted work problem as the compiler only has to generate an AST once, subsequent calls will focus on output generation and can be parallelised. This allows for a large number of modules to be efficiently compiled concurrently.

There already exists an -emit-sib option in the frontend, which seems to be ideal for this purpose:

A representation of captured compiler state after semantic analysis and SIL generation, but before LLVM IR generation ("SIB", for "Swift Intermediate Binary").

Using this as input currently seems to fail at the IR verification stage, and disabling verification results in object files missing many symbols. Extending this intermediate binary format to work successfully as input to compile swiftmodules and object files would be the optimal solution for this issue.

jrose · May 20, 2019, 10:25pm

Thanks for bringing this up. I promise to come respond to this in detail later, but my high-level comment is that this kind of gives up on incremental builds avoiding repeated type-checking. I suspect neither Bazel nor Buck are bothering with that anyway, and I know Swift's dependency analysis is pretty conservative right now, but if you've only changed a function body this would probably not be a good way to go. I don't think it's totally unworkable, though, and while -emit-sib has kind of bit-rotted it's still at a reasonable place in the pipeline.

(Thank you for having read the docs on this! That helps a lot for having a common ground to talk about it. DependencyAnalysis.rst may also turn out to be interesting.)

cc @compnerd

Keith · May 20, 2019, 10:28pm

FYI as of recently bazel does support incremental compilation for Swift Add a persistent worker to support `swiftc` incremental compilation. by swiple-rules-gardener · Pull Request #174 · bazelbuild/rules_swift · GitHub

compnerd · May 20, 2019, 10:46pm

CMake also has the same set of issues, with the approach there being do the single shot compile/link approach with the internal parallelisation. Getting the incremental builds correct was sadly more important than the fine grained control (we lose the ability to build a single object file). This also complicates much of the flags handling. So, having the ability to use the traditional compilation model would be greatly helpful to simplifying build system integration.

richardh · May 20, 2019, 10:46pm

I suspect neither Bazel nor Buck are bothering with that anyway

You are correct that Buck does not, it forces single threaded WMO currently. As Keith points out Bazel is more flexible here, but I believe the default is WMO with 12 threads (right Keith?)

DependencyAnalysis.rst may also turn out to be interesting.

We have been considering adopting this for a future optimisation. Currently in Buck build targets key on their input files and dependencies input files, so modifying any line in any dependency will trigger a rebuild. Improving this to instead key on the compiler generated dependency information could significantly reduce rebuilds of dependent targets, effectively a Swift version of Java ABI rule keys.

Keith · May 20, 2019, 10:52pm

Yep https://github.com/bazelbuild/rules_swift/blob/d9887a794ae032c7971499c6a7d4c0213cfe3626/swift/internal/api.bzl#L480-L487

allevato · May 20, 2019, 11:14pm

There's another potential axis I'd like to explore with Bazel, which is a bit of a different spin on this.

For builds happening entirely on a developer's local machine, we (Bazel) want to take advantage of Swift incremental compilation whenever possible. The ideas mentioned in the original post would help in WMO situations or clean builds, but the distinction between analysis and execution in Bazel doesn't really give us the ability to dynamically choose the best strategy to use here. However, if we started passing -wmo by default for builds corresponding to Bazel's opt mode (we don't today, for historical reasons), then release builds could still benefit from this.

However, for remote builds, if we have N machines available to us where N is large, the situation changes: I would love to be able to shard out each individual frontend invocation to a separate machine (pass all of the source files to each machine, but only treat one of them as -primary-file, and then invoke the final merge-module action that depends on those outputs). My hope (supported by nothing right now, admittedly) is that even though each machine would be separately doing import resolution and type checking, the high parallelization would still end up being a win in terms of wall-clock build time.

For this, the techniques mentioned in Driver.md, like passing -### and then spawning those manually, don't work for Bazel because we can't execute commands at the time that we're registering actions and their inputs/outputs. Directly invoking the frontend is unsupported/fragile, so we'd need to add a supported "single-file compilation" mode to the driver. Since that mode would essentially be "almost invoking the frontend directly, and you still have to manually merge your partial modules, but at least the driver handles a few common frontend flags for you", I'm not sure how crazy the Swift team would be about such a feature.

compnerd · May 21, 2019, 4:10pm

Are the bazel rules public? I was wondering how bazel handles linking. I think that there are two options with pros and cons:

explicit linking as is traditionally done with C/C++ on Unix and Windows
implicit linking with autolinking

For CMake the former has some advantages - you use the target_link_libraries to specify the linkage. This will wire up dependencies, setup include paths, and specify the linker search path indirectly by using absolute paths to the library if it is a local library or setting up the library search path and using the named link if it is an external library.

The latter has benefits of that is probably closer to how many people using the command line tool would use them.

Keith · May 21, 2019, 9:50pm

Yes all the bazel code is open source, for this specific question some of the logic lives in rules_swift and the rest is in bazel core.

In general bazel errs on the side of explicit, and it definitely requires you to strictly define your dependencies in order to link them. rules_swift does include the info from swift-autolink-extract as part of the linking command though https://github.com/bazelbuild/rules_swift/blob/d9887a794ae032c7971499c6a7d4c0213cfe3626/swift/internal/compiling.bzl#L426-L461

@allevato could definitely provide more context on the decisions there.

allevato · May 21, 2019, 10:36pm

Bazel/rules_swift's support for autolinking on Linux is solely there to support the core libraries packaged with the toolchain that have been compiled with -module-link-name, so that we don't have to represent things like Foundation, Dispatch, and XCTest as explicit nodes in the dependency graph and require users to express that dependency in their BUILD files just to get the right linking behavior.

For user-written modules, linking is explicit much like Bazel treats C++; you list the things you want to import as dependencies in your BUILD files and that makes them available for import and links them into your final executable.

compnerd · May 21, 2019, 11:27pm

Okay, this sounds like what I ended up going with and what s-p-m seems to do as well. Awesome, sounds like we have all converged to the same behaviour.

jrose · May 23, 2019, 4:19pm

So, the tricky thing about Swift dependencies is that (like Java) every file in a module is implicitly visible to every other file in the module, and (like Java) that means that "file A.swift has changed" is not enough information to know if you have to rebuild B.swift and C.swift. If that's all the information you have, you have to assume you need to rebuild all of them. This is true even if you have the cross-file dependency info from the previous build, because of things like overloading: if I add func foo(_ x: Int) to A.swift, and B.swift has func foo(_ x: Any), then foo(1) in C.swift changes its meaning. You have to see what changed in order to know what else needs to be rebuilt.

A lot of Swift code is unfortunately slower to compile than Java [citation needed], so this "dynamic" dependency analysis is probably still worth it in the local Debug build case. Especially if you just changed a function body, which is the main case where the compiler knows it doesn't need to be extra conservative.

What about the Bazel case? I think Tony summarized the problems pretty well:

Directly invoking the frontend is definitely unsupported, but assuming you know the full set of driver commands that need to be invoked is also fairly unsupported, and that's why -### has been the recommendation. I think @compnerd convinced me in his CMake work that we can live with a "compile just this file" mode, but part of the tradeoff there was to say "let's not depend on that by having the 'link' step also be responsible for module merging, and the most correct way to do that today is to have the link step do the building as well". That last part isn't going to work for Bazel's "farm out to N machines", though.

(FWIW, -incremental and -wmo are considered incompatible, since -wmo is allowed to use any information it can find to optimize any code in the module, and the compiler isn't tracking what that information ultimately depends on.)

P.S. I still want to talk about Richard's original post: pipelining. Sorry, Richard; I'll get to that next time.

jrose · May 30, 2019, 4:24pm

Okay. Pipelining! Here's a diagram I've used a few times over the years to talk about what I see as the opportunities:

One thing that isn't exactly on the diagram is batch mode, which is similar to "pre-checked decls" except that you run N processes all doing this, and only checking some of the decls in the "Sema" stage. It also doesn't parallelize the separate tracks after Sema, because you're already running N processes and in the simple case N is the number of CPUs already.

The suggestion in the original post sounds like "Pre-checked Decls" at first, but it's probably closer to "Split SIL" (due to the mention of SIB). There's a few reasons why I wouldn't suggest "Pre-checked Decls" as a good model to solve the pipelining problem:

If the goal is to produce a swiftmodule so that you can start compiling lib2, you need to at least have the SIL of the inlinable functions as well.
There's currently no serialization implemented for statements and expressions in Swift, just SIL.

Having someone revive SIB, however, does seem reasonable. Note that a SIB file is not standalone; it's meant to be loaded along with all the other SIB (or source) files for a module to get access to AST declarations in other files.

All that said, it's worth noting that a set of SIB files would contain all of the SIL for a module, which means type-checking all the function bodies. This isn't the slowest part of an optimized build, but it probably is the slowest part of a debug build. So you may still get significantly faster behavior with separate -emit-module and -c invocations if your module doesn't have inlinable code, even with the repeated work of type-checking declarations. (At least in theory. @harlanhaskins, did you manage to get in the change to not type-check non-inlinable function bodies for -emit-module?)

harlanhaskins · May 30, 2019, 4:39pm

Unfortunately I ran into some issues where the benchmarks changed. I’d like to revisit this soon, though!

github.com/apple/swift

[Frontend] Add experimental flag to skip non-inlinable function bodies

apple:master ← harlanhaskins:thats-the-uhh-bodies

opened 02:03AM - 08 Nov 18 UTC

harlanhaskins

+608 -87

This patch adds an experimental flag to skip non-inlinable function bodies and …turns it on for the standard library and overlay `.swiftmodules`. This flag is disabled if the requested action generates IR. --- This option causes some small improvements in the time it takes to type check and generate an optimized and unoptimized standard library `.swiftmodule` | Mode | Optimization | Time | Time (Skipping) | Ratio | |:-------------------------------------|:-------------|:---------|:----------------|:------| | `-emit-parseable-module-interface` | N/A | 16.254s | 14.947s | 1.09x | | `-emit-module` | `-Onone` | 21.685s | 19.462s | 1.11x | | `-emit-module` | `-O` | 50.526s | 35.234s | 1.43x | However, since the standard library is full of inlinable code, it doesn't benefit much from skipping non-inlinable code. To find a more representative Swift project with little inlinable code, I ran these benchmarks against the current master of SwiftLint, which shows much stronger wins: | Mode | Optimization | Time | Time (Skipping) | Ratio | |:-------------------------------------|:-------------|:---------|:----------------|:----------| | `-emit-parseable-module-interface` | N/A | 13.179s | 1.732s | **7.70x** | | `-emit-module` | `-Onone` | 16.21s | 2.719s | **5.88x** | | `-emit-module` | `-O` | 63.408s | 12.186s | **5.26x** |

richardh · May 30, 2019, 5:00pm

Sounds good to me.

Note that a SIB file is not standalone; it's meant to be loaded along with all the other SIB (or source) files for a module

With the Split SIL approach, the idea is that each file has a separate process to generate a SIB for that file and then take those as input to generate the swiftmodule and object file? Would it still be possible to do this with WMO to have a single process with a single SIB output?