Explicit Module Builds, the new Swift Driver, and SwiftPM

Hi all,

In the llbuild2 announcement, @ddunbar mentioned the goal of supporting "explicit modules" in the build system. I wanted to describe what explicit modules are, why they are important, and the developer work we're doing to bring the benefits of explicit module builds to SwiftPM.

Modules in Swift

Swift programs are composed of a number of modules. Each module has a name (e.g., MyModule) and some number of source files. To build a given Swift module MyModule, we first need to have built a binary representation of each module that is imported by the source files in that module. From the compiler's perspective, there are four forms of module that can be imported:

  1. A binary .swiftmodule file: binary .swiftmodule files are created by the Swift compiler when it builds a module, and provides the interface by which other Swift modules can access the API of that module. Binary .swiftmodule files are tied to a specific compiler version.
  2. A binary (Objective-)C .pcm file: binary .pcm files are created by the Swift compiler's embedded Clang compiler when it builds an (Objective-)C module, and provides the interface by which Swift modules can access the (Objective-)C API of that module. Binary .pcm files are tied to a specific compiler version.
  3. A textual .swiftinterface file: textual .swiftinterface files are a superset of Swift source code that can be distributed along with binary libraries. They are compatible with multiple versions of the Swift compiler (i.e., the compiler that generated them and newer versions). However, they need to be compiled into a binary .swiftmodule file (case 1) to be used by the Swift compiler.
  4. (Objective-)C modules: a set of (Objective-)C headers, described by a module map that can be imported into Swift. (Objective-)C modules need to be compiled into a binary format (a .pcm file, described in case 2) to be used by the Swift compiler.

Implicit Module Builds

When the Swift compiler sees a module import such as import MyModule, it looks through the module search path to find a module with the corresponding name. The module search path can be specified by the user using -I and -F flags, as well as having some defaults (e.g., the compiler's own resource directory and some paths within the the provided SDK).

When the compiler finds a binary .swiftmodule or .pcm file (cases 1 or 2, respectively) in the search path, it can load it directly. However, when it finds either a Swift or Clang textual module (cases 3 or 4, respectively), that textual representation must first be compiled (into a binary .swiftmodule or a .pcm file, respectively) before it can be loaded. The Swift compiler will implicitly spawn a thread with another compiler instance to compile each textual module into its appropriate binary module. Once complete, the binary module will be loaded into the original Swift compilation thread. Note that each thread spawned to compile a textual module may itself require additional textual modules (the ones it imports) to be compiled into binary modules.

The binary modules can generally be re-used from one compilation to the next, so they are cached in the module cache. The module cache is a shared directory on the system (within DerivedData in Xcode, or otherwise a platform-specific temporary directory), which can be overridden via the -module-cache-path command line parameter. The Swift compiler will look for an up-to-date binary module in the module cache before initiating a compile of a textual module; when it does compile the textual module into a binary module, it will be recorded in the cache for other Swift compiler instances to bind. Multiple Swift compiler instances will access the module cache at the same time, so the compiler employs a lowest-common-denominator approach to manage access via LLVM's LockFileManager.

Implicit module builds cause problems for both compilation performance and compiler stability:

  • Performance: in any given compile, there are likely to be many Swift compiler instances sharing the same module cache, and requiring the same binary modules. However, these compiler instances will only realize which binary modules are necessary during their compilation, which means they either need to duplicate work (each compiler instance compiles a copy of the binary module) or coordinate (most compiler instances will be stuck waiting for another to compile a certain binary module they need). At best, this is a lost opportunity for parallelism (more binary modules could have been built in parallel); at worst, it oversubscribes the machine. Additionally, every compiler instance is doing redundant work to validate each binary module in the module cache, e.g., running stat for every header file in an (Objective-)C module.
  • Correctness: the module cache has suffered from numerous problems due to cache invalidation (e.g., when the textual inputs of a module cache) and races in the file system, which lead to hard-to-diagnose compiler crashes or mysterious behavior that goes away when cleaning the module cache.

Explicit Module Builds

Explicit module builds are an attempt to move the compilation of textual modules into binary modules out of the Swift compiler instance that imports the module, and up into the build system as an explicit compilation step. The build system is then responsible for scheduling the compilation, checking timestamps on inputs (for incremental builds), and ensuring that all of the binary modules needed by a Swift compilation job have already been built before that compilation job executes.

Explicit module builds are meant to eliminate the problems with implicit module builds, improving parallelism, reducing redundant work among Swift compiler instances, and enabling new technologies such as distributed builds. There are a number of technologies that we are working on in the Swift compilation stack to enable explicit module builds.

Fast Dependency Scanner

Swift recently gained a fast dependency scanner, which is a Swift compiler mode that scans a Swift module for import declarations and resolves which modules will be loaded. It is based on the clang-scan-deps library within Clang, for (Objective-)C modules, but is extended to also understand textual Swift modules (.swiftinterface files).

The output of the dependency scanner is a graph of all of the dependencies of that Swift module, included every module that will be imported (directly or indirectly). For each module, the graph contains a description of the compilation step required to build a binary module from a textual module. This dependency graph can be read by a build system to schedule the necessary explicit module builds before other compilation jobs.

The fast dependency scanner is still a work-in-progress. If you'd like to experiment with it on the master branch of the Swift compiler, use the -scan-dependencies command-line option to invoke the fast dependency scanner.

New Swift Driver

The new Swift Driver project is a reimplementation of the Swift compiler's "driver", which coordinates the build of a single Swift module by invoking the underlying Swift compiler to compile each .swift file and then combine the partial results into module-level output, such as a library or executable.

In other words, it's a miniature build system for a single Swift module, and even makes use of llbuild under the hood to execute the various compilation steps. The driver currently relies on implicit module builds. However, it is tightly coupled with the compiler itself, and is in the process of being extended to use the fast dependency scanner. The new Swift driver will invoke the fast dependency scanner to get a graph of all of the binary modules that need to be built, then create separate compilation steps for each binary module, using llbuild to schedule the actual build. Using the new Swift driver in this way will eliminate the use of the implicit module cache when building a single Swift module.

The new Swift driver has one other crucial architectural advantage over the driver it replaces: it is architected as a Swift library itself, designed for integration in other build systems. A build system can ask the new Swift driver to produce the set of compilation jobs that need to be executed to build a Swift module, without actually executing those compilation jobs. The build system can then add those compilation jobs to its own build graph to be executed as appropriate. This allows a single build system to coordinate all of the compilation jobs across many different Swift modules at once, rather than having the build system spawn many instances of the Swift driver, each of which is its own miniature build system separate from the others. It also means that adding support for explicit module builds to the new Swift driver makes those module-compilation jobs visible to the build system as a whole, allowing them to be appropriately scheduled (in parallel) and ensuring that a given module is only compiled once from its textual form.

SwiftPM

The Swift Package Manager is a build system for Swift packages, suitable for building a package and all of its dependencies across the various platforms Swift supports. We have recently implemented experimental support for using the integrated Swift driver in SwiftPM. This uses the new Swift driver library to provide the set of compilation jobs to perform to build a Swift module, integrating the results into its own build graph. Once the Swift driver starts producing jobs for explicit module builds, SwiftPM itself will integrate those into its build graph as well, eliminating implicit module builds entirely from package builders.

The experimental support for the integrated Swift driver can be enabled by using swift build --use-integrated-swift-driver. Please give it a try.

llbuild2

llbuild2 is an experimental new implementation of llbuild, designed with remote execution in mind, and with the intent to replatform SwiftPM on top (so-called swiftpm-on-llbuild2) to provide distributed builds for Swift packages. swiftpm-on-llbuild2 can leverage the same integrated Swift driver work described above for SwiftPM, but enabling remote build execution and artifact caching to import build performance.

An interesting intermediate step might be to replatform the new Swift driver on top of llbuild2, to make use of its artifact caching scheme for explicitly-built modules as a more robust replacement for the existing module cache. Such a scheme would benefit build systems that do not have the integrated Swift driver.

Wrap-up

Explicit module builds is an important technological change for Swift compilation, which should improve build performance and reliability, as well as being a necessary step toward distributed builds. We're working across various parts of the Swift compilation stack---the Swift compiler, new Swift driver, SwiftPM, and llbuild2---and we could use your help! In addition to the many direct development tasks, which are often called out in the README for the projects, many of these experimental tools and modes are in need of wider testing: dropping in the new Swift driver into your build (e.g., in an Xcode project) or enabling --use-integrated-swift-driver in your SwiftPM builds can uncover previously-unknown bugs that can help move the project forward (and many of those will be easy to fix, too!). Much of the documentation can be improved, additional tests can be ported, and experimental ideas explored. This space is wide open and I think we can make some significant improvements in the development experience for Swift.

Doug

88 Likes

Thank you, this was a very good post on how existing builds work and how they will work, I learned a lot from it. Looking forward to a build system with better performance and error reporting.
I am wondering about those mysterious crashes, are they the source of "Command CompileSwift failed with a nonzero exit code" ?

Thanks for the detailed write-up @Douglas_Gregor! Would it be worth creating something like ExplicitModuleBuilds.md in the docs directory of the main toolchain repository, so that this ongoing work is more discoverable?

8 Likes

Thanks for the awesome write up @Douglas_Gregor!

2 Likes

This is a great explanation, thanks a lot!

+1 to what Max said, I'd love to see a single document that tries to summarize everything that is going on and how the community can help by testing or contributing to the new tools.

2 Likes

Just to echo this, I think the best way to get people using these experimental features would be to enable their usage through Xcode, as it's still by far the most popular way to build Swift code. This would (hopefully) include being able to build apps as well as packages using the new settings. Otherwise I doubt we'll see much usage of these features until they ship.

4 Likes

Agreed. This is definitely a known pain point currently, that it isn't as easy as we would like to have new builds of SwiftPM fully used in Xcode.

Yes, I agree it's a pain point. swift-driver has some instructions for dropping the new Swift driver into Xcode builds, but they're a little hard to follow. Further integration of swift-driver into the toolchain will make it easier.

Doug

2 Likes

Hello. Just want to know if there is any incentive to make swift support build-time metaprogramming, given the presence of libraries such as swiftsyntax?

This is an awesome post. We need more of these. @Douglas_Gregor Could u point to a place where I can read more to understand how swift compiler makes this decision to use the cache or not?

The module interface loader in the Swift frontend is responsible for deciding when to use a cached version of a module vs. rebuilding it.

Doug

3 Likes

I'm assuming you mean something like a macro system? It's been mentioned in the past, but I don't know of anyone actively working on it.

Doug

Would this allow to create several modules within one project? Something similar to namespaces without defining them explicitly as separate units of compilation?

No, this does not change the programming model at all. It’s a change to the implementation that should be invisible to users except for any benefits in build performance and robustness that it’s intended to bring (well, and bugs it might have).

2 Likes

This might be not the perfect place to ask, but still. Is there something planned to do with underlying module importing? As for now, I see it like feature implemented with some hacks here and there, that will get in a way when implementing explicit modules (or I'm wrong?). May we see some different approach to it in the future?

There are no plans to change anything about underlying module imports. The dependency scanner models Clang and Swift modules as separate entities in the module graph it emits, and the command-line flags are very different for building them, so there shouldn't be any extra issues here that we haven't accounted for.

Doug

Hi everyone,

A few months have passed since this announcement was made and I wanted to provide a quick update on the progress that @Xi_Ge, @Douglas_Gregor and I have been making on this project, spanning the Swift compiler, new Swift driver, and Swift Package Manager.

Status

SwiftPM can now self-host using Explicit Module Builds. The package manager itself is a reasonably complex Swift package, and building it exercises most of the new machinery across the involved components. This was an important milestone for getting the basics of the new compilation flow functional.

Highlights

  • The new Swift driver's Swift library architecture made it simple to integrate the basics of explicit modules into SwiftPM by providing APIs for the new build planning flow.
  • Relying exclusively on explicit module dependencies has been a powerful learning tool for understanding the fine details of how today's Implicit Module Builds work as we try to achieve functional parity between the two flows. Interesting example:
    • Dependencies on Clang modules in a build graph mean that each Clang module must potentially be built and scanned multiple times (once for each depending Swift module with a distinct target).
  • Explicit Module Builds are allowing us to diagnose a class of problems at build plan time that would otherwise be caught much later in the compilation process or even cause compiler hangs and crashes.

Call to action

While the new build flow is still very much experimental, there are several ways to get your hands on it to start contributing bug reports and patches, please give it a try!

  • When using the new Swift driver as a drop-in replacement for today's driver [how-to] on the command line, Explicit Module Build of the target module is enabled with the -experimental-explicit-module-build flag.
  • SwiftPM packages can be built using Explicit Modules with the --use-integrated-swift-driver --experimental-explicit-module-build flag combination.

In expectation, the above flags should not affect the result of compilation; however, explicit module build jobs are not yet capable of interacting with the module cache, which can result in more computation and slower build times than their implicit counterparts.

15 Likes

Is there a way to tell the driver to use an explicit Clang module? That is, an option equivalent to Clang's -fmodule-file=[<name>=]<file>?

You should be able to pass in this very same Clang flag to the Swift Driver with something like:

-Xcc -Xclang -Xcc -fmodule-file=[<name>=]<file>

Then, when Swift is getting Clang to load this module, it should prefer the PCM you specify here over building one from-scratch.

For a very simple example, the above seems to work; but, it is very easy to run into various compatibility issues if your pre-built Clang PCM was built with a different set of command-line flags than those that the current compiler invocation will pass to Clang.

Thanks a bunch! Works perfectly. :slight_smile: