Hi all,
In the llbuild2 announcement, @ddunbar mentioned the goal of supporting "explicit modules" in the build system. I wanted to describe what explicit modules are, why they are important, and the developer work we're doing to bring the benefits of explicit module builds to SwiftPM.
Modules in Swift
Swift programs are composed of a number of modules. Each module has a name (e.g., MyModule
) and some number of source files. To build a given Swift module MyModule
, we first need to have built a binary representation of each module that is imported by the source files in that module. From the compiler's perspective, there are four forms of module that can be imported:
- A binary
.swiftmodule
file: binary.swiftmodule
files are created by the Swift compiler when it builds a module, and provides the interface by which other Swift modules can access the API of that module. Binary.swiftmodule
files are tied to a specific compiler version. - A binary (Objective-)C
.pcm
file: binary.pcm
files are created by the Swift compiler's embedded Clang compiler when it builds an (Objective-)C module, and provides the interface by which Swift modules can access the (Objective-)C API of that module. Binary.pcm
files are tied to a specific compiler version. - A textual
.swiftinterface
file: textual.swiftinterface
files are a superset of Swift source code that can be distributed along with binary libraries. They are compatible with multiple versions of the Swift compiler (i.e., the compiler that generated them and newer versions). However, they need to be compiled into a binary.swiftmodule
file (case 1) to be used by the Swift compiler. - (Objective-)C modules: a set of (Objective-)C headers, described by a module map that can be imported into Swift. (Objective-)C modules need to be compiled into a binary format (a
.pcm
file, described in case 2) to be used by the Swift compiler.
Implicit Module Builds
When the Swift compiler sees a module import such as import MyModule
, it looks through the module search path to find a module with the corresponding name. The module search path can be specified by the user using -I
and -F
flags, as well as having some defaults (e.g., the compiler's own resource directory and some paths within the the provided SDK).
When the compiler finds a binary .swiftmodule
or .pcm
file (cases 1 or 2, respectively) in the search path, it can load it directly. However, when it finds either a Swift or Clang textual module (cases 3 or 4, respectively), that textual representation must first be compiled (into a binary .swiftmodule
or a .pcm
file, respectively) before it can be loaded. The Swift compiler will implicitly spawn a thread with another compiler instance to compile each textual module into its appropriate binary module. Once complete, the binary module will be loaded into the original Swift compilation thread. Note that each thread spawned to compile a textual module may itself require additional textual modules (the ones it imports) to be compiled into binary modules.
The binary modules can generally be re-used from one compilation to the next, so they are cached in the module cache. The module cache is a shared directory on the system (within DerivedData
in Xcode, or otherwise a platform-specific temporary directory), which can be overridden via the -module-cache-path
command line parameter. The Swift compiler will look for an up-to-date binary module in the module cache before initiating a compile of a textual module; when it does compile the textual module into a binary module, it will be recorded in the cache for other Swift compiler instances to bind. Multiple Swift compiler instances will access the module cache at the same time, so the compiler employs a lowest-common-denominator approach to manage access via LLVM's LockFileManager.
Implicit module builds cause problems for both compilation performance and compiler stability:
-
Performance: in any given compile, there are likely to be many Swift compiler instances sharing the same module cache, and requiring the same binary modules. However, these compiler instances will only realize which binary modules are necessary during their compilation, which means they either need to duplicate work (each compiler instance compiles a copy of the binary module) or coordinate (most compiler instances will be stuck waiting for another to compile a certain binary module they need). At best, this is a lost opportunity for parallelism (more binary modules could have been built in parallel); at worst, it oversubscribes the machine. Additionally, every compiler instance is doing redundant work to validate each binary module in the module cache, e.g., running
stat
for every header file in an (Objective-)C module. - Correctness: the module cache has suffered from numerous problems due to cache invalidation (e.g., when the textual inputs of a module cache) and races in the file system, which lead to hard-to-diagnose compiler crashes or mysterious behavior that goes away when cleaning the module cache.
Explicit Module Builds
Explicit module builds are an attempt to move the compilation of textual modules into binary modules out of the Swift compiler instance that imports the module, and up into the build system as an explicit compilation step. The build system is then responsible for scheduling the compilation, checking timestamps on inputs (for incremental builds), and ensuring that all of the binary modules needed by a Swift compilation job have already been built before that compilation job executes.
Explicit module builds are meant to eliminate the problems with implicit module builds, improving parallelism, reducing redundant work among Swift compiler instances, and enabling new technologies such as distributed builds. There are a number of technologies that we are working on in the Swift compilation stack to enable explicit module builds.
Fast Dependency Scanner
Swift recently gained a fast dependency scanner, which is a Swift compiler mode that scans a Swift module for import
declarations and resolves which modules will be loaded. It is based on the clang-scan-deps library within Clang, for (Objective-)C modules, but is extended to also understand textual Swift modules (.swiftinterface
files).
The output of the dependency scanner is a graph of all of the dependencies of that Swift module, included every module that will be imported (directly or indirectly). For each module, the graph contains a description of the compilation step required to build a binary module from a textual module. This dependency graph can be read by a build system to schedule the necessary explicit module builds before other compilation jobs.
The fast dependency scanner is still a work-in-progress. If you'd like to experiment with it on the master
branch of the Swift compiler, use the -scan-dependencies
command-line option to invoke the fast dependency scanner.
New Swift Driver
The new Swift Driver project is a reimplementation of the Swift compiler's "driver", which coordinates the build of a single Swift module by invoking the underlying Swift compiler to compile each .swift
file and then combine the partial results into module-level output, such as a library or executable.
In other words, it's a miniature build system for a single Swift module, and even makes use of llbuild
under the hood to execute the various compilation steps. The driver currently relies on implicit module builds. However, it is tightly coupled with the compiler itself, and is in the process of being extended to use the fast dependency scanner. The new Swift driver will invoke the fast dependency scanner to get a graph of all of the binary modules that need to be built, then create separate compilation steps for each binary module, using llbuild
to schedule the actual build. Using the new Swift driver in this way will eliminate the use of the implicit module cache when building a single Swift module.
The new Swift driver has one other crucial architectural advantage over the driver it replaces: it is architected as a Swift library itself, designed for integration in other build systems. A build system can ask the new Swift driver to produce the set of compilation jobs that need to be executed to build a Swift module, without actually executing those compilation jobs. The build system can then add those compilation jobs to its own build graph to be executed as appropriate. This allows a single build system to coordinate all of the compilation jobs across many different Swift modules at once, rather than having the build system spawn many instances of the Swift driver, each of which is its own miniature build system separate from the others. It also means that adding support for explicit module builds to the new Swift driver makes those module-compilation jobs visible to the build system as a whole, allowing them to be appropriately scheduled (in parallel) and ensuring that a given module is only compiled once from its textual form.
SwiftPM
The Swift Package Manager is a build system for Swift packages, suitable for building a package and all of its dependencies across the various platforms Swift supports. We have recently implemented experimental support for using the integrated Swift driver in SwiftPM. This uses the new Swift driver library to provide the set of compilation jobs to perform to build a Swift module, integrating the results into its own build graph. Once the Swift driver starts producing jobs for explicit module builds, SwiftPM itself will integrate those into its build graph as well, eliminating implicit module builds entirely from package builders.
The experimental support for the integrated Swift driver can be enabled by using swift build --use-integrated-swift-driver
. Please give it a try.
llbuild2
llbuild2 is an experimental new implementation of llbuild, designed with remote execution in mind, and with the intent to replatform SwiftPM on top (so-called swiftpm-on-llbuild2
) to provide distributed builds for Swift packages. swiftpm-on-llbuild2
can leverage the same integrated Swift driver work described above for SwiftPM, but enabling remote build execution and artifact caching to import build performance.
An interesting intermediate step might be to replatform the new Swift driver on top of llbuild2
, to make use of its artifact caching scheme for explicitly-built modules as a more robust replacement for the existing module cache. Such a scheme would benefit build systems that do not have the integrated Swift driver.
Wrap-up
Explicit module builds is an important technological change for Swift compilation, which should improve build performance and reliability, as well as being a necessary step toward distributed builds. We're working across various parts of the Swift compilation stack---the Swift compiler, new Swift driver, SwiftPM, and llbuild2---and we could use your help! In addition to the many direct development tasks, which are often called out in the README for the projects, many of these experimental tools and modes are in need of wider testing: dropping in the new Swift driver into your build (e.g., in an Xcode project) or enabling --use-integrated-swift-driver
in your SwiftPM builds can uncover previously-unknown bugs that can help move the project forward (and many of those will be easy to fix, too!). Much of the documentation can be improved, additional tests can be ported, and experimental ideas explored. This space is wide open and I think we can make some significant improvements in the development experience for Swift.
Doug