Summary
Swift's Incremental compilation mode relies on file timestamps to detect updated inputs. This is fragile, as timestamps can be updated even if file contents remain the same from the previous build. For instance, making a change and then reverting it, or moving back and forth between source control checkouts. This is a particular issue for build systems that involve distributing work between multiple hosts. For this reason we propose adding the option to save file hash information in the dependency graph alongside timestamp information, to prevent these 'false positives' from causing unnecessary work.
The Problem
At Meta, work in developer builds can be parallelized between local and remote hosts to reduce build times. This works well under most circumstances, but it interacts poorly with Swift's Incremental compilation mode. Incremental compilation mode relies on timestamp information to determine which sources and dependencies have been updated since the last build. When files are copied between machines timestamps are updated and this causes unnecessary work to be scheduled. The result is that Incremental Compilation mode is of little practical use in these circumstances.
There are two classes of inputs where inconsistent timestamp information can schedule unwanted work. Firstly, the module's Swift source files themselves, which have their individual timestamps from the last build recorded, and are considered updated if those have changed at all. Secondly, module and toolchain dependencies, which must have timestamps older than the recorded previous build start time.
In a distributed build, all of these files might be set to new timestamps either as a consequence of copying the files from another machine, or provisioning a new build host.
Proposed Solution
We propose adding an additional compiler flag to swift-driver that records file hash information alongside timestamp information. When enabled, two classes of file dependencies will have their contents hashed and the information serialized for use in the subsequent incremental compile:
- Direct .swift file dependencies, i.e. InputInfo objects
- External dependencies, i.e. ExternalDependency objects
These files will be hashed (either with the SHA256 implementation already available to swift-driver, or by adding an implementation of xxHash or the like), and those hashes serialized in the corresponding blob fields for those objects.
On subsequent compiles, if and only if the timestamp information has been updated in a way that would otherwise invalidate that dependency, we will rehash the contents of that file. If the hash is unchanged, we can regard that dependency as unchanged for the purposes of this compile. The new hash information will of course be recorded for subsequent builds.
Implementation
I have a working branch that implements this solution via SHA256 hashes. I'm eager to find out if something like this might be acceptable to merge into swift-driver. I'm also happy to implement a faster hash solution if deemed necessary, or take any other feedback.