RFC: File Hashes for Swift Incremental Compilation

Summary

Swift's Incremental compilation mode relies on file timestamps to detect updated inputs. This is fragile, as timestamps can be updated even if file contents remain the same from the previous build. For instance, making a change and then reverting it, or moving back and forth between source control checkouts. This is a particular issue for build systems that involve distributing work between multiple hosts. For this reason we propose adding the option to save file hash information in the dependency graph alongside timestamp information, to prevent these 'false positives' from causing unnecessary work.

The Problem

At Meta, work in developer builds can be parallelized between local and remote hosts to reduce build times. This works well under most circumstances, but it interacts poorly with Swift's Incremental compilation mode. Incremental compilation mode relies on timestamp information to determine which sources and dependencies have been updated since the last build. When files are copied between machines timestamps are updated and this causes unnecessary work to be scheduled. The result is that Incremental Compilation mode is of little practical use in these circumstances.

There are two classes of inputs where inconsistent timestamp information can schedule unwanted work. Firstly, the module's Swift source files themselves, which have their individual timestamps from the last build recorded, and are considered updated if those have changed at all. Secondly, module and toolchain dependencies, which must have timestamps older than the recorded previous build start time.

In a distributed build, all of these files might be set to new timestamps either as a consequence of copying the files from another machine, or provisioning a new build host.

Proposed Solution

We propose adding an additional compiler flag to swift-driver that records file hash information alongside timestamp information. When enabled, two classes of file dependencies will have their contents hashed and the information serialized for use in the subsequent incremental compile:

  • Direct .swift file dependencies, i.e. InputInfo objects
  • External dependencies, i.e. ExternalDependency objects

These files will be hashed (either with the SHA256 implementation already available to swift-driver, or by adding an implementation of xxHash or the like), and those hashes serialized in the corresponding blob fields for those objects.

On subsequent compiles, if and only if the timestamp information has been updated in a way that would otherwise invalidate that dependency, we will rehash the contents of that file. If the hash is unchanged, we can regard that dependency as unchanged for the purposes of this compile. The new hash information will of course be recorded for subsequent builds.

Implementation

I have a working branch that implements this solution via SHA256 hashes. I'm eager to find out if something like this might be acceptable to merge into swift-driver. I'm also happy to implement a faster hash solution if deemed necessary, or take any other feedback.

16 Likes

I think this is a very reasonable proposal, and I like the proposed approach of initially enabling it on an opt-in basis + checking timestamps before rehashing inputs. I think starting with SHA256 hashing is ok as well - there aren't any compatibility constraints preventing us from replacing it with a better hash function in the future.

cc @ArtemC who might have some thoughts on this rfc as well

How does this relate to the new compilation caching opt-in feature of the build system, mentioned in the release notes for Xcode 26 Beta?

I'm a big fan of dumping the timestamp comparisons out of the incremental build. The passage of time is hard enough to comprehend while you're living it. For machines, doubly so.

Less tongue-in-cheek, I appreciate you've expressed this as a set of feature flags. The implementation looks very reasonable to me. This is one of those areas of the driver that we inherited from the C++ legacy driver and efforts like this to modernize it are sorely needed and much appreciated.

Thank you!

3 Likes

I will put in that one reason the legacy driver used timestamps rather than hashing, besides performance, was that people like when touch File.swift actually retriggers compilation, or at least linking. But that hasn't been true since the sub-file incremental logic was added, years ago, so it shouldn't figure into the decision anymore.

Thanks for the feedback. Since it's broadly positive, I've put up a PR:

Obviously this is working on a different level, but I suppose there are situations where Xcode wouldn't be able to retrieve from its cache (because that exact set of inputs is new to the build system) but incremental might still be of value (because only some of the inputs are new). And of course, this will work outside of Xcode builds.

1 Like

Just want to echo my +1 as well. I think this is a good proposal and particularly staging it in with feature flags is the right way to go.

1 Like