Compile-cache key instability from non-deterministic macro plugin binaries

TL;DR

When Swift compilation caching is enabled and a target loads macro plugins, the per-source compile cache key incorporates the plugin executable’s full file content (via the dependency scanner’s CAS file tree). Because ld64 generates a fresh LC_UUID and codesign writes a non-deterministic ad-hoc signature on every link, two runs of an identical build produce different cache keys for every Swift compile that loads a plugin, even when the produced object files are byte-identical. PCMs hit. Swift compiles miss. Reuse across executors and machines does not happen.

I have a candidate fix, but I would like direction from the caching folks before writing a patch against the wrong shape.

Evidence

Two consecutive builds of the same Swift project on the same machine, same commit, same working directory, with caching enabled. I exported the cas_analytics.db from each build and diffed the three tables:

Table Build 1 Build 2 Common Diverge
cas_outputs (content) 10,090 10,090 10,080 10 / 10
nodes 10,089 10,090 10,079 10 / 11
keyvalue_metadata (cache keys) 8,317 8,317 4,571 3,746 / 3,747

Reading this:

  • The CAS object store is 99.9% reproducible. Only 10 of ~10,090 outputs differ.

  • The cache keys are not. Around 3,747 keys per build are unique to that build. That number equals the count of Swift compile tasks. The 4,571 common keys equal the count of Clang PCM and other system tasks.

In other words: swift-frontend ran, computed a fresh cache key, produced byte-identical output, and stored it under a different key than the prior build. Whatever salts the key is not reaching the output.

Inspecting the 10 diverging CAS outputs by size:

size        b1 digest   b2 digest
--------    ---------   ---------
3,284       FB6...      AB3...     same size, different content
5,320       29B...      BC9...     same size, different content
50,904      403...      5C4...     same size, different content
53,696      30F...      4B7...     same size, different content
57,632      429...      EBA...     same size, different content
13,906,816  B2C...      C86...     same size, different content (13.9 MB)
20,461,140  AAF...      709...     same size, different content (20.5 MB)

Seven pairs are byte-for-byte the same size with different content. Three more differ by tens of bytes (signature blob length jitter). The signature is the classic fingerprint of Mach-O binaries with deterministic structure but embedded randomness: LC_UUID (16 bytes ld64 generates per link unless -reproducible is passed) and the ad-hoc LC_CODE_SIGNATURE blob. The 13.9 MB and 20.5 MB sizes are consistent with swift-syntax macro plugin executables; the smaller pairs are the smaller plugins or auxiliary tools the project uses.

How the binary content enters the cache key

Tracing the code path:

  1. The compile-job cache key is computed in lib/Frontend/CompileJobCacheKey.cpp:38-91. It hashes the rendered command line plus the Swift version. Plugin paths appear here as -load-resolved-plugin <lib>#<exec>#<modules> string args. The binary contents do not enter through this function.

  2. Binary content enters via the CAS include tree. The dependency scanner builds a CAS tree of every tracked file; its root ID is referenced from the command line as -clang-include-tree-root <id>, transitively content-hashing every tracked file.

  3. The plugin gets tracked in lib/DependencyScan/ScanDependencies.cpp:464-467:

llvm::for_each(dependencyInfoCopy.getMacroDependencies(),
               [this](const auto &entry) {
                 tracker->trackFile(entry.second.LibraryPath);
               });

That is the single seam where LC_UUID randomness propagates into thousands of compile cache keys. The size-comparison verification at lib/AST/PluginLoader.cpp:140-142 confirms LibraryPath is what the scanner treats as the plugin’s identity.

Why prefix-mapping does not help here

The reproducer is on a single machine with stable absolute paths. SWIFT_ENABLE_PREFIX_MAPPING is on, which is why all the PCMs (system inputs) reproduce. SWIFT_ENABLE_PROJECT_PREFIX_MAPPING is also irrelevant here because the paths to the plugin binary do not vary build-to-build. The variation is purely in the bytes ld64 and codesign embed.

Why “make the producer deterministic” is not a satisfying fix

The obvious knob is to make the plugin link reproducible. LD_DETERMINISTIC_MODE = YES is already the swift-build default and emits -Xlinker -reproducible, which fixes LC_UUID. The signing nonce is harder: disabling signing on host build tools (CODE_SIGNING_ALLOWED = NO) breaks execution on Apple Silicon, since the kernel refuses unsigned Mach-O binaries. Producing deterministic ad-hoc signatures from the spec layer alone is awkward.

So the fix has to move to the layer that consumes the plugin. Three candidates:

  1. Canonicalize the Mach-O before hashing. Mask LC_UUID and LC_CODE_SIGNATURE, hash the rest. This is the narrowest semantic fix, but it is fragile: even after masking, LC_BUILD_VERSION SDK timestamps, N_OSO DWARF paths to intermediate .o files, __LINKEDIT interior ordering, and function-starts table ordering can all vary. Whack-a-mole, with new fields appearing in each linker and compiler release.

  2. Hash plugin identity, not plugin content. A plugin’s behavior is determined by its source files, compiler version, compile flags, and import graph, all of which the build system already knows (it just orchestrated the plugin’s compile). When caching is on, the plugin’s own swift-emit-module cache key is, by construction, a stable digest of exactly that. Propagate that ID as the plugin’s contribution to consumer compile keys instead of hashing the produced binary.

  3. Status quo plus better docs. Tell users to enable LD_DETERMINISTIC_MODE and accept that ad-hoc signing nonce variation degrades hit rate. I do not think this is acceptable for cross-machine caching, but listing it for completeness.

I believe (2) is the right shape. It is strictly stronger than (1): two plugins with identical sources and compile inputs hash the same even if the binary changes for unrelated reasons (linker upgrade, signing tooling change). It is also producer-honest: the build system is the authority on plugin identity, not the linker’s metadata.

A minimum-impact spike

The narrowest landable change that introduces this contract is roughly 10 lines in one file. At the tracking call site above:

llvm::for_each(dependencyInfoCopy.getMacroDependencies(),
               [this](const auto &entry) {
                 SmallString<256> idPath{entry.second.LibraryPath};
                 idPath += ".cachekey";
                 // If the producer has declared a stable identity, hash that
                 // instead of the binary. Fall back to current behavior.
                 if (tracker->fileExists(idPath))
                   tracker->trackFile(idPath);
                 else
                   tracker->trackFile(entry.second.LibraryPath);
               });

(Method names are sketches; the real patch needs to match the tracker’s actual API, proper Expected<> plumbing, and a lit test under test/CAS/.)

Properties:

  • No new flag, no Options.td change, no driver change, no swift-build change required for v1.

  • Behavior is exactly preserved when no sidecar is present.

  • Build systems opt in when they are ready by emitting <plugin>.cachekey next to each plugin. swift-build can use the producer target’s swift-emit-module CAS key. SwiftPM can do the same. External tooling (Bazel, Buck, Tuist) can use their own recipe hashes.

  • False-positive-safe: if the producer does not declare an identity, current behavior holds.

Questions

  1. Is “hash plugin identity, not plugin content” the direction you would want this to go, or do you prefer a Mach-O canonicalization approach inside the include-tree tracker?

  2. If the sidecar approach is acceptable, is <plugin>.cachekey next to the binary the right discovery mechanism, or would you prefer it embedded in the MacroPluginDependency struct and surfaced through a new scanner JSON field?

  3. Is there appetite for a follow-up that teaches the scanner to track ExecutablePath as well as LibraryPath? Today only LibraryPath is tracked; for -load-plugin-executable plugins this means the executable’s content can vary without affecting the cache key at all, which is a separate latent issue (false-positive cache hits across plugin recompilations).

  4. If the answer to (1) is “we would rather fix it deeper in the compiler,” is there an existing direction in lib/CAS/ for normalized hashing of executable inputs that I should follow?

Happy to write the patch and lit test against whichever shape the team prefers. Just want to avoid building against the wrong contract.

2 Likes

Can you share a repro case demonstrating this? Historically the default UUID behavior has been to base the value on the object file's content, which is therefore reproducible (even without the -reproducible flag). This is still what the man page for ld says as well:

 -random_uuid
         Generate a random LC_UUID load command in the output file. By default the linker generates the UUID of the output file based on a hash of the output file's
         content. [snip]

In my experience adhoc signatures are reproducible as well. bazel relies on this reproducible behavior too, so I'd be very interested to know if something changed.

In the past I've found the best way to debug this case is to pass -Wl,-no_uuid -Wl,-no_adhoc_codesign to the link command to eliminate those 2 factors. That will produce a binary that doesn't work on arm64 macOS, but if you can produce those before and after, if there are still differences then the culprit is elsewhere.

Also note that since Xcode 26.4.1 it looks like LD_DETERMINISTIC_MODE is enabled by default:

{   Name = LD_DETERMINISTIC_MODE;
    Type = Boolean;
    DefaultValue = YES;
    CommandLineArgs = {
        YES = (
            "-Xlinker",
            "-reproducible",
        );
        NO = ();
    };
    SupportedVersionRanges = ( "804" );
},

I think your investigation basically covers the situation. When using -reproducible linker flag, the plugin library is not really non-deterministic. Code sign might be an issue, especially we want to do distributed caching where signing key can be different.

Hash plugin identity, not plugin content. A plugin’s behavior is determined by its source files, compiler version, compile flags, and import graph, all of which the build system already knows (it just orchestrated the plugin’s compile). When caching is on, the plugin’s own swift-emit-module cache key is, by construction, a stable digest of exactly that. Propagate that ID as the plugin’s contribution to consumer compile keys instead of hashing the produced binary.

I think this is generally the direction I would like to see it to go but the actual fix is a lot more complicated than the patch you provided. There is no .cachekey file created currently and we cannot rely on a single build system to create this file since this needs to work universally.

Also remember the correctness of the cache is the most important thing, so there are few issues with your design:

  • .cachekey file is a standalone file that can be messed up in the build and might not equal to the identify of the plugin library
  • We also want to verify that the identity of the plugin library found by the dependency scanner is the actual plugin library that is loaded. We can do that with a full CAS entry of the binary, but cannot do so with a cache key.

The actual design might be something in the middle, like embedding the identity of the plugin library in itself.

If you have a complete design, feel free to write a proposal or write a PR to review. Thanks for looking into this.