Modifying debug info for hermetic/distributable builds

Swift currently embeds absolute paths and some other non-hermetic information in the DWARF section of objects and in .swiftmodule files (SR-5694):

  • DWARF contains the absolute paths to the compilation directory and to the compiled files, and the full command line used to compile the code.
  • Swift modules contain the arguments passed to ClangImporter, and the absolute paths of the header search paths.

This causes a couple problems for our distributed builds at Google. The absolute paths differ based on the destination machine where the code is being compiled. Usually there’s a hash in the path, so a typical absolute path would be something like /build/0123456789abcdef/path/to/my/code.swift. This means that code compiled remotely can’t be debugged (the source locations don’t match up to what the user has on their workstation), code coverage suffers the same problem, and the compile outputs can’t be reliably cached.

We solve this problem for C++ by using clang’s -fdebug-prefix-map flag to remap the paths that are written into the DWARF section, so I’d like to add this to Swift. I’ve staged an implementation here, which seems to work well for the DWARF paths.

Aside from the obvious question of “do you want this?”, I have a couple other questions:

  1. Swift unconditionally writes the command line used to compile the app into DWARF (TAG_compile_unit/AT_APPLE_flags). Clang, on the other hand, only appears to do this for Mach-O, and only if the environment variable RC_DEBUG_OPTIONS is set. Can we update Swift to replicate Clang’s behavior, or does Swift unconditionally need this information? Removing it would get rid of another source of non-hermeticity, especially since remapping paths in the raw flags is a significantly harder problem because it requires understanding the semantics of each individual flag (e.g., handling -Ifoo as if it were -I foo).

  2. From looking over lib/Serialization/Serialization.cpp, the XCC, SDKPath, and SearchPaths sections of a Swift module appear to only be used for debugging purposes. Remapping SDKPath and SearchPaths in the same way as DWARF paths seems straightforward. Is XCC used during debugging in a way that would make it difficult to remove? Could we conditionalize it the same way proposed above (RC_DEBUG_OPTIONS) to remove the non-hermeticity unless explicitly requested?

6 Likes

I had to set this aside for a while, but I was finally able to revisit it.

Regarding item 2 above, it appears that the compiler uses those debugging-only search paths even during regular compilation actions, because ASTContext (and subsequent module loading) does not distinguish between "this AST context is being used by the compiler" and "this AST context is being used by the debugger". Thus, import paths and ClangImporter options written into .swiftmodule files when -serialize-debugging-options is specified cause downstream dependencies to compile with different search paths depending on whether you're doing a debug build or a non-debug build. In other words, it's possible to craft imports that succeed or fail to compile solely based on whether debugging options are present, which I presume is not by design. I've filed SR-7845 with a reproducible example.

I encountered this problem when trying to remap the search paths. If you compile ModuleA with debugging options on, -I foo, and remap foo=bar, then foo gets added to the AST context's search paths and then they are remapped when the module is serialized (so bar gets written into the file), as you would expect. But then if you compile ModuleB that depends on ModuleA with the same options, foo gets added to the search paths, then ModuleA is loaded and bar is added to the search paths, and the compiler doesn't realize they're the same. So they get remapped when ModuleB is serialized—foo remaps to bar, the other bar stays as bar, and you end up with duplicate search paths in the module file.

Rather than try to be clever and remap the paths before serialization (that's technically not the right thing to do, because they're being wrapped for the debugger, not for general use), I think it's more important to fix the underlying problem—the compiler shouldn't be using search paths from other modules' debug options at all.

Can we update Swift to replicate Clang’s behavior, or does Swift unconditionally need this information?

I believe that the answer is yes. Early on in Swift's development LLDB was using the DW_AT_APPLE_flags to extract Clang header search paths, but these are now also serialized in the .swiftmodule files. I'd be happy to review a patch that brings Swift to match Clang's behavior.

Is XCC used during debugging in a way that would make it difficult to remove?

The serialized XCC options are used to initialize the ClangImporter used by the Swift compiler embedded in LLDB. Removing them would break debugging of Swift programs that use (Obj)C interoperability. The correct fix for the problem would be to provide some form of path remapping. (See also https://github.com/apple/swift-lldb/pull/503 for how this is done in .dSYM bundles on Apple platforms for example).

1 Like

Thanks! That should be a fairly straightforward one that I can put together in the near future.

Thanks for confirming—I've dug around in the code some more since my original post and thought that was the case. And the PR you linked confirms another one of my concerns, that if I want to remap the paths for arbitrary ClangImporter arguments at the time they're serialized into the .swiftmodule file, I need to handle the joined form of some flags like -I correctly as you've done here.

In my case, since I need to remap the paths before serialization for not only debugging for also hermeticity/reproducibility, I have to handle more cases than just the ones LLDB cares about. So if we're remapping /foo=/bar, then I'd cover:

  • Any bare string with a path prefix matching one in the prefix map, which could be an input file or a non-joined search path (e.g., /foo/main.c -I /foo/includes -> /bar/main.c -I /bar/includes)
  • Any joined argument that can be followed by a path prefix matching the map (e.g., -I/foo/includes -F/foo/frameworks -> -I/bar/includes -F/bar/frameworks)
  • The right-hand side of any argument joined by = that has a path prefix matching the map (e.g., -fmodule-map-file=/foo/my.map -> -fmodule-map-file=/bar/my.map)

Does that sound correct? Is there anything I'm forgetting?

These examples sound fine, but I'm sure that there are many more clang options that take paths (-working-directory comes to mind).
If your platform also supports header maps, you will also need to rewrite any paths in there.
What I expect to be most problematic is dealing with .swiftmodules. You need to rewrite the paths as you serialize each .swiftmodule but when you import them (during the same compilation, not just in the debugger) you need to apply the inverse transformation.

Right—in this case, it can appear in two forms: -working-directory /foo or -working-directory=/foo. The first situation would be handled by the first bullet above (the path would be its own bare word in the args array) and the second would be handled by the third bullet point. I think that covers everything so I just need to make sure to collect the correct list of joined flags for the second bullet.

Getting more information on this is why I filed SR-7845. From a bit of exploration, I couldn't see why the compiler would need those search paths and ClangImporter options (otherwise it would be possible to construct compilations that behave differently in debug vs. release mode, right?). Do you have some additional insight into this?