Lldb performance with clang modules and Swift

Our project has about 1k clang modules, and about 600 swift modules. We experience severe delay when attaching to process with lldb: after emptying ~/Library/Developer/Xcode/DerivedData/ModuleCache.noindex it takes about 20 minutes to hit breakpoint set in application start. I've used profiler to find out that this time is required to re-populate module cache to build module in context of which lldb console expressions are built, which is expected.

However, what was not expected is the following line in the lldb log:

Extra clang arguments        : (28814 items)

It turned out that lldb collects all clang arguments from all Swift modules that are passed to linker through -add_ast_path flag and concatenates them, ignoring the fact that vast majority of them can be deduplicated. To check my hypothesis if this could affect module cache population time, I did the following:

  1. I made a fake Swift module that consisted of single swift source file in which were enumerated imports for all known clang modules used, and paths to clang modulemaps were passed via -fmodule-map-file flag;
  2. I passed only this Swift module from #1 through -add_ast_path linker arg, and did not pass any other modules;
  3. I didn't link this Swift module into final executable.

After this manipulations debugging in Swift code worked just fine (no issues with lldb expressions in Swift), first breakpoint time after cleaning module cache dropped very significantly from about 20 minutes to about 6 minutes, and in lldb log I found this:

Extra clang arguments        : (962 items)

This facts raise two questions:

  1. Why did reducing number of extra clang arguments while keeping same amount of clang modules affect module cache population time so significantly? Does clang rebuild module for each -fmodule-map-file argument encountered?
  2. Can be this fix with arguments deduplication embedded in lldb?
1 Like

@Adrian_Prantl maybe you could help here

It would not be surprising if 28814 options do cause performance issues. However, multiple mentions of the same -fmodule-map-file option do not seem to trigger multiple module rebuilds.

As an experiment I created a test program with one module and passed the option twice:

clang -fmodules -fmodule-map-file=module.modulemap -fmodule-map-file=module.modulemap t.m -fmodules-cache-path=./cache -c -Rmodule-build
t.m:1:9: remark: building module 'A' as '/tmp/x/cache/3Q6AU0RME0VKV/A-BY6Q3D3ZTGS6.pcm' [-Rmodule-build]
@import A;
^
t.m:1:9: remark: finished building module 'A' [$ clang -fmodules -fmodule-map-file=module.modulemap -fmodule-map-file=module.modulemap t.m -fmodules-cache-path=./cache -c -Rmodule-build]

It looks like the module is built once and then cached. Re-running the invocation doesn't build any modules at all. You might want to add -Rmodule-build to your Clang options yourself to see what exactly is happening in terms of module build actions. When you sample or profile lldb, does it spend most of its time rebuilding Clang modules?

As for your example, I'd say that it could work on trivial case, but when there's complex dependency graph (and along with -fmodule-map-file there were some other flags, not sure if they affect pcm file path in cache) it may turn out that same module gets compiled differently with same cache path, which leads to inefficient cache rebuilding.

When you sample or profile lldb, does it spend most of its time rebuilding Clang modules?

Yes, about 90% of the time

Thanks for your suggestion with -Rmodule-build, will try it out! Any case, if you agree that the number of options could cause performance issues, what do you think about adding flags deduplication to lldb?

That sounds like a perfectly reasonable feature to me. If you are interested, you'd probably want to add an llvm::DenseSet or something similar to SwiftASTContext::AddExtraClangArgs() llvm-project/SwiftASTContext.cpp at 3a104b0029ce95a4338abeaff82421db511b99f7 · apple/llvm-project · GitHub

Otherwise you can also file a bug on JIRA.

And you would need to be careful about handling multi-word parameters, such as -I dir correctly.

Thanks! I'll open a pull request for this.

1 Like

This is great, thanks for working on this - I think we might also be affected by this. I have one question though - is there any harm in just embedding the main swift module in the linked binary? From my understanding this will require that you have all build/compiler inputs and outputs available locally to debug?

Not exactly. Consider the following case:

Main swift module:
import FirstObjCModule
SecondObjCModule (in m-file, i.e. implementation only import):
#import <OtherSwiftModule/OtherSwiftModule-Swift.h>
OtherSwiftModule:
import SecondObjCModule

main swift module doesn't need SecondObjCModule, but it is required for debugging to work.

For those interested this is the lldb PR: Reduce number of redundant args for ClangImporter by DeeSee · Pull Request #1438 · apple/llvm-project · GitHub

4 Likes