IR-level PGO Instrumentation in Swift

Hi everyone,

We’ve been exploring how to enable PGO for iOS apps as part of some ongoing work around binary size and launch-time improvements. Since these apps mix Swift and ObjC code and link with LLD(since ldprime → ld (now) doesn’t have PGO support), we’d like profiling data from both languages to flow through consistently into the IR so the linker and optimizer can make better decisions.

  1. For this to work well, we need IR-level instrumentation for profile collection. Clang does that today when we pass -fprofile-generate or -fcs-profile-generate.
  2. In Swift, though, -profile-generate is more of a frontend/SIL-level instrumentation, not LLVM IR level instrumentation. As a result, it isn’t contributing the same kind of profile data that Clang code does.

As per the guidance I got in this discourse thread, I’ve started some early plumbing to experiment with this:

Post these changes, I was able to see temporal profile traces (when passed with -Xllvm -pgo-temporal-instrumentation) for the sample profraw file and instrumentation level IR.

xcrun llvm-profdata show --temporal-profile-traces current.profraw
Instrumentation level: IR  entry_first = 0
Total functions: 0
Maximum function count: 0
Maximum internal block count: 0
Temporal Profile Traces (samples=1 seen=1):
  Temporal Profile Trace 0 (weight=1 count=7):
    main
    $sSa9_getCountSiyFSS_Tg5
    $sSa11_checkIndexyySiFSS_Tg5
    $ss12_ArrayBufferVys06_SliceB0VyxGSnySiGcigSS_Tg5Tf4nn_g
    $s8profdemo4workyySiF
    $s8profdemo3fibyS2iF
    __swift_instantiateConcreteTypeFromMangledName

Before I go further with unit tests, diagnostics, and docs, I wanted to check in and get consensus from the community:

  1. Does it feel like this is going in the right direction?
  2. Are there known limitations, design trade-offs, or things I might be overlooking?
  3. One small naming detail I’m not sure about: I called the new option -ir-profile-generate. Since -profile-generate in Swift is a frontend/SIL thing and not IR, I thought it made sense to prefix the IR-level version for now. But I’m open to suggestions if there’s a clearer or more consistent way to surface this better.

The end goal is for Swift code to emit profiles that integrate cleanly with Clang’s, so we can merge .profraw/.profdata, feed that into ThinLTO and other PGO components, and get the size/startup wins we’re aiming for.

Would love to hear whether this direction sounds right, or if there are reasons it may not make sense to pursue or pursue differently.

Thanks!

2 Likes

It makes sense to me to add support to Swift for LLVM's IR level instrumentation.

As pointed out in the post, the existing Swift profile generation (-profile-generate) is analogous to clang's "frontend" based instrumentation.

Here is the relevant section from clang's documentation (Clang Compiler User’s Manual — Clang 22.0.0git documentation) describing the two approaches:

Clang supports two types of instrumentation: frontend-based and IR-based. Frontend-based instrumentation can be enabled with the option -fprofile-instr-generate , and IR-based instrumentation can be enabled with the option -fprofile-generate . For best performance with PGO, IR-based instrumentation should be used. It has the benefits of lower instrumentation overhead, smaller raw profile size, and better runtime performance. Frontend-based instrumentation, on the other hand, has better source correlation, so it should be used with source line-based coverage testing.

2 Likes

I think this is great work and so far is implemented in the way I expected. Thanks for the contributions!

The main limitation I’m aware of for using LLVM-based PGO in Swift is that we break a SIL module into multiple LLVM IR modules, one for each original .swift file, to process them in parallel. In experiments using sampling-based PGO (-profile-sample-use=) in Swift, I’ve seen that LLVM is more effective at optimization (even without PGO data) when using a single LLVM module with -Xfrontend -enable-single-module-llvm-emission. Of course, compile-times suffer as a result.

Your examples seem to suggest a desire to apply temporal ordering to Swift object files, so I wouldn’t expect this limitation to affect you now, but for those seeking profile-guided compiler optimizations, it’s something to keep in mind. Is symbol ordering your only goal?

2 Likes

Thanks Arnold for looking into it and for sharing the Clang references. I’d really appreciate some guidance here:

I’m trying to map Clang’s IR flag -fprofile-generate, but Swift already uses -profile-generate(frontend). From my understanding, Swift’s -profile-generate (frontend) pairs more closely with Clang’s -fprofile-instr-generate (frontend).

Since we can’t change the semantics of the existing Swift’s -profile-generate, it seems like we’d need a new flag for IR-level instrumentation to keep the distinction clear. Are there any recommendations on what this flag should be called? In my commits above, I used -ir-profile-generate, but then this doesn’t pairs well with Clang, and I’d like to make sure it aligns with existing naming conventions if there’s a better precedent.

Thanks a lot, Kavon :folded_hands:t3:

Yes, our immediate goal is symbol ordering: generating order files and then supplying the merged profdata back to lld and future compilations for ThinLTO and PGO.

Yes, single-module LLVM emission is something we did explore when looking into full LTO, and you’re right, compile times were definitely an issue. We haven’t yet experimented with this in the latest context we are in though, and I’m thinking that could be interesting to revisit since you called it out. Thanks for sharing!