Pitch: Support LTO for Swift

elsh · October 3, 2023, 12:44am

You have to checkout and build swiftpm first and then run .build/[config]/swift-build with --experimental-lto-mode full. You will need to add -lto=llvm-full to your project package's linker settings as well. To run aggressive DCE along with -O or -Osize, you'll need to build and run a modified swiftpm; the PR will be linked here once posted.

rauhul · October 3, 2023, 3:36pm

FWIW I already have a PR for this missing flag and a test fixture in SPM, but haven't had time to chase down CI failures: Add test fixture for experimental-lto-mode by rauhul · Pull Request #6891 · apple/swift-package-manager · GitHub

If you build your own copy of swift-build from this branch you can omit the manual use of -Xlinker -lto=llvm-full.

Dmitriy_Ignatyev · March 31, 2024, 3:07pm

Thanks for exploring this area and summarizing it in a pitch.

While currently there are capabilities for optimizing compilation speed (build graph, parallelizing, XCfrontend flags, type checker etc.), there is a lack of instruments for reducing binary size, specifically LTO with dead code stripping. Currently we use available LTO settings, but I assume pitched changes can help to reduce binary size even more (up to 70-75%) in my concrete case.

I would also suggest to provide an option to enable making a report, where all stripped symbols will be listed per each module. It can be done nowadays but not in a user friendly way.
Such report can help a lot for searching and removing unused code untill we have no such IDE feature.

restermans · November 19, 2024, 12:51pm

Hi.

We are seeing very good results in terms of app's binary size reduction from applying LTO.

For thin LTO the app's binary size reduction in our case is ~7%, while for full LTO the reduction is ~20%.

Thin LTO increases build time by ~10 min in our case, while full LTO increases build time by whopping ~4 hours.

While it's very tempting for us to go for full LTO, we currently have to opt for thin LTO, because we basically can not afford to have 4+ hours builds at the moment.

I consequently would like to ask if -lto=llvm-thin and -lto=llvm-full are the only available options, or if maybe there is an option for a finer grained tuning, like for example selecting the number of iterations performed by the outliner, basically anything that could potentially be adjusted per specific use case to optimise size reduction <--> build time ?

nocchijiang · November 22, 2024, 6:07am

You may be interested in this RFC: [RFC] Enhanced Machine Outliner – Part 2: ThinLTO/NoLTO - Code Generation - LLVM Discussion Forums

Unfortunately, you have to wait for the next rebasing of Swift's LLVM fork and the release of Swift that carries the changes if you are using Apple's toolchain.

You can tune MachineOutliner parameters. In case of LTO, you provide the parameters as LLVM arguments to your linker invocation. For the number of iterations you can try -machine-outliner-reruns. See llvm-project/llvm/lib/CodeGen/MachineOutliner.cpp at 2b61a59b5961794ce1a7b51a31a67053c2c4334f · swiftlang/llvm-project · GitHub for other available options you might be interested.

restermans · November 26, 2024, 7:51pm

Thank you very much for sharing the RFC, it looks very promising and indeed we would be very interested using the implementation when it's out.

As for tuning MachineOutliner parameters, i am using Apple's toolchain, and the latest release of swiftlang/llvm-project (5.10.1) only has enable-linkonceodr-outlining and -machine-outliner-reruns.

I don't really understand what enable-linkonceodr-outlining does, but for machine-outliner-reruns, if my understanding is correct, since the default value is already 0, it could only be beneficial with thin-LTO to possibly improve size reduction by bumping it up, while it can't really help with reducing build time for full-LTO.

Though i wasn't really able to make ld64 to pick up machine-outliner-reruns param, what i tried was -Wl -mllvm -machine-outliner-reruns=1, but it looks like it's just ignored. I would appreciate any suggestions on this.

nocchijiang · November 27, 2024, 1:59am

The latest release is 6.0.2 and you can checkout the source via the release tag.

It allows the outliner to perform optimizations on linkonceodr functions. It is a special linkage type which basically tells the linker to deduplicate globals with the identical contents. For no-LTO builds, it may actually increase the final code size because the local outlining behaviors vary across modules and previously identical linkonceodr functions might become different, thus unable to be deduped by the linker. For FullLTO builds it is definitely beneficial.

It is generally beneficial for code size regardless of LTO kinds - it just tell the outliner to rerun the algorithm N times based on previous runs. Of course it comes with performance penalty, let alone reducing build time.

I believe you should pass it as a whole like -Wl,-mllvm,-machine-outliner-reruns=1.

restermans · November 27, 2024, 5:06pm

The latest release is 6.0.2 and you can checkout the source via the release tag.

I only checked releases tab, my bad, thanks for pointing out tags.

I believe you should pass it as a whole like -Wl,-mllvm,-machine-outliner-reruns=1.

This seems to work

Do u know if there any way to collect stats, e.g the number of instructions that where outlined, or the number of bytes that were saved ?