[GSoC2020] LTO support for swift

Hi, I am interested in the LTO support for the swift project.

My rough understanding of the project is the following:

  • lld is the ld replacement from llvm which is way faster
  • lld LTO optimizations can reduce size and speed up the end-resulting binary
  • proposal is to integrate lld + LTO and measure the improvements

@compnerd could you answer my questions? They would greatly improve my understanding:

The lld plugin would deserialize the LLVM IR emitted by the compiler, invoke the Swift LTO pipeline

  • but from my understanding lld, or any linker for that matter work with the raw shared lib or exec files and not with LLVM IR
  • is lld utilized in the project? is it used for the compilation of the swift files and just LTO "activation" is missing?
  • could you tell me any parts of the codebase one would need to know in order to be able to work on this?
  • any further steps you'd recommend me to do in order to get the full picture of the project?

Hi Denis,

I wouldn't necessarily describe it that way. Yes, it is an alternative implementation of a linker that supports ELF and COFF currently, and it has made a different set of trade offs which makes it faster in some cases, but it has other drawbacks as well.

Yes, linkers operate on object files, however, when performing LTO (or LTCG in Microsoft's parlance), the operation is delayed and instead the linker will actually perform a callback into the compiler to late transform some alternative representation (C2 tuples or LLVM IR or in this case, possibly even SIL!) to get the opportunity to do more aggressive IPO.

lld is just a linker - there is no need for that directly in the sense that the compiler will not invoke the linker. However, when the user builds a shared library or an executable, they will do so from the driver, which will invoke the linker on the user's behalf to link the object files. There already exists the -use-ld flag to the driver that allows you to switch the linker that is in use.

It is not really "activation" but rather development of a linker plugin (see LLVMGold as an example) that allows the linker to perform code generation at link time (from whence originates the term LTCG - link time code generation).

I would recommend that you take a look at the swift driver (lib/Driver). LLVMGold would also be interesting to look at since that is an example of how one can setup the LTO pipeline for C/C++ code.

IMO, a great starting point to get a better understanding of this would be to setup a build of a small C/C++ project with LTO and see what the flow for that is.

Additionally, I would highly recommend that you get swift building from source so that you can follow how to build the project and change it to be able to run with your improvements.

1 Like

Thanks for the comprehensive answer.

I did my "homework" and have some further questions:

I would recommend that you take a look at the swift driver ( lib/Driver ). LLVMGold would also be interesting to look at since that is an example of how one can setup the LTO pipeline for C/C++ code.

  • do I understand correctly, that I would need to implement something similar to LLVMGold, but for lld? Basically implementing this interface for lld
  • and create a new lld-LTO action inside a Driver.cpp, like a custom a LinkJobAction action if certain flag is passed? (where the FIXME: what about LTO is written?)

a great starting point to get a better understanding of this would be to setup a build of a small C/C++ project with LTO and see what the flow for that is.

AFAIU this relatively easy, basically installing llvm-gold and running clang -lto on the input files, at least this was sufficient for me to run it and see (by running objdump) unnecessary functions being eliminated. Or were you talking about using lld instead?

I would highly recommend that you get swift building from source so that you can follow how to build the project and change it to be able to run with your improvements.

With your help managed to do this as well

There exists some infrastructure in lld to support LTO (see https://github.com/llvm/llvm-project/blob/master/lld/ELF/Driver.h#L37 and https://github.com/llvm/llvm-project/blob/master/lld/ELF/Driver.h#L45-L46). The idea would be to restructure that and extend it to support other compilers as well.

I don't think that you would need to write a new action given that there already is the idea integrated into lld.

Right, now, extend that concept further - swiftc will need to learn how to invoke the linker with LTO enabled and should be possible to invoke as swiftc -lto ... to compile+link with LTO.

Saleem

  • The lld plugin would deserialize the LLVM IR emitted by the compiler, invoke the Swift LTO pipeline, and perform at least one optimization pass over the unified LLVM module.
  • Integrate the LTO optimizations into the compilation pipeline under a flag

After watching video about thinLTO I have a better picture on how stuff works in clang, but I still have trouble understanding what exactly should ^that mean, or better said what exactly has to be implemented.

Yes, linkers operate on object files, however, when performing LTO (or LTCG in Microsoft's parlance), the operation is delayed and instead the linker will actually perform a callback into the compiler to late transform some alternative representation (C2 tuples or LLVM IR or in this case, possibly even SIL!) to get the opportunity to do more aggressive IPO.

But according to the interface of BticodeCompiler in LTO.h there are only two methods, add bitcode file and compile, no callbacks, or anything.

Or this is exactly what you meant with extending it in order to support other compilers. If yes, then how should this look like and should these callbacks be called?

The lld plugin would deserialize the LLVM IR emitted by the compiler, invoke the Swift LTO pipeline

What is meant by Swift LTO? I was assuming that it works with LLVM IR data which is input language agnostic, it could be C++ or Rust, or anything else.

Integrate the LTO optimizations into the compilation pipeline under a flag

Does this imply adding a custom LinkAction or "passing a flag" to an existing LinkAction?

The exact things that need to be implemented are roughly:

Phase 1

  • swift driver changes for providing flags for LTO
  • swift driver changes to invoke the clang linker driver properly for enabling LTO
  • swift frontend changes to emit LLVM BC instead of object files

Measurements for this phase would be interesting as they would identify the benefits of extra IPO of the IRGen (generic, non-language specific optimizations)

Phase 2

  • changes to LLD to add support setting multiple compiler pipelines for LTO
  • changes to swift frontend to support multiple pipelines
  • changes to swift frontend to emit SIB instead of IRGen
  • changes to LLD to setup a swift pipeline if a SIB is encountered, call back into swift to SILGen
  • changes to the pipeline to do IPO with language specific considerations - e.g. late monomorphisation of generics, late devirt of calls

Measurements of this phase would be interesting as they identify the benefits of IPO of SILGen (language specific optimizations).

The latter should actually be interesting as doing LTO on SIL should even allow us to consider dropping VWTs.

Its the implementation of those that are interesting. The invocation of the "bitcode compiler" is the callback.

Yes, extending that to support swift would be an extension, and what I was referring to. They would look like setting up a swift pipeline and executing it.

Yes, there are things you can do with language agnostic optimizations, but doing this at the SIL level is also interesting with the ability to internalize functions and VWTs.

Yes, the LinkAction would need to grow some fields in order to support the new behaviors.

2 Likes

Hi, I'm also interested in this topic and planning to apply.

Recently I'm working on supporting WebAssembly target for Swift.
The size of executable file is really important especially for web context.

Currently, the size of the binary is larger than 10MB even for "Hello, world!" program because whole stdlib content is linked into executable binary statically even most of stdlib parts are not used in runtime.

Even after applying language-agnostic LTO, the binary is still larger than 4MB.
As far as I investigated why the produced binary is so large even after removing unused code applying LTO, I found that most of the remaining functions that are not actually used at runtime are referenced from protocol conformances.

Protocol conformances are embedded in swift5_protocol_conformances section and they are scanned at runtime by runtime library, so they are always marked as llvm.used.

They are very tied to Swift runtime implementation, so language-agnostic LTO can't eliminate them.

So I expect that language-specific LTO can achieve really valuable binary size reduction.

@compnerd
Just to confirm about Phase2, "multiple compiler pipelines" means that lld invokes each language-specific LTCG pipelines for each language-specific intermediate files?
My rough design of this implementation is providing a stable interface of language-specific LTCG plugin, loading plugins dynamically for each language and call each language's compiler through the interface.
Is this idea close to yours?

I would be really happy to help to evolve Swift language and make it available on many platforms to be used by more people.

Thanks.

5 Likes

Correct, and this is related to my comment about phase 2 of this approach which would allow the specialization and monomorphizing the generics allowing the VWT to be DCEd after specialization allowing further function DCE. Static linking with full thin LTO (language agnostic + language dependent) should yield a binary that is exactly the size of the alive code.

Yes, that is correct, although for the language dependent portion currently that would be swift specific, so it should not be extending the existing interface much since we do not have much to compare it against.

No, this isn't exactly what I have in mind. There is no way to do a stable interface with a single language. Unless you plan on adding support for at least rust to have a point of comparison, we should be aiming to have this just fit into the existing structure.

Thanks for detailed description.
I'm wondering whether there is a difference between language-specific LTO and WMO for dynamic shared library or not. I know that they run at different time, but I'm curious whether the produced binaries are same or not.
Do you think is there any optimization that WMO can't but language specific LTO can for dynamic library?

I understood. At this time, it's out of scope to be capable of other languages.

At first, I'm going to start to develop a prototype of language-agnostic LTO "Phase 1". And I'll post a more detailed plan of this project.
Thanks!

LTO should be able to do things across Swift modules when dealing with a library that is statically linked and is built for LTO which WMO would not be able to do. I've not given it sufficient thought, but, cross-language optimizations should also be possible.

1 Like