I wouldn't necessarily describe it that way. Yes, it is an alternative implementation of a linker that supports ELF and COFF currently, and it has made a different set of trade offs which makes it faster in some cases, but it has other drawbacks as well.
Yes, linkers operate on object files, however, when performing LTO (or LTCG in Microsoft's parlance), the operation is delayed and instead the linker will actually perform a callback into the compiler to late transform some alternative representation (C2 tuples or LLVM IR or in this case, possibly even SIL!) to get the opportunity to do more aggressive IPO.
lld is just a linker - there is no need for that directly in the sense that the compiler will not invoke the linker. However, when the user builds a shared library or an executable, they will do so from the driver, which will invoke the linker on the user's behalf to link the object files. There already exists the -use-ld flag to the driver that allows you to switch the linker that is in use.
It is not really "activation" but rather development of a linker plugin (see LLVMGold as an example) that allows the linker to perform code generation at link time (from whence originates the term LTCG - link time code generation).
I would recommend that you take a look at the swift driver (lib/Driver). LLVMGold would also be interesting to look at since that is an example of how one can setup the LTO pipeline for C/C++ code.
IMO, a great starting point to get a better understanding of this would be to setup a build of a small C/C++ project with LTO and see what the flow for that is.
Additionally, I would highly recommend that you get swift building from source so that you can follow how to build the project and change it to be able to run with your improvements.
I did my "homework" and have some further questions:
I would recommend that you take a look at the swift driver ( lib/Driver ). LLVMGold would also be interesting to look at since that is an example of how one can setup the LTO pipeline for C/C++ code.
do I understand correctly, that I would need to implement something similar to LLVMGold, but for lld? Basically implementing this interface for lld
and create a new lld-LTO action inside a Driver.cpp, like a custom a LinkJobAction action if certain flag is passed? (where the FIXME: what about LTO is written?)
a great starting point to get a better understanding of this would be to setup a build of a small C/C++ project with LTO and see what the flow for that is.
AFAIU this relatively easy, basically installing llvm-gold and running clang -lto on the input files, at least this was sufficient for me to run it and see (by running objdump) unnecessary functions being eliminated. Or were you talking about using lld instead?
I would highly recommend that you get swift building from source so that you can follow how to build the project and change it to be able to run with your improvements.
I don't think that you would need to write a new action given that there already is the idea integrated into lld.
Right, now, extend that concept further - swiftc will need to learn how to invoke the linker with LTO enabled and should be possible to invoke as swiftc -lto ... to compile+link with LTO.
The lld plugin would deserialize the LLVM IR emitted by the compiler, invoke the Swift LTO pipeline, and perform at least one optimization pass over the unified LLVM module.
Integrate the LTO optimizations into the compilation pipeline under a flag
After watching video about thinLTO I have a better picture on how stuff works in clang, but I still have trouble understanding what exactly should ^that mean, or better said what exactly has to be implemented.
Yes, linkers operate on object files, however, when performing LTO (or LTCG in Microsoft's parlance), the operation is delayed and instead the linker will actually perform a callback into the compiler to late transform some alternative representation (C2 tuples or LLVM IR or in this case, possibly even SIL!) to get the opportunity to do more aggressive IPO.
But according to the interface of BticodeCompiler in LTO.h there are only two methods, add bitcode file and compile, no callbacks, or anything.
Or this is exactly what you meant with extending it in order to support other compilers. If yes, then how should this look like and should these callbacks be called?
The lld plugin would deserialize the LLVM IR emitted by the compiler, invoke the Swift LTO pipeline
What is meant by Swift LTO? I was assuming that it works with LLVM IR data which is input language agnostic, it could be C++ or Rust, or anything else.
Integrate the LTO optimizations into the compilation pipeline under a flag
Does this imply adding a custom LinkAction or "passing a flag" to an existing LinkAction?
The exact things that need to be implemented are roughly:
Phase 1
swift driver changes for providing flags for LTO
swift driver changes to invoke the clang linker driver properly for enabling LTO
swift frontend changes to emit LLVM BC instead of object files
Measurements for this phase would be interesting as they would identify the benefits of extra IPO of the IRGen (generic, non-language specific optimizations)
Phase 2
changes to LLD to add support setting multiple compiler pipelines for LTO
changes to swift frontend to support multiple pipelines
changes to swift frontend to emit SIB instead of IRGen
changes to LLD to setup a swift pipeline if a SIB is encountered, call back into swift to SILGen
changes to the pipeline to do IPO with language specific considerations - e.g. late monomorphisation of generics, late devirt of calls
Measurements of this phase would be interesting as they identify the benefits of IPO of SILGen (language specific optimizations).
The latter should actually be interesting as doing LTO on SIL should even allow us to consider dropping VWTs.
Its the implementation of those that are interesting. The invocation of the "bitcode compiler" is the callback.
Yes, extending that to support swift would be an extension, and what I was referring to. They would look like setting up a swift pipeline and executing it.
Yes, there are things you can do with language agnostic optimizations, but doing this at the SIL level is also interesting with the ability to internalize functions and VWTs.
Yes, the LinkAction would need to grow some fields in order to support the new behaviors.
Hi, I'm also interested in this topic and planning to apply.
Recently I'm working on supporting WebAssembly target for Swift.
The size of executable file is really important especially for web context.
Currently, the size of the binary is larger than 10MB even for "Hello, world!" program because whole stdlib content is linked into executable binary statically even most of stdlib parts are not used in runtime.
Even after applying language-agnostic LTO, the binary is still larger than 4MB.
As far as I investigated why the produced binary is so large even after removing unused code applying LTO, I found that most of the remaining functions that are not actually used at runtime are referenced from protocol conformances.
Protocol conformances are embedded in swift5_protocol_conformances section and they are scanned at runtime by runtime library, so they are always marked as llvm.used.
They are very tied to Swift runtime implementation, so language-agnostic LTO can't eliminate them.
So I expect that language-specific LTO can achieve really valuable binary size reduction.
@compnerd
Just to confirm about Phase2, "multiple compiler pipelines" means that lld invokes each language-specific LTCG pipelines for each language-specific intermediate files?
My rough design of this implementation is providing a stable interface of language-specific LTCG plugin, loading plugins dynamically for each language and call each language's compiler through the interface.
Is this idea close to yours?
I would be really happy to help to evolve Swift language and make it available on many platforms to be used by more people.
Correct, and this is related to my comment about phase 2 of this approach which would allow the specialization and monomorphizing the generics allowing the VWT to be DCEd after specialization allowing further function DCE. Static linking with full thin LTO (language agnostic + language dependent) should yield a binary that is exactly the size of the alive code.
Yes, that is correct, although for the language dependent portion currently that would be swift specific, so it should not be extending the existing interface much since we do not have much to compare it against.
No, this isn't exactly what I have in mind. There is no way to do a stable interface with a single language. Unless you plan on adding support for at least rust to have a point of comparison, we should be aiming to have this just fit into the existing structure.
Thanks for detailed description.
I'm wondering whether there is a difference between language-specific LTO and WMO for dynamic shared library or not. I know that they run at different time, but I'm curious whether the produced binaries are same or not.
Do you think is there any optimization that WMO can't but language specific LTO can for dynamic library?
I understood. At this time, it's out of scope to be capable of other languages.
At first, I'm going to start to develop a prototype of language-agnostic LTO "Phase 1". And I'll post a more detailed plan of this project.
Thanks!
LTO should be able to do things across Swift modules when dealing with a library that is statically linked and is built for LTO which WMO would not be able to do. I've not given it sufficient thought, but, cross-language optimizations should also be possible.