Reading LLD and Swift codebase to get a deeper understanding of each architecture.
Start implementing libswiftLTO pipeline (WIP)
I'm spending much time to investigate which information in ASTContext is required after SILGen.
Since ASTContext is used through the compiler process, it has so much information. But LTO plugin is used in linker process, so it can't derive ASTContext state from the compiler process and needs to create and setup ASTContext again.
For this reason, I'm investigating which information in ASTContext should be serialized in SIB.
Thanks for the update and sharing the detailed plan @kateinoigakukun. I can't wait to have proper LTO working for all platforms that Swift supports and producing smaller Swift binaries! (Especially as this is critical for the WebAssembly target).
After #31146 is merged, as far as I understand one could start using the -lto=llvm flag with the master nightlies. Or would it need lld built in some special way? Also, does it already yield some reductions in size of produced binaries, or do we need to wait for language-specific LTO to kick in to start seeing any noticeable reduction?
After #31146 is merged, as far as I understand one could start using the -lto=llvm flag with the master nightlies. Or would it need lld built in some special way? Also, does it already yield some reductions in size of produced binaries, or do we need to wait for language-specific LTO to kick in to start seeing any noticeable reduction?
@Max_Desiatovwasm-ld already support bitcode LTO, so we can use it without special work after the PR merged. And I've not tried yet for wasm but I think llvm level lto can reduce size of produced binaries more than wasm-opt can.
Last week, I opened a PR to bootstrap LTO pipeline in the compiler side. It contains a very basis for transforming SIL into LLVM IR and does not contain any optimization at this time.
(And I'm sorry that I couldn't do a lot of things for GSoC because of too many assignments from my university.)
This week, I'll address the reviewed points of the PR and fix some remaining issues.
One of the issues is that my current implementation depends on the order of input modules.
In the usual compiler process, all dependent modules are loaded by need after the main module that uses them.
On the other hand, the current implementation load the input module immediately, so it depends on the order of input modules and fails if the dependent libraries are not loaded before user-module.
So I need to implement lazy module loading mechanism for serialized on memory modules.
Hello, everyone.
Last week I got some feedback for the prototype implementation and #32429 was merged into master branch.
In addition, driver part of LLVM LTO PR is now under review.
This week, I'll work on it to be merged and take a binary size benchmark for the prototype implementation to clarify that it shows better value than existing optimizations.
Last week, I've spent much time trying the LTO for stdlib to benchmark but I found that it has some difficulties.
The main problem is that SIB to object file is not well supported. Compiling stdlib sib to object file fails even without LTO.
In addition, there were many false-positive eliminations and it causes assertion error, so I fixed them.
I can't take much time for this project than usual because I have university final exams for about two weeks from this week. However, two weeks later, summer vacation will start and it'll allow me to spend more time on this project.
Last weekend, I succeed to optimize some popular Swift libraries for benchmark by prototype optimizer on this branch.
Here is summary of the result at this time.
SwiftyJSON
Variant
Size
non-LTO
Swift LTO
LLVM LTO
Swift & LLVM LTO
Onone
306.4 KB
250.5 KB
234.0 KB
202.2 KB
O
310.6 KB
253.6 KB
299.2 KB
233.1 KB
Osize
278.3 KB
221.2 KB
251.8 KB
203.0 KB
SwiftSyntax
Variant
Size
non-LTO
Swift LTO
LLVM LTO
Swift & LLVM LTO
Onone
16.1 MB
10.4 MB
8.2 MB
5.6 MB
O
6.9 MB
5.9 MB
6.9 MB
5.0 MB
Osize
5.6 MB
5.1 MB
5.3 MB
3.9 MB
RxSwift
Variant
Size
non-LTO
Swift LTO
LLVM LTO
Swift & LLVM LTO
Onone
2.8 MB
2.0 MB
1.8 MB
1.4 MB
O
1.6 MB
1.4 MB
1.6 MB
1.3 MB
Osize
1.5 MB
1.3 MB
1.5 MB
1.2 MB
Now, the optimizer does conservatively for witness table elimination, so it doesn't show significant reduction. But after do that, it would be a better result.
Last week, I worked on more aggressive dead table elimination based on type reference information.
The optimization eliminates vtables and witness tables if the conforming types are not referenced by any instruction.
This results more binary size reduction.
stdlib: -5%
SwiftyJSON: -13%
RxSwift: -8%
In addition, I implemented KeyPath accessors elimination, but this was not so much effective for binary size reduction.
To find heavy living functions, I implemented call graph visualizer and dominator tree based analyzer. (but I found that call graph is too big to see at once )
Last week, I spent much time on supporting LTO build variant for apple/swift's benchmark system. The benchmark system reported the below result comparing -Osize and -Osize with LTO.
Code size: -Osize v.s. -Osize with LTO
Regression
OLD
NEW
DELTA
RATIO
RandomShuffle.o
10692
11036
+3.2%
0.97x
SortArrayInClass.o
8910
9151
+2.7%
0.97x
StringMatch.o
7480
7659
+2.4%
0.98x
StringReplaceSubrange.o
7010
7173
+2.3%
0.98x
Diffing.o
10331
10566
+2.3%
0.98x
Array2D.o
13683
13967
+2.1%
0.98x
Substring.o
31123
31655
+1.7%
0.98x
DropLast.o
44229
44751
+1.2%
0.99x
StringWalk.o
51054
51627
+1.1%
0.99x
ย
Improvement
OLD
NEW
DELTA
RATIO
PrimsNonStrongRef.o
194994
159624
-18.1%
1.22x
NIOChannelPipeline.o
4219
3647
-13.6%
1.16x
PolymorphicCalls.o
7677
6959
-9.4%
1.10x
BucketSort.o
31344
28799
-8.1%
1.09x
COWTree.o
18837
17508
-7.1%
1.08x
Queue.o
28705
27205
-5.2%
1.06x
Exclusivity.o
5483
5284
-3.6%
1.04x
Phonebook.o
37757
36615
-3.0%
1.03x
WordCount.o
82401
80738
-2.0%
1.02x
CSVParsing.o
80620
79231
-1.7%
1.02x
SortIntPyramids.o
39449
38975
-1.2%
1.01x
DictOfArraysToArrayOfDicts.o
38540
38097
-1.1%
1.01x
FloatingPointParsing.o
68406
67629
-1.1%
1.01x
Some object files have regressions on code size, but I couldn't find the reason why LTO path increases instruction size.
And I prototyped SwiftPM LTO support. It allows us to try the LTO easily by adding --lto=swift option.
In addition, I wrote up final evaluation report of this GSoC project.
Even though the Swift-LTO part is not complete, you can enable LLVM level thin-lto/full-lto. That part of the work is complete.
That being said, I am not sure how production ready it is. I recently added support for building the swift stdlib with thin-lto/full-lto and found that bugs were found on the stdlib. But the stdlib is a bit of a special case so your mileage may vary.