for as long as i can remember, exposing generics in public APIs was something to be avoided if possible, and it was necessary to devote an enormousamount of thoughtandplanning towards architecting libraries to not rely on copious amounts of @inlinable.
is cross-module optimization even a net win? back when it was gated by a feature flag, the near-uniform recommendation was not to use it, because it significantly worsened performance.
what was the rationale for making this the new default, when its performance impact was understood to be unclear at best to negative at worst? has anything changed since it was gated by the feature flag?
has anyone studied the impact on compilation times? in the event that it is significant, are there ways to limit cross-module inlining to a select group of modules within a package? can cross-module inlining take place across package boundaries?
assuming neither of the above two issues is relevant anymore, are there any downsides to vending generics as part of public API, in modules intended to be built from source? is there ever a reason to manually wrap/specialize to āhideā generics in the post-5.7 world?
Just to make one short comment: Itās unclear to me if itās āeverythingā that is cross-module-optimized by default in 5.8. Looking at some of the other commits it rather seemed that it was a less aggressive mode that was enabled by default and the test when I saw the regression in performance previously was with the more aggressive mode (which AFAIU still can be enabled with the feature flag) - canāt say I know the exact difference.
Iād be super happy to hear more from someone in a position to elucidate though.
In general, CMO is a significant performance win. But (as with most optimizations) there can be corner cases where you see a degradation.
The critical problem with CMO is code size. Therefore the CMO which is enabled by default is much more conservative than the "aggressive" CMO, which must be explicitly enabled with -cross-module-optimization.
impact on compilation times
We didn't see any significant impact on compilation times. Especially with the default CMO which has only a relatively small impact on size/complexity in the optimization pipeline.
ways to limit cross-module inlining to a select group of modules within a package
The compiler option -disable-cmo disables the default CMO.
can cross-module inlining take place across package boundaries
yes
is there ever a reason to manually wrap/specialize to āhideā generics in the post-5.7 world?
It really depends. CMO makes it less likely that generic APIs will have a negative performance impact. But it still can happen (CMO is an optimization based on heuristics).
iām not sure i understand the tradeoffs here correctly, inlinability shouldnāt impact code size, only inlining should. based on my (very limited!) understanding of the optimizer, i would expect there to be a lot of optimization passes (e.g. ARC optimizations) that the compiler should be able to apply by analyzing inlinable code without actually inlining the code.
can you give a brief overview of what those heuristics are? is there a good workflow for inspecting if CMO has taken place?
inlinability shouldnāt impact code size, only inlining should
not exactly. First, more functions available for inlining will also result in more inlining (the inliner is selecting functions based on a heuristic, too). Second, more function specialization is done. This can have a negative or positive effect on code size.
can you give a brief overview of what those heuristics are?
It's mainly based on the function size.
is there a good workflow for inspecting if CMO has taken place?
It's possible to look at the generated swiftmodule file with swiftc -sil-opt and look what functions have a SIL function body. But that's more a tool for compiler engineers.
letās say, as a thought experiment, i took a codebase with fifty modules, and then refactored it so that all of the code lived in one oversized module and every declaration had internal or lower access control. wouldnāt that also result in more inlining?
it seems to me that there would be two possibilities:
the optimizer currently performs too much inlining, and this is unrelated to CMO, because all CMO is doing is just making external code susceptible to the same overinlining problem that internal code already suffers from.
the optimizer currently strikes the right balance for intra-module inlining, but for some reason is more aggressive when inlining things that originate from outside the module than it otherwise would be.
Inlining decisions are probably the most complicated thing in the optimizer.
The problem is that inlining can have a negative or positive effect on code size.
The reason to limit making functions inlinable with CMO is mainly to keep additional (code size) churn to a minimum compared to not using CMO at all.
Larger binary size isn't necessarily bad - what matters is the working set size of instructions for any performance-sensitive code. I've seen real-world binaries that were approaching a gigabyte in TEXT size (C++ templates, yay ) yet were super fast because any given core tended to nest in relatively tiny working subsets of the code.
PGO (Profile-Guided Optimisation) is really helpful in this regard for helping the compiler know which parts of the code benefit from being small [enough to fit into L1 icache] (among other things, like how symbols should be arranged to minimise icache fragmentation and prefetch misses).
In my experience, most code (by machine instruction count) isn't sensitive to size and actually does benefit from aggressive inlining (for reasons less clear to me - perhaps many compounding consequences such as better elimination of redundant or unreachable code).
I mention this because WMO comes up relatively often but PGO rarely gets mentioned, and I suspect they really should go hand-in-hand (for non-trivial codebases). It looks like PGO is supported in Swift projects (in Xcode: Product > Perform Action > Generate Optimization Profileā¦) though I haven't tried it. It used to work quite well for Clang-based projects, at least.
Code size still matters on mobile devices, both for the actual space on disk and for the bandwidth it takes to download. For desktop platforms itās not as bad, but still somewhat a concern. I agree that for servers it basically doesnāt matter these days.
assuming neither of the above two issues is relevant anymore, are there any downsides to vending generics as part of public API, in modules intended to be built from source? is there ever a reason to manually wrap/specialize to āhideā generics in the post-5.7 world?
I've been experimenting a library with heavy generics recently. The class signature looks like this:
public final class SimulationKD<NodeID, V>
where NodeID: Hashable, V: SIMD, V.Scalar : SimulatableFloatingPoint {
}
and in one of my test cases the generic version with V==simd_double2 takes ~0.59s . Turning on cross-module-optimization takes ~0.17s
By manually inlining with V = simd_double2, it takes ~0.05s with cmo disabled, and ~0.04s with cmo enabled.
So guess at this time(Swift 5.9), generics are still something to avoid in public API.
I'm new to Swift so I don't know very well about the compiling things. I started this library with non-generic and then refactored it to generic, with a huge performance downgrade.
I did some experiments and from my observation @inlinable doesn't work very well, but I'm not sure if I'm using it correctly. And by replacing V.Scalar: SimulatableFloatingPoint to V.Scalar==Double globally (still inside the where clause), I get like 20% speed back. Then I tried manually inline and it got the speed back.
@inlinable is hard, i used to ship very large modules because i did not understand how @inlinable works. for such a fundamental building block of the language, resources for learning how to use it are dreadfully sparse.
one reason @inlinable might not be working for you is because you havenāt @inlinabled the entire call stack, if you only @inlinable the outer generic call, you will still have generic abstraction overhead in all the places you call generic functions inside the outer function.
Right, to add some colour: You need to make every generic public function as well as any function that the public API calls @inlinable. This includes anything that's called transitively. You can (but don't have to) stop adding inlinables once you hit a function that isn't generic.
There might be places where you only want specialisation (but not actual inlining). In those cases use @inlinable @inline(never) func iWantYouToBeSpecialisedButNotInlined<Foo: Bar>(_ foo: Foo). And yes, that's @inlinable @inline(never) which essentially means "specialisable" .
An interesting find in swift/include/swift/Option/Options.td:
def EnableCMOEverything : Flag<["-"], "enable-cmo-everything">,
Flags<[HelpHidden, FrontendOption]>,
HelpText<"Perform cross-module optimization on everything (all APIs). "
"This is the same level of serialization as Embedded Swift.">;
def CrossModuleOptimization : Flag<["-"], "cross-module-optimization">,
Flags<[HelpHidden, FrontendOption]>,
HelpText<"Perform cross-module optimization">;
Is there special risk, beyond code size increase, when using -enable-cmo-everything with regular Swift projects? If not, it seems like an incredible option for many projects ā so significant that it should be included in a new -O2 optimization level for better visibility.