After having just finished moving some code into a Swift package, we are now seeing benchmarks taking almost 20 times longer to complete as measured in Instruments using signposts. What used to take around 3 seconds is now taking just over 1 minute to complete.
The code in question manipulates a bunch of structs that are stored in Arrays and then accumulates the results and returns them. Most of the structs include at least one generic property.
I assume we're running into the performance limitations of cross module optimization, though I wasn't aware it could be this significant. I see some older PRs and discussions around this topic but was curious what the current state of affairs was.
How are others managing performance sensitive code in packages, especially code that involves generics?
From what I can tell, the Swift Algorithms and Swift Collection packages make very liberal use of @inlinable almost everywhere.
Our package is not available to the public and is used only in applications that we have full access to the source code. Should we really be trying to inline as many hot-path methods as possible that are in the package?
Snippet of a trace that used to take 3.31 seconds but now takes 1.09 min:
@inlinable only allows the compiler to inline the function; it does not guarantee it. However, doing so also provides the visibility through the module boundary that the compiler needs in order to do specialization of generics and similar optimizations.
When something turns out to be unexpectedly slow, I usually start by perusing the stack looking for generic functions not marked “specialized” (the way Array.subscript.modify is in your screenshot), and make them @inlinable. I am assuming that as a package, your module is only vended as source, in which case the implications for ABI stability are irrelevant.
There is also -cross-module-optimization, which basically erases the module boundary altogether. If you control all the clients, it might be easier to just change the build invocations. But it is usually not feasible to train external clients to do that. There was talk of automating this flag within in SwiftPM, but I don’t think it has been done yet. See this older thread for more details.
I had seen that mentioned in other threads but wasn't curious if this was "production ready" or not. One or two posts mentioned some potentially app stability issues when that flag was used, though those posts might be out of date now.
I don’t know off the top of my head exactly how far it has come, and I don’t have the latest release of Swift on hand at the moment. I suggest doing swiftc --help and looking for it in the list. If you see it there, then the flag is officially supported; if you don’t, then it is still experimental.
The option is not present with the following Swift compiler:
Apple Swift version 5.5 (swiftlang-1300.0.31.1 clang-1300.0.29.1)
Target: arm64-apple-darwin21.1.0
Earlier discussions on this issue appear to have died off. I'm surprised to not see more current discussions of this performance limitation given how prevalent Swift Packages have become in the last year or two. (Especially when you consider recent trends to hyper-modularize applications into numerous micro packages.)
Our use case was to share a new layout engine across our applications using SPM but this performance issue is too big to ignore. We'll investigate inlining a bit more but I guess the fallback is to just fold the code directly into the main build for releases.
Cross-module optimization pretty much eliminates every form of runtime overhead packages have, right? Assuming you can stomach immense compilation times, it’s probably worth using for release binaries you don’t need to build too often. I would appreciate a straight answer on whether it is officially supported, though.
My understanding is that CMO is equivalent to WMO, except it treats everything as @inlinable. You should still mark things as @inlinable where appropriate, to benefit non-CMO builds. I use it for anything that has no conceivable alternate behavior, where any future implementation would inevitably have the same effects.
Micro packages should use @inlinableextremely liberally, since they usually have plenty of methods with no room for interpretation. That’s why packages like Swift Algorithms use it everywhere: there are plenty of ways to do a stable partition, but the end result is always the same.
Is there a way to enable cross-module optimization from Xcode? I've tried searching but I couldn't find anything.
Edit: it seems this can be done by adding "-cross-module-optimization" to "Other Swift Flags" in the build settings. Can someone confirm this is the best way to do this?