i believe i may have found a cousin of this issue: in the swift 6.2 compiler, code of the following form can take an extremely long time to compile when optimizations are enabled:
final class A {}
final class B {
let prop1: A
let prop2: A
...
let propN: A
init() {
self.prop1 = switch () { default: A() }
self.prop2 = switch () { default: A() }
...
self.propN = switch () { default: A() }
}
}
when compiling with an invocation like:
swiftc -swift-version 5 -O -emit-sil <file>
this code exhibits the following scaling in the 6.0 and 6.2 compilers:
property count
256
512
1024
2048
6.0 compile time (s)
<1
<1
1
4.6
6.2 compile time (s)
1.4
18
226
2629
sampling the 6.2 compiler process while building yields something that looks like this (truncated):
so far, the only workaround i've found is to compile the code with the setting -Xfrontend -enable-copy-propagation=false. this returns the compilation time to a reasonable duration (under 1 second for the 1024 property case).
i'm curious to better understand the possible cause of this behavior, and what can be done about it. my current intuition is that this loop is for some reason running a lot, and it may have something to do with enabling OSSA modules by default.
i'd also like to know:
are there downsides to explicitly disabling the copy propagation pass
are there ways of structuring source code or other flags that can be used to avoid this performance issue
what explains why the 6.2 compiler behaves differently than the 6.0 compiler on these examples
so this does look possibly similar to the DI issue i linked above – there are various points in the 'pruned liveness' calculations that seem like they may repeatedly walk through the same, large basic blocks to compute liveness states for individual instructions. a few more precise thoughts/questions now that i've looked a bit closer at what's going on here:
i've tried reproducing this issue with the 6.0 compiler by explicitly enabling the copy propagation pass and OSSA modules via command line options, but it doesn't seem to have the same behavior. the 6.1 compiler does appear to have similar scaling with the flags on, though not quite as bad as 6.2[1]. why would that be?
the documentation for the pass says it is supposed to remove unneeded copy_value and destroy_value instructions. in the SIL output for examples like this i've seen very few such instructions produced by either SILGen or the resulting optimized SIL (both with the pass on or off). is there a good way to verify if the pass is meaningfully doing anything to the code it operates upon?
was the copy propagation pass typically running on code before the 6.1/6.2 compilers? i tried to poke through the git history for what looked like the relevant flags, but it wasn't entirely clear to me.
lastly, would you consider this a report-worthy bug (and is a GH issue or radar the more appropriate reporting channel)?
cc { @Erik_Eckstein, @Andrew_Trick, @nate_chandler } – as i've seen your names associated with the relevant parts of the codebase, your thoughts would be greatly appreciated when you have the time!
e.g. 98s (vs 226s) for the 1024 property example ↩︎
Thanks for your detailed report, @jamieQ . As always, please feel free to report such issues on GitHub.
As you've noticed, the behavior you're seeing is connected with the copy propagation pass and with OSSA modules. The enabling of OSSA modules exposed OSSA passes to more SIL. In the case of copy propagation, it revealed an issue where a lifetime would be shortened invalidly and in a program-visible way. The immediate fix was to prevent such invalid lifetime shortening. The longer term fix will be to enable "complete OSSA lifetimes" throughout the pipeline. At that point, this fix (and a fair amount of special case code throughout the optimizer) will be deleted, and the compile-time performance will be recovered.
As with a number of issues in OSSA, the invalid-lifetime-shortening issue was the result of lacking "complete OSSA lifetimes". In OSSA, a value must be consumed exactly once on every path. ...with one significant exception: a value need not be consumed on paths that enter "dead-end blocks". A "dead-end block" is one from which there are no paths which exit the function: a block from which all paths lead to unreachable-terminated blocks or infinite loops. We have been and will continue making incremental progress towards the goal of having complete OSSA lifetimes throughout the OSSA pipeline. Currently, we complete lifetimes in a mandatory pass that runs after SILGen. Subsequent passes, however, leave some lifetimes incomplete. Eventually, these passes will be updated not to do this, the SIL verifier will enforce that every OSSA value is consumed on every path, and special case handling of such blocks in OSSA passes will be deleted.
To reiterate: the reason that you're seeing this compile-time increase in 6.2 is that OSSA lifetimes are not yet complete throughout the pipeline but OSSA modules being enabled required that the issue mentioned above be fixed and that fix incurs this increased compile-time cost. Once OSSA lifetimes are complete throughout the pipeline, we will be able to recover this compile-time performance.
One final note: Copy propagation is an important optimization for eliminating spurious copies. It has been enabled since Swift 5.7.
thank you for your time and explanations Nate. to restate my understanding based on your feedback and what i've investigated so far:
in the process of enabling OSSA modules everywhere, some issues with existing behavior during optimization were discovered
specifically within copy propagation, there was a problem where lifetimes could be shortened in 'dead-end' blocks that could lead to incorrect behavior (i assume that's the issue to which thesecomments refer)
to deal with this, OSSA lifetimes are 'extended' to fixup the liveness boundaries at various points (this appears to be done in multiple spots during copy propagation). specifically, liveness is extended into dead-end blocks to avoid the aforementioned issue.
in the 6.2 compiler, enabling OSSA modules by default, means that copy propagation and the associated lifetime extension logic are run more than they used to be (e.g. copy propagation in particular appears to be added to the pass pipeline numerous times)
IIUC, in the 6.1 compiler, OSSA modules were still not enabled by default, but more logic was added for the dead-end lifetime handling, like this, which might explain the scaling differences when compared to the 6.0 compiler
after having had a chance to run Instruments on an example like this, it seems the time is largely spent split evenly between the two flavors of dead-end liveness extension[1]. in this class of examples, however, it seems like there may not actually be any dead-end blocks, so it seems conceivable this work could be avoided.
would it be reasonable to perform a check for the existence of dead-end blocks before invoking the visitAvailabilityBoundary function in the relevantmethods, and skip the boundary visitation if we expect the callback to never be invoked? or does visiting the boundary perform side-effects that are expected to occur regardless of whether lifetimes are actually extended? anecdotally, i tried adding logic to bail out early if the dead-end block analysis reported no relevant dead end blocks, and running the SILOptimizer tests locally appeared to still pass (and the compilation time for the large examples was greatly reduced).
e.g. in the 512 property example, extendLexicalLivenessToDeadEnds and extendLivenessToDeadEnds each account for ~44% of the total time ↩︎
Yes, this sounds like a reasonable mitigation. Please raise a PR with your patch and we can iterate on GitHub. In particular, I imagine we'll want to check in CopyPropagation itself (which doesn't introduce dead-ends) and pass the result down to the canonicalization utility.
with the changes to skip the lifetime extension logic for dead-ends if we can deduce there are no dead-end blocks in a function, the scaling for these examples is improved to something like:
property count
256
512
1024
2048
compile time (s)
<1
<1
2.4
11
still a bit slower than it was in earlier compiler versions, but that's perhaps to be expected since more work has to be done until the OSSA lifetimes are complete throughout the pipeline. certainly good enough to fix the 'real world' code that was bumping into this though – thanks for all your help with this @nate_chandler!