Given a storage location, we're trying to determine if the location is stored before being loaded from, within this basic block. If this is not the case, then the value is "live" on entry to the block, meaning it must be known to be initialized. After computing this "local" information, we run an analysis on the control flow graph to determine if indeed, the value is initialized on entry to the block, regardless of which control flow path was taken to get there.
There are several ways to improve it, I think. The fundamental issue here is that instead of computing the liveness of all storage locations simultaneously, we're performing a separate dataflow analysis for each location. So even if the dataflow analysis requires no fixpoint iteration (so it's O(n) in the number of basic blocks) you're still going to get O(mn) run time where n is the number of blocks and m is the number of storage locations. This is completely unnecessary and could be eliminated with better book-keeping.
A smaller change would be to perhaps amend LiveOutBlockState
to record the indices of the store instructions that appear in the block, instead of a single HasNonLoadUse
boolean. After all, we scan each basic block to compute the LiveOutBlockState
originally, and then we're scanning each basic block again, for each value.
As you observed, in the limiting case where each 'basic block' is a single instruction there's no bottleneck, so there's nothing inherent about this algorithm that forces a quadratic amount of computation. It's just an artifact of the implementation.
As far as I know, neither the implementation nor the user-visible behavior of definite initialization is documented anywhere except for the source code. However, it is an instance of a Dataflow analysis, which is a topic covered in any compiler textbook, such as Engineering a Compiler. Dataflow analyses are often used to eliminate redundant loads and stores, detect loads of uninitialized memory, and so on.