I noticed that integer overflow checks in Swift sometimes cause Swift code to "miss out" on certain compiler optimizations that Clang or GCC perform for "equivalent" C/C++ code (equivalent, that is, if it weren't for the overflow checks).
I don't know anything about the existing optimization passes in the compiler, but I was wondering if it's possible to write an optimizer pass that eliminates some overflow checks (because it can prove that a particular value will never overflow/underflow) and thus unlock later optimizations that only work without the overflow check.
Example: popcount / nonzeroBitCount
Here's a recent example from Matt Godbolt:
Compiler Explorer workspace for the following example (covering both Swift and C++:
Matt’s example is a C++ function that uses a loop to count the number of non-zero bits in an integer:
#include <cstdint>
unsigned population_count(std::uint64_t value) {
unsigned result = 0;
while (value) {
// Clear bottom set bit
value &= value - 1;
++result;
}
return result;
}
Matt shows that the compiler (both GCC and Clang) recognizes what this loop does and replaces the loop with a popcount instruction that modern CPUs have:
population_count(unsigned long):
fmov d0, x0
cnt v0.8b, v0.8b
addv b0, v0.8b
fmov w0, s0
ret
(Matt uses GCC and Intel assemly in his article. I'm using Clang and ARM assembly. It doesn't really matter.)
If we translate the C++ code to Swift, we might expect the compiler to produce identical assembly. Here’s my Swift translation:
// With overflow checking (default)
func populationCount1(_ value: UInt) -> Int {
var value = value
var result = 0
while (value != 0) {
value &= value - 1
// Implicit overflow check here
result += 1
}
return result
}
(Note: Don’t use this in production. Swift has this operation built-in as nonzeroBitCount in the standard library.)
But, when compiling this code with optimizations (-O), Swift 6.2 produces the following assembly:
output.populationCount1(Swift.UInt) -> Swift.Int:
cbz x0, .LBB1_5
mov x9, xzr
.LBB1_2:
adds x8, x9, #1
b.vs .LBB1_6
sub x10, x0, #1
add x9, x9, #1
ands x0, x10, x0
b.ne .LBB1_2
mov x0, x8
ret
.LBB1_5:
mov x0, xzr
ret
.LBB1_6:
; Raise a CPU exception, i.e. trap
brk #0x1
Notes:
- This code is much longer than what Clang generated. The Swift compiler hasn't eliminated the
whileloop. - The
b.vs .LBB1_6instruction in line 6 is the overflow check: a conditional jump to thebrkinstruction at the end if the overflow flag is set.
We can prove that the overflow check is the “culprit” here by replacing the += operator with the wrapping variant &+=. Everything else stays the same:
// Without overflow checking
func populationCount2(_ value: UInt) -> Int {
var value = value
var result = 0
while (value != 0) {
value &= value - 1
// Use wrapping operator to remove the overflow check
result &+= 1
}
return result
}
With this change, Swift generates the identical code as Clang (see on Compiler Explorer).
Could we eliminate the overflow check?
I was wondering if the optimizer could handle this better: As far as I can tell, it's possible to prove that the result variable can never overflow/underflow in this particular case:
resultis initialized to 0.resultis only ever incremented by 1 (in the loop body).- The loop body won’t ever execute more than
UInt.bitWidthtimes.
Questions:
- Am I correct or did I overlook something?
- Would it be possible in principle to write a compiler optimization pass that eliminates the overflow check here? I imagine this would suffice for the later loop deletion pass to recognize the structure of the loop and eliminate that.
- Is this a good idea?