Missed optimizations due to integer overflow checks

I noticed that integer overflow checks in Swift sometimes cause Swift code to "miss out" on certain compiler optimizations that Clang or GCC perform for "equivalent" C/C++ code (equivalent, that is, if it weren't for the overflow checks).

I don't know anything about the existing optimization passes in the compiler, but I was wondering if it's possible to write an optimizer pass that eliminates some overflow checks (because it can prove that a particular value will never overflow/underflow) and thus unlock later optimizations that only work without the overflow check.

Example: popcount / nonzeroBitCount

Here's a recent example from Matt Godbolt:

Compiler Explorer workspace for the following example (covering both Swift and C++:

Matt’s example is a C++ function that uses a loop to count the number of non-zero bits in an integer:

#include <cstdint>

unsigned population_count(std::uint64_t value) {
  unsigned result = 0;
  while (value) {
    // Clear bottom set bit
    value &= value - 1;
    ++result;
  }
  return result;
}

Matt shows that the compiler (both GCC and Clang) recognizes what this loop does and replaces the loop with a popcount instruction that modern CPUs have:

population_count(unsigned long):
        fmov    d0, x0
        cnt     v0.8b, v0.8b
        addv    b0, v0.8b
        fmov    w0, s0
        ret

(Matt uses GCC and Intel assemly in his article. I'm using Clang and ARM assembly. It doesn't really matter.)

If we translate the C++ code to Swift, we might expect the compiler to produce identical assembly. Here’s my Swift translation:

// With overflow checking (default)
func populationCount1(_ value: UInt) -> Int {
    var value = value
    var result = 0
    while (value != 0) {
        value &= value - 1
        // Implicit overflow check here
        result += 1
    }
    return result
}

(Note: Don’t use this in production. Swift has this operation built-in as nonzeroBitCount in the standard library.)

But, when compiling this code with optimizations (-O), Swift 6.2 produces the following assembly:

output.populationCount1(Swift.UInt) -> Swift.Int:
        cbz     x0, .LBB1_5
        mov     x9, xzr
.LBB1_2:
        adds    x8, x9, #1
        b.vs    .LBB1_6
        sub     x10, x0, #1
        add     x9, x9, #1
        ands    x0, x10, x0
        b.ne    .LBB1_2
        mov     x0, x8
        ret
.LBB1_5:
        mov     x0, xzr
        ret
.LBB1_6:
        ; Raise a CPU exception, i.e. trap
        brk     #0x1

Notes:

  • This code is much longer than what Clang generated. The Swift compiler hasn't eliminated the while loop.
  • The b.vs .LBB1_6 instruction in line 6 is the overflow check: a conditional jump to the brk instruction at the end if the overflow flag is set.

We can prove that the overflow check is the “culprit” here by replacing the += operator with the wrapping variant &+=. Everything else stays the same:

// Without overflow checking
func populationCount2(_ value: UInt) -> Int {
    var value = value
    var result = 0
    while (value != 0) {
        value &= value - 1
        // Use wrapping operator to remove the overflow check
        result &+= 1
    }
    return result
}

With this change, Swift generates the identical code as Clang (see on Compiler Explorer).

Could we eliminate the overflow check?

I was wondering if the optimizer could handle this better: As far as I can tell, it's possible to prove that the result variable can never overflow/underflow in this particular case:

  • result is initialized to 0.
  • result is only ever incremented by 1 (in the loop body).
  • The loop body won’t ever execute more than UInt.bitWidth times.

Questions:

  1. Am I correct or did I overlook something?
  2. Would it be possible in principle to write a compiler optimization pass that eliminates the overflow check here? I imagine this would suffice for the later loop deletion pass to recognize the structure of the loop and eliminate that.
  3. Is this a good idea?
12 Likes

I just took a quick look out of curiosity.

Swift produces pretty much the same code if you use -Ounchecked instead of -O.

I'm not too familiar with clang. It's very possible -O in clang is simply -Ounchecked in Swift compiler.

No, C and C++ simply don't have overflow checks. Any math operator on an unsigned variable in either language is defined to wrap.

5 Likes

I believe that with -Ounchecked, integer overflow triggers undefined behavior instead of just wrapping around. So both signed and unsigned non-wrapping integer arithmetic in Swift would be analogous to signed integer arithmetic in C.

I don't work on the compiler at all, but I do know that there's an optimization pass dedicated to removing overflow checks: RedundantOverflowCheckRemoval.cpp. It appears (with my very limited knowledge) to only have knowledge of arithmetic operations like + and *, not bitwise operations.

Although (again, with my very limited knowledge) it could be that if the compiler ever becomes smart enough to realize that value &= value - 1 always zeroes out exactly one bit of value, it'd also be smart enough to realize that the entire operation is just nonzeroBitCount.

6 Likes

Thanks for the link to the existing optimization pass, that's very interesting.

1 Like