Agree that popcount can be better for bignums (with a large amount of bits representation). However, it may introduce the following problems:
[1] For some hardwares without the popcnt like instructions, it probably ends up counting the 1-bits naively. It would be an overkill, especially for bignums.
[2] The popcount intrinsic requires the number of bits determined at compile time. As a result, it would narrow down the scope of the functionality from BinaryInteger to FixedWidthInteger.
If we use the short pass / long pass implementation as shown below, querying for a constant 2 can be as efficient as isPowerOf2, on condition that the type Self is obvious enough (e.g., Int, Uint) to compiler to eliminate the other == 2 check. It may be challenging when Self is some complicated types.
@inlinable
public func isPower(of other: Self) -> Bool {
if other == 2 {
// short path
return self > 0 && self & (self - 1) == 0
}
// long path
...
}
If you define isPowerOf2 instead of isPower(of:), you can define it as a computed property, since it has little cost and doesn't require any arguments. Defining it on non-integers may not have as much value, either, at least not for the use cases I envision.
I have done some experiments to compare the performance of different strategies, including isPowerOf2 (the straight forward solution), isPowerOf2_ctpop (the popcount implementation), and isPower(of:) (the more general interface). The table below shows the execution time (in sec) for type Int under the -O optimization level across a range of devices.
.
iMac (3.8G Core i5)
Macbook Pro (2.9G Core i7)
iPhone X Max
iPhone X
iPhone 6S
iPad Air (A1474)
isPowerOf2
0.083
0.098
0.121
0.148
0.194
0.305
isPowerOf2_ctpop
0.135
0.154
0.121
0.148
0.222
0.343
isPower(of: 2)
0.082
0.097
0.121
0.148
0.193
0.305
isPower(of: 3)
1.330
1.472
1.442
1.603
3.102
3.775
It can be seen that:
The popcount implementation can be as efficient as isPowerOf2 on some latest devices (e.g., iPhone X and iPhone X Max), but overall is less performant.
In addition, isPower(of: 2) (i.e., the slow path of the general interface isPower(of:)) is as efficient as isPowerOf2.
Moreover, the complexity of isPower(of: non2) is O(log(self) with base non2), and it can be an order of magnitude slower than the other three O(1) scenarios. I think it is acceptable because checking for powers of non-2 bases is much less commonly used, and there isn't any known efficient algorithm.
The results for other built-in integer types have a similar pattern to this one, so they are not given in this table. The code of such experiment is shown below:
import Foundation
extension BinaryInteger {
// Proposed solution
@inlinable public var isPowerOf2: Bool {
return self > 0 && self & (self - 1) == 0
}
// Alternative solution #1
//@{
@inlinable public func isPower(of other: Self) -> Bool {
if other == 2 { return self.isPowerOf2 } // fast path
return self._isPower(of: other) // slow path
}
@usableFromInline internal func _isPower(of other: Self) -> Bool {
// If self is 1, then it is always true.
if self == 1 { return true }
// Now self cannot be 1, so it is false when other is 1.
if other == 1 { return false }
// If other is 0, then self must be 0; if other is -1, then self must be -1.
if (other == 0) || (other < 0 && other + 1 == 0) { return self == other }
// At this point, we have abs(other) > 1.
guard self % other == 0 else { return false }
let bound = self / other
var x: Self = 1
while (x > 0 ? x : 0 - x) < (bound > 0 ? bound : 0 - bound) { x *= other }
return x == bound
}
//@}
}
extension FixedWidthInteger {
// Alternative solution #2
@inlinable public var isPowerOf2_ctpop: Bool {
return self > 0 && self.nonzeroBitCount == 1
}
}
func timing(title: String, op: ()->()) {
let t1 = CFAbsoluteTimeGetCurrent() // <------ Start timing
op()
let t2 = CFAbsoluteTimeGetCurrent() // <------ End timing
let d = String(format: "%.3f", t2 - t1)
print("\(title): execution time is \(d) s.")
}
func measure<T: FixedWidthInteger>(_ :T.Type, _ repeating: Int = 100_000_000) {
timing(title: "\(T.self).isPowerOf2 ", op: {
var n: T = T(1) << (T.bitWidth - 2)
for _ in 0..<repeating {
if n.isPowerOf2 {
n += 1
} else {
n -= 1
}
}
guard n != 0 else { exit(-1) }
})
timing(title: "\(T.self).isPowerOf2_ctpop", op: {
var n: T = T(1) << (T.bitWidth - 2)
for _ in 0..<repeating {
if n.isPowerOf2_ctpop {
n += 1
} else {
n -= 1
}
}
guard n != 0 else { exit(-1) }
})
timing(title: "\(T.self).isPower(of: 2) ", op: {
var n: T = T(1) << (T.bitWidth - 2)
for _ in 0..<repeating {
if n.isPower(of: 2) {
n += 1
} else {
n -= 1
}
}
guard n != 0 else { exit(-1) }
})
timing(title: "\(T.self).isPower(of: 3) ", op: {
var n: T = T(1) << (T.bitWidth - 2)
for _ in 0..<repeating {
if n.isPower(of: 3) {
n += 1
} else {
n -= 1
}
}
guard n != 0 else { exit(-1) }
})
}
measure(UInt8.self)
measure(UInt16.self)
measure(UInt32.self)
measure(UInt64.self)
measure(Int8.self)
measure(Int16.self)
measure(Int32.self)
measure(Int64.self)
Agree that when it comes to the isPowerOf2 form, we'd go with a computed property. I think isPower(of:) appears to have better consistency with the existing isMultiple(of:), though.
Hi Steve, I did some experiments regarding the performance. The result shows popcount == 1 can be as efficient as n & (n - 1) == 0 at some latest iPhone devices, even for small fixed-width integers.
Hi Nevin, according to the experiments, isPower(of: 2) is as performant as isPowerOf2 for those built-in integer types. I think it is because the compiler does a good job eliminating the other == 2 check in isPower(of:), and thus generates code identical to isPowerOf2.
while x.magnitude < bound.magnitude { x *= other }
• • •
Somewhat off-topic, but on platforms where popCount is fast, remainders modulo certain integers (including most small integers) can be calculated quickly as well. For example:
extension UInt64 {
func mod3() -> UInt64 {
let mask: UInt64 = 0xAAAA_AAAA_AAAA_AAAA // bits: 1010...
let p1 = self.nonzeroBitCount
let p2 = (self & mask).nonzeroBitCount
let p = UInt64(p1 &+ p2) // max 96, then 7
return (p < 3) ? p : (p &- 3).mod3() // max 3 recursions
}
}
magnitude looks much better, and it boosts performance to some extent as well. As for the remainders modulo algorithm, it took me some time to figure it out how it works; and yes, it is brilliant to use the fact (4 * x + y) % 3 == (x + y) % 3 to quickly converge the recursion.
If your base is itself a power of two—such as in isPower(of: 4) and isPower(of: 8)—I believe you can first check for isPowerOf2 and then bit-mask with a pattern of the form 0b...1010101010101 (for base 4) or 0b...1001001001001 (for base 8), etc. I'm not too sure how to create the pattern without a loop, but with a constant base the pattern could end up optimized to a compile-time constant.
Huh? Where are you getting this from? The complexity of testing if an integer is a perfect power is something like O(n log^2 n), where n is the number of bits in self (i.e. essentially O(log(self))). I'm not doing the full analysis right now, but basically you can estimate the exponent it would have if it were a perfect power very quickly (count the bits in self and other, divide), then compute that power via repeated squaring (dominated by the cost of the final multiplication, so instead O(n^2 some_log_nonsense) if you use naive multiplication), and test for equality, which is O(n).
For small fixed-width integers, you can do better, but in those cases it's O(1) anyway.
Apologies for the handwavy analysis, but the complexity of this operation certainly is not exponential. There may be an even better option than the algorithm I sketched out above, but I would need to check in my copy of Computer Arithmetic.
I have no comment about the implementation, but if we end up exposing a power of two computed property, the spelling isPowerOfTwo seems more consistent with existing names than isPowerOf2. Compare BinaryFloatingPoint property trailingZeroBitCount and FloatingPoint properties isZero and ulpOfOne, all of which spell out the number instead of using a numeral.
I am thinking of if we should also have a dedicated optimization when base is 10, which may be a bit more commonly used. However, I am not aware of any efficient implementation in this case.
So the algorithm is to look at the number of trailing zeros to get the exponent. Then we make sure column 2 matches the last two bits of the exponent and column 3 matches 0b01. Since column 1 is left unchecked, we don't know for sure we have a power of 10.
But do we have a fast way to compute a power of 10? Since we can easily find the exponent from the bit pattern we could just compute 10^exponent and that'd give us an exact result to check for. That'd be better than just knowing you may have a power of 10.
There aren't that many powers of 10 that can fit in 64 bits anyway. Let's make a lookup table with 20 entries that'll give us a very quick check for powers of 10 in the 64-bit range. Some other algorithm can be used above that.
Full implementation (fast path up to 64-bit)
var isPowerOf10: Bool {
let exponent = trailingZeroBitCount
switch exponent {
case 0: return self == 1 as UInt8
case 1: return self == 10 as UInt8
case 2: return self == 100 as UInt8
case 3: return self == 1000 as UInt16
case 4: return self == 10000 as UInt16
case 5: return self == 100000 as UInt32
case 6: return self == 1000000 as UInt32
case 7: return self == 10000000 as UInt32
case 8: return self == 100000000 as UInt32
case 9: return self == 1000000000 as UInt32
case 10: return self == 10000000000 as UInt64
case 11: return self == 100000000000 as UInt64
case 12: return self == 1000000000000 as UInt64
case 13: return self == 10000000000000 as UInt64
case 14: return self == 100000000000000 as UInt64
case 15: return self == 1000000000000000 as UInt64
case 16: return self == 10000000000000000 as UInt64
case 17: return self == 100000000000000000 as UInt64
case 18: return self == 1000000000000000000 as UInt64
case 19: return self == 10000000000000000000 as UInt64
default:
// if this is 64-bit or less we can't have a higher power of 10
if self.bitWidth <= 64 { return false }
// fail early it not matching the easy-to-check bit pattern
if !self.perhapsIsPowerOf10 { return false }
// now time for the slow path
return self._isPower(of: 10)
}
}
Note: Beyond 64 bits, the performance characteristics will be uneven.
Generalizing for all bases multiple of 10
You can also generalize the code above to other bases that are a multiple of 10 by adding a precheck of the base-10 exponent found in the bit pattern:
func isPower(ofMultipleOf10 base: Int) -> Bool {
precondition(base.isMultiple(of: 10))
let exponentOf10 = trailingZeroBitCount // exponent for base 10
let exponentOfBase = base.trailingZeroBitCount
if !exponentOf10.isMultiple(of: exponentOfBase) { return false }
return self.isPowerOf10
}
To generalize this further (for math fun), you're observing that 10^x = 2^x * 5^x, and that 5^x % 4 is always 1. So for any isPower(of:) argument, you should be able to do something like this:
func isPower(of base: Int) -> Bool {
// some limitation on negative numbers I haven't thought through
let trailingZeroMultiplier = base.trailingZeroBitCount
guard trailingZeroMultiplier != 0 else {
return self.isPowerSlow(of: base)
}
let exponent = self.trailingZeroBitCount / trailingZeroMultiplier
return (self >> self.trailingZeroBitCount).isParticularPower(of: base, exponent: exponent)
}
For the switch statement, we test trailingZeroBitCount in the first place, and then check if self equals a particular 10^n (up to 64-bit). Can we get rid of trailingZeroBitCount to make it straight forward? It could be something like:
var isPowerOf10: Bool {
if self == 1 as UInt8 ||
self == 10 as UInt8 ||
self == 100 as UInt8 ||
...
self == 10000000000000000000 as UInt64 { return true }
// if this is 64-bit or less we can't have a higher power of 10
if self.bitWidth <= 64 { return false }
// fail early it not matching the easy-to-check bit pattern
if !self.perhapsIsPowerOf10 { return false }
// now time for the slow path
return self._isPower(of: 10)
}