Help me with maximizing performance of some code translated from C to Swift

"Computing the number of digits of an integer even faster": https://twitter.com/lemire/status/1400533024488443906

His implementation in C:

int int_log2(uint32_t x) { return 31 - __builtin_clz(x | 1); }

int fast_digit_count(uint32_t x) {
  static uint64_t table[] = {
      4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
      12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
      21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
      25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
      34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
      38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
      42949672960, 42949672960};
  return (x + table[int_log2(x)]) >> 32;
}

I translate that into Swift:

extension UInt32 {
    func log2() -> Int {
        bitWidth - 1 - (self | 1).leadingZeroBitCount
    }

    var digits: Int {
        // Lookup table can only support up to UInt32, not 64bit
        let table: [UInt64] = [
            4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
            12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
            21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
            25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
            34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
            38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
            42949672960, 42949672960
        ]

        return Int((UInt64(self) + table[log2()]) >> 32)
    }
}

I want to know: did I do it right?

In Swift, what can be done to maximize performance (such as adding @inline)?

BTW, the (self | 1) in log2() is to avoid getting undefined value for zero.

Edit: I just realize his code only works for 32bit Int, anyway, ignore this error for my question....

2 Likes

This is not necessary in Swift. leadingZeroBitCount is fully defined for all input values (it will save a test and branch when targeting older x86 CPUs, however).

The issue that you're going to have in Swift is around initialization and bounds checks for table. You can work around the initialization problems by defining the table in a .h file instead, but it makes your code a bit grosser if you do this; this is something that the language and optimizer should make easier to do.

You may be interested in reading various proposals for fixed size arrays (and/or homogeneous aggregates / tuples), which would hopefully provide a good means to resolve the problems here.

In the short term, the best option for this specific operation is to just put the known-good c code in a .h file and use that directly from Swift.

// fastDigitCount.h:
int int_log2(uint32_t x) { return 31 - __builtin_clz(x | 1); }

static inline __attribute__((always_inline))
int fast_digit_count(uint32_t x) {
  static uint64_t table[] = {
      4294967296,  8589934582,  8589934582,  8589934582,  12884901788,
      12884901788, 12884901788, 17179868184, 17179868184, 17179868184,
      21474826480, 21474826480, 21474826480, 21474826480, 25769703776,
      25769703776, 25769703776, 30063771072, 30063771072, 30063771072,
      34349738368, 34349738368, 34349738368, 34349738368, 38554705664,
      38554705664, 38554705664, 41949672960, 41949672960, 41949672960,
      42949672960, 42949672960};
  return (x + table[int_log2(x)]) >> 32;
}
// import bridging header or module containing fastDigitCount.h
extension UInt32 {
  var digits: Int { Int(fast_digit_count(self)) }
}
4 Likes

Swift seems to have gotten a little better at constant propagating lets, but it's still not very good at it. Something like C++20's constinit would be nice here.

Terms of Service

Privacy Policy

Cookie Policy