Why is someArray.forEach { _ in ... } faster than someRange.forEach { _ in ... } for same number of elements?

young · June 6, 2021, 9:47pm

I'm trying to measure the performance of some func:

    let randomNumbers = (1...10_000_000).map { _ in Int.random(in: 0...999) }
    let number = Int.max

        self.measure {
// Time: 2.466 sec
            randomNumbers.forEach { _ in
                let _ = number.digits2
                let _ = number.digits2
                let _ = number.digits2
            }
        }


// vs.


        self.measure {
// Time: 4.389 sec
            (1...10_000_000).forEach { _ in
                let _ = number.digits2
                let _ = number.digits2
                let _ = number.digits2
            }
        }

the second measure is much slower

Int.digits2

extension Int {
    var digits: Int {
        // Call this C func
        Int(count_digit(Int64(self)))
    }

    var digits2: Int {
        self < 10 ? 1 :
        self < 100 ? 2 :
        self < 1_000 ? 3 :
        self < 10_000 ? 4 :
        self < 100_000 ? 5 :
        self < 1_000_000 ? 6 :
        self < 10_000_000 ? 7 :
        self < 100_000_000 ? 8 :
        self < 1_000_000_000 ? 9 :
        self < 10_000_000_000 ? 10 :
        self < 100_000_000_000 ? 11 :
        self < 1_000_000_000_000 ? 12 :
        self < 10_000_000_000_000 ? 13 :
        self < 100_000_000_000_000 ? 14 :
        self < 1_000_000_000_000_000 ? 15 :
        self < 10_000_000_000_000_000 ? 16 :
        self < 100_000_000_000_000_000 ? 17 :
        self < 1_000_000_000_000_000_000 ? 18 :
        19
    }
}

Ben_Cohen · June 6, 2021, 11:31pm

Are you asking about debug performance not release?

young · June 6, 2021, 11:35pm

Oh, is the diff is because debug build?

I thought the two should generate the same code. But even if it's different, the Range one for should be faster but it's actually slower.

Ben_Cohen · June 6, 2021, 11:40pm

In debug, the compiler won't specialize the underlying implementation of generic types, or eliminate bounds checks or other redundant work. This is massively inefficient for supposed "zero-cost" abstractions like 0...n or forEach (btw, I'd suggest you don't use forEach for this, just use a for loop - not a performance thing, just a style thing).

It so happens the abstractions for ClosedRange<Int> underperforms Array<Int> in these circumstances. The compiler does actually have some hacks in it that "pre-specialize" certain optimizations even in debug, and it may be the array version is benefiting from these.

If you turn on optimizations, you'll probably see the range example outperform the array example – though in your sample code the range example constant folds away completely. You'd need to do some opaque work with the integers inside the loop to measure the true performance difference.

young · June 7, 2021, 12:19am

I don't know how to run the test target in "release" mode. When I edit the Test scheme to "Release" Xcode complains my app module is not compiled for test...:(

I change to for-in:

// Time: 1.689 sec
for _ in randomNumbers { ... }
// Time: 2.705 sec
for _ in 1...10_000_000 {

So for-in array is still faster. But this is in "debug" mode.

How can I my test in "release"?

Even though I am not testing in "release" mode. I think I am pretty sure my Swift digits2 is faster than C++ count_digit() comparing the time of each in debug mode test because I don't think there is much different between debug and release compile for the two.

andreas66 · June 7, 2021, 7:32am

I guess testing requires debug build. You can build for release by changing the Xcode scheme to release from the scheme settings. Or if using the Swift package manager, build from the command line:

swift build -c release

And measure the performance:

  // Start the timing
  let start = Date()
  // Do your thing here
  let duration = start.distance(to: Date())
  print(" >>>> Time \(duration) secs.")

Performance testing with debug produces really different numbers from release.

lukasa · June 7, 2021, 10:07am

You can compile this in release mode in Godbolt by passing -O to the compiler. I've taken the liberty of updating your Godbolt sample to split the two chunks of code into functions and then pass -O: Compiler Explorer. This code also actually does something with the computation to ensure that the compiler doesn't entirely optimise the code away.

You'll notice that the result of the first change here is that, in release mode, the range-based version is vastly better than the Array-based one. Here is the complete assembly code for the range-based version:

output.withRange() -> Swift.Int:
  mov eax, 570000000
  ret

That is, the compiler has observed that the result is entirely constant: the loop iterations are constant, the input number (Int.max) is constant, and so the result is statically known. The compiler cannot do this with the Array-based implementation and so it is hilariously slow in comparison.

But that's not really fair, so let's refactor again and pass both the loop iterations and the input number in separately. This version is here: Compiler Explorer. Here both versions are very similar, but the Array-based version first spends it time allocating and populating an Array that it does not need. The range-based implementation again performs better.

young · June 7, 2021, 4:16pm

Thank you!

I'm surprised -O generate vastly different code for digits2. So I'm wrong in assuming debug and -O should be not much different for such code.

lukasa · June 7, 2021, 4:18pm

In Swift, debug code and optimised code are almost completely unrelated. As @Ben_Cohen said above, "zero-cost" abstractions in Swift are only zero-cost in release mode. In debug mode there is much more state being kept around.

This has been said on the forums before but I'll say it again here, it is never worth profiling debug mode code unless the specific thing you care about is how your code runs in debug mode. This is rarely the thing you care about.

Karl · June 7, 2021, 4:39pm

You can run tests in release mode in Xcode by clicking the scheme (the thing at the top to the right of the start/stop buttons), selecting "Edit scheme", and changing the test action's build configuration to "Release":

young · June 7, 2021, 7:20pm

I did that but then Xcode complain "app module is not compile for testing".

I want to compare C++ vs. Swift. So to get around this error, I thought I could just make my code include in test target membership. But I can see no option to do this for C++ code.

Karl · June 7, 2021, 7:44pm

It sounds like your test code is using a @testable import. You’ll need to use a regular import if you build tests in release mode.

C++ code can only be accessed via a C interface.

QuinceyMorris · June 7, 2021, 7:58pm

That's not what the Test "action" is for. It's for automated integration testing.

Instead, create a copy of this scheme. In the copy, change the Run action (immediately above the Test action) from Debug configuration to Release configuration.

Then choose whichever scheme you want to evaluate your code performance in, and use the regular Run command/button/menu item.

Alternatively, if you want to use Instruments to measure performance, use the Profile action (immediately below the Test action). Again, you can choose whether to use the Debug or Release configuration for this, and you can have multiple schemes with different choices. In this case, start the measurement with Profile instead of Run.

young · June 7, 2021, 8:33pm

I did make an extern "C" count_digit(...) that calls the rest of C++ code.