How to handle warm-up cost of a benchmark function

I have a benchmark function as defined below. It reports times in milliseconds (ms).

import Accelerate

func benchmark(_ label: String, niter: Int = 5, operation: () -> ()) {
    print("\n\(label)")
    var times = [Double](repeating: 0.0, count: niter)

    for i in 1...niter {
        let tic = Date.now
        operation()
        let toc = tic.timeIntervalSinceNow.magnitude
        times[i - 1] = toc
    }

    let totalTime = times.reduce(0, +)
    let avgTime = totalTime / Double(niter)

    print("""
    Min time (ms) ..... \(String(format: "%.4f", times.min()! * 1000))
    Max time (ms) ..... \(String(format: "%.4f", times.max()! * 1000))
    Total time (ms) ... \(String(format: "%.4f", totalTime * 1000))
    Avg time (ms) ..... \(String(format: "%.4f", avgTime * 1000))
    """)
}

Here I run three consecutive benchmarks. I expect these benchmarks to report similar times since each of them are timing the same operation.

benchmark("Array repeating") {
    let arr = [Double](repeating: 1.0, count: 20_000_000)
    print("First item in array:", arr[0])
}

benchmark("Array repeating") {
    let arr = [Double](repeating: 1.0, count: 20_000_000)
    print("First item in array:", arr[0])
}

benchmark("Array repeating") {
    let arr = [Double](repeating: 1.0, count: 20_000_000)
    print("First item in array:", arr[0])
}

Below is the output from running three consecutive benchmarks. Notice the max time for the first benchmark is about 21 ms but the max time for the following benchmarks is about 7 ms. This makes the first benchmark appear to be slower. Is there a warmup cost associated with calling my benchmark function? How can I avoid this or at least account for this effect in the benchmark function?

Array repeating
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
Min time (ms) ..... 7.9551
Max time (ms) ..... 21.8320  <-- Larger than the other max times
Total time (ms) ... 56.9900
Avg time (ms) ..... 11.3980

Array repeating
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
Min time (ms) ..... 6.8300
Max time (ms) ..... 7.3930  <-- Max time around 7 ms
Total time (ms) ... 36.0560
Avg time (ms) ..... 7.2112

Array repeating
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
First item in array: 1.0
Min time (ms) ..... 6.4270
Max time (ms) ..... 7.1180  <-- Max time around 7 ms
Total time (ms) ... 33.3430
Avg time (ms) ..... 6.6686

The usual way to deal with this is to run a few warmup iterations to make sure any initial setup cost is removed from the measurements (and making sure cpu caches have instructions and data primed).

This is something we do in eg. GitHub - ordo-one/package-benchmark: Swift benchmark runner with many performance metrics and great CI support By default so you don’t have to think about it.

I added a little warm-up loop in the benchmark (see below) but I still get results where the first benchmark takes longer than subsequent benchmarks.

func benchmark(_ label: String, niter: Int = 5, operation: () -> ()) {
    print("\n\(label)")
    var times = [Double](repeating: 0.0, count: niter)

    // Warm-up code
    for _ in 1...4 {
        operation()
    }

    // Time code for benchmark
    for i in 1...niter {
        let tic = Date.now
        operation()
        let toc = tic.timeIntervalSinceNow.magnitude
        times[i - 1] = toc
    }

    let totalTime = times.reduce(0, +)
    let avgTime = totalTime / Double(niter)

    print("""
    Min time (ms) ..... \(String(format: "%.4f", times.min()! * 1000))
    Max time (ms) ..... \(String(format: "%.4f", times.max()! * 1000))
    Total time (ms) ... \(String(format: "%.4f", totalTime * 1000))
    Avg time (ms) ..... \(String(format: "%.4f", avgTime * 1000))
    """)
}

You want to call the whole benchmark function, not just the operation - e.g. there might be some setup cost for getting the timestamp the first time.

How would I do that without calling the function multiple times? I could do what I did above which is

benchmark("Array repeating") { ... }

benchmark("Array repeating") { ... }

and ignore the result of the first benchmark. But I don't want to explicitly call the benchmark function at least twice. I don't understand how I could change the existing benchmark function to alleviate this requirement.

Why don’t you want to do that?

Try capturing the timestamp before running the benchmark and see if it helps, or use some standard benchmark package like Google-benchmark, package-benchmark or others depending on your need (who deals with and hides that).

So the function below seems to work where running consecutive benchmarks produces similar results. The only problem is when the operation function contains a print statement then the print output is shown during the warmup. Is there a way to suppress the print output during the warmup?

func benchmark(_ label: String, warmup: Int = 3, niter: Int = 5, operation: () -> ()) {
    print("\n\(label)")

    // Warmup entire benchmarking logic, results discarded
    for _ in 0..<warmup {
        var _ = [Double](repeating: 0.0, count: niter)
        for _ in 1...niter {
            let tic = Date.now
            operation()
            _ = tic.timeIntervalSinceNow.magnitude
        }
    }

    // Actual benchmark
    var times = [Double](repeating: 0.0, count: niter)

    for i in 1...niter {
        let tic = Date.now
        operation()
        let toc = tic.timeIntervalSinceNow.magnitude
        times[i - 1] = toc
    }

    let totalTime = times.reduce(0, +)
    let avgTime = totalTime / Double(niter)

    print("""
    Min time (ms) ..... \(String(format: "%.4f", times.min()! * 1000))
    Max time (ms) ..... \(String(format: "%.4f", times.max()! * 1000))
    Total time (ms) ... \(String(format: "%.4f", totalTime * 1000))
    Avg time (ms) ..... \(String(format: "%.4f", avgTime * 1000))
    """)
}