Benchmark stable release

hassila · March 23, 2023, 8:30am

Hi,

We're happy to announce that Benchmark now has shipped with what we believe is an API stable interface (which follows semantic versioning) with significantly improved capabilities compared to the initial release last fall (several thanks to the new underpinning Histogram foundation) .

Shiny new documentation is available outlining both setup and how to write benchmarks, largely thanks to @Joseph_Heck and the DocC team (and SPI for hosting!).

There are also some sample code on GitHub.

For those who have existing benchmarks there are a few (mostly search-and-replace) changes to the API between 0.9.0 and 1.2.0 which is the latest release - please see the release notes for migration notes (or check the changes to the sample code project above).

Thanks to the use of a built tool plugin, the boilerplate needed is now trivial:

import Benchmark

let benchmarks = {
    Benchmark("MyBenchmark") { benchmark in
      // Something to measure here
    }

    Benchmark("MyOtherBenchmark") { benchmark in
      // Something to measure here
    }
}

New is also the ability to run benchmark on a platform that does not have jemalloc available by setting up an environment variable that disables the jemalloc requirement:

> BENCHMARK_DISABLE_JEMALLOC=true swift package benchmark

Each benchmark is now run completely isolated so resident memory statistics etc should be consistent across runs now.

It's also possible to use --filter, --skip, --target and --skip-target with Regex to choose what benchmarks should be run.

Multiple output formats is now supported, including raw percentile data as well as data suitable for visualisation with online tools such as JMH and HDR Histogram plots or even piping output to e.g. Youplot.

There's also the support for grouping per metric or delta comparisons:

Also, there is extensive support for automation of benchmarks through CI, where either two branches (e.g. main and PR) can be compared with custom deviation thresholds configured (absolute and/or relative) per metric supported by Benchmark.

Additionally, it's now also possible to check benchmarks results against an absolute set of values (useful if the number of permutations of toolchain/OS/repo is big, to reduce the build matrix) - although for most projects automated checks of PR vs main would be recommended.

Many thanks to the people who've provided feedback and please file issues / PR:s (or DM me here if you want offline feedback).

Cheers,

Joakim

hassila · March 27, 2023, 10:09am

We just added a significant new feature with 1.3.0 - the ability to track ARC traffic (thanks to those who helped make it happen!).

These metrics are very useful for a few interesting use cases:

Zero ARC traffic regressions benchmark checks (code that never should emit any ARC traffic can easily be tested by CI now, either for zero ARC or for expected X...)
Optimizations of ARC traffic (quick iterations without having to run instruments, also on Linux)
Check that retain/release delta is zero (for most benchmarks that would be the expected result, if you have a retain cycle it will quickly run up, so good to check with CI too)

We find this a great complement to the memory allocation stats also in place, hope it'll be useful!

ktoso · March 27, 2023, 10:14am

Thanks a lot for adding the ARC tracking @hassila, this is looking very good

I'm actually wondering if you'd like a category under Related Projects - Swift Forums for the project, rather than re-using the same "initial announcement" thread? I think benchmarking is an important topic and the project is shaping up very well, so I'd be happy to assist in getting that set up

hassila · March 27, 2023, 10:17am

Would be super, thanks!

hassila · March 27, 2023, 2:04pm

Already helped find one funny thing:

github.com/apple/swift

TaskGroup.addTask seems to not use standard _swift_retain calls

opened 02:02PM - 27 Mar 23 UTC

hassila

bug triage needed

**Description** It seems that that TaskGroup.addTask doesn't properly use the _…swift_retain runtime calls. We found this when testing https://github.com/ordo-one/package-benchmark that has hooks added to get metrics for the aggregate number of retain/release calls. As you can see in the [sample code here](https://github.com/ordo-one/external-reproducers/blob/bd7e6607930aa707694ef8897c1e45020f492269/swift/retain-release-diff/Benchmarks/retain-release-diff/retain_release_diff.swift#L17) the number of superfluous releases matches the number of addTask calls exactly, so we assume it is not using the normal _swift_retain runtime hook that package-benchmark uses for measuring. **Steps to reproduce** ```bash hassila@max /tmp> gh repo clone ordo-one/external-reproducers ... hassila@max /tmp> cd external-reproducers/swift/retain-release-diff/ hassila@max /t/e/s/retain-release-diff (main)> swift package benchmark ... ==================================================================================================== Baseline 'Current run' ==================================================================================================== Host 'max.local' with 10 'arm64' processors with 64 GB memory, running: Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 =================== retain-release-diff =================== Retain/release deviation ╒════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕ │ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │ ╞════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡ │ Releases │ 2368 │ 2369 │ 2369 │ 2369 │ 2369 │ 2369 │ 2373 │ 3425 │ ├────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ Retain / Release Δ │ 789 │ 789 │ 789 │ 789 │ 789 │ 789 │ 793 │ 3425 │ ├────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ Retains │ 1579 │ 1580 │ 1580 │ 1580 │ 1580 │ 1580 │ 1580 │ 3425 │ ╘════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛ ``` **Expected behavior** I expect `Retain / Release Δ` to be zero for this test.  **Environment** - Swift compiler version info swift-driver version: 1.62.15 Apple Swift version 5.7.1 (swiftlang-5.7.1.135.3 clang-1400.0.29.51) Target: arm64-apple-macosx13.0 - Xcode version info Xcode 14.1 Build version 14B47b - Deployment target: macOS 13

hassila · March 28, 2023, 4:34pm

Added a convenience benchmark creation command in the spirit of swift package init which will create the initial boilerplate and add the target for you (dependency still must be added manually)

So getting started is faster than ever:

Add the dependency to package-benchmark
Run swift package --allow-writing-to-package-directory benchmark init MyNewBenchmark
Make any changes to the boilerplate so you measure the right stuff
Run it with swift package benchmark