Benchmark package initial release

hassila · September 27, 2022, 3:51pm

I'm happy to announce that a first release of the Benchmark SwiftPM command plugin is available which supports both macOS and Linux (tested on Ubuntu) with Swift 5.6 or later.

There's a sample project showing various Benchmark API usage and ready-to-use GitHub CI workflows samples (it is highly recommended to use dedicated GitHub runners for any real benchmarking in the CI pipeline).

There's also good support for local developer workflows with comparisons of various baselines.

It's intended to be suitable for both ad-hoc smaller benchmarks primarily caring about runtime as well for more extensive benchmarks caring about additional benchmark metrics such as memory allocations, syscalls, thread usage as well as support for custom metrics.

A sample benchmark can simply be:

import BenchmarkSupport                 // import supporting infrastructure
@main extension BenchmarkRunner {}      // Required for main() definition to not get linker errors

@_dynamicReplacement(for: registerBenchmarks) // Register benchmarks
func benchmarks() {

    Benchmark("Small benchmark", throughputScalingFactor: .mega) { benchmark in
        for x in 0..<benchmark.throughputScalingFactor.rawValue {
            blackHole(x)
        }
    }
}

and give output similar to:

it also provides support for delta comparisons:

Looking forward to people trying it out and to any feedback and PR:s are very much welcome.

Cheers,

Joakim

taylorswift · September 27, 2022, 4:37pm

wow, great work Joakim! i will definitely be integrating this into my libraries when i get a chance!

David_Smith · September 27, 2022, 8:17pm

This is very cool. I love that it can track mallocs too!

hassila · September 27, 2022, 8:20pm

Thanks! Yeah, malloc tracking is very important for us - it does require you to install jemalloc though to facilitate that, but I thought it was a reasonable tradeoff. There are tons of memory allocator stats available, it’d be possible to surface additional ones if any makes sense in the future (basically anything in the “mallctl namespace” from JEMALLOC )

ktoso · September 28, 2022, 1:52am

Very nice work! Super happy that you followed through on the promise for the plugin

This looks like a great foundation we can keep building on...

Quick skim feedback:

I like the deltas, nice work!
I'm hopeful for more output formats in the future, would be nice if we could get JMH compatible outputs (because of how nice visualizers exist for those: https://jmh.morethan.io )
the way to declare benchmarks is a bit weird, have you considered something more along the lines of what multi-node tests do in distributed actors?
- inherit from a protocol that makes it easier to find all the customization points (func configure...)
- then have let benchmarkSomething which are discovered, rather than building them inside of a function func benchmarks() { (the dynamic replacement is pretty weird to be honest, a "let decl as test-case/benchmark" is more natural), you can look at the plugin below to find out how to discover those.
- i.e. swift-distributed-actors/MultiNode+ReceptionistTests.swift at main · apple/swift-distributed-actors · GitHub
- alternatively, perhaps a let benchmarks = .make { ... } or something if you still want the result builder perhaps?
should warmup be warmupIterations and not just a boolean?
love the allocation tracking integration, didn't get to check for correctness yet but that's awesome to have <3

Overall looking great and I can't wait to put it to good use

hassila · September 28, 2022, 8:00am

Only had a short look at JMH, looks feasible and could be a future addition - except for "raw data" - but maybe the visualisation tool can work without that - I don't keep the raw samples around, as memory consumption would be prohibitive for many of our tests, so the implementation do linear bucketing (10K buckets, for whatever units that are in use) and falls back on power-of-two buckets for anything outside that range. (opened Investigate supporting JMH format for output · Issue #4 · ordo-one/package-benchmark · GitHub for that)

Yeah, I spent (way too much) time with different approaches, don't remember all the details, but I ran into a few different problems with regard to Swift Argument Parser integration and the protocol approach that I couldn't get working. May revisit later, but couldn't spend more time there at the moment (as the current declaration is similar to e.g. Google benchmark I'd expect it to be fairly acceptable) - it's not truly a result builder, but just faking it with a discardable result init. The good thing is that no special hooks are needed for shared setup, it can just be done outside the Benchmarks themselves and only be run once.

Currently it's running 3 iterations if you want warmup - I don't particularly mind making it configurable, but I'm actually curious if there are any reasons to run more than a few iterations (basically to fill any caches) when using a 'proper' compiled language?

Thanks! Let me know if you see any incorrect behaviour (the repo is open for filing issues), it took some digging of the jemalloc stats until I've got what I believe is the correct counters to reflect what one expects.

hassila · October 3, 2022, 9:51am

For anyone playing with it, you might want to update to 0.3.3 just shipped that fixes a few niggles (and allows for free naming of benchmark targets as long as they are in Benchmarks).

ktoso · October 5, 2022, 10:51am

I've been using it a little and have some more feedback, though sadly can't open issues on the tracker directly right now.

I find it very confusing that each benchmark decides independently what unit to report the results. It's a pain in the neck to have to manually remember and multiply between ns, us, ms because the benchmark infra decides to report one test as ns and another as us but I want to compare them. Sure, one is by far larger result than the other, but I really don't want to do additional math on results: they should be plain as day: n > m, and no other math i need to be applying to compare two results.

Can we add some desired, or actually "resultsTimeUnit" for an entire benchmark suite run?

More feedback coming soon, I like the library but have some small things that annoy during normal usage here and there. Overall great start though, very happy about the effort.

hassila · October 5, 2022, 11:08am

Issues should be open for creating, or why can't you open issues?

But specifying time units for the test is possible already, see e.g. this sample

I.e

    Benchmark("Force milliseconds time unit ", 
               timeUnits: .milliseconds) { benchmark in
    }

Or am I missing something?

hassila · October 5, 2022, 11:09am

Or re-reading, you'd like a convenience to set it for multiple benchmarks in one go?

ktoso · October 5, 2022, 11:11am

Thanks I didn't notice this I think.

Yes, for an entire suite; it's annoying to have to remember to set it on every benchmark I'm adding.

Minor bug as well: if you register two benchmarks with the same name one of the benchmarks is silently swallowed and never runs -- this can happen when you dupe benchmarks and just tweak them a bit -- instead this should result in a crash please

hassila · October 5, 2022, 12:57pm

Have a spin of 0.3.4, addresses both issues:

@_dynamicReplacement(for: registerBenchmarks)
func benchmarks() {

    Benchmark.defaultBenchmarkTimeUnits = .nanoseconds // make all benchmarks in the suite use these

    Benchmark("Foundation Date()") {
...
    }

ktoso · October 5, 2022, 1:16pm

Nice, thanks. Small little improvements adding up

I look forward to polishing up this plugin

hassila · October 6, 2022, 10:04am

Some more polish (and source breaking type change warmups -> warmupIterations):

Major cleanup is that it's now possible to set all parameters as defaults for a whole benchmark suite, also cleaned up delta output so it's easier to understand what thresholds were broken for a PR.

hassila · October 11, 2022, 8:28pm

And 0.4.1/0.4.2 was released - if you are measuring very short time periods and don’t measure malloc or OS metrics, you’d want to update as the overhead for measuring have been significantly reduced (wouldn’t impact the quality of measurements, but would impact real world time spent waiting for the benchmark run for such setups).

Also fixed a bug for absolute thresholds where the units would be the same as measured (now instead concrete helpers for defining them, e.g. .mega(3) or .milliseconds(5).

Release notes for both are at:

taylorswift · November 26, 2022, 7:43pm

hi Joakim, i’m having trouble compiling the plugin on Amazon Linux 2.

there were no instructions for installing libjemalloc-dev on Amazon Linux, so i installed it with the following:

$ sudo amazon-linux-extras install -y epel
$ sudo yum install -y jemalloc-devel

which installed

Installed:
  jemalloc-devel.x86_64 0:3.6.0-1.el7                                                                                                   

Dependency Installed:
  jemalloc.x86_64 0:3.6.0-1.el7

when i try to compile the plugin i get:

.build/checkouts/package-benchmark/Sources/BenchmarkSupport/MallocStats/MallocStatsProducer+jemalloc.swift:109:59: error: cannot find 'MALLCTL_ARENAS_ALL' in scope
            let result = mallctlnametomib("stats.arenas.\(MALLCTL_ARENAS_ALL).small.nrequests",
                                                          ^~~~~~~~~~~~~~~~~~
.build/checkouts/package-benchmark/Sources/BenchmarkSupport/MallocStats/MallocStatsProducer+jemalloc.swift:119:59: error: cannot find 'MALLCTL_ARENAS_ALL' in scope
            let result = mallctlnametomib("stats.arenas.\(MALLCTL_ARENAS_ALL).large.nrequests",
                                                          ^~~~~~~~~~~~~~~~~~
[12/15] Emitting module BenchmarkSupport

hassila · November 26, 2022, 8:00pm

Hmm, not at computer right now, but checked our CI and we seem to have a 5.x.x version of jemalloc, so would guess that symbol is missing in 3.x. Any way to get a newer version installed?

Haven’t tried with anything except Ubuntu unfortunately yet (and macOS of course).

hassila · November 26, 2022, 8:02pm

Might also be an issue with not finding the header file for jemalloc, but then swiftpm should have given you a warning IIRC.

hassila · November 26, 2022, 8:05pm

Quick google gave one hint that source install might be needed (with steps):

taylorswift · November 26, 2022, 8:21pm

building from source worked for me, here is what i put in my dockerfile if it helps anyone:

RUN sudo yum -y install bzip2 make
RUN curl https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2 -L -o jemalloc-5.3.0.tar.bz2
RUN tar -xf jemalloc-5.3.0.tar.bz2
RUN cd jemalloc-5.3.0 && ./configure && make && sudo make install

after all the usual swift toolchain dependencies.

make install installs the libraries in /usr/local/lib, which the plugin can’t find, so you also have to do:

$ sudo ldconfig /usr/local/lib