And 0.4.1/0.4.2 was released - if you are measuring very short time periods and don’t measure malloc or OS metrics, you’d want to update as the overhead for measuring have been significantly reduced (wouldn’t impact the quality of measurements, but would impact real world time spent waiting for the benchmark run for such setups).
Also fixed a bug for absolute thresholds where the units would be the same as measured (now instead concrete helpers for defining them, e.g. .mega(3) or .milliseconds(5).
.build/checkouts/package-benchmark/Sources/BenchmarkSupport/MallocStats/MallocStatsProducer+jemalloc.swift:109:59: error: cannot find 'MALLCTL_ARENAS_ALL' in scope
let result = mallctlnametomib("stats.arenas.\(MALLCTL_ARENAS_ALL).small.nrequests",
.build/checkouts/package-benchmark/Sources/BenchmarkSupport/MallocStats/MallocStatsProducer+jemalloc.swift:119:59: error: cannot find 'MALLCTL_ARENAS_ALL' in scope
let result = mallctlnametomib("stats.arenas.\(MALLCTL_ARENAS_ALL).large.nrequests",
[12/15] Emitting module BenchmarkSupport
building from source worked for me, here is what i put in my dockerfile if it helps anyone:
RUN sudo yum -y install bzip2 make
RUN curl https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2 -L -o jemalloc-5.3.0.tar.bz2
RUN tar -xf jemalloc-5.3.0.tar.bz2
RUN cd jemalloc-5.3.0 && ./configure && make && sudo make install
after all the usual swift toolchain dependencies.
make install installs the libraries in /usr/local/lib, which the plugin can’t find, so you also have to do:
Had a few questions come up while taking a look through the plugin:
google/swift-benchmark had an option where it would try to get a statistically meaningful performance value if no number of iterations was provided is that something that could be interesting?
Would adding a standard deviation be interesting? (I just mention it since google/swift-benchmark originally displayed that as well)
Could it be possible configure for benchmarks to be in different folders than Benchmarks as well? Ie a per target folder for example. I have a large project where a lot of modules are grouped together with custom path's for most targets, would be very convenient to also put benchmarks into these groupings.
I’ve never seen much use for ‘auto iterations’ in practice, as we’d usually tweak it with the combination of runtime / number of iterations - which would give more comparable test runs too - but maybe I’m missing something and would be happy to be convinced otherwise.
With regard to SD, I think it’s an error to have it in the first place in Google benchmark actually - see e.g. React San Francisco 2014 : Gil Tene - Understanding Latency - YouTube (or many other nice talks from Gil Tene) around the 30-minute mark - performance measurements aren’t normally distributed in practice, so it’s not a good model to try to fit into IMHO.
It’d make sense to support more flexibility for benchmark placement for more complex project layouts, maybe an optional prefix for the executable targets could be one way to do it (there’s no way to mark up targets with metadata as far as I know that we could use) - PR:s are welcome!
yes, i’ve found performance tends to be multi-modal (as they do mention in the video), and this defies easy summarization with a statistic like standard deviation. in my opinion you really have to view the histogram to read these sorts of measurements. like:
That's exactly the response I was hoping for haha. Thanks. I don't have that much experience with more than basic performance testing so that's good to know thanks for sharing.
I'll try to give a better example of a setup when I know how I'd ideally incorporate it with our project structure thanks! Then we might be able to get to a PR at some point :)!
Thanks for the response.
We haven’t, although the intention was to make it possible by pulling out the data from the JSON format to some external system like grafana if you want to plot performance over time.
Our primary use cases are a) to validate PR performance vs main to avoid merging in regressions and b) to provide a convenient workflow for engineers for improving key metrics such as malloc s/memory footprint/context switches etc, vs baselines when actively working on performance cases.
Is there a convenient way to run a single benchmark? Would be cool if we could somehow get the same way of running benchmarks as tests (buttons to run individual ones in Xcode). Not sure if that's possible however
It doesn’t work as xctest crashes with jemalloc that is used for the Malloc counters. That is only a problem with the proprietary xctest on macOS - the open source one on Linux works fine - I have a feedback open with Apple that was closed as they thought it was a problem with jemalloc - but had jemalloc engineers debug it and it seems xctest on macOS passes a pointer to jemalloc that was not allocated with jemalloc - so would need to fix that.
It also doesn’t build optimized for xctest targets as far as I understand? But maybe that is possible somehow.
There is probably some other issues with r gags to integration with eg. Swift argument parser etc (we need to be able to run from command line at Linux too), but that might be possible to split perhaps.