I have run Benchmark_O with --num-iters=100 on my machine for the the
whole performance test suite, to get a feeling for the distribution of
benchmark samples, because I also want to move the Benchmark_Driver to
use MEAN instead of MIN in the analysis.
I'm concerned about that, especially for microbenchmarks; it seems to me
as though MIN is the right measurement. Can you explain why MEAN is
Using MEAN wasn’t part of the aforementioned SR-4669. The purpose of that
task is to reduce the time CI takes to get useful results (e.g. by using 3
runs as a baseline). MEAN isn’t useful if you’re only gathering 3 data
Current approach to detecting performance changes is fragile for tests that
have very low absolute runtime, as they are easily over the 5%
improvement/regression threshold when the test machine gets a little bit
noisy. For example in benchmark on PR #9806
<[stdlib] String index interchange, etc. by dabrahams · Pull Request #9806 · apple/swift · GitHub
BitCount 12 14 +16.7% 0.86x
SuffixCountableRange 10 11 +10.0% 0.91x
MapReduce 303 331 +9.2% 0.92x
These are all false changes (and there are quite a few more there).
To partially address this issue (I'm guessing) the last SPEEDUP column
sometimes features mysterious question mark in brackets. Its emitted when
the new MIN falls inside the (MIN..MAX) range of the OLD baseline. It is
not checked the other way around.
I'm suggesting to use MEAN value in combination with SD
(standard-deviation) to detect the changes (improvements/regressions). At
the moment, this is hard to do, because the aggregate test results reported
by Benchmark_O (and co.) can include anomalous results in the sample
population that messes up the MEAN and SD, too. Currently it is only
visible in the high sample range - the difference between reported MIN and
MAX. But it is not clear how many results are anomalous.
Currently I'm working on improved sample filtering algorithm. Stay tuned
for demonstration in Benchmark_Driver (Python), if it pans out, it might be
time to change adaptive sampling in DriverUtil.swift.
On Tue, May 16, 2017 at 9:10 PM, Dave Abrahams via swift-dev < firstname.lastname@example.org> wrote:
on Thu May 11 2017, Pavol Vaskovic <swift-dev-AT-swift.org> wrote:
On Wed, May 17, 2017 at 1:26 AM, Andrew Trick <email@example.com> wrote: