I'd like to share some exciting news about benchmarking:
We made some significant improvements for running the benchmarks in pull requests:
- It's now a lot faster: down to 30min from 2h (including the compiler build time)
- Reduced noise: almost no false alarms anymore
- Code size differences are now reported - for the benchmark object files and also for the Swift standard library files
- Some improvements of the report table format. For example, improvements are not folded by default but shown in the same table as regressiosn (we should be proud of improvements and not hide them!)
Currently the new feature can be tested with "@swift-ci smoke benchmark staging" and they will go live with "@swift-ci smoke benchmark" soon.
You can look at a test PR to see some sample output: [Do not merge] Test benchmark runs by eeckstein · Pull Request #18876 · apple/swift · GitHub
Now what about the non-smoke "@swift-ci benchmark"? Currently the only difference between smoke and non-smoke are the number of iterations. But as the new method reduces noise anyway, I'm actually thinking of making "smoke benchmark" the default, i.e. just having "@swift-ci benchmark" which does the new thing.
We hope that this is much more usable as before and will enable everyone to run the benchmarks for every (non-trivial) pull request.
If everything goes well, we are eventually planning to run the benchmarks by default for all "@swift-ci test"s. Because it's so fast now it will not add any time overhead.
If you have any comments or questions, please let me know
Erik
PS: credit goes to @palimondo, who initiated that effort in Towards Robust Performance Measurement