Towards Robust Performance Measurement

Actually the biggest problem with "noisy" benchmarks was the code alignment problem (IRGen, benchmarks: add an option -align-module-to-page-size for benchmarking by eeckstein · Pull Request #18318 · apple/swift · GitHub), which is fixed now.

I suggest to see how the new benchmarking method goes (Improved benchmarking for pull requests) and if there are still noisy benchmarks because of a too long iteration time, lets fix those benchmarks.

Mass-renaming/changing benchmarks creates a lot of burden for our performance tracking and I don't see how we can spend time on working on this.