Baseline and thresholds commands difference

Kyle-Ye · October 13, 2024, 11:27am

Got a little confusing for the baseline and thresholds commands usage.

For example, I've created some predefined thresholds here on Thresholds/DemoKit/1.0/DemoBenchmark.A.example1.p90.json

{
  "cpuTotal" : 409599,
  "instructions" : 6467583,
  "mallocCountTotal" : 4002,
  "peakMemoryResident" : 1680867327,
  "throughput" : 2677
}

And I can also update/create it using swift package --allow-writing-to-package-directory benchmark thresholds update --target DemoBenchmark --path ./Thresholds/DemoKit/1.0

After it, I can read it using the following commands.

swift package benchmark thresholds read --path ./Thresholds/DemoKit/1.0

But I can't read it using the following commands.

swift package benchmark baseline read --path ./Thresholds/DemoKit/1.0
swift package benchmark baseline read --check-absolute-path ./Thresholds/DemoKit/1.0

However, I can check them by the following both commands.

swift package benchmark baseline check --target ProtobufKitBenchmarks --check-absolute-path ./Thresholds/DemoKit/1.0
swift package benchmark thresholds check --target ProtobufKitBenchmarks --path ./Thresholds/DemoKit/1.0

Is there any difference here for the baseline check and thresholds check?
And how can I use baseline read here?

hassila · October 14, 2024, 6:30am

Hi Kyle,

Basically the new thresholds support formalizes the support for static thresholds, effectively replacing --check-absolute-path - see some more details here in the release notes (it has been removed from the latest documentation, but still works for backwards compatibility reasons):

I just checked and it seems for some reason up-to-date documentation has not been generated by SPI, please build it in Xcode and check the following updated documentation:

You can't use baseline functionality to read static thresholds, let me try to clarify:

Baselines are the named stored results from a benchmark run allowing for comparisons of different versions of the code base (e.g. a PR vs main, or for continuous local development/optimization). The thresholds defined in the code, are actually really threshold tolerances - specifying what deviation in absolute and relative terms for a given benchmark metric that are allowed between two baselines when using baseline check. The naming is a bit unfortunate, but can't change it now without breaking peoples code. For a baseline check operation that only specifies a single baseline, the benchmarks will be run and you effectively compare the single named baseline with a "current run". NB, the baseline format is an internal implementation detail and should not be checked into a repo. The kind of

The new thresholds command on the other hand, allows for storing the results of a benchmark as a static threshold level that can be checked into your repo. It allows for comparing either a current run or a named baseline to be compared to this static threshold (with the same threshold tolerances applied as defined by the benchmark code). So reading the thresholds is only supported by the new thresholds related subcommand - fundamentally --check-absolute-path should not be used. The naming of the code-defined thresholds is unfortunate, but try to think of them as "threshold tolerances" and it will be clearer.

If we do an API-breaking release sometime, we'd definitely want to clean that up (as well as likely moving to using macros similar to swift-testing), but that's not for now.

I will file a case with SPI and see why the docs weren't generated after release 125 (the main documentation is also stale) - please use the Xcode generated one by checking out package-benchmark and do "build documentation" in the meanwhile.

Hope that clarifies.

Kyle-Ye · October 20, 2024, 8:21am

Thanks for the clarification.

I now got the difference of baseline and threshold in SwiftBenchmark.

Another question is I have a user case to diff of my package's latest to my package v1.0 v.1.1 and also another package's v3.0 and v3.1 etc. (eg. ProtobufKit vs swift-protobuf and OpenCombine vs Combine)

See Add Benchmark support by Kyle-Ye · Pull Request #4 · OpenSwiftUIProject/ProtobufKit · GitHub

Is there any suggested ways doing/modeling it? Currently I am using 2 different target and manually comparing the result.

hassila · October 20, 2024, 3:35pm

Sure, that is straightforward - run the benchmarks with one release, save a baseline, check out the other release (switch branch/tag) and save another baseline, then just use baseline check and compare the two.

Kyle-Ye · October 21, 2024, 2:20am

run the benchmarks with one release, save a baseline, check out the other release (switch branch/tag) and save another baseline, then just use baseline check and compare the two.

It seems only works for the same package benchmark.

My concern also contains benchmarking 2 different but having similar API packages. (See example above)

hassila · October 21, 2024, 5:00am

You can have the benchmark in a subdirectory with a separate package.swift and pin the version (basically just don’t update it, just change branch for the parent directory) - having a separate benchmark subdirectory with a separat package.swift is common to not add dependencies to top project. Check out how swift foundation does it for a good example.

For different projects with similar api, perhaps easiest is to export to JMH - IIRC the JMH analyzer web page supported loading multiple result files for comparison - but you need to check - on mobile device right now and can’t verify.

hassila · October 21, 2024, 11:33am

Also, for multi-package comparisons it would perhaps be even better to completely break out the benchmark, depend on both packages you want to benchmark, and design benchmarks that are directly comparable - then you'll get output that is easily comparable with e.g. JMH, or even just simple metric grouping of output (swift package benchmark --grouping metric) for easy comparison.