Synthesized Equatable conformance for enum with associated value may have performance dependent on parameter order

I think static analysis and the optimizer are defeating your attempts to accurately benchmark.

The code that's synthesized to compare two enum values when some cases have associated values looks something like this:

switch (lhs, rhs) {
case let (.number(lhs_0, lhs_1), .number(rhs_0, rhs_1)):
  guard lhs_0 == rhs_0 else { return false }
  guard lhs_1 == rhs_1 else { return false }
  return true
case let (.atomic(lhs_0), .atomic(rhs_0)):
  guard lhs_0 == rhs_0 else { return false }
  return true
// ...
case (.prompt, .prompt): return true
case (.plus, .plus): return true
// ...
default: return false
}

It's ugly, but since IIRC multi-payload enums partition their cases into those with associated values and those without, it looks like the optimizer does a pretty good job of turning that into a two-level jump table based on the enum discriminator (godbolt link). So, it looks like parameter order might have an effect on whether you end up making one hop or two, but I think we're talking about a very small difference here.

But the results of your benchmark are suspicious, because minor tweaks to them are affecting the results; for example, just changing var x into let x shortens the time of the first trial and increases the time of the second trial (both substantially). So that makes me think other things are affecting it, like mutability affecting the way the closure captures the variable.

I ran some trials using GitHub - google/swift-benchmark: A swift library to benchmark code snippets. and used some helper functions (thanks to @dabrahams) that make sure the optimizer doesn't undo what we're trying to measure. The code is in this gist.

Using swift run -c release, the outcomes were nearly identical:

running x == .plus... done! (1263.63 ms)
running .plus == x... done! (1244.43 ms)

name       time      std        iterations
------------------------------------------
x == .plus 27.000 ns ± 354.22 %   10000000
.plus == x 26.000 ns ± 324.12 %   10000000

The only thing that's a little worrisome there is the very high standard deviation. But the total execution time of both trials was the nearly the same.

I'd be interested to hear from folks with more familiarity around enum layout and the optimizer, though, since there's probably a lot of subtleties I've missed here too!

3 Likes