Synthesized Equatable conformance for enum with associated value may have performance dependent on parameter order

RonAvitzur · November 22, 2021, 5:50pm

While profiling my app, one of the top functions is OpCode.__derived_enum_equals(::) for the enum with associated values below. I was surprised to discover a 5.9x performance difference in the synthesized function depending on parameter order.

Running the code below in a command-line app, compiled optimized with safety checks disabled on my M1 mini gives:

.plus == x returns 1000000000 in 0.32s
x == .plus returns 1000000000 in 1.88s

The asymmetry is dependent on the number of cases in the enum. Remove one case and then both parameter orders are fast.

import Foundation

enum OpCode : Equatable {
    case number(Double, String = "")
    case atomic(String)
    case string(String)
    case matrix(_ rows: Int, _ cols: Int)
    case prompt, plus, minus, times, divide, power, list, equal, notequal, less, leq, greater, geq, elementof,
         sin,  cos,  tan,  csc,  sec,  cot,  asin,  acos,  atan,  acsc,  asec,  acot,
         sinh, cosh, tanh, csch, sech, coth, asinh, acosh, atanh, acsch, asech, acoth,
         index, function, substitution, factorial, superscriptT, plusorminus, minusorplus, sgn, log, sqrt,
         grad, div, curl, lap, optotal, oppartial, sum, product, id, abs, overline, sn, choice, interval,
         dot, cross, floor, ceil, ln, exp, arg, real, imag, Re, Im, prime, hat, besselj, conditional, piecewise,
         integral, mod, min, max, clamp, Count, Sum, Mean, StdDev, slider, set, Transpose, Trace, Det,
         Rotate, Interpolate, And, Or, Not, Inf, Sup, lim, partial, total, dataIndex, atan2, square, powfrac,
         piecewiseConditional, labeledPiecewise(Int)
}

func time(_ description: String, function: () -> Int) {
    let iterations = 1000000000
    let start = DispatchTime.now().uptimeNanoseconds
    var sum = 0
    for _ in 1 ... iterations { sum += function() }
    let end = DispatchTime.now().uptimeNanoseconds
    print("\(description) returns \(sum) in \(String(format: "%.2f", Double(end - start)/1.0e9))s")
}

var x = OpCode.string("x")
time(".plus == x") { .plus == x ? 0 : 1}
time("x == .plus") { x == .plus ? 0 : 1}

Reported in FB9775661

Any suggestions for a work-around other than writing func == out the long way?

tera · November 22, 2021, 9:11pm

e.g. have OpCode as a struct with the union of all fields (tag + 2 x Int + Double + String). Worth to make String optional in this case. It's less type safe but hopefully more efficient. I guess that'd require many changes to the app, so I'd probably go with the manual func == implementation.

In general case `a == b` and `b == a` execution times can be very different. a trivial example inside.

struct CrazyInt: Equatable {
    var int: Int
    init(_ int: Int) { self.int = int }
    public static func == (a: Self, b: Self) -> Bool {
        if (a.int % 2) == 1 {
            sleep(1)
        }
        return a.int == b.int
    }
}

let a = CrazyInt(0)
let b = CrazyInt(1)
a == b // quick
b == a // slow

allevato · November 22, 2021, 9:14pm

I think static analysis and the optimizer are defeating your attempts to accurately benchmark.

The code that's synthesized to compare two enum values when some cases have associated values looks something like this:

switch (lhs, rhs) {
case let (.number(lhs_0, lhs_1), .number(rhs_0, rhs_1)):
  guard lhs_0 == rhs_0 else { return false }
  guard lhs_1 == rhs_1 else { return false }
  return true
case let (.atomic(lhs_0), .atomic(rhs_0)):
  guard lhs_0 == rhs_0 else { return false }
  return true
// ...
case (.prompt, .prompt): return true
case (.plus, .plus): return true
// ...
default: return false
}

It's ugly, but since IIRC multi-payload enums partition their cases into those with associated values and those without, it looks like the optimizer does a pretty good job of turning that into a two-level jump table based on the enum discriminator (godbolt link). So, it looks like parameter order might have an effect on whether you end up making one hop or two, but I think we're talking about a very small difference here.

But the results of your benchmark are suspicious, because minor tweaks to them are affecting the results; for example, just changing var x into let x shortens the time of the first trial and increases the time of the second trial (both substantially). So that makes me think other things are affecting it, like mutability affecting the way the closure captures the variable.

I ran some trials using GitHub - google/swift-benchmark: A swift library to benchmark code snippets. and used some helper functions (thanks to @dabrahams) that make sure the optimizer doesn't undo what we're trying to measure. The code is in this gist.

Using swift run -c release, the outcomes were nearly identical:

running x == .plus... done! (1263.63 ms)
running .plus == x... done! (1244.43 ms)

name       time      std        iterations
------------------------------------------
x == .plus 27.000 ns ± 354.22 %   10000000
.plus == x 26.000 ns ± 324.12 %   10000000

The only thing that's a little worrisome there is the very high standard deviation. But the total execution time of both trials was the nearly the same.

I'd be interested to hear from folks with more familiarity around enum layout and the optimizer, though, since there's probably a lot of subtleties I've missed here too!

RonAvitzur · November 22, 2021, 10:29pm

Ah, thank you. That makes a lot more sense.