Why is this closure slower the second time it is used?

Jens · May 30, 2018, 10:49pm

I'm bumping this thread because

This is an interesting performance issue
which still affects me
and has gotten worse with Xcode 9.4,
and I'd also like to ask a couple of more general questions (about performance issues and the benchmark suite) for which this particular issue serves as a good example.

First As noted in SR-6983, this issue has gotten worse (with the default toolchain of Xcode 9.4):

The "2nd use of closure" is now 10 instead of 3 times slower than the others:

Click to see the same example compile & run as before but with Xcode 9.4

› swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
› swiftc -O -static-stdlib test.swift
› ./test
1st use of closure:
  time: 0.0632649020117242 seconds (checksum: 581423683991283777)
  time: 0.0617133129999274 seconds (checksum: -7306424433621360502)
  time: 0.0625146849924931 seconds (checksum: 7081978004570742532)
  time: 0.0610871280077845 seconds (checksum: -4065774701487485870)
  time: 0.0624201809987426 seconds (checksum: -2784397570979284854)
  minimum time: 0.0610871280077845 seconds

1st use of func:
  time: 0.0610684939892963 seconds (checksum: 264423982952719367)
  time: 0.0615423239942174 seconds (checksum: 2237211590060124107)
  time: 0.061124823987484 seconds (checksum: -4267020249627113289)
  time: 0.0619161030044779 seconds (checksum: 1792111283221657060)
  time: 0.063269748003222 seconds (checksum: -1589300676560776483)
  minimum time: 0.0610684939892963 seconds

2nd use of closure:
  time: 0.699463352997554 seconds (checksum: -3918483693787300236)
  time: 0.667777561000548 seconds (checksum: 1137324794268155767)
  time: 0.664340454997728 seconds (checksum: -7811258734802726978)
  time: 0.661137435992714 seconds (checksum: 1048546319275511909)
  time: 0.666032235996681 seconds (checksum: 1052028149280526971)
  minimum time: 0.661137435992714 seconds

2nd use of func:
  time: 0.0614374390133889 seconds (checksum: 6639092839694586210)
  time: 0.0611787349916995 seconds (checksum: -8533491229019487789)
  time: 0.060995906998869 seconds (checksum: -4113116250669794077)
  time: 0.0605206390027888 seconds (checksum: -4439758878194388265)
  time: 0.0613778190017911 seconds (checksum: 1964471734991009996)
  minimum time: 0.0605206390027888 seconds

So “2nd use of closure” is now 0.66 seconds. It was 0.18 seconds when I reported the issue (see compiler version in the example run in the report). The other ones are still 0.06 seconds.

Click to see test.swift (same as in OT, repeated here for completion)

import AppKit

func measureExecutionTime(label: String, subject: (Int) -> Int) {
    let operationsPerTrial = 100_000_000
    var randomIntValues = [Int](repeating: 0, count: operationsPerTrial)
    var minTime = Double.infinity
    print("\(label):")
    for _ in 0 ..< 5 {
        randomIntValues.withUnsafeMutableBytes { (ptr) -> Void in
            let r = SecRandomCopyBytes(nil, ptr.count, ptr.baseAddress!)
            precondition(r == errSecSuccess)
        }
        var checksum = Int(0)
        let t0 = CACurrentMediaTime()
        for randomInput in randomIntValues {
            let r = subject(randomInput)
            checksum ^= r
        }
        let t1 = CACurrentMediaTime()
        let time = t1 - t0
        print("  time:", time, "seconds (checksum: \(checksum))")
        minTime = min(minTime, time)
    }
    print("  minimum time:", minTime, "seconds\n")
}

let identityClosure: (Int) -> Int = { return $0 }
func identityFunc(input: Int) -> Int { return input }

// Note that both have the same execution time here:
measureExecutionTime(label: "1st use of closure", subject: identityClosure)
measureExecutionTime(label: "1st use of func", subject: identityFunc)

// Note that identityClosure is slower this time:
measureExecutionTime(label: "2nd use of closure", subject: identityClosure)
measureExecutionTime(label: "2nd use of func", subject: identityFunc)

// Why?

Questions (about "non-benchmark-able" special cases):

We'll need some background in the form of a conversation that took place in another thread some time ago:

Suitable form for discussing and optimizing small programs to help improving compiler/optimizer/stdlib?

Just a quick question: Which of the 3 categories would the following two issues (from my above list) be?

SR-7023 - Missed optimization opportunity, but not missed if wrapping code in a nested func

SR-6983 - Unexpected drop in performance of (identity-)closure the second time it is used

I’m not at all sure, but I’d guess they’d both be 3 ( Reduced case where the performance problem manifests itself ), and if so I should add them as PRs to the Swift Benchmark Suite.

Looking at the way those benchmarks are constructed (test is specified in a func, which seems to then be stored in a property, and later called from within a specific context), I fail to see how these two particular performance problems could/should be turned into such benchmarks (since they are about unexpected differences in performance depending on context, things like whether some code is wrapped within a nested func or not, or if a closure has been passed as an argument once or twice before it is heavily used etc.). A similar example would be if an issue manifested itself only in global scope, not otherwise, I can’t see how such an issue could be added as a benchmark in the current design of the Benchmark Suite.

OK:

I'd say most of the performance issues I'm affected by are "non-benchmark-able special cases" as @moiseev calls them above, just like the one described in this thread.

I keep running into them, and most of the time it turns out to be one that I have already reported, but it takes time to isolate them, and it's impossible to build a strategy (as a programmer) for avoiding them, they're just too complex, especially since in real code there are often several of them interacting.

Also, if you are working on something where you don't know what the performance can be, then you have no idea if you are affected by these issues or not. A safe guess is that you probably are. So if you care about performance (I guess most people don't, but I have to, or the project will be unusable), you'll develop a habit of writing the same thing in many different ways, using less abstractions, etc. just so that you might discover that what you have now is actually 100 times slower than it could be, provided that the code is written in this magic way. And who knows, maybe you're still affected by some other issue and your computation could actually be 1000 times faster!

I am often unable to reason about or develop an intuition for the performance of my code, and these inconsistent performance issues easily eat up 80% of my work day.

And according to the conversation I quoted, the way forward seems to be to just continue like this: writing my performance sensitive code in multiple ways to see if I manage to hit or miss any of these issues, if so try to isolate them, see if I already reported them, if not report. And then (while I continue like this), I should trust that …

… or if it's non-benchmark-able, then …

(Please note that the following questions are sincere, even though I might seem a little bit upset :) .)

@moiseev: What are these other ways?

@palimondo: Is there any way that issues like this (ie SR-6983) can be (now or in the future) added to the benchmark suite? I suppose it would require that the test be compiled to -- and run as -- a stand alone executable (since a harness would inevitably modify the context of and possibly interfere with the tested code), and the test should interchange information (compiler flags to use and the result) with some test-driver (the program which compiles, runs and analyzes the test) via some protocol that is expressible enough for handling issues like SR-6983.

Also note that SR-6983 might afaics be affecting the benchmark suite itself, since the test is passed as a closure, depending on exactly how the harness and the test happens to be written (please correct me if I'm wrong).