I'm bumping this thread because
- This is an interesting performance issue
- which still affects me
- and has gotten worse with Xcode 9.4,
- and I'd also like to ask a couple of more general questions (about performance issues and the benchmark suite) for which this particular issue serves as a good example.
First As noted in SR-6983, this issue has gotten worse (with the default toolchain of Xcode 9.4):
The "2nd use of closure" is now 10 instead of 3 times slower than the others:
Click to see the same example compile & run as before but with Xcode 9.4
› swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
› swiftc -O -static-stdlib test.swift
› ./test
1st use of closure:
time: 0.0632649020117242 seconds (checksum: 581423683991283777)
time: 0.0617133129999274 seconds (checksum: -7306424433621360502)
time: 0.0625146849924931 seconds (checksum: 7081978004570742532)
time: 0.0610871280077845 seconds (checksum: -4065774701487485870)
time: 0.0624201809987426 seconds (checksum: -2784397570979284854)
minimum time: 0.0610871280077845 seconds
1st use of func:
time: 0.0610684939892963 seconds (checksum: 264423982952719367)
time: 0.0615423239942174 seconds (checksum: 2237211590060124107)
time: 0.061124823987484 seconds (checksum: -4267020249627113289)
time: 0.0619161030044779 seconds (checksum: 1792111283221657060)
time: 0.063269748003222 seconds (checksum: -1589300676560776483)
minimum time: 0.0610684939892963 seconds
2nd use of closure:
time: 0.699463352997554 seconds (checksum: -3918483693787300236)
time: 0.667777561000548 seconds (checksum: 1137324794268155767)
time: 0.664340454997728 seconds (checksum: -7811258734802726978)
time: 0.661137435992714 seconds (checksum: 1048546319275511909)
time: 0.666032235996681 seconds (checksum: 1052028149280526971)
minimum time: 0.661137435992714 seconds
2nd use of func:
time: 0.0614374390133889 seconds (checksum: 6639092839694586210)
time: 0.0611787349916995 seconds (checksum: -8533491229019487789)
time: 0.060995906998869 seconds (checksum: -4113116250669794077)
time: 0.0605206390027888 seconds (checksum: -4439758878194388265)
time: 0.0613778190017911 seconds (checksum: 1964471734991009996)
minimum time: 0.0605206390027888 seconds
So “2nd use of closure” is now 0.66 seconds. It was 0.18 seconds when I reported the issue (see compiler version in the example run in the report). The other ones are still 0.06 seconds.
Click to see test.swift (same as in OT, repeated here for completion)
import AppKit
func measureExecutionTime(label: String, subject: (Int) -> Int) {
let operationsPerTrial = 100_000_000
var randomIntValues = [Int](repeating: 0, count: operationsPerTrial)
var minTime = Double.infinity
print("\(label):")
for _ in 0 ..< 5 {
randomIntValues.withUnsafeMutableBytes { (ptr) -> Void in
let r = SecRandomCopyBytes(nil, ptr.count, ptr.baseAddress!)
precondition(r == errSecSuccess)
}
var checksum = Int(0)
let t0 = CACurrentMediaTime()
for randomInput in randomIntValues {
let r = subject(randomInput)
checksum ^= r
}
let t1 = CACurrentMediaTime()
let time = t1 - t0
print(" time:", time, "seconds (checksum: \(checksum))")
minTime = min(minTime, time)
}
print(" minimum time:", minTime, "seconds\n")
}
let identityClosure: (Int) -> Int = { return $0 }
func identityFunc(input: Int) -> Int { return input }
// Note that both have the same execution time here:
measureExecutionTime(label: "1st use of closure", subject: identityClosure)
measureExecutionTime(label: "1st use of func", subject: identityFunc)
// Note that identityClosure is slower this time:
measureExecutionTime(label: "2nd use of closure", subject: identityClosure)
measureExecutionTime(label: "2nd use of func", subject: identityFunc)
// Why?
Questions (about "non-benchmark-able" special cases):
We'll need some background in the form of a conversation that took place in another thread some time ago:
OK:
I'd say most of the performance issues I'm affected by are "non-benchmark-able special cases" as @moiseev calls them above, just like the one described in this thread.
I keep running into them, and most of the time it turns out to be one that I have already reported, but it takes time to isolate them, and it's impossible to build a strategy (as a programmer) for avoiding them, they're just too complex, especially since in real code there are often several of them interacting.
Also, if you are working on something where you don't know what the performance can be, then you have no idea if you are affected by these issues or not. A safe guess is that you probably are. So if you care about performance (I guess most people don't, but I have to, or the project will be unusable), you'll develop a habit of writing the same thing in many different ways, using less abstractions, etc. just so that you might discover that what you have now is actually 100 times slower than it could be, provided that the code is written in this magic way. And who knows, maybe you're still affected by some other issue and your computation could actually be 1000 times faster!
I am often unable to reason about or develop an intuition for the performance of my code, and these inconsistent performance issues easily eat up 80% of my work day.
And according to the conversation I quoted, the way forward seems to be to just continue like this: writing my performance sensitive code in multiple ways to see if I manage to hit or miss any of these issues, if so try to isolate them, see if I already reported them, if not report. And then (while I continue like this), I should trust that …
… or if it's non-benchmark-able, then …
(Please note that the following questions are sincere, even though I might seem a little bit upset :) .)
@moiseev: What are these other ways?
@palimondo: Is there any way that issues like this (ie SR-6983) can be (now or in the future) added to the benchmark suite? I suppose it would require that the test be compiled to -- and run as -- a stand alone executable (since a harness would inevitably modify the context of and possibly interfere with the tested code), and the test should interchange information (compiler flags to use and the result) with some test-driver (the program which compiles, runs and analyzes the test) via some protocol that is expressible enough for handling issues like SR-6983.
Also note that SR-6983 might afaics be affecting the benchmark suite itself, since the test is passed as a closure, depending on exactly how the harness and the test happens to be written (please correct me if I'm wrong).