I do see an issue when running this code on godbolt itself... the "non-final class" version times out, while "final class" and "struct" version works ok.
Are you testing this on linux? (I'm on macOS and on slightly older compiler than that on the godbolt). To reiterate there is no timing difference on ARM macOS in class vs final class in this test.
I’m testing on macOS. The issue was also reproduced by the original author of the benchmarks who also run it on a Macbook and saw the same numbers for the class and struct versions that I’m seeing.
The point is that they are I/O operations that depend on the result of the computations. If you eliminate them, the compiler is free to just delete code which would otherwise be left in the binary in order to compute the printed values.
The point is not timing the prints. The point is seeing the difference between the struct and class versions (which both contain the prints, so there should be no issue with that).
Yeah, but print?! Its timing could vary for zillion reasons.
Try this:
var results: [String] = []
...
//print(String(out))
results.append(String(out))
...
a.solve(hard20[j]);
// print();
...
at the end:
print("done")
print(results.count)
However, even with those print's redacted - there's still a difference between "final class" and "class" I can see on godbolt... But not on my machine... go figure.
Open this directory in terminal and first run the following to refresh the sudo password:
sudo echo
And then this loop to count how many times swift_retain is called.
for ver in sudoku-struct sudoku-class sudoku-final-class; do echo -e "\n=========\nVERSION: $ver\n=========\n"; swiftc -Ounchecked "$ver.swift"; time sudo dtrace -c "./$ver" -n 'pid$target:libswiftCore.dylib:swift_retain:entry { @[probefunc] = count(); } profile:::tick-30s { printf("\nTIMEOUT\n"); exit(0); }'; done
I’ve added a 30s timeout, because the class version, when instrumented like that, seems happy to just go on forever. So each test finishes either when the program successfully exits or when the 30s timeout is reached, whichever comes first.
My results:
=========
VERSION: sudoku-struct
=========
dtrace: system integrity protection is on, some features will not be available
dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
done
4000
dtrace: pid 58283 has exited
swift_retain 292003
sudo dtrace -c "./$ver" -n 1.99s user 0.79s system 42% cpu 6.577 total
=========
VERSION: sudoku-class
=========
dtrace: system integrity protection is on, some features will not be available
dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
CPU ID FUNCTION:NAME
4 140 :tick-30s
TIMEOUT
swift_retain 10632060
sudo dtrace -c "./$ver" -n 2.79s user 27.38s system 99% cpu 30.366 total
=========
VERSION: sudoku-final-class
=========
dtrace: system integrity protection is on, some features will not be available
dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
done
4000
dtrace: pid 58403 has exited
swift_retain 288003
sudo dtrace -c "./$ver" -n 2.31s user 0.78s system 93% cpu 3.317 total
The struct version makes 292003 calls to swift_retain in 6.577s and exits. The final class version makes 288003 calls to swift_retain in 3.317s and exits. The class version makes 10632060 calls to swift_retain in 30s and hits timeout (without the timeout it was happy to go on for even 20 minutes with no end in sight).
swiftc version:
$ swiftc --version
swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0
Since yesterday I have exactly the same swift version (installed by Xcode 15.1 (15C65)). Could the speed difference be just because retains/releases are much slower on M1 2020 compared to M1 Pro 2021? However... this won't explain why "final" makes a difference for you but not for me.
A quick test for those who want to try retain/release overhead:
// main.swift
import Foundation
class C {
@discardableResult init() {
let o = unsafeBitCast(self, to: Int.self)
let start = Date()
for _ in 0 ..< 10_000_000 {
retain(o)
release(o)
}
let elapsed = Date().timeIntervalSince(start)
print("elapsed: \(elapsed)")
}
}
C()
Then I tried "swift -O sudoku-class.swift" instead of "swiftc -O sudoku-class.swift" - the result was significantly faster.
Previously I used Xcode to build the binary, then regardless of whether I run the built app from Xcode or from terminal - it was fast.
At least this explains the timing difference we observe. But the open question is why building with "swiftc" makes a slower executable compared to the one made with both "swift" and Xcode.
both "swiftc --version" and "swift --version" give this result:
swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0
BTW, after I installed the new (to me) Xcode 15.1 I did not install the corresponding command line tools afterwards... Should I?