I’ve tried doing that, but XCode uses swift-frontend instead of swiftc and many of the flags there are invalid for swiftc . The ones that I tried which worked didn’t make a difference.
I’ve also discovered that in the XCode build the compiler must be noticing that it’s a final class, because the number of calls to swift_retain is exactly the same as in the binary compiled with swiftc when the class is marked as final. I just don’t know why the compiler notices that when invoked from XCode, but not when invoked from swiftc.
I made a subclass of "Sudoku" class (and used it so it's not stripped out as unused) to force Sudoku class to be "actually" non-final – but that didn't change the timing of the binary made by Xcode. Likewise the timing of using a derived class instead of Sudoku was not affected either.
Do you recon it could affect the timing that much? In the above standalone test the extra time of doing 10M extra retain/release pairs is just 0.04 seconds, but maybe we are not accounting for some other difference.
We don’t actually know the number of calls to retain that the class version makes. Instrumentation slows it down too much. I suspect if I let this program run under DTrace without the timeout, it would run for about 2 days, which wouldn’t be fun. Also, the profile in Instruments.app is pretty clear about those calls being the bottleneck. (Counting it with breakpoints in lldb is even slower, so forget about it.)
That’s for class. For struct it’s 1463. For final class it’s 1443.
Anyway, as I said in my comment higher up, based on the profile, those calls amount to about 65% of the runtime of the class version, while they're insignificant for the other versions.
Well, at least that checks out: 6918135 * 200 would be 1383627000, at 230M retain/release per second that would add 6 seconds – which is indeed the difference you observe.