We have a very weird performance observation that we'd be super happy if someone could shed any light on.
As we are toiling on with performance optimisations, we see a significant performance difference when running an optimised build in Xcode (no debugger attached) or running that same executable from Finder.
Running the exact same workload with CPU profiling enabled gives a ~60% different runtime for us - which would be understandable if not for the fact that the workload consistently runs FASTER when under measurements.
So we can see ~5 seconds runtime for the optimised binary in Xcode (or started from Finder, no difference) and ~3 seconds when running under Instruments with CPU profiling.
What gives?
We'd rather not have to recommend our users to run our app under Instruments with a profile build. :-) :-)
Dunno if this is the solution you're looking for, but Xcode started gathering code coverage by default a while back and it tripped me up because Instruments doesn't. I've been diligent about turning it off whenever I want to measure things in Xcode.
Yeah, we do - the data sets we test with now makes it a prerequisite to maintain sanity ;-)
Major other issue related to that is that Xcode always builds for both x86 and arm when building optimized / profiling (no way to turn that off) - so we get to wait 2x the time for builds even when we’ll never ship on x86 (as the debug builds runs too slow), so got to pick our poison there - but I digress.
It’s really appropriate, we don’t use any x86 hardware and all our customers will be on M3+ in production. But even if I disable it in the project settings, it will still build for both - seems really to be no way to get around it….
I think my first step for figuring out what's going on would be to use sample to sample the slow version of the app, then compare the profile to the fast version in Instruments. In theory a 1.66x difference should show up pretty nicely.
Thanks David, makes sense, will try that when back in the office tomorrow - just have this feeling that it will boost and go faster if I turn on sampling (JK)
Ok, maybe I start to understand something - the resulting .app is arm64 only - it seems that the unnecessary compiling may be SPM dependencies that aren't picking up the build architectures. Unfortunately we have quite a few such dependencies (the Xcode project is a small part of the overall code base built, the majority by far is SPM).
I believe you can make swift build only a single architecture for a package from the command line, but I don't think you can configure a Package that way. @NeoNacho?
The only way we have for that is using xcodebuild on the CLI. If you pass the excluded arches build setting as an override there, it'll apply to packages as well. In the UI, there's no facility to do this.
Aha, ouch, thanks. That’s ok for CI, but a bit painful for normal workflow - hope it’ll be possible somehow in the future - at least before x86 is deprecated in macOS :-)
So back from the digression, will try the sample to understand the root issue of the performance difference tomorrow when in office.
Ok, I've sampled and compared the samples - in one of them one particular path is executed several X more. The data set they operate on is identical, the order they operate on is the same, so it's very unclear why there's a difference yet. (first guess would be that it was some degenerate hashing function and that the data set came in a bad order or something, but that is unfortunately not the issue)
It's reproducible even when starting the application from the Finder, and the performance difference is definitely there depending on whether it was built for profiling or just optimised).
Suffice to say, quite strange, will continue to dig for root cause (also will look at the Xcode build options and see if it's possible to compare how the builds are done, something obviously is different between these two modes!).