We have a very weird performance observation that we'd be super happy if someone could shed any light on.
As we are toiling on with performance optimisations, we see a significant performance difference when running an optimised build in Xcode (no debugger attached) or running that same executable from Finder.
Running the exact same workload with CPU profiling enabled gives a ~60% different runtime for us - which would be understandable if not for the fact that the workload consistently runs FASTER when under measurements.
So we can see ~5 seconds runtime for the optimised binary in Xcode (or started from Finder, no difference) and ~3 seconds when running under Instruments with CPU profiling.
We'd rather not have to recommend our users to run our app under Instruments with a profile build. :-) :-)
Dunno if this is the solution you're looking for, but Xcode started gathering code coverage by default a while back and it tripped me up because Instruments doesn't. I've been diligent about turning it off whenever I want to measure things in Xcode.
Just to double-check the obvious, you do actually have optimization enabled when building the "optimized binary", right?
Yeah, we do - the data sets we test with now makes it a prerequisite to maintain sanity ;-)
Major other issue related to that is that Xcode always builds for both x86 and arm when building optimized / profiling (no way to turn that off) - so we get to wait 2x the time for builds even when we’ll never ship on x86 (as the debug builds runs too slow), so got to pick our poison there - but I digress.
That’s only in debug builds, no?
You ought to be able to turn this off by disabling support for x86 in the project, if that's really appropriate for your case.
It’s really appropriate, we don’t use any x86 hardware and all our customers will be on M3+ in production. But even if I disable it in the project settings, it will still build for both - seems really to be no way to get around it….
How are you trying to do it?
(no luck there, also tried to add x86_64 to excluded architectures, but same, still builds for both)
It affects release build performance too, at least if you're running an xctestplan that has it enabled.
Congratulations on your trust level promotion! — Fake News™
I think my first step for figuring out what's going on would be to use
sample to sample the slow version of the app, then compare the profile to the fast version in Instruments. In theory a 1.66x difference should show up pretty nicely.
Thanks David, makes sense, will try that when back in the office tomorrow - just have this feeling that it will boost and go faster if I turn on sampling (JK)
That's at the project level, what does it show at the target level?
Same (here with trying to exclude x86_64, but no diffence there, Architectures still arm64 only:
Ok, maybe I start to understand something - the resulting .app is arm64 only - it seems that the unnecessary compiling may be SPM dependencies that aren't picking up the build architectures. Unfortunately we have quite a few such dependencies (the Xcode project is a small part of the overall code base built, the majority by far is SPM).
So the follow up question - are there any way to force SPM dependencies to a single architecture?
I believe you can make swift build only a single architecture for a package from the command line, but I don't think you can configure a Package that way. @NeoNacho?
The only way we have for that is using
xcodebuild on the CLI. If you pass the excluded arches build setting as an override there, it'll apply to packages as well. In the UI, there's no facility to do this.
Aha, ouch, thanks. That’s ok for CI, but a bit painful for normal workflow - hope it’ll be possible somehow in the future - at least before x86 is deprecated in macOS :-)
So back from the digression, will try the sample to understand the root issue of the performance difference tomorrow when in office.
Ok, I've sampled and compared the samples - in one of them one particular path is executed several X more. The data set they operate on is identical, the order they operate on is the same, so it's very unclear why there's a difference yet. (first guess would be that it was some degenerate hashing function and that the data set came in a bad order or something, but that is unfortunately not the issue)
It's reproducible even when starting the application from the Finder, and the performance difference is definitely there depending on whether it was built for profiling or just optimised).
Suffice to say, quite strange, will continue to dig for root cause (also will look at the Xcode build options and see if it's possible to compare how the builds are done, something obviously is different between these two modes!).