I'm trying to optimize a CPU-intensive operation, and perplexingly, when I parallelize it, it runs slower when the work is distributed over multiple cores. The single-core implementation takes 3.1 milliseconds to perform one iteration over 16400 records, while the concurrent implementation takes 10.7 milliseconds.
Currently I'm using
DispatchQueue.concurrentPerform for parallelization, with a stride of 256. I've experimented with stride values, and it doesn't seem to have an effect.
While profiling, I can see that for the working threads, 53% of the CPU time is being spent on
swift_unownedRetainStrong, and 28% of the CPU time is being spent on
swift_release_(swift::HeapObject*). These costs are not present in the single-threaded version, so I believe this explains the difference in performance.
I am explicitly passing all referenced into the closure using
[unowned] because I want to avoid memory-management and reference counting overhead. My assumption was that this option would just skip reference counting, but this appears not to be the case.
So what I am wondering is, what exactly does
swift_unownedRetainStrong mean, and is this avoidable?