Hi there,
tl;dr Writing to an array in a tight loop gets really slow because of memory exclusivity checks. What's the recommended way to avoid that?
I implemented a decoder/encoder for an image format called QOI.
My naive implementation just used an array of UInt8
to store information about each pixel colour (one array slot = one pixel channel value):
public class Image {
...
public var pixels : [UInt8]
...
Decoder performance was around one order of magnitude slower (10x) than the reference C implementation, running on the same machine (a MacBook Pro M1 with 8 CPU cores).
Since I didn't expect such a huge slowdown, I ran Instruments on the benchmark tool to see where the problem was.
In the profiler, I noticed most of the decoder time was dominated by calls to swift_beginAccess
and swift_endAccess
:
I found a post in the Swift blog explaining those runtime calls were added by default to release builds on Swift 5.
After I disabled the exclusivity checks (--enforce-exclusivity=none), decoder performance was significantly better and in the realms of what I expected from Swift (2x slower than the reference C implementation).
What is the recommended way of dealing with these situations?
I'm leaning towards allocating a Data
object with enough space to store the pixel information and do the writes directly via an UnsafeMutableRawPointer
. It's not as convenient as directly writing to the array of pixels, but I'm guessing it will be faster.
How to replicate the measurements taken
All the code can be found at GitHub - track-5/SwiftQOI: QOI image decoder/encoder written in Swift
Clone the repo and checkout the branch benchmark-repro
.
Create a release build with the command below:
swift build --product SwiftQOIBenchmark -c release
You can turn off memory safety by editing the Package.swift file and including adding the flags commented there.