I was curious about the overhead of ARC and started profiling some
benchmarks found in the Computer Language Benchmark Game (
So far, it seems that ARC sequence optimization is surprisingly good and
most benchmarks don't have to perform ARC operations as often as I
expected. I have some questions regarding this finding.
I compiled all benchmarks with "-O -wmo" flags and counted the number of
calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.
1. Reference counting is considered to have high overhead due to frequent
counting operations which also have to be atomic. At least for the
benchmarks I tested, it is not the case and there is almost no overhead.
Is it expected behavior? Or is it because the benchmarks are too simple
(they are all single-file programs)? How do you estimate the overhead of
ARC would be?
2. I also tried to compile the same benchmarks with "-Xfrontend
-assume-single-threaded" to measure the overhead of atomic operations.
Looking at the source code of this experimental pass and SIL optimizer's
statistic, the pass seems to work as expected to convert all ARC operations
in user code into nonatomic. However, even when using this flag, there are
some atomic ARC runtime called from the user code (not library). More
strangely, SIL output said all ARC operations in the user code have turned
into nonatomic. The documentation says ARC operations are never implicit
in SIL. So if there is no atomic ARC at SIL-level, I expect the user code
would never call atomic ARC runtime. Am I missing something?
3. Are there more realistic benchmarks available? Swift's official
benchmarks also seem pretty small.