Copy-on-write and M1 optimizations

There's information that the M1 makes refcounting faster than before, i.e. on the Intel platforms. People reported that some benchmarks show the improvement of specifically refcount increment/decrement operations up to 5-6 times.

The short version of my question is: is isKnownUniquelyReferenced() affected by this optimization? In other words, does CoW (:cow:) benefit from M1 or not?

The long version: I'm not knowledgeable in how Apple's M1 could make atomic increment/decrements so fast, but my hypothesis is that, because the value of the reference counter is only ever needed after the atomic decrement operation, you can have atomic operations that bypass the caches and modify the counters directly in memory. I.e. if the caches contain the wrong value, that should be totally fine since nobody's interested in the value itself before decrementing it. This, as I understand it, allows to bypass normal memory ordering and maintain the refcounters in the app without causing CPU cache shake-ups.

So now there's isKnownUniquelyReferenced() which seems to be the case that would require the actual value and therefore should issue a memory fence, i.e. be inefficient.

I may be totally wrong on any or all of the above, so I'd appreciate if someone can clarify this. Thanks!

2 Likes

isKnownUniquelyReferenced should be a read-only operation, which I would expect to be fast [when not contended by multiple threads] on both Apple's processors and x86_64. I have not, however, benchmarked it specifically.

The unusually fast (relatively speaking) operation is uncontended atomic compare and swap with acquire-release (or weaker) ordering. Swift refcounting doesn't use atomic increment/decrement, because it stores additional values in the same memory word as the refcount.

4 Likes

Additionally, isKnownUniquelyReferenced doesn't need to perform any sort of fence.

6 Likes

Any idea where I can read about how Swift's ARC works? I thought it was common knowledge that it uses atomic operations.

It uses atomic operations, they're just not as simple as atomic increment/decrement. The implementation is here.

9 Likes