The reality remains that almost all CoW-elimination optimizations are potentially fragile. The only way to know if they're in your code is to attempt to measure them, and the only way to be confident that your refactors don't regress them is to measure your code.
On top of that, regression testing allocations has been the single most powerful performance improvement practice that the NIO team has used. In the past I've described it as "pulling the magic go-faster lever": add a test, find the allocations, justify them all to yourself and anything that looks like it was unnecessary, make it go away, repeat. Even with the _modify
accessor present you still have to validate that you're hitting that code path.
There's a more serious fundamental limitation: the law of exclusivity.
In the first example, you're in a mutating func
, which is a single long mutating access. The law of exclusivity forbids anyone else from reading Entry
during that access, so the original value of self
can be considered dead once we've entered the case
statement.
In the second example, a
is an internal var
on a class
. The switch
is therefore not a long write access to a
but a get
followed by a set
, with some code in-between. During that time it is possible that something else in your program might observe the value of a
, because the law of exclusivity does not prevent it from happening. As a result, the original value of a
is still alive up until the set
, and so the modification of rest
must CoW.
A "sufficiently smart compiler" could perform a whole program observation and attempt to prove that no access to a
can possibly exist. This works better with lower and lower visibility: for example, making a
private
can oftem trigger the optimization to reappear. But it's much harder.
In this instance I think the lesson to learn is not that one is dumb and the other is clever, but that array.append
and a switch are not the same unless you tell Swift that the law of exclusivity applies. You'll tend to find that NIO embeds its state machine enum
s inside struct
s to help make this perform better.
Another way to get this optmization to trigger more often is to use fewer classes.