I took my quick & dirty chess example and used it as a benchmark, experimenting with struct vs class types and different value storage methods.
// M1 Pro, release, maxDepth=6, duration of the first move with randomisation switched off.
type storage time ARC
struct unsafe pointer 1.5s no
class unsafe pointer 1.9s yes (via class)
struct 1D tuple 4.2s no
class 1D tuple 7.1s yes (via class)
struct 1D log tuple 4.7s no
class 1D log tuple 4.9 yes (via class)
struct 2D tuple 9.7s no
class 2D tuple 12.3s yes (via class)
struct 1D array 5.1s yes (via array)
class 1D array 2.4s yes (via class and array)
struct 2D array 17.2s yes (via array)
class 2D array 3.0s yes (via class and array)
In the table: "unsafe pointer" - a liner "malloced" area used as an 1D array, "1D tuple" - a tuple of 64 cells, "2D tuple" - a tuple of 8 tuples of 8 cells, "1D / 2D array" - similar setup. "1D log tuple" same as "1D tuple", just the modified subscript operation doing a "logarithmic lookup" to drill to the relevant cell rather than relying on a built-in switch operator behavior. 2D lookup is obviously "elements[y][x]
" and 1D lookups is: "elements[y*8 + x]
".
There is some notably anomaly at the end of the table where struct + array version working slower than the corresponding class + array version (TBD why), but in other cases struct version outperformed class version, quite possibly due to ARC overhead (but again there could be other differences in performance when you switch from struct to class). Also notably the tuple version is not that fast.
Not the most direct benchmark though, the best would be to run the ARC triggering versions of this app in a single threaded mode and toggle the above mentioned "atomic ARC" switch.
Is it possible to somehow override retain/release globally? For diagnostic purposes only, not going to ship this to a store, just to use it locally and measure the overhead of ARC operations in a single threaded app.