[Pitch] Core team publishes results of performance study: Cooperative Scheduler just introduced, plus compile for non-atomic ARC

tera · June 21, 2023, 11:35am

I took my quick & dirty chess example and used it as a benchmark, experimenting with struct vs class types and different value storage methods.

// M1 Pro, release, maxDepth=6, duration of the first move with randomisation switched off.

type    storage         time    ARC

struct  unsafe pointer  1.5s    no
class   unsafe pointer  1.9s    yes (via class)

struct  1D tuple        4.2s    no
class   1D tuple        7.1s    yes (via class)

struct  1D log tuple    4.7s    no
class   1D log tuple    4.9     yes (via class)

struct  2D tuple        9.7s    no
class   2D tuple        12.3s   yes (via class)

struct  1D array        5.1s    yes (via array)
class   1D array        2.4s    yes (via class and array)

struct  2D array        17.2s   yes (via array)
class   2D array        3.0s    yes (via class and array)

In the table: "unsafe pointer" - a liner "malloced" area used as an 1D array, "1D tuple" - a tuple of 64 cells, "2D tuple" - a tuple of 8 tuples of 8 cells, "1D / 2D array" - similar setup. "1D log tuple" same as "1D tuple", just the modified subscript operation doing a "logarithmic lookup" to drill to the relevant cell rather than relying on a built-in switch operator behavior. 2D lookup is obviously "elements[y][x]" and 1D lookups is: "elements[y*8 + x]".

There is some notably anomaly at the end of the table where struct + array version working slower than the corresponding class + array version (TBD why), but in other cases struct version outperformed class version, quite possibly due to ARC overhead (but again there could be other differences in performance when you switch from struct to class). Also notably the tuple version is not that fast.

Not the most direct benchmark though, the best would be to run the ARC triggering versions of this app in a single threaded mode and toggle the above mentioned "atomic ARC" switch.

Is it possible to somehow override retain/release globally? For diagnostic purposes only, not going to ship this to a store, just to use it locally and measure the overhead of ARC operations in a single threaded app.