Continuing the discussion from Is this the right way to do copy-on-write / use isKnownUniquelyReferenced?:

Finally, after an extra 1.5 months, I made all the new streams and new tests.

Let's move to the results. I used Xcode's testing framework on a 2014 MacBook Air. The number in parentheses is the relative standard deviation.

## Tests with the Wheel Factorization Framework

Test | `Int8` |
`UInt8` |
`Int16` |
`UInt16` |
---|---|---|---|---|

Trial Division, no caching | 181µs (78.4%) | 301µs (63.0%) | 46ms (2.96%) | 82ms (1.75%) |

Trial Division, caching in array | 161µs (47.3%) | 314µs (25.2%) | 41ms (4.66%) | 84ms (2.21%) |

Trial Division, caching in linked list | 261µs (42.9%) | 399µs (21.5%) | 65ms (3.00%) | 141ms (1.94%) |

Sieve of Eratosthenes, priority queue | 506µs (43.2%) | 1.18ms (27.9%) | 775ms (7.59%) | 2.83s (74.2%) |

Sieve of Eratosthenes, dictionary of sets | 276µs (73.8%) | 547µs (63.7%) | 103ms (2.76%) | 241ms (3.20%) |

Sieve of Eratosthenes, dictionary of linked lists | 188µs (60.8%) | 3.69µs (34.7%) | 76ms (2.47%) | 189ms (3.36%) |

Sieve of Sundaram, i/j visitation | 9.41ms (18.7%) | 364ms (1.40%) | - | - |

Sieve of Sundaram, odd multiple visitation | 3.25ms (27.4%) | 8.38ms (20.0%) | 8.35s (2.24%) | 27.2s (4.56%) |

## Tests with Custom Sequences

Test | `Int8` |
`UInt8` |
`Int16` |
`UInt16` |
---|---|---|---|---|

Trial Division, no caching | 256µs (73.5%) | 370µs (20.7%) | 57ms (3.18%) | 103ms (1.93%) |

Trial Division, caching in array | 350µs (59.0%) | 565µs (18.6%) | 61ms (3.65%) | 119ms (1.40%) |

Trial Division, caching in linked list | 441µs (29.0%) | 798µs (22.1%) | 540ms (2.66%) | 1.66s (0.426%) |

Sieve of Eratosthenes, priority queue | 2.50ms (32.7%) | 5.46ms (30.1%) | 1.45s (1.31%) | 3.43s (5.68%) |

Sieve of Eratosthenes, dictionary of sets | 638µs (34.5%) | 1.11ms (16.9%) | 90ms (1.83%) | 195ms (6.12%) |

Sieve of Eratosthenes, dictionary of linked lists | 430µs (35.7%) | 676µs (19.9%) | 41ms (3.58%) | 84ms (1.94%) |

Sieve of Sundaram, i/j visitation | 5.01ms (24.6%) | 13.6ms (12.7%) | - | - |

Sieve of Sundaram, odd multiple visitation | 2.92ms (25.5%) | 8.17ms (20.1%) | 8.48s (1.33%) | 27.7s (2.48%) |

The 16-bit tests for the regular `i`

/`j`

-method for the Sieve of Sundaram never finished. I gave up after 3+ hours. Worse, I found out it wasn't 3+ hours for a round of 10 tests, but for a *single* test! If anyone has a 2019 Mac with 64+ GiB of RAM, I could send over a copy of my playground for you to test if they ever finish. (Make sure to activate the `#if`

to `true`

.) My MBA only has 8 GiB of RAM, so I may have been thrashing virtual memory.

Before my last custom sequence, I decided to try out a custom framework. That framework exploits wheel factorization with a 2/3/5 wheel. Such a wheel reduces the numbers to check to 27% of a full check. This doesn't really help my Sieve of Sundaram much because it always goes through every product of greater-than-1-odd-integers (even to find the next product that isn't affected by the wheel).

None of the custom sequences use wheel factorization. (Not counting that the Sieve of Sundaram never goes over even numbers.)

In trial division, caching with a custom linked list was always worse than using an `Array`

. Sometimes much worse. But **not caching at all** was almost always the best, even with the extra division operations. Not caching and using an `Array`

cache were always close together.

We can see why primality sieve arguments pit the Sieve of Eratosthenes versus the Sieve of Atkin, and never include the Sieve of Sundaram. However, my judgement may be invalid since the playground tests my streaming adaptations of these sieves, while the algorithms are naturally supposed to be applied on a locked-size grid. I haven't groked the Sieve of Atkin enough to attempt a streaming version of it.

My and @lukasa's hope seems to be debunked. Although the priority queue-based version of the Sieve of Eratosthenes should have better cache locality, it's blown away by the dictionary versions. But the queue's `Array`

still has to grow every time a new prime is discovered; maybe that (occasional) reallocation & movement slows the priority-queue versions overall. Or maybe it's the constant re-heaping.

The filtering effect of wheel factorization always helped the trial division methods and the priority queue Sieve of Eratosthenes. For the dictionary versions of the sieve, wheel factorization helped the 8-bit versions but hindered the 16-bit versions! Maybe this is because the density of composites increases as we go towards positive infinity, so a fixed size wheel progressively gets less effective. I wonder if a 2/3/5/7 wheel would shift the balance. Wheel factorization almost always hindered the Sieve of Sundaram.