Computations, and where to save resources, are hard

CTMacUser · November 26, 2019, 2:16am

Continuing the discussion from Is this the right way to do copy-on-write / use isKnownUniquelyReferenced?:

Finally, after an extra 1.5 months, I made all the new streams and new tests.
Let's move to the results. I used Xcode's testing framework on a 2014 MacBook Air. The number in parentheses is the relative standard deviation.

Tests with the Wheel Factorization Framework

Test	`Int8`	`UInt8`	`Int16`	`UInt16`
Trial Division, no caching	181µs (78.4%)	301µs (63.0%)	46ms (2.96%)	82ms (1.75%)
Trial Division, caching in array	161µs (47.3%)	314µs (25.2%)	41ms (4.66%)	84ms (2.21%)
Trial Division, caching in linked list	261µs (42.9%)	399µs (21.5%)	65ms (3.00%)	141ms (1.94%)
Sieve of Eratosthenes, priority queue	506µs (43.2%)	1.18ms (27.9%)	775ms (7.59%)	2.83s (74.2%)
Sieve of Eratosthenes, dictionary of sets	276µs (73.8%)	547µs (63.7%)	103ms (2.76%)	241ms (3.20%)
Sieve of Eratosthenes, dictionary of linked lists	188µs (60.8%)	3.69µs (34.7%)	76ms (2.47%)	189ms (3.36%)
Sieve of Sundaram, i/j visitation	9.41ms (18.7%)	364ms (1.40%)	-	-
Sieve of Sundaram, odd multiple visitation	3.25ms (27.4%)	8.38ms (20.0%)	8.35s (2.24%)	27.2s (4.56%)

Tests with Custom Sequences

Test	`Int8`	`UInt8`	`Int16`	`UInt16`
Trial Division, no caching	256µs (73.5%)	370µs (20.7%)	57ms (3.18%)	103ms (1.93%)
Trial Division, caching in array	350µs (59.0%)	565µs (18.6%)	61ms (3.65%)	119ms (1.40%)
Trial Division, caching in linked list	441µs (29.0%)	798µs (22.1%)	540ms (2.66%)	1.66s (0.426%)
Sieve of Eratosthenes, priority queue	2.50ms (32.7%)	5.46ms (30.1%)	1.45s (1.31%)	3.43s (5.68%)
Sieve of Eratosthenes, dictionary of sets	638µs (34.5%)	1.11ms (16.9%)	90ms (1.83%)	195ms (6.12%)
Sieve of Eratosthenes, dictionary of linked lists	430µs (35.7%)	676µs (19.9%)	41ms (3.58%)	84ms (1.94%)
Sieve of Sundaram, i/j visitation	5.01ms (24.6%)	13.6ms (12.7%)	-	-
Sieve of Sundaram, odd multiple visitation	2.92ms (25.5%)	8.17ms (20.1%)	8.48s (1.33%)	27.7s (2.48%)

The 16-bit tests for the regular i/j-method for the Sieve of Sundaram never finished. I gave up after 3+ hours. Worse, I found out it wasn't 3+ hours for a round of 10 tests, but for a single test! If anyone has a 2019 Mac with 64+ GiB of RAM, I could send over a copy of my playground for you to test if they ever finish. (Make sure to activate the #if to true.) My MBA only has 8 GiB of RAM, so I may have been thrashing virtual memory.

Before my last custom sequence, I decided to try out a custom framework. That framework exploits wheel factorization with a 2/3/5 wheel. Such a wheel reduces the numbers to check to 27% of a full check. This doesn't really help my Sieve of Sundaram much because it always goes through every product of greater-than-1-odd-integers (even to find the next product that isn't affected by the wheel).

None of the custom sequences use wheel factorization. (Not counting that the Sieve of Sundaram never goes over even numbers.)

In trial division, caching with a custom linked list was always worse than using an Array. Sometimes much worse. But not caching at all was almost always the best, even with the extra division operations. Not caching and using an Array cache were always close together.

We can see why primality sieve arguments pit the Sieve of Eratosthenes versus the Sieve of Atkin, and never include the Sieve of Sundaram. However, my judgement may be invalid since the playground tests my streaming adaptations of these sieves, while the algorithms are naturally supposed to be applied on a locked-size grid. I haven't groked the Sieve of Atkin enough to attempt a streaming version of it.

My and @lukasa's hope seems to be debunked. Although the priority queue-based version of the Sieve of Eratosthenes should have better cache locality, it's blown away by the dictionary versions. But the queue's Array still has to grow every time a new prime is discovered; maybe that (occasional) reallocation & movement slows the priority-queue versions overall. Or maybe it's the constant re-heaping.

The filtering effect of wheel factorization always helped the trial division methods and the priority queue Sieve of Eratosthenes. For the dictionary versions of the sieve, wheel factorization helped the 8-bit versions but hindered the 16-bit versions! Maybe this is because the density of composites increases as we go towards positive infinity, so a fixed size wheel progressively gets less effective. I wonder if a 2/3/5/7 wheel would shift the balance. Wheel factorization almost always hindered the Sieve of Sundaram.

CTMacUser · November 26, 2019, 3:12am

Here's the test class.

/// Test class for the prime-sequence primality checker types.
public final class PrimeSequnceTimingTests<Integer: PrimePrefixSampleProvider & FixedWidthInteger>: XCTestCase {

    /// Returns the odd number the argument is the generator for.
    ///
    /// (Technically, this can work for any `FixedWidthInteger`.)
    static func oddNumber(for x: Integer) -> Integer? {
        guard let one = Integer(exactly: 1), let two = Integer(exactly: 2) else { return nil }

        let (product, multiplyfailed) = x.multipliedReportingOverflow(by: two)
        guard !multiplyfailed else { return nil }

        let (sum, addfailed) = product.addingReportingOverflow(one)
        guard !addfailed else { return nil }

        return sum
    }

    /// Determines if the argument is prime via repeated trial division.
    ///
    /// (Technically, this can work for any `BinaryInteger`.)
    static func isPrime(_ x: Integer) -> Bool {
        guard x > 1 else { return false }

        var n: Integer = 2
        while n < x {
            let (q, r) = x.quotientAndRemainder(dividingBy: n)
            guard r != 0 else { return false }  // Got composite
            guard q > n else { break }  // Reached square root of self

            n += 1
        }
        return true
    }

    /// Test using wheel factorization by otherwise exhaustive trial division.
    func test01aUncachedTrialDivsionTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: TrialDivisionPrimeChecker.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with exhaustive trial division.
    func test01bUncachedTrialDivsionTests() {
        let primes = LimitedArithmeticProgression<Integer>(from: 2, by: +1)!.filter(Self.isPrime)
        measure {
            XCTAssertTrue( primes.starts(with: Integer.primePrefix) )
        }
    }

    /// Test using wheel factorization by trial division while caching the
    /// post-wheel primes in an array.
    func test02aArrayCachedTrialDivisonTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: ArrayCachingTrialDivisionPrimeChecker.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with trial divsion while caching discovered primes in an array.
    func test02bArrayCachedTrialDivisonTests() {
        let primes = LimitedPrimalityViaTrialDivisionSequence<Integer>().filter { $0.1 }.map { $0.0 }
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test using wheel factorization by trial division while caching the
    /// post-wheel primes in a linked list.
    func test03aListCachedTrialDivsionTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: ListCachingTrialDivisionPrimeChecker.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with trial division while caching discovered primes in a linked
    /// list.
    func test03bListCachedTrialDivsionTests() {
        let primes = LimitedPrimalityViaTrialDivisionNodesSequence<Integer>().filter { $0.1 }.map { $0.0 }
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test using wheel factorization with the Sieve of Eratosthenes while
    /// tracking prime multiples with an array-based priority queue.
    func test04aPriorityQueueEratosthenesSieveTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: LimitedPriorityQueueEratosthenesSieve.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with the Sieve of Eratosthenes while tracking prime multiples with
    /// an array-based priority queue.
    func test04bPriorityQueueEratosthenesSieveTests() {
        let primes = LimitedPrimalitySequence<Integer>().filter { $0.1 }.map { $0.0 }
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test using wheel factorization with the Sieve of Eratosthenes while
    /// tracking values with a dicitionary from composites to a set of their
    /// prime factors.
    func test05aSetDictionaryEratosthenesSieveTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: LimitedSetDictionaryEratosthenesSieve.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with the Sieve of Eratosthenes while tracking values with a
    /// dictionary from composites to a set of their prime factors.
    func test05bSetDictionaryEratosthenesSieveTests() {
        let primes = LimitedPrimalityViaSetsSequence<Integer>().filter { $0.1 }.map { $0.0 }
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test using wheel factorization with the Sieve of Eratosthenes while
    /// tracking values with a dictionary from composites to a linked list of
    /// their (minimally allocated) prime factors.
    func test06aListDictionaryEratosthenesSieveTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: LimitedListDictionaryEratosthenesSieve.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with the Sieve of Eratosthenes while tracking values with a
    /// dictionary from composites to a linked list of their (minimally
    /// allocated) prime factors.
    func test06bListDictionaryEratosthenesSieveTests() {
        let primes = LimitedPrimalityViaNodesSequence<Integer>().filter { $0.1 }.map { $0.0 }
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    #if false
    /// Test using wheel factorization with the Sieve of Sundaram while tracking
    /// the progression of `(i, j)` pairs with a priority queue.
    func test07aPriorityQueueSundaramSieveTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: LimitedPriorityQueueSundaramSieve.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with the Sieve of Sundaram while tracking the progression of
    /// `(i, j)` pairs with a priority queue.
    func test07bPriorityQueueSundaramSieveTests() {
        let one = Integer(1), two = Integer(2)
        let firstElement = CollectionOfOne(two)
        let secondElementFirstSource = LimitedArithmeticProgression(from: one, by: +1)!
        let secondElementSecondSource = LimitedOddCompositesHalvedFlooredSequence<Integer>()
        let secondElementMergedSource = secondElementFirstSource.mergeSortedAsSets(with: secondElementSecondSource, keeping: .exclusivesFromFirst)
        let secondElements = secondElementMergedSource.compactMap(Self.oddNumber(for:))
        let primes = firstElement.cojoin(with: secondElements)
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }
    #endif

    /// Test using wheel factorization with the Sieve of Sundaram while tracking
    /// odd multiples directly with a priority queue.
    func test07cPriorityQueueSundaramSieveTests() {
        let primes = Integer.primeSequence(withPrimeBasisCount: 3, checkingWith: LimitedPriorityQueueSundaramSieveToo.self)!
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

    /// Test with the Sieve of Sundaram while tracking odd multiples directly
    /// with a priority queue.
    func test07dPriorityQueueSundaramSieveTests() {
        let two = CollectionOfOne<Integer>(2)
        let odds = LimitedArithmeticProgression<Integer>(from: 3, by: +2)!
        let oddComposites = LimitedOddCompositeSequence<Integer>()
        let oddPrimes = odds.mergeSortedAsSets(with: oddComposites, keeping: .exclusivesFromFirst)
        let primes = two.cojoin(with: oddPrimes)
        measure {
            XCTAssertTrue(primes.starts(with: Integer.primePrefix))
        }
    }

}

The PrimePrefixSampleProvider protocol lets me test against all the primes:

/// Vends the first few primes to test prime-generating sequences.
public protocol PrimePrefixSampleProvider: BinaryInteger {

    /// Provides the first few primes.
    static var primePrefix: [Self] { get }

}

I made complete lists for Int8, UInt8, Int16 and UInt16 (from downloading lists from the Internet). Since it's a prefix, we don't need an exhaustive list for 32 bits and above.

The "b" (and one "d") tests use custom sequences to generate prime numbers, either directly or after some sequence operations. Towards the end, I wondered if a specialized custom protocol and sequence would work better; those are run by the "a" (and one "c") tests.

/// A type that performs incremental primality tests.
///
/// If a conforming checker doesn't unconditionally succeed in its initializer,
/// it should document what values of a basis make it fail.
///
/// Due to the timing on how comformers to this type are supposed to interact
/// with `LimitedPrimeSequence`, conforming instances shouldn't be used except
/// through `LimitedPrimeSequence` and any user-created work-alikes.
public protocol WheeledPrimeChecker {

    /// The type of integers tested by the checker.
    associatedtype Integer: BinaryInteger

    /// Creates a checker over the positive integers coprime to the integers
    /// in the given sequence.
    ///
    /// - Precondition: If not empty, `basis` stores the first few primes,
    ///   starting from 2, without gaps.
    ///
    /// - Parameter basis: A list of the first few primes.  May be empty.
    /// - Postcondition: The checker may read the `basis` and cache it or any
    ///   other information from it that may help during calls to `isNextPrime`.
    ///   If `basis` has an unacceptable value, emit `nil`.
    init?<S: Sequence>(basis: S) where S.Element == Integer

    /// Returns whether the given value is prime, while allowing the check to
    /// mutate internal state.
    ///
    /// - Precondition:
    ///    - `coprime > 1`.
    ///    - Subsequent submitted values are strictly increasing.
    ///    - `coprime` is coprime to all the integers in the sequence submitted
    ///      to the initializer.
    ///
    /// - Parameter coprime: The value to test for primality.
    /// - Returns: `true` if `coprime` is prime; otherwise, `false`.
    /// - Postcondition: This checker may adjust its internal state to record
    ///   `coprime` and/or facilitate future calls to this method.
    mutating func isNextPrime(_ coprime: Integer) -> Bool

}

/// A sequence of the first few primes.
public struct LimitedPrimeSequence<Checker: WheeledPrimeChecker>: LazySequenceProtocol where Checker.Integer: FixedWidthInteger { /*...*/ }

extension FixedWidthInteger {
    /// Returns a sequence of all the primes that fit into this type, aided by a
    /// factorization wheel of the given length and a primality checker of the
    /// given type.
    ///
    /// If the method can determine if the computed factorization wheel basis is
    /// long enough that all the coprimes are prime (determined when the largest
    /// base is greater than the square root of `Self.max`), use of a primality
    /// checker is skipped.
    ///
    /// - Precondition:
    ///    - `count >= 0`.
    ///    - `count` isn't greater than the number of primes that fit within
    ///       the bounds of `Self.max`.
    ///
    /// - Parameter count: The number of initial primes used in the basis.
    /// - Parameter type: A metatype specifier for the predicate that tests if
    ///   given strictly-increasing values are prime.
    /// - Returns: A sequence of all the prime supported by this type, starting
    ///   with those generated for the basis, then all of the integers coprime
    ///   to the basis's elements, after filtering out the composite values.
    ///
    /// - Complexity: O(*n*!), where *n* is the last prime in the generated
    ///   basis (or 1 when said basis is empty).  However, the calculations will
    ///   most likely finish early since factorials quickly surpass
    ///   `Self.max`.
    public static func primeSequence<C: WheeledPrimeChecker>(withPrimeBasisCount count: Int, checkingWith type: C.Type) -> LimitedPrimeSequence<C>? where C.Integer == Self
}

Factorization wheels, which are used by the framework, are generated incrementally:

/// A sequence of factorization wheels for the first few primes.
public struct LimitedPrimeFactorizationWheelSequence<Integer: FixedWidthInteger>: LazySequenceProtocol { /*...*/ }

// The element type of the above is LimitedWheelFactorizedCoprimeSequence below.

Where a wheel is expressed as an arithmetic progression that filters out all the numbers that are multiples of at least one basis prime:

/// A sequence of the integers filtered through a prime factorization wheel.
public struct LimitedWheelFactorizedCoprimeSequence<Integer: FixedWidthInteger> { /*...*/ }

There are methods to return a wheel based on the desired length, or an optimized one.

extension FixedWidthInteger {
    /// Returns a factorization wheel of the given length, along with that
    /// length of initial primes used as the wheel's basis.
    ///
    /// If the returned basis is not empty, and its last element is greater than
    /// the square root of `Self.max`, then all of the returned wheel's elements
    /// after the first are prime.
    ///
    /// - Precondition:
    ///    - `length >= 0`.
    ///    - `length` isn't greater than the number of primes that fit within
    ///       the bounds of `Self.max`.
    ///
    /// - Parameter length: The number of initial primes used in the basis.  The
    ///   wheel's circumference will be the product of those primes, and its
    ///   (initial) spokes will be the integers less than the circumference that
    ///   are coprime to all of the primes in the basis.
    /// - Returns: A tuple whose first member is an array of the first `length`
    ///   primes, and whose second member is a sequence the vends every positive
    ///   integer that is coprime to the values in the basis array.  This method
    ///   will return `nil` instead if `length` is out-of-range.
    public static func primeFactorizationWheel(ofLength length: Int) -> (basis: [Self], wheel: LimitedWheelFactorizedCoprimeSequence<Self>)?

    /// Returns a factorization wheel sized for this integer type, along with
    /// the basis for said wheel.
    ///
    /// Due to diminishing returns, the 2/3/5/7 wheel is the highest one that
    /// can be returned.
    ///
    /// - Returns: A tuple whose first member is the first few primes, up to 7,
    ///   but not exceeding what `Self` can support.  The second member is a
    ///   sequence that vends every positive integer that is coprime to the
    ///   values in the basis array.
    public static func bestPrimeFactorizationWheel() -> (basis: [Self], wheel: LimitedWheelFactorizedCoprimeSequence<Self>)
}

The LimitedArithmeticProgression generic type is a lazy Sequence (and Collection) that takes the given FixedWidthInteger starting value and a Stride between each new value until .min or .max is reached (or would be surpassed).

The LimitedPrimalityViaTrialDivisionSequence and LimitedPrimalityViaTrialDivisionNodesSequence generic types are the custom sequences to test primality via trial division, but cache any discovered primes with an Array or custom linked list type, respectively.

The LimitedPrimalitySequence, LimitedPrimalityViaSetsSequence, and LimitedPrimalityViaNodesSequence generic types are custom sequences to test primality with a custom streaming variant of the Sieve of Eratosthenes. The first version uses a custom priority queue type, storing nodes containing a pair of a composite value and one of its prime factors. The queue is a minimum heap on the composite member. The other two sequences store a dictionary where the key is a composite integer and the value is a container of the composite's prime factors up to its square root. One design stores the primes in a Set, the other in a custom linked list type. An advantage of the linked list is that once a node for a prime integer is made, it's simply passed up to the next multiple; the Set design has to recreate nodes for each prime every time one of its multiples is processed.

The LimitedOddCompositesHalvedFlooredSequence generic type stores a priority queue on the i and j values for each combination of 1 <= i <= j. The queue is a minimum heap on the generator, which is i + j + 2 * i * j. The first node, with values i = 1, j = 1, and cached = 4 is preloaded. Every time a node's cached value is vended, we make new nodes for i -> i + 1 and j -> j + 1, so the total nodes stored goes up by 1 almost every time.

There are types to provide a set-subtraction (or other set combinations) or join two arbitrary sequences sharing an Element type.

The LimitedOddCompositeSequence generic type stores a priority queue to directly vend each odd composite integer in order. The nodes are arithmetic progressions for each odd number greater than 1, each inner sequence vending its target odd multiples starting with the target number times 3. Upon each vending, the currently smallest of all the first elements of each queued progression is returned, then that progression is updated to pop the just returned element.

CTMacUser · November 26, 2019, 7:12am

As soon as I wrote that last sentence, I wondered if that could be fixed. Whenever I add to the priority queue of the wheeled Sieve of Sundaram, that element is an arithmetic progression of the (odd) multiples of next odd number. If said odd number is divisible by one of the basis primes, then that odd number and all of its multiples will never show up as an argument (since the wheel already filtered them out). So why bother adding that progression to the priority queue? I altered the wheeled SoS that uses arithmetic progressions to skip pushing non-coprime progressions to the queue. The results:

Test	`Int8`	`UInt8`	`Int16`	`UInt16`
Before	3.31ms (33.3%)	8.20ms (16.4%)	8.47s (0.657%)	28.8s (5.43%)
After	2.40ms (49.8%)	4.89ms (18.9%)	3.77s (5.29%)	10.1s (4.72%)

It looks like it trims the time down by a third to a half.

This trick removes progressions where the multiplicand is not co-prime to a basis. Every odd multiple of the surviving odd numbers still is processed, even if the multiplier is not co-prime to the basis. Trimming there would require comparing the wheel co-primes to each progression element's index.