Is this the right way to do copy-on-write / use isKnownUniquelyReferenced?

CTMacUser · October 1, 2019, 8:50am

extension DistinctPrimeFactorizationSequence.Iterator: IteratorProtocol {

    public mutating func next() -> Set<Integer>? {
        defer { upcoming += 1 }

        // Check if we got a prime or not.
        var primeLink = taggedComposites.removeValue(forKey: upcoming)
        if primeLink == nil {
            // We did get a new prime!
            primeCallback?(upcoming)
            primeLink = Prime(prime: upcoming)
        }

        // (Otherwise,) move all the factors to their next multiples.
        var primeFactors = Set<Integer>(minimumCapacity: 2)
        while var hardPrime = primeLink {
            // Lock the results.
            let prime = hardPrime.value
            primeFactors.update(with: prime)

            // Re-do prime connections
            primeLink = primeLink?.next
            if !isKnownUniquelyReferenced(&hardPrime) {
                // Do copy-on-write
                hardPrime = Prime(prime: prime)
            }
            hardPrime.next = nil

            // Do composite connections
            let nextUpcoming = upcoming + prime
            hardPrime.next = taggedComposites[nextUpcoming]
            taggedComposites[nextUpcoming] = hardPrime
        }
        return primeFactors
    }

}

upcoming is of Integer, a generic BinaryInteger type. taggedComposites is a [Integer: Prime], where Prime is a final class with an Integer value and an Optional link to the next node, as a crappy linked list.

In the middle section of the while loop, am I doing that right? I move primeLink to the next node to ensure hardPrime is the last reference, unless I value-copied the iterator object out-of-band. Then I clone hardPrime if needed and cut off its link to its original down-node.

The last part of the while loop is right too? It should work whether to establish hardPrime as the sole Prime node for a new dictionary entry or to insert it as the head node to an existing entry.

(I originally had a [Integer: Set<Integer>], then I worried about the prime tags going across by value, which probably makes new Set-internal nodes, copies the values, then removes the old ones. Moving to a class directly lets me directly transfer nodes across dictionary entries, only creating new ones when I find a new prime number (or do a COW cloning).)

lukasa · October 2, 2019, 9:11am

Kind of? It's really hard to know.

The general (and probably cleaner) approach is for Prime to become a struct backed by a class, and to have all the CoW management within that struct. That way Prime will simply behave like a struct and exhibit CoW behaviours accordingly, and your algorithms don't have to worry about exactly how that happens. That avoids the need to carefully examine this code for correctness.

CTMacUser · October 4, 2019, 4:01am

I had something like that:

I deliberately took out the layer of struct-COW-class to minimize allocations, especially since I wouldn't be copying the outer iterator type in general.

lukasa · October 4, 2019, 8:11am

I mean, without seeing the entire code I can only provide feedback in the dark, so excuse me if this is a bad question: I'm working with what I have. But: why are you allocating more with a class-backed-struct?

You should only be allocating more if you're mutating the struct. If you're mutating the struct, you must also be mutating the class, and because the class is referential you're changing it everywhere. Is that the behaviour you are actually intending to have? It doesn't seem to be, because you're retrofitting CoW-logic back on top of it.

So perhaps I'm not seeing the whole picture, but I am a bit confused as to why you think swapping class-backed-struct for class should reduce allocations, unless you're also fundamentally changing the semantics of your program.

CTMacUser · October 5, 2019, 10:43pm

Let's iterate when the counter is 6. I remove the key-value pair for 6. It's composite, so there was actually a value. It was first created when 3 was processed as a new prime. It also has a tag for 2 after 4 was processed. Then the tags for 2 and 3 are resolved for keys 8 and 9, respectively. Both are new nodes since they're powers of a prime.

When the value was a Set, the value for a new key-value pair is an empty set, then the tagged primes are added (from the user perspective) by value to the set. This should allocate an internal Set node per prime. The nodes, AFAIK, aren't reused; new internal nodes are created for 2 and 3 when keys 8 and 9 are processed, and the ones for key 6 are destroyed.

With the custom Prime class, the node is created when the prime is first discovered. It first goes into the dictionary as the value part of a key-value pair where the key is double the new prime. When a composite is processed, each prime tag is transferred to the next multiple's key as its value. The tag has a link to the key's other tags, done a part of the linked list insertion. Primes are now allocated only once; processing from the current composite to the next multiple only juggles links around, with no allocation besides the new multiple's keys (if it didn't already exist).

lukasa · October 7, 2019, 10:58am

Ok, that makes sense. Given that Prime is a linked-list-node, and can by definition only be in one linked list at a time, would you ever actually trigger the copy on write code?

CTMacUser · October 8, 2019, 7:51am

Prime is a reference type, but DistinctPrimeFactorizationSequence.Iterator is a value type, as is the type of its taggedComposites stored property. If I do make a copy of the iterator, its dictionary is copied. When one of the copies extracts an existing composite, that composite is removed from that dictionary but not from the others (triggering copy-on-write on the dictionary's node links), which means those others indirectly still have a link to any Prime instances.

I'm not expecting anyone to copy the iterators around, but I still have to do my due diligence to make sure it's safe.

lukasa · October 8, 2019, 8:57am

In that case yes, your code above looks ok.

I suggest you benchmark the difference between that implementation and simply using [Integer: [Integer]] though. If you judiciously use reserveCapacity on your backing sets, I suspect you'll find the moderate increase in number of allocations could well be entirely outweighed by the improvement in cache locality.

(The digression: a linked list is an incredibly cache-unfriendly data structure, as the program jumps are essentially random and the odds of two nodes sharing a cache line are basically zero. This causes a lot of cache misses and CPU stalls. Moving to a linear data structure like Array, even though you will need to allocate slightly more, may well return its gains in locality. You could also try Set<Integer> to avoid needing to construct one, though Set is a bit less space-efficient and a lot less access efficient.)

CTMacUser · October 11, 2019, 7:49pm

[Integer: Set<Integer>] is what I originally had. The list of primes is only added, while composites are removed once they're counted past. Maybe it'd be easier to have an array of the primes that points to free-node composites, but those composites could somehow point back at which primes refer to them (to prevent a linear search every time a composite is iterated over).

...

Lately, I've been working on making an iterator for the Sieve of Sundaram, and right now I'm thinking about heaps and priority queues. What if we did that here? Make a min-heap priority queue on (composite: Integer, prime: Integer) tuples, where the composite member is compared.

Find 2 as prime: Heap{c4p2}
FInd 3 as prime: Heap{c4p2, c6p3}
Update 4 as composite: Heap{c6p2, c6p3}
Find 5 as prime: Heap{c6p2, c6p3, c10p5}
Update 6 as composite: Heap{c8p2, c9p3, c10p5}
Find 7 as prime: Heap{c8p2, c9p3, c10p5, c14p7}

We keep one element for each prime, when a composite is found, we pop the lead, update the composite member to the next multiple, then push the updated tuple back into the heap. The list is kept as a single (contiguous) block; only growing when a new prime is found. When processing a composite, elements are just moved around.

prime: one insert + one heap adjustment
composite: (pop-heap + update value + push-heap) * ω(composite)

And the heap adjustment methods take O(log count) time. We trade better locality for more processing time. And actually, it may not take more time if the time to allocate memory is sufficiently large! The heap method does one large allocation every few cycles instead my current code doing a bunch of little allocations (almost) every cycle.

Maybe I should switch to this.