Reusing existing storage for deep copy

Nevin · January 24, 2018, 9:42pm

In brief outline, I want to keep a cache of some data, update the original data, then possibly revert to the cached version. And this will happen a large number of times, so I don’t want to allocate a new array or other heap storage every time, I just want one single allocation which gets reused.

In particular, I need to make a copy of an object, mutate it in various ways, and choose one of those mutations to apply. For example, let’s call the object a state machine, and we want to try its available transitions until we find an acceptable result:

protocol StateMachine {
    associatedtype State
    associatedtype Transition
    var observableState: State { get }
    mutating func transition(_ t: Transition)
    func availableTransitions() -> [Transition]
}

class Controller<Machine: StateMachine> {
    var machine: Machine
    init(machine: Machine) { self.machine = machine }
    func evaluate(_ state: Machine.State) -> Bool { … }
    
    func attemptTransition() -> Bool {
        let backup = machine
        for t in machine.availableTransitions() {
            machine.transition(t)
            if evaluate(machine.observableState) { return true }
            machine = backup
        }
        return false
    }
}

In the attemptTransition() function, we make a single copy of the machine and revert to it as needed. Sounds good, right? (Well, as long as the StateMachine has value semantics anyway.)

Not so fast. If the StateMachine stores data in an array (or other heap-allocated storage), then as I understand it copy-on-write means a new array will be allocated every time we hit “machine.transition(t)”, which mutates said array.

Is that correct?

To avoid such extra allocations, is the best approach to give the StateMachine protocol a mutating “copy from” method? That would certainly work, and it would let reference types conform properly as well, but it seems un-Swifty—especially since many state machines don’t even use heap storage.

Karl · January 26, 2018, 1:37am

Not if the machine is the only one with a reference to its data. Standard library types avoid copying their backing storage when it isn't needed via the isKnownUniquelyReferenced function; if your own types allocate heap storage, they can use that too to guard against mutations which would break value semantics.

Consider if somebody else down the chain was still processing the previous data (and thus holding an additional reference, beyond the one in the StateMachine itself), and then the StateMachine undergoes this transition. Either a copy will need to be made, or the downstream code will have to accept that the data can shift under its feet (as it is for reference-types).

Nevin · January 26, 2018, 2:24am

The additional reference is created first by the line “let backup = machine”, and subsequently by “machine = backup” every time through the loop.

Outside interference and external references are not important here. We can just as easily write “var backup = machine” and do the transitions on backup until we find a good one, so the actual machine only transitions once at the end.

The goal is to make *exactly one* copy of the data in the original machine, which requires allocations if the machine uses heap storage, and thereafter make zero additional heap allocations for the remainder of the function.

Assuming that the machine’s transition() method modifies values stored in an array, but does not change the length of that array, this should be possible. Essentially, we want to make a “scratchpad” which *is* uniquely referenced.

At the top of the loop, machine and backup necessarily store the same data. In order to avoid a new allocation though, we want them to have *separate* storage. Whichever one is being modified inside the loop needs to have a unique reference to its heap storage.

At the bottom of the loop though, the assignment makes them share a reference, which means there will be a whole new allocation on the next pass. We can avoid this with a “copy from” method (or equivalently a “greedy assignment” operator), but I’m wondering if there is a better, Swiftier solution.

nnnnnnnn · January 26, 2018, 4:16am

I think you’re describing the situation accurately. If machine.transition() creates a new buffer, there’s no way to get the machine to keep using it when you restore it to backup.

I think your copy(from:) method is the right approach (I might call it restore(_:) instead). The Swifty part would be that you could provide a default implementation that would just do a simple assignment, and then recommend that conforming types with heap-based storage should implement it to maintain the buffer.

Nevin · January 26, 2018, 3:56pm

Thanks Nate, good idea with the default implementation.