Converting Class objects into Data to serialise as .dat file

I will be using the XOR, to store the difference in byte state, for the entire object.

This difference (resulting xor bytes) will be stored in an object and added to an undo/redo stack. Then at the users discretion they will be able to move between states. With each move I would apply the XOR data stored in the object.

A lot of the data that is being modified may be 100megs+, so I do not want to store the entire object at that specific period in time. Instead store the difference (xor data) and compress the byte data (run-length). This will greatly reduce the footprint of changes.

Thank you for the link, will take a look now :), I watched Mikes talk on memory introspection yesterday, and totally missed this blog post, Looks like it will be helpful.

1 Like

Here are a few ideas for you:

  1. do not compare two representations, track changes instead
  2. compare the two high level structures (your Node / arrays dictionaries)
  3. convert to JSON string (sorted + prettyPrinted) and diff the two strings
  4. convert to byte representation and diff the two byte representations

Note that in 3 and 4 you don't want to have a situation when you insert a single byte into the beginning (or middle) of your file and then have the whole file (or half of the file) as a "delta" you have to record: either you do some kind of journal by appending such changes to the file end or your diff mechanism is robust enough to keep the deltas small. Note that I mentioned using sortedKeys + prettyPrinted for the JSONEncoder case - those (especially the first one) will greatly help keeping the text diffs small. For 4 most likely some paging scheme would be needed.

Personally I lean towards methods 1 and 2 where you either don't compare the two versions at all, or you compare the original high level representations of your data structures. Automerge / CRDT is doing something similar.

3 Likes

Thank you for the ideas and all the help.

Will let you know what I end up with.

Hi Tera,

I have been doing some investigation and implementations.

For simple alterations to objects I have managed to implement your first suggestion.
This works well for changes like matrices and vertex positions. where is is a simple variable to store and apply.

I am currently looking into topological changes (undo/redo), which is very class/pointer centric, using the HalfEdge data structure. I started looking into codable and will continue to look into it but was wondering the impact of alterations for files that are 50megs / 100megs / 250megs, every small adjustment one would be serialising the entire object, which could be very processor intensive.

By any chance might you know of a technique that could help with this, that I could read up on?

I need to refresh my memory on this, may take some time.

In the meantime can say that: both consciously and subconsciously I lean toward designs that either don't have to compare (a lot, or at all) or somehow do those comparisons quick (read: definitely quicker than O(n)). It's very hard (perhaps impossible?) to achieve this goal once you are in the (3) or (4) land. Realistically you can get 100MB/s from an encoder (more likely 30MB/s) - so indeed that's a huge tax you don't want to pay too often.

Beyond that your specific case of HalfEdge data structure might allow for some specific quick diffing, I'd look into that closely.

A bunch of clever people here, someone might have a better idea.

1 Like

Can't you just track / record changes as part of your undo/redo mechanism?

Quick & dirty (and untested!) sketch:

enum Op {
    case insert(index: Int, value: Double)
    case remove(index: Int)
    case change(index: Int, value: Double)
    
    var name: String {
        switch self {
        case .insert(let index, let value):
            return "insert \(value) at \(index)"
        case .remove(let index):
            return "remove at \(index)"
        case .change(let index, let value):
            return "set \(value) at \(index)"
        }
    }
}

struct TheState {
    var elements: [Double] = []
    var undoStack: [Op] = []
    var redoStack: [Op] = []
    
    func reverse(of op: Op) -> Op {
        switch op {
        case let .insert(index: index, value: _):
            return .remove(index: index)
        case let .remove(index: index):
            let value = elements[index]
            return .insert(index: index, value: value)
        case let .change(index: index, value: _):
            let value = elements[index]
            return .change(index: index, value: value)
        }
    }
    
    mutating func perform(_ op: Op, isUndoRedo: Bool = false) {
        if !isUndoRedo {
            undoStack.append(reverse(of: op))
            redoStack = []
        }
        switch op {
        case .insert(let index, let value):
            elements.insert(value, at: index)
        case .remove(let index):
            elements.remove(at: index)
        case .change(let index, let value):
            elements[index] = value
        }
        draw("after \(op.name)")
    }
    
    mutating func undo() {
        guard !undoStack.isEmpty else {
            print("can't undo")
            return
        }
        let op = undoStack.removeLast()
        redoStack.append(reverse(of: op))
        print("undo: ", terminator: "")
        perform(op, isUndoRedo: true)
    }
    
    mutating func redo() {
        guard !redoStack.isEmpty else {
            print("can't redo")
            return
        }
        let op = redoStack.removeLast()
        undoStack.append(reverse(of: op))
        print("redo: ", terminator: "")
        perform(op, isUndoRedo: true)
    }
    
    func draw(_ title: String) {
        print(title, terminator: ": [")
        var i = 0
        elements.forEach { v in
            if i != 0 {
                print(", ", terminator: "")
            }
            i += 1
            print(v, terminator: "")
        }
        print("]")
    }
}

func testState() {
    var state = TheState()
    state.perform(.insert(index: 0, value: 0))  // [0.0]
    state.perform(.insert(index: 1, value: 1))  // [0.0, 1.0]
    state.perform(.insert(index: 2, value: 2))  // [0.0, 1.0, 2.0]
    state.perform(.insert(index: 3, value: 3))  // [0.0, 1.0, 2.0, 3.0]
    state.perform(.remove(index: 1))            // [0.0, 2.0, 3.0]
    state.perform(.insert(index: 0, value: 4))  // [4.0, 0.0, 2.0, 3.0]
    state.redo()                                // can't redo
    state.undo()                                // [0.0, 2.0, 3.0]
    state.redo()                                // [4.0, 0.0, 2.0, 3.0]
    state.undo()                                // [0.0, 2.0, 3.0]
    state.undo()                                // [0.0, 1.0, 2.0, 3.0]
    state.undo()                                // [0.0, 1.0, 2.0]
    state.undo()                                // [0.0, 1.0]
    state.undo()                                // [0.0]
    state.undo()                                // []
    state.undo()                                // can't undo
    print("end")                                // end
    state.perform(.insert(index: 0, value: 9))  // [9.0]
    state.undo()                                // []
    state.undo()                                // can't undo
    state.redo()                                // [9.0]
    state.undo()                                // []
    state.undo()                                // can't undo
}

testState()

Outputs:

after insert 0.0 at 0: [0.0]
after insert 1.0 at 1: [0.0, 1.0]
after insert 2.0 at 2: [0.0, 1.0, 2.0]
after insert 3.0 at 3: [0.0, 1.0, 2.0, 3.0]
after remove at 1: [0.0, 2.0, 3.0]
after insert 4.0 at 0: [4.0, 0.0, 2.0, 3.0]
can't redo
undo: after remove at 0: [0.0, 2.0, 3.0]
redo: after insert 4.0 at 0: [4.0, 0.0, 2.0, 3.0]
undo: after remove at 0: [0.0, 2.0, 3.0]
undo: after insert 1.0 at 1: [0.0, 1.0, 2.0, 3.0]
undo: after remove at 3: [0.0, 1.0, 2.0]
undo: after remove at 2: [0.0, 1.0]
undo: after remove at 1: [0.0]
undo: after remove at 0: []
can't undo
after insert 9.0 at 0: [9.0]
undo: after remove at 0: []
can't undo
redo: after insert 9.0 at 0: [9.0]
undo: after remove at 0: []
can't undo
end

Thank you for the help tera.

The way I have implemented the Half Edge (my implementation is very textbook using references ) data structure, I dont think it would work quite so easily, without heavily coupling the undo module into the half edge module.

So I am going to look at making the data for the halfedge more flat, allowing me to store all the geometric meta data in a flat array, which would then be easily converted into Data for comparison, or serialisation.

Thank you for all the help, I will post my results

The link doesn't point to your implementation (it points to an assignment) - without seeing your implementation it's hard to comment. Other thank that, good luck with your implementation!

I believe it would be possible to have the half edge module "as is" and have a wrapper around it - then it would be just a matter of changing:

halfEdge.someCommand(parameters)

to something like:

   halfEdge.someCommand(parameters)
   halfEdge.undo()
   halfEdge.redo()

Wait a second. Did I just say change? But there's no change other then ability to use undo / redo? Right! Same API. Here's how, this is the "bird view" overview of this approach.

A generic "Operation" command.

struct Op {
    let name: String
    let doStep: () -> Void
    let undo: () -> Void
    
    init(name: String, doStep: @escaping () -> Void, undo: @escaping () -> Void) {
        self.name = name
        self.doStep = doStep
        self.undo = undo
    }
    
    var reversed: Op {
        Op(name: name, doStep: undo, undo: doStep)
    }
}

Nothing scary so far, right?

A generic undo manager itself (removing some unimportant bits)

class UndoManager {
    var undoStack: [Op] = []
    var redoStack: [Op] = []

    func perform(_ op: Op, isUndoRedo: Bool = false) {
        if !isUndoRedo {
            undoStack.append(op.reversed)
            redoStack = []
        }
        op.doStep()
    }
    
    func undo() {
        let op = undoStack.removeLast()
        redoStack.append(op.reversed)
        perform(op, isUndoRedo: true)
    }
    
    func redo() {
        let op = redoStack.removeLast()
        undoStack.append(op.reversed)
        perform(op, isUndoRedo: true)
    }
}

Have you noticed any Half Edge dependency so far? Right, there was no so far. Here it is:

class HalfEdge: UndoManager {
    var state: ....

    // example command implementation
    func someOperation(parameters: ....) {
        // you may need to remember some state bits here
        perform(
            Op(name: "someCommand") {
                // change self.state accordingly
                // using your existing half engine module API
            } undo: {
                // to use the remembered bits here
                // to change self.state "back" accordingly 
                // using your existing half engine module API
            }
        )
    }
}

That's basically it. And your data structure can stay whatever it is, no need to make it linear, etc.

1 Like

Hi tera,

Thank you for the UndoManager explanation :slight_smile:

I have implemented an undo manager in nearly the same way as your example, I just use a single stack and an index for undo location in the stack.

I suppose what I was trying to avoid are the custom undo functions (in your example the state bits in someOperation) as this would result in each half edge operator having a custom undo operator ( each operator would have to handle all the special cases that could occur when editing 3D topology), which is doable, but for longevity of the project and the amount of code upkeep I was hoping to steer away from this sort of undo setup.

I may be over thinking this or thinking about it incorrectly :sweat_smile:

This is what I was hoping to try an avoid by storing the byte data for the object.

Here is a little sudo-ish code example. There is possibility of references breaking as they could be getting deleted ( was thinking about the flat data structure and using index positions ). In the sudo code below this also shows one of the simplest special cases, some are much more complicated.

func deleteEdgeOperation(mesh:HE_Mesh, edge:HE_Edge ) {
    
    // Vertices that make up each side of the edge, these would need to be stored to recreate the edge.
    // This might have to be implemented in another way, incase one of these vertices gets deleted, the reference in the undo will break.
    let vertices:Array<HE_Vertex> = [edge.half_edge!.vert!, edge.half_edge!.twin!.vert!]
    
    // Mesh to apply all the modification to
    let mesh: HE_Mesh = mesh
    
    // Special Case Data
    // If the edge being deleted is on a boundary then the polygon that touched this edge
    // will also be deleted. So we will need to track the polygons vertices, 
    // as we cannot rely on references incase a vert is deleted.
    let fill_hole_vertices:Array<simd_float3>?
    
    // As the edge will be getting recreated, we need to store the new 
    // edge reference, incase it needs to be undone.
    var temp_edge: HE_Edge? = edge
    
    // Special case
    if edge.onBoundary{
        // sudo-ish gets all the vertex positions for the face
        non_boundary_edge = edge.half_edge?.face.vertices()
    }
    
    Op(name: "Delete Edge") {
        // Deletes the edge, and resets the reference to the edge.
        mesh.deleteEdge(temp_edge)
        temp_edge = nil
    }
    undo: {
        
        // Creates a new edge between vertices. This could fail if vertices references no longer 
        // exist, as they have been recreated, Can store the vertex positions and search 
        // through all available vertices. This is not a sound solution, as it is 
        // possible for more then two vertices to exist at the same position. 
        // ( up to the artists interpretation/needs)
        let new_edge = mesh.create_edge(vertices)
        
        // Special case, has occurred, a face needs to be generated 
        if fill_hole_vertices {
            if new_edge.half_edge.face.vertices == fill_hole_vertices {
                mesh.fill_hole(new_edge.half_edge)
            }            
        }
    }
}

While writing the above code, it made me realise that references in the undo hierarchy could become a very big problem.

This is a bird eye view of the main half edge structure, and the data that will be getting modified.

class HE_Mesh {
    var Vertices:Array<HE_Vertex> = []
    var Edges:Array<HE_Edge> = []
    var Faces: Array<HE_Face> = []
    var HalfEdges: Array<HalfEdge> = []
    var Boundaries: Array<HE_Face> = []
}

class HalfEdge {
    var vert: HE_Vertex?
    var edge: HE_Edge?
    var face: HE_Face?
    var twin: HalfEdge?
    var next: HalfEdge?
}

class HE_Vertex {
    weak var half_edge: HalfEdge?
    var pos: simd_float3 = [0, 0, 0]
}

class HE_Edge {
    weak var half_edge: HalfEdge?
    
    public func onBoundary() -> Bool {
        return half_edge!.isBoundary() || half_edge!.twin!.isBoundary()
    }
}

class HE_Face {
    weak var half_edge:HalfEdge?
    var boundary:Bool = false
}

Thank again for taking a look, Times like these its great to have be able to discuss the problem

Welcome to the pain of reference semantic world! These kind of issues is why people tend to use value semantic - much fewer gotchas like that down the road.

Ok, let's stay with the reference semantic for now. And let's simplify what you are saying a bit to see if we are on the same page. Pseudo code (showing the do / undo operations):

  1. do: {new node()} undo: {...}
  2. do: { something that uses node reference } undo: { something that uses node reference }
  3. do: { delete node reference } undo: { recreate the mode reference }

when you delete the node at step 3 but then undo the operation and recreate the node - you'll have a new node reference, so if you further "undo" to step (2) and try to use that reference - that would be the wrong reference to deal with. You better use some node ID instead and lookup for node reference by that ID.

Ditto would be the case if you want your undo stack to survive app relaunch. Not many apps doing that, but when they do that's really cool. Same answer - use IDs, not references.

Would it be slow to find a node reference by ID? Searching in a linear list that's O(n) - so would be slow indeed for large lists. Naturally you may come up with some [ID: Node] dictionary to speed this lookup up and keep that dictionary up to date. Although if you do that - you'll probably start asking yourself at some point why to have this:

class CNode { // reference type node
    var id: ID
    var essentialContent
    var children: [Node]
    var parents: [Node] // make this weak somehow
}

var nodes: [ID: CNode]

and deal with the complexities of having both references and IDs (also deal with weak pointers, which for a graph like structure would be non easy) while you could use a simpler:

struct VNode { // value type node
    var essentialContent
}

var nodes: [ID: VNode]

and encode node relationships with something like this (showing only one possible approach out of dozens!):

var parents: [ID: [ID]] // if a node can have more than one parent
var parents: [ID: ID] // if a node can have just one parent
var children: [ID: [ID]]
var next: [ID: ID]
var prev: [ID: ID]

Of course you also need to keep in mind "the target user" of your data structure (Metal?), and it's peculiar preferences (e.g. it will need to see an array at the end of the day) and adjust accordingly.

I'm merely thinking out loud now, and don't want to bias you in wany way. Perhaps what you had in mind is also doable in the form and shape your envisioned. Still hope the above helps.

Thanks tera, that does help, I had not thought about storing the ids of the objects in that way, and the scenario you mention reminds me of an API I used years ago, I think I will give that a try :slight_smile:

I have setup ids already on each object and use it for comparisons and look up in some scenarios.

You are correct that all this data gets converted into triangle mesh and then then gets converted into a vertex struct buffer and vertex index struct buffer, and passed off to the GPU. I still have a lot of work with regards to some of the GPU integration, but one step at a time :)

This is how it is currently looking

1 Like

Hi Simon,
you could try a sub-library of my project, able to archive and de-archive your data in binary format. It's more low-level than Codable but it's also orders of magnitude faster.

Most system types are already implemented, and it's very easy to add implementations of others anyway.

Your code becomes:

import Foundation
import simd

class Node: BinaryIOType {
	public var position : simd_float3 = [0, 0, 0]
	public var name 	: String = "Node"
	public var vertices : Array<simd_float3> = []
	
	init() {}
	
	//	Just implement these two methods of the BinaryIOType protocol:
	func write(to writer: inout BinaryWriter) throws {
		try position.write(to: &writer)
		try name.write(to: &writer)
		try vertices.write(to: &writer)
	}
	// read each variable in the same order it was written
	required init( from reader: inout BinaryReader ) throws {
		self.position	= try simd_float3(from: &reader)
		self.name		= try String(from: &reader)
		self.vertices	= try Array<simd_float3>(from: &reader)
	}
}


/// Default instance
var newClassNode = Node()
/// Modified instances
var moved_newClassNode = Node()

moved_newClassNode.name		= "Updated_Node_Position"
moved_newClassNode.position = [1, 1, 1]
moved_newClassNode.vertices = [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4],
								[5, 5, 5], [6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9], ]

// You can even use Data here, but [UInt8] is faster:
var data_test	= try newClassNode.binaryData() as [UInt8]
var data_test2	= try moved_newClassNode.binaryData() as [UInt8]

print( "newClassNode data: \(data_test)\n" )
print( "moved_newClassNode data: \(data_test2)\n" )

// etc....

// Of course you can recreate the value from the data:

var decoded_newClassNode 		= try Node(binaryData: data_test)
var decoded_moved_newClassNode 	= try Node(binaryData: data_test2)


extension Node: Equatable {
	static func == (lhs: Node, rhs: Node) -> Bool {
		return lhs.position == rhs.position && lhs.name == rhs.name && lhs.vertices == rhs.vertices
	}
}

print("-- EQUALITY CHECK: ----------------------------")
print("\t• decoded newClassNode equality check       = \( newClassNode == decoded_newClassNode)")
print("\t• decoded moved_newClassNode equality check = \( moved_newClassNode == decoded_moved_newClassNode)")

This is the output:

newClassNode data: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 78, 111, 100, 101, 0, 0, 0, 0, 0, 0, 0, 0, 0]

moved_newClassNode data: [0, 0, 128, 63, 0, 0, 128, 63, 0, 0, 128, 63, 85, 112, 100, 97, 116, 101, 100, 95, 78, 111, 100, 101, 95, 80, 111, 115, 105, 116, 105, 111, 110, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 128, 63, 0, 0, 128, 63, 0, 0, 128, 63, 0, 0, 0, 64, 0, 0, 0, 64, 0, 0, 0, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0, 64, 64, 0, 0, 128, 64, 0, 0, 128, 64, 0, 0, 128, 64, 0, 0, 160, 64, 0, 0, 160, 64, 0, 0, 160, 64, 0, 0, 192, 64, 0, 0, 192, 64, 0, 0, 192, 64, 0, 0, 224, 64, 0, 0, 224, 64, 0, 0, 224, 64, 0, 0, 0, 65, 0, 0, 0, 65, 0, 0, 0, 65, 0, 0, 16, 65, 0, 0, 16, 65, 0, 0, 16, 65]

-- EQUALITY CHECK: ----------------------------
	• decoded newClassNode equality check       = true
	• decoded moved_newClassNode equality check = true

If you want to give it a try, go here, select the branch IdentitySupportForValueTypes, download the files contained in GraphCodable/Sources/GraphCodable/BinaryIO/ and add they to your project.

I concur with that observation. In some of my tests Codable based serialisation was the slowest, Mirror based serialisation was about an order of magnitude quicker than Codable and manual serialisation was an order of magnitude quicker than Mirror, making it two orders of magnitude faster than Codable.

1 Like

Two orders of magnitude:

•••••• LargeTest TEST ••••••••••••••••••••••••••
BinaryIO           7.75ms (σ/avg =  9.1%) Data size = 3603100
Codable (JSON)   730.87ms (σ/avg =  1.0%) Data size = 1662739
Codable (PLIST)  481.92ms (σ/avg =  1.0%) Data size = 1246961
•••••• LargeTest2 TEST ••••••••••••••••••••••••••
BinaryIO          14.41ms (σ/avg = 14.3%) Data size = 876808
Codable (JSON)   483.32ms (σ/avg =  0.5%) Data size = 1465132
Codable (PLIST)  491.60ms (σ/avg =  0.9%) Data size = 2411269

Thanks Loooop,

I will take a look into it :slight_smile:

I debugged my test and found one of the major contributor to slowness in the Codable based serialiser:

class MyEncoder: Encoder {
    ...
    func encode(_ value: Float) {
        ...
    }
    func encode<T: Encodable>(_ val: T) {
        try! val.encode(to: self)
    }
    ....

    private struct KeyedContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
        var myEncoder: MyEncoder
        
        func encode<T: Encodable>(_ value: T, forKey key: Key) {
            myEncoder.encode(value)
        }
        ...
    }
    ...
}

Stepping through the line "myEncoder.encode(value)" where the value passed was Float, the resulting "encode" method being called was not the specialised "encode(_ value: Float)" but generic "encode<T: Encodable>(_ val: T)" (why it is this way is discussed here). I put the workaround and interestingly that one change sped the serialised 10 times, so now my codable based serialised is just 10x slower than the manual serialiser.

1 Like

I found this great talk on a data structure that I think would work for my required needs, not sure if any of you have seen it.

This would be great to have in the Swift library, as an optional array type

Besides how value semantic rocks he's talking about a flavour of B-Tree (a data structure we know and love since 70's). Indeed a ready-made array-API-like structure like this could be useful to have in the standard library.

Another thing I'd recommend you to review is BSP trees.

1 Like

Thanks for the BSPTree suggestion, I have implemented a KD-tree for scene and object interaction, but have not looked into tying that into a way of temporarily storing undo structure. Something to look into

Update on my side: I did some testing using index look up table for all the components (vertex, edges, face) this allowed me to remove the referencing in my arrays, and allows for the arrays to be converted into Data types very easily. Can confirm that this setup works with the XOR Byte setup that I mentioned early on.

Going to look into the immer repo mentioned in the youtube video, looks like someone has added a Swift Package to it, so will give it a try.