Converting Class objects into Data to serialise as .dat file

Hi everyone,

I am trying to convert a class object into Data and then serialise it into a dat file, and then be able to convert the .dat file back into data and then back into an object to use.

I have been working with Metal and alot of the GRPU buffer implementation, but was wondering if anyone has some examples of how this could be done in Swift, and not have to use the Metal module.

One of my reasons for this is also comparing byte data of two instances and recording the byte changes. utilising this in an undo/redo mechanism.

Thank you for the help
Simon

2 Likes

I did some investigating into Data, and started to get the results I was looking for but was wondering

  • Does Swift have a function that can retrieve all the byte data for a class instance?

As my current investigation found that the byte data that the pointers are pointing to are not included in my xor.

Below is the Playground code I used for my investigation.

The final result I would like to achieve is to be able to XOR all the byte data of two instances.

import Foundation
import simd

class Node {
    public var position : simd_float3 = [0, 0, 0]
    public var name : String = "Node"
    public var vertices : Array<simd_float3> = []
}

/// Default instance
var newClassNode = Node()
/// Modified instances
var moved_newClassNode = Node()

moved_newClassNode.name = "Updated_Node_Position"
moved_newClassNode.position = [1, 1, 1]
moved_newClassNode.vertices = [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], 
                                [5, 5, 5], [6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9], ]

/// Size of the MemoryLayout class
let mem_size = MemoryLayout<Node>.size

var data_test = Data(bytes: &newClassNode, count: mem_size)
var data_test2 = Data(bytes: &moved_newClassNode, count: mem_size)

let d_ints = data_test.map { UInt8($0) }
let d2_ints = data_test2.map { UInt8($0) }

/// The XOR byte data.
var xor_bytes : Array<UInt8> = []

/// The byte data that has had the xor bytes applied.
var applied_xor : Array<UInt8> = []

// XOR comparison

/// The following is performing the byte XOR comparison for each byte found in the Data object.
for i in 0..<mem_size {
    xor_bytes.append(d_ints[i] ^ d2_ints[i])
}

// Applies the XOR bytes, and stores the result
for i in 0..<mem_size {
    applied_xor.append(d_ints[i] ^ xor_bytes[i])
}

/// Assigns the byte data to a Data variable
var xor_data = Data(applied_xor)

/// Result of the applied xor data cast back into the original object type.
let appliedNode = xor_data.withUnsafeBytes { $0.load(as: Node.self) }

// Updating a variable after the XOR should not appear in the final result, unless its using pointers and not fully capturing all the bytedata
moved_newClassNode.vertices[1] = [101,101,101]

/// == LOGGING ==

memory_debug(data_test)
memory_debug(data_test2)

print("\n-- XOR Numeric --")
for i in 0..<mem_size {
    print("\(d_ints[i]) ^ \(d2_ints[i]) = \(xor_bytes[i])")
}

print("\n-- BINARY representation --")
for i in 0..<mem_size {
    print("\( pad(string:String(d_ints[i], radix:2), toSize:8) ) ^ \( pad(string:String(d2_ints[i], radix:2), toSize:8 )) = \( pad(string:String(xor_bytes[i], radix:2), toSize:8))")
}

print("\n-- xor applied --")
for i in 0..<mem_size {
    print("\(d_ints[i]) ^ \(xor_bytes[i]) = \(applied_xor[i])")
}

print("\n -- Converted applied xor byte data to object --")
print(appliedNode.name)
print(appliedNode.position)
print(appliedNode.vertices)


/* ============
 *   Functions
 * ============
 */

func memory_debug(_ input_data:Data) {
    print("\n-- Data_Test Debug --")
    print(input_data)
    print(input_data as NSData)
    print("count : \(input_data.count)")
    let hex_string = input_data.map { String(format: "%02x", $0) }.joined() 
    print(hex_string) // hex value
    
    //print(String(Int(hex_string, radix:16)!, radix:2)) // hex to binary value
    
    print(input_data.map { String($0, radix:2) } )
}

func pad(string : String, toSize: Int) -> String {
    var padded = string
    for _ in 0..<(toSize - string.count) {
        padded = "0" + padded
    }
    return padded
}

Don't do that. Here you are taking the bytes that represent the class reference itself – 8 bytes on 64-bit CPU's. You'd need to drill into the class contents itself, but there would be other complications (the class instance header, variables like String or Array that point to other objects). I'd answer these first:

  • can you use Codable?
  • can you use value types instead of reference types?
  • do you need Array in there specifically, or can you simplify it (say by using a tuple of a fixed size).
3 Likes

The underlying bytes of a class instance are really not guaranteed to be stable across builds of your project, so I would highly highly recommend doing an explicit serialization/deserialization step.

Doing otherwise opens your project up to severe security vulnerabilities where you would have mismatched expectations when deserializing instances of your class that were created by an older version of your project.

2 Likes

Thank you for the quick response.

  • Yes I can use codable. ( I have in the proper application. It works great for serialising my scenes data, so it can be loaded again in another session.)

  • I have used reference types as there is a lot of data manipulation occurring behind the scenes. (This class is minimalist view from a 3D geometry app I am writing)

  • Arrays are needed as the data will be getting modified continuously during a session.

Does codable have a method to output byte data?

Will this work for you?

class Node: Codable {
    public var position : simd_float3 = [0, 0, 0]
    public var name : String = "Node"
    public var vertices : Array<simd_float3> = []
}

let nodes = [Node()]
let data = try! JSONEncoder().encode(nodes)
print("encoded data size \(data.count)")
let newNodes = try! JSONDecoder().decode([Node].self, from: data)
print(newNodes)

Instead of JSONEncoder / Decoder you may use some other encoder.

Thanks harlanhaskins,

What you mentioned regarding writing out data files is good to know.

In the final application I have used Codable to write out my scene data, and have a versioned schema to make sure I don't run into the error you mentioned.

Codable may work as long as there is an encoder that encodes the data into Byte information, and then I can use that Byte information to perform an XOR.

I do wonder if that would increase memory usage, by doubling up on the data that is being read and then encoded.

I was originally going the route of accessing the bytes / pointers as I wanted to read the object in place, and not double up on any data.

Do you think that Codable will suite these needs?

What is the reason for XOR? Note that if you XOR the bytes representing a float number you'd get some bogus coordinates (most likely NAN). Would you XOR the string content as well? And if there's an enumeration variable - XORing the underlying bytes would make it totally invalid. It feels like this XOR (if needed) should be applied at a higher level rather on the raw bytes level.

You may find this interesting if you want to go the binary encoding route.

1 Like

I will be using the XOR, to store the difference in byte state, for the entire object.

This difference (resulting xor bytes) will be stored in an object and added to an undo/redo stack. Then at the users discretion they will be able to move between states. With each move I would apply the XOR data stored in the object.

A lot of the data that is being modified may be 100megs+, so I do not want to store the entire object at that specific period in time. Instead store the difference (xor data) and compress the byte data (run-length). This will greatly reduce the footprint of changes.

Thank you for the link, will take a look now :), I watched Mikes talk on memory introspection yesterday, and totally missed this blog post, Looks like it will be helpful.

1 Like

Here are a few ideas for you:

  1. do not compare two representations, track changes instead
  2. compare the two high level structures (your Node / arrays dictionaries)
  3. convert to JSON string (sorted + prettyPrinted) and diff the two strings
  4. convert to byte representation and diff the two byte representations

Note that in 3 and 4 you don't want to have a situation when you insert a single byte into the beginning (or middle) of your file and then have the whole file (or half of the file) as a "delta" you have to record: either you do some kind of journal by appending such changes to the file end or your diff mechanism is robust enough to keep the deltas small. Note that I mentioned using sortedKeys + prettyPrinted for the JSONEncoder case - those (especially the first one) will greatly help keeping the text diffs small. For 4 most likely some paging scheme would be needed.

Personally I lean towards methods 1 and 2 where you either don't compare the two versions at all, or you compare the original high level representations of your data structures. Automerge / CRDT is doing something similar.

3 Likes

Thank you for the ideas and all the help.

Will let you know what I end up with.

Hi Tera,

I have been doing some investigation and implementations.

For simple alterations to objects I have managed to implement your first suggestion.
This works well for changes like matrices and vertex positions. where is is a simple variable to store and apply.

I am currently looking into topological changes (undo/redo), which is very class/pointer centric, using the HalfEdge data structure. I started looking into codable and will continue to look into it but was wondering the impact of alterations for files that are 50megs / 100megs / 250megs, every small adjustment one would be serialising the entire object, which could be very processor intensive.

By any chance might you know of a technique that could help with this, that I could read up on?

I need to refresh my memory on this, may take some time.

In the meantime can say that: both consciously and subconsciously I lean toward designs that either don't have to compare (a lot, or at all) or somehow do those comparisons quick (read: definitely quicker than O(n)). It's very hard (perhaps impossible?) to achieve this goal once you are in the (3) or (4) land. Realistically you can get 100MB/s from an encoder (more likely 30MB/s) - so indeed that's a huge tax you don't want to pay too often.

Beyond that your specific case of HalfEdge data structure might allow for some specific quick diffing, I'd look into that closely.

A bunch of clever people here, someone might have a better idea.

1 Like

Can't you just track / record changes as part of your undo/redo mechanism?

Quick & dirty (and untested!) sketch:

enum Op {
    case insert(index: Int, value: Double)
    case remove(index: Int)
    case change(index: Int, value: Double)
    
    var name: String {
        switch self {
        case .insert(let index, let value):
            return "insert \(value) at \(index)"
        case .remove(let index):
            return "remove at \(index)"
        case .change(let index, let value):
            return "set \(value) at \(index)"
        }
    }
}

struct TheState {
    var elements: [Double] = []
    var undoStack: [Op] = []
    var redoStack: [Op] = []
    
    func reverse(of op: Op) -> Op {
        switch op {
        case let .insert(index: index, value: _):
            return .remove(index: index)
        case let .remove(index: index):
            let value = elements[index]
            return .insert(index: index, value: value)
        case let .change(index: index, value: _):
            let value = elements[index]
            return .change(index: index, value: value)
        }
    }
    
    mutating func perform(_ op: Op, isUndoRedo: Bool = false) {
        if !isUndoRedo {
            undoStack.append(reverse(of: op))
            redoStack = []
        }
        switch op {
        case .insert(let index, let value):
            elements.insert(value, at: index)
        case .remove(let index):
            elements.remove(at: index)
        case .change(let index, let value):
            elements[index] = value
        }
        draw("after \(op.name)")
    }
    
    mutating func undo() {
        guard !undoStack.isEmpty else {
            print("can't undo")
            return
        }
        let op = undoStack.removeLast()
        redoStack.append(reverse(of: op))
        print("undo: ", terminator: "")
        perform(op, isUndoRedo: true)
    }
    
    mutating func redo() {
        guard !redoStack.isEmpty else {
            print("can't redo")
            return
        }
        let op = redoStack.removeLast()
        undoStack.append(reverse(of: op))
        print("redo: ", terminator: "")
        perform(op, isUndoRedo: true)
    }
    
    func draw(_ title: String) {
        print(title, terminator: ": [")
        var i = 0
        elements.forEach { v in
            if i != 0 {
                print(", ", terminator: "")
            }
            i += 1
            print(v, terminator: "")
        }
        print("]")
    }
}

func testState() {
    var state = TheState()
    state.perform(.insert(index: 0, value: 0))  // [0.0]
    state.perform(.insert(index: 1, value: 1))  // [0.0, 1.0]
    state.perform(.insert(index: 2, value: 2))  // [0.0, 1.0, 2.0]
    state.perform(.insert(index: 3, value: 3))  // [0.0, 1.0, 2.0, 3.0]
    state.perform(.remove(index: 1))            // [0.0, 2.0, 3.0]
    state.perform(.insert(index: 0, value: 4))  // [4.0, 0.0, 2.0, 3.0]
    state.redo()                                // can't redo
    state.undo()                                // [0.0, 2.0, 3.0]
    state.redo()                                // [4.0, 0.0, 2.0, 3.0]
    state.undo()                                // [0.0, 2.0, 3.0]
    state.undo()                                // [0.0, 1.0, 2.0, 3.0]
    state.undo()                                // [0.0, 1.0, 2.0]
    state.undo()                                // [0.0, 1.0]
    state.undo()                                // [0.0]
    state.undo()                                // []
    state.undo()                                // can't undo
    print("end")                                // end
    state.perform(.insert(index: 0, value: 9))  // [9.0]
    state.undo()                                // []
    state.undo()                                // can't undo
    state.redo()                                // [9.0]
    state.undo()                                // []
    state.undo()                                // can't undo
}

testState()

Outputs:

after insert 0.0 at 0: [0.0]
after insert 1.0 at 1: [0.0, 1.0]
after insert 2.0 at 2: [0.0, 1.0, 2.0]
after insert 3.0 at 3: [0.0, 1.0, 2.0, 3.0]
after remove at 1: [0.0, 2.0, 3.0]
after insert 4.0 at 0: [4.0, 0.0, 2.0, 3.0]
can't redo
undo: after remove at 0: [0.0, 2.0, 3.0]
redo: after insert 4.0 at 0: [4.0, 0.0, 2.0, 3.0]
undo: after remove at 0: [0.0, 2.0, 3.0]
undo: after insert 1.0 at 1: [0.0, 1.0, 2.0, 3.0]
undo: after remove at 3: [0.0, 1.0, 2.0]
undo: after remove at 2: [0.0, 1.0]
undo: after remove at 1: [0.0]
undo: after remove at 0: []
can't undo
after insert 9.0 at 0: [9.0]
undo: after remove at 0: []
can't undo
redo: after insert 9.0 at 0: [9.0]
undo: after remove at 0: []
can't undo
end

Thank you for the help tera.

The way I have implemented the Half Edge (my implementation is very textbook using references ) data structure, I dont think it would work quite so easily, without heavily coupling the undo module into the half edge module.

So I am going to look at making the data for the halfedge more flat, allowing me to store all the geometric meta data in a flat array, which would then be easily converted into Data for comparison, or serialisation.

Thank you for all the help, I will post my results

The link doesn't point to your implementation (it points to an assignment) - without seeing your implementation it's hard to comment. Other thank that, good luck with your implementation!

I believe it would be possible to have the half edge module "as is" and have a wrapper around it - then it would be just a matter of changing:

halfEdge.someCommand(parameters)

to something like:

   halfEdge.someCommand(parameters)
   halfEdge.undo()
   halfEdge.redo()

Wait a second. Did I just say change? But there's no change other then ability to use undo / redo? Right! Same API. Here's how, this is the "bird view" overview of this approach.

A generic "Operation" command.

struct Op {
    let name: String
    let doStep: () -> Void
    let undo: () -> Void
    
    init(name: String, doStep: @escaping () -> Void, undo: @escaping () -> Void) {
        self.name = name
        self.doStep = doStep
        self.undo = undo
    }
    
    var reversed: Op {
        Op(name: name, doStep: undo, undo: doStep)
    }
}

Nothing scary so far, right?

A generic undo manager itself (removing some unimportant bits)

class UndoManager {
    var undoStack: [Op] = []
    var redoStack: [Op] = []

    func perform(_ op: Op, isUndoRedo: Bool = false) {
        if !isUndoRedo {
            undoStack.append(op.reversed)
            redoStack = []
        }
        op.doStep()
    }
    
    func undo() {
        let op = undoStack.removeLast()
        redoStack.append(op.reversed)
        perform(op, isUndoRedo: true)
    }
    
    func redo() {
        let op = redoStack.removeLast()
        undoStack.append(op.reversed)
        perform(op, isUndoRedo: true)
    }
}

Have you noticed any Half Edge dependency so far? Right, there was no so far. Here it is:

class HalfEdge: UndoManager {
    var state: ....

    // example command implementation
    func someOperation(parameters: ....) {
        // you may need to remember some state bits here
        perform(
            Op(name: "someCommand") {
                // change self.state accordingly
                // using your existing half engine module API
            } undo: {
                // to use the remembered bits here
                // to change self.state "back" accordingly 
                // using your existing half engine module API
            }
        )
    }
}

That's basically it. And your data structure can stay whatever it is, no need to make it linear, etc.

1 Like

Hi tera,

Thank you for the UndoManager explanation :slight_smile:

I have implemented an undo manager in nearly the same way as your example, I just use a single stack and an index for undo location in the stack.

I suppose what I was trying to avoid are the custom undo functions (in your example the state bits in someOperation) as this would result in each half edge operator having a custom undo operator ( each operator would have to handle all the special cases that could occur when editing 3D topology), which is doable, but for longevity of the project and the amount of code upkeep I was hoping to steer away from this sort of undo setup.

I may be over thinking this or thinking about it incorrectly :sweat_smile:

This is what I was hoping to try an avoid by storing the byte data for the object.

Here is a little sudo-ish code example. There is possibility of references breaking as they could be getting deleted ( was thinking about the flat data structure and using index positions ). In the sudo code below this also shows one of the simplest special cases, some are much more complicated.

func deleteEdgeOperation(mesh:HE_Mesh, edge:HE_Edge ) {
    
    // Vertices that make up each side of the edge, these would need to be stored to recreate the edge.
    // This might have to be implemented in another way, incase one of these vertices gets deleted, the reference in the undo will break.
    let vertices:Array<HE_Vertex> = [edge.half_edge!.vert!, edge.half_edge!.twin!.vert!]
    
    // Mesh to apply all the modification to
    let mesh: HE_Mesh = mesh
    
    // Special Case Data
    // If the edge being deleted is on a boundary then the polygon that touched this edge
    // will also be deleted. So we will need to track the polygons vertices, 
    // as we cannot rely on references incase a vert is deleted.
    let fill_hole_vertices:Array<simd_float3>?
    
    // As the edge will be getting recreated, we need to store the new 
    // edge reference, incase it needs to be undone.
    var temp_edge: HE_Edge? = edge
    
    // Special case
    if edge.onBoundary{
        // sudo-ish gets all the vertex positions for the face
        non_boundary_edge = edge.half_edge?.face.vertices()
    }
    
    Op(name: "Delete Edge") {
        // Deletes the edge, and resets the reference to the edge.
        mesh.deleteEdge(temp_edge)
        temp_edge = nil
    }
    undo: {
        
        // Creates a new edge between vertices. This could fail if vertices references no longer 
        // exist, as they have been recreated, Can store the vertex positions and search 
        // through all available vertices. This is not a sound solution, as it is 
        // possible for more then two vertices to exist at the same position. 
        // ( up to the artists interpretation/needs)
        let new_edge = mesh.create_edge(vertices)
        
        // Special case, has occurred, a face needs to be generated 
        if fill_hole_vertices {
            if new_edge.half_edge.face.vertices == fill_hole_vertices {
                mesh.fill_hole(new_edge.half_edge)
            }            
        }
    }
}

While writing the above code, it made me realise that references in the undo hierarchy could become a very big problem.

This is a bird eye view of the main half edge structure, and the data that will be getting modified.

class HE_Mesh {
    var Vertices:Array<HE_Vertex> = []
    var Edges:Array<HE_Edge> = []
    var Faces: Array<HE_Face> = []
    var HalfEdges: Array<HalfEdge> = []
    var Boundaries: Array<HE_Face> = []
}

class HalfEdge {
    var vert: HE_Vertex?
    var edge: HE_Edge?
    var face: HE_Face?
    var twin: HalfEdge?
    var next: HalfEdge?
}

class HE_Vertex {
    weak var half_edge: HalfEdge?
    var pos: simd_float3 = [0, 0, 0]
}

class HE_Edge {
    weak var half_edge: HalfEdge?
    
    public func onBoundary() -> Bool {
        return half_edge!.isBoundary() || half_edge!.twin!.isBoundary()
    }
}

class HE_Face {
    weak var half_edge:HalfEdge?
    var boundary:Bool = false
}

Thank again for taking a look, Times like these its great to have be able to discuss the problem

Welcome to the pain of reference semantic world! These kind of issues is why people tend to use value semantic - much fewer gotchas like that down the road.

Ok, let's stay with the reference semantic for now. And let's simplify what you are saying a bit to see if we are on the same page. Pseudo code (showing the do / undo operations):

  1. do: {new node()} undo: {...}
  2. do: { something that uses node reference } undo: { something that uses node reference }
  3. do: { delete node reference } undo: { recreate the mode reference }

when you delete the node at step 3 but then undo the operation and recreate the node - you'll have a new node reference, so if you further "undo" to step (2) and try to use that reference - that would be the wrong reference to deal with. You better use some node ID instead and lookup for node reference by that ID.

Ditto would be the case if you want your undo stack to survive app relaunch. Not many apps doing that, but when they do that's really cool. Same answer - use IDs, not references.

Would it be slow to find a node reference by ID? Searching in a linear list that's O(n) - so would be slow indeed for large lists. Naturally you may come up with some [ID: Node] dictionary to speed this lookup up and keep that dictionary up to date. Although if you do that - you'll probably start asking yourself at some point why to have this:

class CNode { // reference type node
    var id: ID
    var essentialContent
    var children: [Node]
    var parents: [Node] // make this weak somehow
}

var nodes: [ID: CNode]

and deal with the complexities of having both references and IDs (also deal with weak pointers, which for a graph like structure would be non easy) while you could use a simpler:

struct VNode { // value type node
    var essentialContent
}

var nodes: [ID: VNode]

and encode node relationships with something like this (showing only one possible approach out of dozens!):

var parents: [ID: [ID]] // if a node can have more than one parent
var parents: [ID: ID] // if a node can have just one parent
var children: [ID: [ID]]
var next: [ID: ID]
var prev: [ID: ID]

Of course you also need to keep in mind "the target user" of your data structure (Metal?), and it's peculiar preferences (e.g. it will need to see an array at the end of the day) and adjust accordingly.

I'm merely thinking out loud now, and don't want to bias you in wany way. Perhaps what you had in mind is also doable in the form and shape your envisioned. Still hope the above helps.