NSKeyedArchiver, CoreData and other storage solutions

itaiferber · November 24, 2022, 12:52am

On the face of it, it doesn't sound like NSKeyed{A,Una}rchiver is going to be buying you much over Codable options which exist already, though what solution to go with specifically is going to depend on the rigidity of your requirements.

Primarily, you stand to benefit from NSKeyedArchiver if you're encoding object graphs and not just trees of objects: once you have circular references in encoding and decoding, NSKeyedArchiver offers tools which are easier to work with than Codable does currently. The caveat is that in order to participate in those tools, you have to adopt NSCoding and not just use the Codable support layer, so take that as you will.

In terms of your requirements:

1. open / save archives to a given file location on a local file system <snip>
2. open / save should be "quick" (see 3)
Both archival solutions are going to offer the features you need here, but you'll need to benchmark to see if either meets your perf requirements. With Codable, the specific encoder and decoder you use may swing the results in different directions; with NSKeyedArchiver you just have the one implementation — but worth a test
1. changing a small portion of a data graph should change a small number of pages on disk to minimise I/O.
Depending on what you mean specifically here, you're likely out of luck for all currently-presented solutions, since none of them really support updating in-place in a meaningful way; more on this below.
1. should support Codable types
2. should support value types (like dictionaries or arrays of Ints or Strings or other arrays / dictionaries, possibly custom Codable "structs" with those types in them).
3. secure coding would be nice to have out of the box, although this is not a show stopper.
Both options support the types you need; and Codable obviates the need for NSSecureCoding, so you're covered there too
1. nice to have (although not a show stopper) if this storage is fault tolerant / atomic <snip>
On all filesystems and OSes I'm aware of, this would be in conflict with (3). If you're looking to optimize writes to the disk by doing something like memory-mapping the file and updating only portions of it in-place, you'd have to sacrifice atomicity: there's nothing you can do if the user pulls the power cord after you've started writing to memory but before you're done (leading to data corruption that may not be detectable or recoverable).

On all systems I'm aware of, atomic writes are done by writing to a temp file, and having the filesystem atomically replace the existing file with the new one. When doing this, you're pretty much always writing to the temp file from scratch, which means writing out all of the data from start to finish — which precludes pretty much any I/O wins.

(3) is the real sticking point, here depending on what exactly you mean by "change a small number of pages on disk".

If you're looking to do something like mmap the file into memory and update portions of it, none of the mentioned solutions here are going to work well for you: NSKeyedArchiver (and CoreData, based on it) only supports writing the entire archive at once; Codable could theoretically support this with an Encoder explicitly written to do this, but I'm not quite sure it could work.

You'd likely need something written with this use-case in mind: you'd need something that keeps a reference to the existing file open, reading it as it's encoding data to only write out pages of data which have changed... which seems pretty niche.
You also need to keep in mind that it's pretty difficult to keep archives binary-stable, to benefit from any optimizations regarding avoiding writes to pages by matching them up
1. The data produced by NSKeyedArchiver (and all Encoders I know) is not guaranteed to be stable — e.g., reading an archive into memory and writing it back out can produce a different binary blob because dictionaries are ordered based on how keys are hashed, which can change at runtime
2. Inserting data into anywhere in the archive necessarily needs to "push" all further data out, which requires shifting everything in the file, defeating any I/O optimizations
So, you'd need an archiver of some sort which is guaranteed to be binary-stable and also append-only for changes, if you wanted to truly benefit from such a scheme (if I'm understanding your desires as written correctly)

Given this, it seems unlikely that this is truly a hard requirement, in which case "fast enough to not be noticeable" may be good enough... and if so, benchmark, benchmark, benchmark. You may discover that existing Encoders are fast enough to do the job you want, in which case you don't even need to think about any of this at all. (And if it's not the case, you can figure out what the bottlenecks are, and go from there. Maybe explore something like protobuf as a next step to see if it has the tools and performance characteristics you're looking for.)