In SE-0167 we find this statement about NSKeyedArchiver
and NSKeyedUnarchiver
:
Although our primary objectives for this new API revolve around Swift, we would like to make it easy for current consumers to make the transition to Codable where appropriate. As part of this, we would like to bridge compatibility between new Codable types (or newly-Codable-adopting types) and existing NSCoding types.
To do this, we want to introduce changes to NSKeyedArchiver and NSKeyedUnarchiver in Swift that allow archival of Codable types intermixed with NSCoding types
(Yes, I’m squarely in Cocoa-land, because that’s one prominent scenario where the problem shows up. However, this is a definitely a Swift issue, not Cocoa.)
The problem is that this stated goal isn’t achievable by encodeEncodable
and decodeDecodable
as currently implemented. As a practical exercise, I took a very ordinary, simple data model that was previously being archived for NSDocument
, and changed all the NSCoding
conformances to Codable
. When I tried to save the document, my app crashed — crashed big, with infinite recursion leading to a stack overflow.
It turns out that archiving/unarching via Codable
through keyed archivers/unarchivers doesn’t respect the normal archiver convention of object reference identity. That is, reference type instances in NSCoding
are unique within the archive as a whole. Codable
, on the other hand, archives or unarchives a new instance at every reference encountered in the object graph. It crashes because typical data models in Cocoa apps have circular chains of references. (For example, there is a circular chain between a parent object with an owning reference to a child object that has a [weak] back reference to the parent.) These are unproblematic in NSCoding
, but fatal in Codable
.
This particular crash isn’t very difficult to solve, at least in principle. It’s possible to write archive encoders and decoders that check for reference types, and maintain reference identity in the archive.
Unfortunately, there is a second problem that’s not so easy to deal with. Reference identity is fine for encoding/archiving, but it isn't sufficient for decoding/unarchiving, because of Swift’s initializer rules. An object being decoded, A, might need to store a reference to another object, B, but, if there is a chain of references from B back to A, B cannot be initialized because references to A are not available until A returns from its initializer — which it can’t generally do without the reference to B.
(Obj-C doesn’t have this problem, because it happily passes around references to partially constructed instances.)
I thought at first this was insoluble, but I realized that the only way for a circular chain of references to exist is if one of them (at least) is an optional type. AFAICT, a circular chain of non-optional references cannot be created in Swift at all. Necessarily, one of them must be of optional type.
That opens up the possibility for a two-pass unarchiving process. In the first pass, objects would be allowed only to set stored properties (typically, non-optional references, along with non-optional value types that contain non-optional references), but not to use any stored properties (since they're not fully initialized yet). In the second pass, objects would set the rest of the stored properties (including optional references), and finish any other calculations and setup normally done in an initializer.
Clearly, this would require compiler support, but before venturing into that territory I’d like to ask if anyone has looked into this problem before, and whether there are any easier solutions. I’ve tried searching the forum for existing threads on the subject, but haven’t found any.
I’ve also spent some time on an experimental implementation of two-pass decoding (using init
to simulate the first pass, and a decode(from:)
method for the second pass), but I got stuck at the point where I found I couldn't retroactively impose two-pass decoding on types that currently conform to Decodable
.
So, anyone got any thoughts on this matter?