Note: This came out nothing like I intended. I've been working on it for days, and it's either a feasible insight, or the product of a crazed mind. I can't tell the difference any more.
Purpose
There needs to be a Swift-native replacement for NSCoding-based NSKeyedArchiver and NSKeyedUnarchiver, for encoding and decoding object graphs with reference identity preserved. The Swift solution should in straightforward cases be synthesizable by the compiler, to abolish the boilerplate code found in the NSCoding mechanism, but should be simple enough to code manually when necessary, just like Codable.
TL;DR
-
The Swift language cannot currently meet this need.
-
Multi-pass decoding typically won't help.
-
A feasible solution could be based on separating object allocation from initialization.
General Approach
My belief is that a natural solution would introduce a new pair of encoder and decoder protocols, which I'll refer to as ReferenceEncoder and ReferenceDecoder, implemented by ArchiveEncoder and ArchiveDecoder classes, with an associated pair of encoding and decoding protocols, ReferenceEncodable and ReferenceDecodable, and various β¦Reference[En/De]codingContainer protocols. All of these protocols would be largely identical to the existing protocols of similar names (without the Referenceβ¦ prefix). The ReferenceEncodable and ReferenceDecodable are named differently to act as markers for different synthesis semantics.
The reference-identity aspect of both encoding and decoding β assigning each archived object a unique ID, and representing object references in the archive using unique IDs β seems straightforward, and I don't think it needs to be discussed here.
I also believe that a reasonable Swift-native archiving scheme is impossible without enhancements to the language itself. Swift's current initialization rules are a bit too restrictive for archive decoding to work. The rest of this post is an attempt to justify this claim.
The Problem
Previously, I've kept making the mistake of thinking that the hard part of decoding archives with reference-identity semantics is the formation and storage of object references. I finally realized this isn't true. Object references are central to a solution, but they are not themselves the problem.
The problem is how to determine a correct order of initialization for objects being unarchived.
The order matters because the initialization of one object, in general, depends on the prior initialization of other objects. In the least general case, it would be possible to choose any archived object, and begin decoding it. This would lead to invoking init(from:) on its type, and letting that initializer trigger the decoding of the objects and values on which it depends, recursively.
When an object is encountered which has no further dependencies, that chain of decoding ends, and the process restarts from a different, as-yet-undecoded object in the archive. When no more undecoded objects remain, decoding of the archive is finished.
Obviously, this strategy fails if there are any circular chains of dependencies in the archive.
Breaking the Chains
With the current Swift initialization mechanism, there is no general way of resolving circularity in a single decoding pass (that is, by just invoking init(from:) for each object in the archive, in some order or other). A circular chain will have at least one object that cannot finish initialization until a second one finishes, and the second object cannot finish without the first.
At first, it seems hopeful that a two-pass decoding mechanism would work. Basically, this strategy would break all circular chains by ignoring some of the dependencies, initializing the objects in the remaining non-circular chains in a first pass, then dealing with the ignored dependencies in a second pass.
If "initialization" meant "setting the values of stored properties", then this might be a feasible solution. Some stored properties would be stored in pass 1, and the rest in pass 2. (In fact, this is not so easy, and I wasted a lot of time trying to taxonomize this strategy. It turns out to be irrelevant.)
In general, initialization of one object also involves dereferencing β invoking methods on or accessing properties of β the objects it depends on. For that to be safe, those objects must have completed their initialization. A multi-pass strategy doesn't help with that, it just extends initialization (for at least some objects) over all of the passes.
So, we're stuck.
But, Objective-C, Dammit
It's worth asking why NSCoding-based unarchiving isn't faced with the same dilemma.
In part, the answer is that the NSCoding pattern isn't entirely safe. In particular, Obj-C object initialization conventions are quite happy to vend object references outside the initialization method before the object is fully initialized. Developers using NSCoding learn to avoid relying on objects being in a consistent/complete state during unarchiving, except perhaps in a few carefully curated cases.
This is not foolproof. The typical symptom of getting it wrong is crashing, or encountering an unexpected nil reference during decoding. Sometimes, the symptom is a subtle bug that manifests much later, and takes a lot of debugging to find.
It's obviously not an option for Swift to let objects be used in an invalid state, but I think the Obj-C pattern points in the correct direction for Swift.
How Swift Can Win
Decoding typically involves a lot more setting of properties within an initializer than dereferencing object dependencies. If we could safely allow stored properties be set, even when their values are references to objects that haven't completed their initialization (or sometimes, in some sense, haven't started), then there's a pretty good chance that the remaining dependencies could be resolved without circularity. After all, decades of Obj-C programming has been relying on just this (though without language-enforced safety guarantees).
Safety could be provided by a few simple rules, which would apply to code executing during an init(from:) method:
-
If the code attempts to decode an archived value of value type, the init(from:) initializer for that type is run to completion.
-
If the code attempts to decode an archived object of reference type, and the object's init(from:) has already run to completion, the object reference is simply returned.
-
If the code attempts to decode an archived object of reference type, and the object does not yet exist, the object is allocated (that is, the memory to which the object reference points is allocated) but not initialized, and the object reference is returned.
-
If the code attempts to decode an archived object of reference type, and the object's init(from:) is partially completed, the object reference is simply returned.
Rule 4 applies in the case of a circular chain of dependencies of stored references. It would be unsafe, except that there are more rules:
-
If the code attempts to dereference an object reference, and the object's init(from:) has already run to completion, the invocation or access is allowed.
-
If the code attempts to dereference an object reference, and the object's init(from:) is partially completed, the program is aborted with an error describing the chain of coding keys.
Rule 6 describes to the case of a circular chain of dependencies on initialization completion. If it's encountered, the archive is undecodable by the ReferenceDecodable types provided for the decoding.
- If the code attempts to dereference an object reference, and the object's
init(from:) has not begun to execute, the initialization should be run to completion, then the invocation or access should be allowed.
Rule 7 may sometimes lead to a failure in the dereferenced object's initialization, because of rule 6.
What Swift Needs
The above rules cannot be implemented in the Swift language as it presently stands. However, the needed changes seem fairly simple, at least in overview:
-
The ReferenceDecoder object or compiler-generated code would need to maintain the initialization state of every object in the archive as one of three cases: .uninitialized, .initializing, and .initialized.
-
The ReferenceDecoder object or compiler-generated code would need a way to get a reference to an object's memory without actually initializing the object. (Obj-C already has an alloc/init distinction. This would be something similar. Presumably Swift does actually allocate memory before initializing it.)
-
The ReferenceDecoder object or compiler-generated code would need a way or running initialization using a reference to previously-allocated object memory.
-
Compiler-generated code would need to check the initialization state of the reference being used for method dispatch and property access during init(from:).
An actual implementation would likely be more complicated than this, but I don't see any obvious technical barrier to the general approach.