Codable with references?

BigZaphod · June 24, 2018, 4:52pm

I've run into a problem I can't figure out how to solve here. Example structure:

class Second {
  unowned let parent: First
}

class First {
  var children: [Second]
}

When attempting to unarchive in a situation like this, it might start with unarchiving an instance of First which then attempts to unarchive the children array which in turn attempts to unarchive a reference to the instance of First that is still unarchiving. This results in bad things.

The solution that I think would work would be if I could allocate memory for an instance of First and cache it and use it when unarchiving child objects like Second that refer back to it. No problem there in Objective-C because I could just do something like [First alloc] and store the resulting pointer and hand it off to child objects as they do their decoding. Then later call -initWithDecoder: or whatever on that allocated instance so it gets properly initialized. So for awhile things might be in an undefined state, but at least the pointers are where they should be and as long as nothing tricky is done in the various decoding functions, it should be okay.

I can't for the life of me figure out how to do this in Swift because Swift seems to pretty strongly bind allocating with initializing. I played around with the various Unsafe stuff where I can allocate a buffer of bytes or whatever, bind it to a Type, etc.. but I can't figure out how to lie to Swift that it's initialized when it actually isn't - just for a little while.

MLewin · June 24, 2018, 5:12pm

Could you make parent optional to achieve what you want?

BigZaphod · June 24, 2018, 5:18pm

I don't think that'd solve anything without also having a second phase during the decoding process where parent could be reconnected to the proper instance later. Something like an func awake(after decoding: Decoder) or something like that. That'd technically maybe work, but it's not in the Coding protocols and it's also putting the burden of knowing when you might accidentally create a reference cycle like this into the encoding/decoding logic of the type and you'd never know if you really needed to do it until you ran into problems decoding at runtime. (if that makes sense)

MLewin · June 24, 2018, 5:25pm

It makes perfect sense. In reading the prior post in the thread, I felt like you'd already resigned yourself to that second phase. I then made the apparently incorrect assumption that there was a fairly simple way to make that second phase work. While making assumptions, it also was sounding to me like the implementation you were working on was custom enough that you would know about that reference cycle.

I suppose this is what I get for trying grok something while my son calls me three times in rapid succession.

QuinceyMorris · June 24, 2018, 7:32pm

You can't solve this in the general case. As you say, it needs a 3-pass initialization process, not the current 2-pass process (although it's not entirely clear how best to break it down into three passes).

The only current solution is to use some kind of optional and set the actual value "later", which typically involves some customization of the decoding process (and possibly preventing the synthesis of Decodable conformance). You can do this in your example, too, by noting that the children array is kinda like an optional when left empty. You can create the First object first with no children, then create all the Second objects with references to their parents, then add all the children to the parent.

The huge drawback (apart from having to refactor code you think of differently) is that the extra step has to involve making something public that probably should be private to First. (Edit: Plus, it tends to expose objects that are in a intermediate/incomplete state, which isn't good either.)

Anyway, this is the point that I gave up (as I said in the original thread) and decided to let @itaiferber go first into this dark valley. Currently, I think he's too optimistic about the solvability of the problem, but if there's a solution I trust him to find it, or invent it.

itaiferber · June 25, 2018, 9:24pm

Thanks for weighing in here, and sorry that the previous thread fell off my Radar; lots to do!

I won't rehash everything discussed in Codable != Archivable but I think it's a good basis for discussion here.

I am still not convinced that a solution would need to be invented here that doesn't already exist. The same initialization rules that already apply to First and Second are no different whether you write the initializer yourself, or implement init(from:).

@BigZaphod, in your example, how would you actually write initializers for First and Second yourself today? I see a few possible options:

Make parent optional and patch it up in an addChild() method:

class Second {
    unowned var parent: First?
}

class First {
    private var children: [Second] = []

    func addChild(_ child: Second) {
        child.parent = self
        children.append(child)
    }
}

This isn't great as Second objects without parents might not be useful, and now parent has to be a var (and exposed)

Create a Second with a First and patch up children:

class Second {
    unowned let parent: First
    init(_ parent: First) {
        self.parent = parent
        self.parent.addChild(self)
    }
}

class First {
    private var children: [Second] = []
    func addChild(_ child: Second) {
        self.children.append(child)
    }
}

This also isn't great: Second needs to know to patch up its parent upon initialization, and First needs to expose addChild, which may or may not need to also verify that child.parent == self, which is redundant in many cases

Create a Second in First using a helper method:
```
class Second {
    unowned let parent: First
    fileprivate init(_ parent: First) {
        self.parent = parent
    }
}

class First {
    private var children: [Second]
    func addChild() -> First {
        let child = Second(self)
        children.append(child)
        return child
    }
}
```
This is getting better as there's now no way to construct a Second from outside of the type, and it doesn't need to know anything about the structure of First; you can only construct one from First, which also keeps the structure of First and relationship between these types consistent. The downside is that you can't create a Second on its own and have to go through First

There are a few other isomorphic cases (we can do more fun stuff with visibility), but each of these cases can be represented by an init(from:) — it largely comes down to: does Second decode First and patch up its list of children, or does First decode Second and patch up its parent reference?

Whatever rule you were using in your existing initializer is the same rule you'd use here. Because the reference is unowned, I would say that Second doesn't even necessarily need to encode its parent, as, well, it doesn't own it! It's up to the parent to own and fix up the child.

So: can you share the real-world use-case here? We can take a look at the existing methods for creating the object graph in the first place, and try to replicate them in init(from:). Again, I'd love to be proven wrong here, but haven't yet seen something to convince me that this is broken.

(@QuinceyMorris I didn't get a chance to respond to this, but your final example changes the relationship between X and Y to be one-to-one, and also, can't be represented with pure inits — you have to pull the code out into a static function to hold on to an X long enough for it to not be deallocated.)

Separately, we have considered adding a post-decode patch-up phase to Codable (we can add it today with a default implementation that does nothing), but I'm not convinced that this would be necessary to fix this up. Either object A needs to patch up object B or vice versa; that can happen in init(from:) or later. It might be more ergonomic in an awaken phase or similar, but should be possible in init.

BigZaphod · June 25, 2018, 9:55pm

The project I've been working on uses approach #3 to make the children, so it was neat and tidy and initializers were all private and restricted and happy. The problem with having the parent patch it up later is that unowned let parent: First is a let so I can't ignore it in Second's implementation of init(from decoder: Decoder). The only workaround for that I can think of would be to make that property into a weak var parent: First! to get basically the same use-site behavior that unowned has - and that'd allow me to skip initialization and rely on First to patch it as it initializes itself from the archive.

That said, though, that'd means having to manually implement the encode/decode to handle these cases and thus giving up the free compiler-generated implementations.

itaiferber · June 25, 2018, 10:18pm

Ah, yes — this is the case in which you cannot construct children without an already constructed parent.

The rules here are consistent: in the same way that you cannot construct children from within First.init() itself (something external has to call addChild() after the fact), you would need to recreate the same state here. Something from outside of First would need to help decode the children once it is constructed. For instance, if something owns First (like something has to call addChild() on it), you can do something like:

init(from decoder: Decoder) throws {
    let container = ...
    first = try container.decode(First.self, forKey: .first) // no children yet
    first.decodeChildren(using: container /* or decoder */) // first can get a nested container and decode its children now that it fully exists
    // ...
}

You can add additional methods to do essentially what addChild() does in a variety of ways, but if your encoder/decoder natively supports references, first should just be able to decode its children and their references will be well-constructed.

That's another way to do it — this transforms the problem into case #1 (with some hopefully nicer restrictions).

Yes, there's no real way around this, unfortunately. In the same way that you need to write manual inits/methods elsewhere to form the relationships between these values, you'll need to do the same here; there's no way for the compiler to guess at the semantics here and find a way to do it automatically.

BigZaphod · June 26, 2018, 11:00pm

For the moment, I've given up on my own Encoder/Decoder pair and ended up rolling my own thing. I've learned a ton in the process and I think this includes some pretty darned clever code if I do say so myself.

The code is here: GitHub - BigZaphod/Archivable: A custom Swift 4.2 Archiver that supports reference types

My project has a 2-phase initialization, so you have to have an empty init() which happens first and is immediately followed by a read(from:) to get around the issues with references. For my purposes, this just seemed easier. We'll see what happens in the long run as I use it more, I suppose. It also internalizes strings so any given string is only encoded once. The resulting archives are pretty small and seem to compress really well.

The best part, though, is that it uses KeyPath to specify which properties of your type you want to serialize and everything else just happens automatically. It's almost as good as having compiler-synthesized Codable support. For example, this is all it takes to encode/decode my Location structure that has x,y,z properties:

extension Location: Archivable {
	static var archivingKeyPaths = [
		Archive(\.x),
		Archive(\.y),
		Archive(\.z),
	]
}

I should note that my project is unkeyed since that was simpler, faster, and smaller. This is obviously not as robust as Codable can be, but it has the huge benefit of me understanding how it all works - which counts for something, I think!

Maybe the code will be interesting for someone.

cherrywoods · June 27, 2018, 9:20am

Hi,
I wrote a framework for this case, that should keep you from implementing a full encoder and decoder.... I'm a bit late with this, but maybe it can still help.

If you are still interested, I think it should be able to save you a lot of work (It did for me once, but I have not gathered any in dept feedback whether it also did for others).

I think you could use reference table itaiferber mentioned above without giving up synthesized codable conformance. I don't have to much time right now, so this is really just an idea that might not work at all, but I would try to implement it this way:

First serialize to a enum based format (similarly to the JSONValue enum) that works with references and a reference table:

enum Refless: Meta {
    case int(Int)
    case string(String)
    // Double, whatever...
    case ref(Ref) // points to another Refless in a ReferenceTable
    case array([Refless])
    case dict([String:Refless]
}

In the end, you want to serialize to a tuple (Refless, [Ref:Refless]) where the first element is the encoded top level instance and the second is the Reference Table. You also have to implement Ref somehow to uniquely identify your objects.

To implement the encoding, implement a MetaSupplier from the framework I linked to above that creates the lookup table and serializes. To do this I would implement wrap this way:

var lookupTable: // I think if all your objects are hashable, you can use a type + hashvalue combination as unique identifer and use [Ref:Refless] here. Else, I think you need a more complex thing that keeps all instances it sees and compares each instance you look up with all other instances for identity. 
func wrap<T>(_ value: T, for encoder: MetaEncoder) throws -> Meta? where T: Encodable {
    if value is Int {
         return Refless.int(value as! Int)
    else if // ... for string and the other primitives
    else {
        // this is a more complex instance
        // now I want to lookup this instance in the lookup table
        // if it is present, I want to return Refless.ref
        // if it is not, I want to call encoder.encode(value), store the returned value in lookupTable
        // and the return Refless.ref
    }
    // it is also necessary to override keyedContainer and unkeyedContainer, because I want to use Refless here too.
    // so Refless needs to conform to EncodingKeyedContainerMeta and EncodingUnkeyedContainerMeta 
    // and needs to implement the required function that will only work, if self is dict/array
    // otherwise they may call fatalError.
}

Calling MetaEncoder(metaSupplier: yourMetaSupplier).encode(yourObject) will then return the top level instance and yourMetaSuppliers lookupTable is the reference table we want.
Then we can use the JSONEncoder to encode this tupel to JSON as our storage format.
We use JSONDecoder to decode from this JSON to (Refless, [Ref:Refless]) (implementing decoding similarly to JSONValue).
Implement Unwrapper to convert Refless back to swift types:

let lookupTable // set on init with the second element of the decoded tupel.
func unwrap<T>(meta: Meta, toType type: T.Type, for decoder: MetaDecoder) throws -> T? where T: Decodable {
    switch (meta as! Refless) {
        case .int(let value):
              return value as? T
        // same for string, etc.
        case .ref(let ref):
             // lookup ref in the lookup table, and then decode T using
             // MetaDecoder(unwrapper: self).decode(type: T.Type, from: lookedupRefless)
        default:
            return nil
    }
}

Refless also needs to implement DecodingKeyedContainerMeta and DecodingUnkeyedContainerMeta

I hope this isn't a completely biased approach (this happens to me quite often )

jamesmoschou · September 7, 2018, 6:57am

itaiferber:

The rules here are consistent: in the same way that you cannot construct children from within First.init() itself (something external has to call addChild() after the fact), you would need to recreate the same state here. Something from outside of First would need to help decode the children once it is constructed. For instance, if something owns First (like something has to call addChild() on it), you can do something like:
init(from decoder: Decoder) throws {
    let container = ...
    first = try container.decode(First.self, forKey: .first) // no children yet
    first.decodeChildren(using: container /* or decoder */) // first can get a nested container and decode its children now that it fully exists
    // ...
}

Isn't this just an awake phase by another name?

itaiferber · September 7, 2018, 4:05pm

These are isomorphic concepts, yes. Adding awake() or similar would formalize the concept more fully, but the core concept is possible today.

jamesmoschou · September 7, 2018, 7:08pm

How would you handle this case where First is the top level object being decoded? This is the issue I'm running into. I don't see how it is possible without cooperation from the Decoder implementation.

The object graph I'm working with is actually the Xcode project file. Starting with the top level PBXProject object the coding path targets.n.dependencies.n.targetProxy.containerPortal cycles back to the top. It is impossible to obtain the instance of PBXProject from within any decode(from:) initialisers.

The only ways forward I can see are:

Model containerPortal as an ID rather than the object.
Assume it points to the root project and either don't explicitly model it, or patch up it heuristically.

Number 1 I don't like because it doesn't generalise to archive formats that autogenerate internal IDs that aren' t meant to be be exposed to the outside world.

Number 2 doesn't work for object graphs that can't be patched up heuristically. If I had a coding path child.child.child.ancestor, which ancestor the final component refers to is only known by the archive.

QuinceyMorris · September 7, 2018, 8:11pm

See Codable with references? - #26 by QuinceyMorris above.

I still believe this is an insoluble problem, in general**, given Swift's current initialization methodology. @itaiferber is on record as being optimistic, but I haven't seen any reason to be optimistic.

————
** Insoluble without either (a) severely restricting the kinds of object reference graphs that can encoded and decoded, or (b) exposing objects publicly in a partially constructed state, or otherwise leaking private implementation details.

itaiferber · September 7, 2018, 8:38pm

In the general case, I do agree with you that there are object graphs unrepresentable in Swift due to their construction without either:

Something to help at least temporarily break a dependency chain (e.g. making something Optional, whether truly optional or an IUO), or
Changes to Swift's rules about how allocation and initialization can happen

I don't see #2 changing given the safety guarantees Swift makes, but #1 is within the control of any developer. Within the existing rules of Swift's initialization, I am confident that Codable can represent any object graph which you can create yourself today in code, whether using a technique like #1, manually patching up values after the fact, etc.

As for making this easier with Codable, the answer in general comes down to (usually) needing some object which helps facilitate the buildup of these graphs (a translation from object-graph to tree on encode, and a translation from the similar tree to a graph on decode). I haven't had the bandwidth to sit down and design this API in fullness, but there are a few things to get down:

Where would a helper like this live? There are a few options:
- We provide an implementation of, say, a CodableReferenceTable which has a simple representation of references (one easy scheme is object ↔︎ UID) — if you want to use one of these you pass one in through the userInfo dictionary of any given Encoder/Decoder and it would be possible to graft this on to any format. You would then need to override init(from:) and encode(to:) to customize behavior to do this. Or,
- We extend the Encoder/Decoder requirements to support explicitly require supporting object graphs and this becomes "automatic" for any format as a requirement (with the Encoder/Decoder translating into a representation suitable for the format). Simply encoding objects will create references (as currently happens with, say NSKeyedArchiver/NSKeyedArchiver, which automatically translates object ↔︎ UID). [Note: this would be behavior-breaking and cannot be changed at this point, both in terms of requirements on written Encoder/Decoder pairs, but format-wise as well, potentially.] Or,
- We extend the Encoder/Decoder requirements to optionally allow requesting reference semantics (either with a toggle, or allowing the Encoder/Decoder to itself simply signal whether it supports the operation) and types decide how they want to make use of things. Or,
- We do some combination of the above, allowing one to set a CodableReferenceTable on an Encoder/Decoder as a way to signal that you want reference semantics and the setter has an option of rejecting the assignment if the Encoder/Decoder doesn't support those semantics. Or,
- We don't offer any help, as things are today, and encourage folks to roll their own solution (as it is possible, though nontrivial). Or,
- We do that but make it somewhat easier by adding an awaken(from decoder: Decoder) which allows after-the-fact patching up of references. Or,
- We do something not mentioned here
As part of the above, whose responsibility is it to decide how references are represented, if at all? What does it mean for a type to require reference semantics but for an Encoder/Decoder to not support it? How is that communicated and what do you do in that case? There is a spectrum of semantics to figure out here ranging from "if you need this feature you're on your own" to "we're going to change the semantics of all existing Encoder/Decoder pairs to require this and invalidate existing code" (neither end is good, FWIW) and there are decisions to be made that we have not yet had the bandwidth to think all the way through.

I think there is good work to be done in this area and I am receptive to further suggestions.

itaiferber · September 7, 2018, 8:42pm

In terms of your specific questions here:

As with First above, you would need something to contain First to help facilitate this. Either as a helper type which can reach in and patch up relationships, or at least as something to manage First and help set up links to it through some helper type present in the Decoder's userInfo that you write.

What do you mean by this? Why is maintaining your own IDs in conflict with how formats maintain their representations of things? (NSKeyedArchiver does this entirely successfully while sitting atop the property list format, for instance.)

I'm not entirely sure what you mean by "heuristically" here. Can you give a more concrete example?
Coding paths are also required to be entirely unique such that a given coding path can point to only a single location within an archive. I'm not sure exactly what you mean by this here, but just noting the fact.

QuinceyMorris · September 8, 2018, 12:39am

I think #1 is a solved problem. It is nearly impossible** to create a circular chain of references without at least one of the reference types being some kind of optional (if for no other reason than one of them is likely to be weak to avoid a retain cycle). I would be happy to live with this being a requirement for a built-in reference-coding scheme for Swift.

For #2, I don't want to weaken any of the existing guarantees. Instead, I think the correct solution would be to allow references to partially-created objects to exist, but not be dereferenceable until initialization is complete.

The way I see this working is that decoding an object would:

return a regular reference to the object, if it had already been fully decoded, or
instantiate and initialize the object if it hadn't been decoded yet, or
return a non-dereferenceable reference if the object had begun decoding but hadn't finished.

The last of these 3 conditions would occur when the referenced object was already partway through its own init when the chain of decodings led to this object being decoded.

The non-dereferenceable reference would automatically become referenceable as soon as it completed its init.

As I said once before, this is basically the same idea as an unowned variable referencing a non-existing or existing object at different times. (Not the same implementation, though, since the current implementation rests on a reference count being or not being 0.)

An alternative would be to add a second decoding pass from the Decoder object, which would be invoked after all decoded-object inits had been invoked. However, this would need some kind of guarantee that the second pass couldn't be used maliciously by a bad actor, and that classes would need to track manually whether they were in some kind of intermediate state.

The trouble with a helper object is that it can stand in for a referenced object, but a helper object reference can't stand in for a referenced object reference. Helper objects might be one approach to serializing an object graph, but I don't think they help solve the order-of-initialization problem that we're trying to solve (and, to my reading, is the essence of @jamesmoschou's concerns).

————
** I should qualify: unless a type exposes some structural API publicly, that could be used to mess with the object graph. At least, I think this is nearly impossible.

itaiferber · September 8, 2018, 3:32am

If you don't mind, I'm going to re-arrange your points slightly to respond to them in a different order:

QuinceyMorris:

For #2, I don't want to weaken any of the existing guarantees. Instead, I think the correct solution would be to allow references to partially-created objects to exist, but not be dereferenceable until initialization is complete.

The way I see this working is that decoding an object would:

return a regular reference to the object, if it had already been fully decoded, or

instantiate and initialize the object if it hadn't been decoded yet, or

return a non-dereferenceable reference if the object had begun decoding but hadn't finished.

The last of these 3 conditions would occur when the referenced object was already partway through its own init when the chain of decodings led to this object being decoded.

The non-dereferenceable reference would automatically become referenceable as soon as it completed its init.

As I said once before, this is basically the same idea as an unowned variable referencing a non-existing or existing object at different times. (Not the same implementation, though, since the current implementation rests on a reference count being or not being 0.)

I agree in that I wouldn't want to weaken any guarantees Swift makes here, but I'm curious as to how you see this working out as far as the type system goes. What would prevent one from dereferencing a non-dereferenceable object? If it's a runtime guarantee, then this seems not much safer than an IUO (which is okay, but we may have the tools to do this without requiring new language features). If it's the type system, we'd be in a position where we'd have to enforce something like

init() {
    // type(of: self) == UninitializedReference<Self>
    /* initialize all vars */
    // type(of: self) == Self
}

or similar, i.e. we'd have to ensure that prior to full initialization, a self reference would have some unusable type, while afterwards it would be easily referenceable. We don't currently have anything like this in Swift — the same var inside the same scope always has the same type.

Save for my lack of imagination here, I'm not sure how feasible this would be, especially if a method like decodeReference<T : AnyObject & Decodable>(to type: T.Type, forKey: ...) throws -> ??? could return either an uninitialized reference or an initialized reference dynamically. I'm eager to hear more about how you imagine the design of this to look.

However, I'm not quite sure if that's necessary:

EDIT: As @Nevin points out below it is possible to do this with post-initialization fixup, but I agree that in the general case, it is impossible to construct an object Foo which depends on the existence of an object Bar when the initialization of Bar depends on the existence of Foo without some form of escape hatch (in the form of an Optional, Unsafe*Pointer, or other stand-in object). The following are possible:

// Escape hatch is an Optional
class Bar1 { var foo: Foo1? }
class Foo1 { var bar = Bar1() }
let foo1 = Foo1()
foo1.bar.foo = foo1

// @Nevin's example below -- escape hatch is an object with no requirements.
class SuperFoo {}
class Bar2 { var foo = SuperFoo() }
class Foo2 { var bar = Bar() }

let bar2 = Bar2()
let foo2 = Foo2()
bar2.foo = foo2
foo2.bar = bar2

but this is not:

// Neither of these is truly constructible.
class Bar3 {
    var foo: Foo3
    init(foo: Foo3) { self.foo = foo }
}

class Foo3 {
    var bar: Bar3
    init(bar: Bar3) { self.bar = bar}
}

Assuming this axiom (and that it is impossible to rely on the behavior of Foo3 and Bar3), then this

is not quite true. To elaborate further on what I had in mind here. Let's assume point #1 stands, and we have a structure which looks like

class Parent : Codable {
    var child: Child
}

class Child: Codable {
    weak var parent: Parent?
}

Child here will be breaking the reference cycle. Assuming this, and assuming that for the moment we are willing to ignore setting up the weak link between a Child back to Parent, then I hope you agree with me that in order to decode a Parent p and its Child c, the following is true:

If we attempt to decode p, it will require that we decode c first. Assuming we are willing to decode c with a broken link to p, decoding c will succeed and we can finish decoding p
If we attempt to decode c first, assuming we are willing to decode c with a broken link to p, decoding will succeed. If we find p in the payload later, we can successfully decode its reference to c

At the end of either of these steps, we should have the following:

p.child == c
c.parent == nil

I hope you also agree that p can patch up c.parent = self at the end of its init(from:) in both cases 1 and 2, or if we had a stable reference to p held on to that was accessible, c could patch up self.parent == p during a second pass (see: awaken(from:) comments above). In both scenarios, we can have

p.child == c
c.parent == p

The requirements for doing this assume a shared way to reference these objects once they're decoded, so let's for the moment assume we're using a UID scheme here: every reference is translated to a unique ID for that reference and any object can be looked up via UID. Assuming this, let's take a look at one possible scheme for representing decoding.

Potential steps in scenario 1 above:

Something calls decode(Parent.self, ...) first — the decoder finds a UID at this coding path and goes to look it up in the UID table
The decoder finds no cached object in the table, so goes to actually decode a Parent from the payload at the given UID location
It sets up Parent.init(from:) which calls decode(Child.self, ...) — the decoder finds a UID at this coding path and goes to look it up in the UID table
The decoder finds no cached object in the table, so goes to actually decode a Child from the table at the given UID location
It sets up Child.init(from:), which leaves parent == nil for now (as above) and returns
decode(Child.self, ...) caches and returns Child c
Parent p stores c, patches up c.parent = self, and returns
decode(Parent.self, ...) caches and returns Parent p, which now has the right child. Any further calls to decode p or c's UID simply returns the already decoded objects

Potential steps in scenario 2 above:

Something calls decode(Child.self, ...) first — the decoder finds a UID at this coding path and goes to look it up in the UID table
The decoder finds no cached object in the table, so goes to actually decode a Child from the payload at the given UID location
It sets up Child.init(from:), which leaves parent == nil for now (as above) and returns
decode(Child.self, ...) caches and returns Child c
Later on, something calls decode(Parent.self, ...) — the decoder finds a UID at this coding path and goes to look it up in the UID table
The decoder finds no cached object in the table, so goes to actually decode a Parent from the payload at the given UID location
It sets up Parent.init(from:), which calls decode(Child.self, ...) — the decoder finds a UID at this coding path and goes to look it up in the UID table; it finds Child c and returns it
Parent p stores c, patches up c.parent = self, and returns
decode(Parent.self, ...) caches and returns Parent p, which now has the right child. Any further calls to decode p or c's UID simply returns the already decoded objects

In both scenarios, regardless of the order in which the Parent and Child are decoded, the weak reference allows breaking the ordering constraint until it can be patched up. Just like in the regular initialization scenario, Parent and Child must cooperate to patch up the cyclical reference, but this works just fine. [Just like in regular initialization, if you just decode Child without decoding its Parent, it will have a parent of nil.]

In terms of making some pretend syntax that looks more concrete, we can imagine an implementation for both which looks like:

final class Parent : Decodable {
    var child: Child

    init(child: Child) {
        self.child = child
        child.parent = self
    }

    init(from decoder: Decoder) {
        let container = try decoder.container(keyedBy: CodingKeys.self)

        // Pretend new method which finds a UID and does a translation to the reference table.
        let child = try container.decodeReference(to: Child.self, forKey: .child)
        
        // Just like init(child:):
        self.child = child
        child.parent = self
    }
}

final class Child : Decodable {
    weak var parent: Parent? = nil

    // Assuming Child had other properties, these methods wouldn't be empty.
    // The point is we do nothing with the relation to Parent.
    init() {}
    init(from decoder: Decoder) {
        // same as init()
    }
}

This can work just fine without even requiring a second pass, and this is how NSKeyedUnarchiver works, with the key difference that instead of requiring these classes to cooperate to patch up, things "just work" as long as you cross your fingers and neither class tries to touch either instance it owns during initialization lest it unknowingly call a method on an uninitialized object. (NSKeyedUnarchiver also includes an awaken step, but this is needed relatively rarely. It likely can't hurt to formalize this step with Codable too [with a default implementation that does nothing], but it's not strictly necessary in this case unless I'm missing something egregious.)

jamesmoschou · September 8, 2018, 5:28am

The parent-child examples only work because you can say "the parent of my child is me". You can patch up the object graph merely by inspecting it and making guesses about what to do. This is what I meant by "heuristically", apologies if its not the correct term.

I'm imagining a scenario like:

class Node: Codable {
    var children: [Node] = []
    weak var someAncestor: Node? = nil
    required init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        self.children = try container.decode([Node].self, forKey: . children)

        // This is not necessarily valid
        self.children.forEach { $0.someAncestor = self )
    }
}

let top = Node()

let child1 = Node()
top.children.append(child1)

let child2 = Node()
child1.children.append(child2)

// any one of these could have occurred
child2.someAncestor = top
child2.someAncestor = child1
child2.someAncestor = // technically could be any node, not necessarily a cyclic reference

Here you cannot just patch it up. You must query the archive to do it correctly, and today the only time to do so is while decoding the child as the requisite information is stored in that child's container.

I have a working custom encoder/decoder pair that handles references. To solve this particular problem it supports an awaken phase that objects opt into via protocol conformance. Then I'm using the following helper type to facilitate patching up the references without requiring the types themselves to think about it too hard (@QuinceyMorris I think this is similar to what you're seeking, but without any special pointer magic):

final class Unowned<Instance>: TwoPhaseDecodable where Instance: Decodable & AnyObject {
    
    private struct Box {
        unowned let value: Instance
    }
    
    private enum State {
        case uninitialized
        case initialized(Box)
    }

    var value: Instance {
        switch state {
        case .initialized(let box):
            return box.value
        default:
            fatalError("Cannot access value until decoding has finished.")
        }
    }

    private var state: State

    init(value: Instance) {
        self.state = .initialized(Box(value: value))
    }
    
    init(from decoder: Decoder) throws {
        self.state = .uninitialized
    }
    
    func awake(from decoder: Decoder) throws {
        let value = try decoder.singleValueContainer().decode(Instance.self)
        state = .initialized(Box(value: value))
    }
    
}

So instead of writing unowned let x: T or weak var x: T? you write let x: Unowned<T> and decoding just works.

At this stage I'm trying to refine the decoder implementation to move it closer to what might be provided by Swift itself in the future, so I'm interested in understanding what that would look like and how far it would go. I realise this is more of an academic exercise, if this was a real world problem I would have moved on already

@itaiferber thank you for detailing those options, it's certainly a lot to think about. I think I'll post what I have in a different thread. I don't presume to have the answers but at least it might be a starting point for others who want to roll their own.

Nevin · September 8, 2018, 1:58pm

Quincey and I had this same discussion months ago. The answer remains the unchanged: it is entirely possible to create a cycle of strong references with no optionals and no unsafe pointers involved. Here is an example:

class SuperFoo {}
class Bar { var foo = SuperFoo() }
class Foo : SuperFoo { var bar = Bar() }

let foo = Foo()
let bar = Bar()
foo.bar = bar
bar.foo = foo