Codable != Archivable

itaiferber · March 27, 2018, 4:44pm

Thanks for bringing this topic up! Some notes:

QuinceyMorris:

The problem is that this stated goal isn’t achievable by encodeEncodable and decodeDecodable as currently implemented. As a practical exercise, I took a very ordinary, simple data model that was previously being archived for NSDocument, and changed all the NSCoding conformances to Codable. When I tried to save the document, my app crashed — crashed big, with infinite recursion leading to a stack overflow.

It turns out that archiving/unarching via Codable through keyed archivers/unarchivers doesn’t respect the normal archiver convention of object reference identity. That is, reference type instances in NSCoding are unique within the archive as a whole. Codable, on the other hand, archives or unarchives a new instance at every reference encountered in the object graph. It crashes because typical data models in Cocoa apps have circular chains of references. (For example, there is a circular chain between a parent object with an owning reference to a child object that has a [weak] back reference to the parent.) These are unproblematic in NSCoding, but fatal in Codable.

You can consider this a bug. Although not explicitly called out in the Codable proposals, we planned on supporting and maintaining reference semantics both through our own encoders (JSONEncoder, PropertyListEncoder) and through NSKeyedArchiver support. However, we didn't manage to finish this aspect of the feature in time, and have not yet had the bandwidth to follow through, unfortunately. I'm planning on incorporating this explicitly in the next update to the Codable feature.

Yes, that's correct. Because Objective-C has two separate initialization steps (+alloc, -init...), it's possible to break reference cycles by returning allocated but uninitialized objects. I don't know if this is necessarily a model to emulate, though — it's terribly unsafe to do this, because it might seem completely reasonable to be able to depend on objects being initialized once you -decodeObjectOfClass:forKey:. If you do have a reference cycle, whether the object you get back is valid or not depends on where you are in decoding and in what order the objects were encoded in the first place.

This goes both ways:

I had a discussion on Twitter a few weeks back in which some folks were disappointed in the Swift model because it's difficult to achieve this in Swift
A week later, I got an unrelated Radar from a surprised developer that ran into this (quite painfully) on their own

Not necessarily. Nevin had a good example of this, but there are other examples which are less surprising. Consider the following:

class Post {
    let author: Author
    let content: String
}

class Author {
    var posts: [Post]
}

Every Post must have an Author, but every Author owns its posts. Since a Post's author shouldn't be optional, how can we design an API around this cycle? The following, for example, would not work:

class Post {
    // ...
    init(author: Author, content: String) {
        self.author = author
        self.content = content
    }
}

class Author {
    let name: String
    private(set) var posts: [Post]
    init(name: String, posts: [Post]) { ... }
}

You wouldn't be able to create an Author who has Posts because those Posts require an existing Author. You can, however, do this:

class Author {
    let name: String
    private(set) var posts: [Post] = []

    init(name: String) { ... }

    func add(post: Post) {
        posts.append(post)
    }
}

You can create an Author with no Posts, then add more later on. This is better, but requires some amount of checking to ensure that when you add(post:), the post.author == self. Alternatively, this design might be better approached as such:

class Post {
    init(author: Author, content: String) {
        self.author = author
        self.content = content
        self.author.posts.append(self)
    }
}

class Author {
    let name: String
    fileprivate(set) var posts: [Post] = []
    init(name: String) { ... }
}

In this case, the initializer of Post encapsulates this runtime linking of the object graph. No optionals are necessary because [Post] acts somewhat link Optional in that it can start off empty and get added to later. (Of course, you'd actually need a weak in here somewhere to later break up the reference cycle, but this'll let you build it up.)

This all goes to say that I don't actually see this initialization problem as any different than building the reference cycle as above — there's a general way to do this in Swift, and I don't see much of a difference between doing this in init(author:content:) and init(from:).

The "linking" of this object graph on decode could do the same thing (assuming that the decoder supported the reference semantics that we're looking to embody):

class Post : Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        self.author = try container.decode(Author.self, forKey: .author) // assuming reference here
        self.content = try container.decode(String.self, forKey: .content)
        self.author.posts.add(self)
    }
}

class Author {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        self.name = try container.decode(String.self, forKey: .name)

        // _Don't_ decode Posts because every post on decode will add itself here
        self.posts = []
    }
}

As in all reference cycles, in order for the cycle to make sense, one object must be the true "owner" of the other, and in this case, Posts own their Authors and not the other way around. You could switch the relationship around by making Post.author be Optional, but this might be slightly more ergonomic depending on how you construct Posts and Authors.

With all of this, I don't know how much the compiler could help recognize that "hey, you've got a reference cycle here, better break that up" and break the cycle for you. It's possible theoretically to break up the process into a two-phase one like in Objective-C, but that would require either:

Breaking Swift's strong initialization requirements and allow us to pass around unintialized objects like in Objective-C, or
Break up the Decodable requirements into init() and decode(from:)

Besides the backwards incompatibility of option 2 at this point, we considered it during the design of Codable. Unfortunately, not all objects can be default-initialized, and making that a requirement would be a non-starter for a lot of types.