Codable with references?

BigZaphod · June 21, 2018, 5:20am

Does the automatic Codable synthesis not support references? I just built out a very large graph I was going to serialize using the automatic Codable stuff, but near as I can tell it ends up in an infinite loop when attempting to encode it. I'm assuming this is probably because of the various references and back references going on in the data. Does this mean I need to write my own Encoder/Decoder for some made up binary format that can handle references or something?

BigZaphod · June 21, 2018, 3:42pm

I wrote this late last night, so maybe it's confusing - but what I mean is that it seems the included encoders in Swift (JSON and PropertyList) do not understand how to encode reference types as references. Instead they seem to encode everything as if they were values. This is not at all what I expected - although it sort of makes sense if I stop and think about it really hard (at least for JSON and plists). I tried NSKeyedEncoder, but none of my objects are objective-c - they're all pure Swift objects and it still seemed to end up spinning forever stuck in an infinite loop. (I haven't tried making all of my objects @objc, but I really don't want to have to do that.)

I'm assuming that the only way I could do this would be to implement my own encoder/decoder? Is there a nice simple template for getting started with this? It seems like there are a lot of things to implement.

itaiferber · June 21, 2018, 4:06pm

There was some discussion of this in Codable != Archivable - #5 by itaiferber, and specifically:

Not all formats support reference semantics, and one of the things we planned was a way to express whether a given format/encoder supports them or not. JSON, for instance, does not support references natively, so we'd have to add an additional encoding layer on top of JSON to support it.

This is something that's planned, but we didn't have time to push it through in the Swift 4 timeframe.

In the meantime, there are two main approaches you can take to cover your needs:

If you really want to keep synthesized conformances and make support for this automatic for your format, you can implement an encoder/decoder pair which does the work to implement references. The specifics of how this is done or represented depends on the format you're looking to support; you can look at the implementation of JSONEncoder and JSONDecoder as a basis for how to get started — there isn't an easy template at the moment, but this is something we're also looking into potentially supporting
If you are willing to give up synthesized conformances in order to avoid the work needed to implement an encoder/decoder, you can add your own compatibility layer here — you can create a ReferenceTable type (or similar) that you give reference types to; it can give each value (by identity) an identifier which you encode in place of the object. After you've encoded all the UIDs, the last step can be to encode the ReferenceTable itself, which encodes the actual objects you have, once. On decode, you decode the ReferenceTable first, then any object UID types you find can go through the reference table to give you objects back out

Approach #2 is what JSONEncoder/JSONDecoder would essentially do on your behalf, but unfortunately, we haven't gotten there yet. If you want to do that yourself at the moment, you'll likely need buy-in from your types, which isn't great.

[If you're interested in following either of these approaches, I can give more concrete information about how to get started/what the process might look like.]

BigZaphod · June 21, 2018, 4:24pm

Thanks for the reply!

I was really hoping I could use the automatic synthesis to easily and quickly (and basically freely) implement game save/restore, but my game state is big and complicated and has a lot of references between things. It's possible I'm thinking about all of this all wrong, though, (including the idea of representing my save-game this way) but it seems like giving up the automatic synthesis would be more work in this case than implementing a custom encoder/decoder for a simple made up file format or something. Looking through the JSONEncoder/JSONDecoder code you linked, it seems to get pretty complicated, unfortunately.

While searching around, I ran across this project which might be the simplest reference point for implementing an Encoder and Decoder, but I haven't dug into it yet: GitHub - Flight-School/Codable-DIY-Kit: A template for creating your own Swift Codable encoders and decoders

itaiferber · June 21, 2018, 4:28pm

At first glance, that looks like a great resource — I've also been planning on writing a longer-form Swift.org blog post on how you might approach doing this, but that's likely a great place to start. Mike Ash also wrote a blog post a while back exploring how to implement an Encoder, which can be a handy reference too.

If this is the direction you're going, feel free to ask questions if anything is unclear and I'll be happy to help point you in the right direction.

BigZaphod · June 21, 2018, 4:37pm

I'll be looking at this more later, but when thinking about trying to implement my own reference-supporting encoder, I'm wondering how subclasses are handled. For example say I have something structured like this:

class Base: Codable {}

class Subclass: Base {}

class Root: Codable {
  let value: Base
}

What happens when the reference in value is actually an instance of Subclass? Is that something my encoder would have to understand and encode somehow? I assume when decoding normally, the automatic synthesis would be doing something like: let value = Base.init(from:...) which would lose the subclass.

itaiferber · June 21, 2018, 4:59pm

This specific use-case is something that Codable does not explicitly support in the general case, but is something you can support if you want (with additional work).

The essence of the issue here is that in order to decode value, you need to decode(Base.self, forKey: .value) — there is no indication that value may be a Subclass. That info either needs to come from the decode call itself, or from the payload you're decoding from. NSKeyedArchiver, for instance, makes this work by encoding the name of the actual class you're trying to encode into the archive, so that on decode, with NSClassFromString, you get back a Subclass; this approach has shortcomings (primarily that in order to decode polymorphic values, you have to largely trust the information that's in the archive, which requires a lot of validation) which are discussed in Data You Can Trust.

Obj-C has it easy, though — the above works primarily because class names must be unique, and we don't have nested classes, private types, generic types, etc. You can't rely on class names in Swift to do this for a few reasons, stemming primarily from the fact that class names need not be unique:

Class names are qualified in the runtime. You can have Foundation.NSObject and MyModule.NSObject just fine; in order to disambiguate, you need to use the fully qualified name. This means, though, that names are more fragile — renaming your module changes the name of your class
Classes can be nested, and the fully qualified names need to reflect that. MyModule.ClassA.NestedClassA is different from MyModule.ClassB.NestedClassA. This introduces another layer of fragility: moving nested classes out of scope or into a different scope changes their name
Swift has generic classes, whose names are determined at runtime by their concrete type arguments. MyGenericClass<T> has a different runtime name from MyGenericClass<U>, so depending on how you ask for the type, you can get a different name
Classes can have the same name at the same scope if they are in different files and are fileprivate. private and fileprivate classes have their runtime names mangled to represent the file/scope they come from, which means that changing that scope/file can change the name. Name mangling has also not been stable over time (but will be one we hit ABI stability), which introduce complexity over time

Depending on your exact needs, these things may or may not be relevant to you — but in general, if you want to support this, you'll likely need a mechanism other than class names to identify types in archives stably over time. Because you're not looking to handle this in the general case, your data and usage patterns may lend themselves to one approach over another.

In the general case, this is something that you'd leave up to individual types to solve. If it makes sense for your use-case, you can have your Base class know about its subclasses and can include specific identifiers in the payload to decode the right type; alternatively, you can encode a wrapping enum with associated types for the individual types you expect to decode and it encodes/decodes a marker to do that.

BigZaphod · June 21, 2018, 5:51pm

Thanks for the detailed information! Disappointing that I can't get this all "for free" yet, but now I have a pretty good idea where to start messing around.

itaiferber · June 21, 2018, 6:09pm

I agree, and this is something we want to do well for API consumers in the future. Let me know if you run into issues in the meantime.

BigZaphod · June 21, 2018, 7:10pm

I took Swift's JSONEncoder/Decoder and removed everything from it to try to extract a minimal "shell" of an Encoder/Decoder. It is rather overwhelmingly big - especially the need to have both keyed and unkeyed variants of everything. Eep!

import Foundation

open class ArchiveEncoder {
    open var userInfo: [CodingUserInfoKey : Any] = [:]

    public init() {}

    open func encode<T : Encodable>(_ value: T) throws -> Data {
		fatalError()
    }
}

fileprivate class _Encoder : Encoder {
    public var codingPath: [CodingKey]
    public var userInfo: [CodingUserInfoKey : Any]

    fileprivate init(userInfo: [CodingUserInfoKey : Any], codingPath: [CodingKey] = []) {
        self.userInfo = userInfo
        self.codingPath = codingPath
    }

    public func container<Key>(keyedBy: Key.Type) -> KeyedEncodingContainer<Key> {
		fatalError()
    }

    public func unkeyedContainer() -> UnkeyedEncodingContainer {
		fatalError()
    }

    public func singleValueContainer() -> SingleValueEncodingContainer {
        return self
    }
}

fileprivate struct _KeyedEncodingContainer<Key : CodingKey> : KeyedEncodingContainerProtocol {
    private(set) public var codingPath: [CodingKey]

    public mutating func encodeNil(forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Bool, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Int, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Int8, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Int16, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Int32, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: Int64, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: UInt, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: UInt8, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: UInt16, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: UInt32, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: UInt64, forKey key: Key) throws {
		fatalError()
    }
    public mutating func encode(_ value: String, forKey key: Key) throws {
		fatalError()
    }
    
    public mutating func encode(_ value: Float, forKey key: Key) throws {
		fatalError()
    }

    public mutating func encode(_ value: Double, forKey key: Key) throws {
		fatalError()
    }

    public mutating func encode<T : Encodable>(_ value: T, forKey key: Key) throws {
		fatalError()
    }

    public mutating func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type, forKey key: Key) -> KeyedEncodingContainer<NestedKey> {
		fatalError()
    }

    public mutating func nestedUnkeyedContainer(forKey key: Key) -> UnkeyedEncodingContainer {
		fatalError()
    }

    public mutating func superEncoder() -> Encoder {
		fatalError()
    }

    public mutating func superEncoder(forKey key: Key) -> Encoder {
		fatalError()
    }
}

fileprivate struct _UnkeyedEncodingContainer : UnkeyedEncodingContainer {
    private(set) public var codingPath: [CodingKey]

    public var count: Int {
		fatalError()
    }

    public mutating func encodeNil() throws {
		fatalError()
	}

	public mutating func encode(_ value: Bool) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: Int) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: Int8) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: Int16) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: Int32) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: Int64) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: UInt) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: UInt8) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: UInt16) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: UInt32) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: UInt64) throws {
		fatalError()
	}
	
	public mutating func encode(_ value: String) throws {
		fatalError()
	}

    public mutating func encode(_ value: Float)  throws {
		fatalError()
	}

    public mutating func encode(_ value: Double) throws {
		fatalError()
	}

    public mutating func encode<T : Encodable>(_ value: T) throws {
		fatalError()
    }

    public mutating func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type) -> KeyedEncodingContainer<NestedKey> {
		fatalError()
    }

    public mutating func nestedUnkeyedContainer() -> UnkeyedEncodingContainer {
		fatalError()
    }

    public mutating func superEncoder() -> Encoder {
		fatalError()
	}
}

extension _Encoder : SingleValueEncodingContainer {
    public func encodeNil() throws {
		fatalError()
	}

    public func encode(_ value: Bool) throws {
		fatalError()
    }

    public func encode(_ value: Int) throws {
		fatalError()
    }

    public func encode(_ value: Int8) throws {
		fatalError()
    }

    public func encode(_ value: Int16) throws {
		fatalError()
    }

    public func encode(_ value: Int32) throws {
		fatalError()
    }

    public func encode(_ value: Int64) throws {
		fatalError()
    }

    public func encode(_ value: UInt) throws {
		fatalError()
    }

    public func encode(_ value: UInt8) throws {
		fatalError()
    }

    public func encode(_ value: UInt16) throws {
		fatalError()
    }

    public func encode(_ value: UInt32) throws {
		fatalError()
    }

    public func encode(_ value: UInt64) throws {
		fatalError()
    }

    public func encode(_ value: String) throws {
		fatalError()
    }

    public func encode(_ value: Float) throws {
		fatalError()
    }

    public func encode(_ value: Double) throws {
		fatalError()
    }

    public func encode<T : Encodable>(_ value: T) throws {
		fatalError()
    }
}

//===----------------------------------------------------------------------===//
// Decoder
//===----------------------------------------------------------------------===//

open class ArchiveDecoder {
    open var userInfo: [CodingUserInfoKey : Any] = [:]
    public init() {}
    open func decode<T : Decodable>(_ type: T.Type, from data: Data) throws -> T {
		fatalError()
    }
}

fileprivate class _Decoder : Decoder {
    fileprivate(set) public var codingPath: [CodingKey]
    public var userInfo: [CodingUserInfoKey : Any]

    fileprivate init(codingPath: [CodingKey] = [], userInfo: [CodingUserInfoKey : Any]) {
        self.codingPath = codingPath
		self.userInfo = userInfo
    }

    public func container<Key>(keyedBy type: Key.Type) throws -> KeyedDecodingContainer<Key> {
		fatalError()
    }

    public func unkeyedContainer() throws -> UnkeyedDecodingContainer {
		fatalError()
    }

    public func singleValueContainer() throws -> SingleValueDecodingContainer {
        return self
    }
}

fileprivate struct _KeyedDecodingContainer<Key : CodingKey> : KeyedDecodingContainerProtocol {
    private(set) public var codingPath: [CodingKey]

	fileprivate init(codingPath: [CodingKey]) {
		self.codingPath = codingPath
    }

    public var allKeys: [Key] {
		fatalError()
    }

    public func contains(_ key: Key) -> Bool {
		fatalError()
    }

    public func decodeNil(forKey key: Key) throws -> Bool {
		fatalError()
    }

    public func decode(_ type: Bool.Type, forKey key: Key) throws -> Bool {
		fatalError()
    }

    public func decode(_ type: Int.Type, forKey key: Key) throws -> Int {
		fatalError()
    }

    public func decode(_ type: Int8.Type, forKey key: Key) throws -> Int8 {
		fatalError()
    }

    public func decode(_ type: Int16.Type, forKey key: Key) throws -> Int16 {
		fatalError()
    }

    public func decode(_ type: Int32.Type, forKey key: Key) throws -> Int32 {
		fatalError()
    }

    public func decode(_ type: Int64.Type, forKey key: Key) throws -> Int64 {
		fatalError()
    }

    public func decode(_ type: UInt.Type, forKey key: Key) throws -> UInt {
		fatalError()
    }

    public func decode(_ type: UInt8.Type, forKey key: Key) throws -> UInt8 {
		fatalError()
    }

    public func decode(_ type: UInt16.Type, forKey key: Key) throws -> UInt16 {
		fatalError()
    }

    public func decode(_ type: UInt32.Type, forKey key: Key) throws -> UInt32 {
		fatalError()
    }

    public func decode(_ type: UInt64.Type, forKey key: Key) throws -> UInt64 {
		fatalError()
    }

    public func decode(_ type: Float.Type, forKey key: Key) throws -> Float {
		fatalError()
    }

    public func decode(_ type: Double.Type, forKey key: Key) throws -> Double {
		fatalError()
    }

    public func decode(_ type: String.Type, forKey key: Key) throws -> String {
		fatalError()
    }

    public func decode<T : Decodable>(_ type: T.Type, forKey key: Key) throws -> T {
		fatalError()
    }

    public func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type, forKey key: Key) throws -> KeyedDecodingContainer<NestedKey> {
		fatalError()
    }

    public func nestedUnkeyedContainer(forKey key: Key) throws -> UnkeyedDecodingContainer {
		fatalError()
    }

    public func superDecoder() throws -> Decoder {
		fatalError()
    }

    public func superDecoder(forKey key: Key) throws -> Decoder {
		fatalError()
    }
}

fileprivate struct _UnkeyedDecodingContainer : UnkeyedDecodingContainer {
    private(set) public var codingPath: [CodingKey]

    public var count: Int? {
		fatalError()
    }

	public var currentIndex: Int {
		fatalError()
	}

    public var isAtEnd: Bool {
		fatalError()
    }

    public mutating func decodeNil() throws -> Bool {
		fatalError()
    }

    public mutating func decode(_ type: Bool.Type) throws -> Bool {
		fatalError()
    }

    public mutating func decode(_ type: Int.Type) throws -> Int {
		fatalError()
    }

    public mutating func decode(_ type: Int8.Type) throws -> Int8 {
		fatalError()
    }

    public mutating func decode(_ type: Int16.Type) throws -> Int16 {
		fatalError()
    }

    public mutating func decode(_ type: Int32.Type) throws -> Int32 {
		fatalError()
    }

    public mutating func decode(_ type: Int64.Type) throws -> Int64 {
		fatalError()
    }

    public mutating func decode(_ type: UInt.Type) throws -> UInt {
		fatalError()
    }

    public mutating func decode(_ type: UInt8.Type) throws -> UInt8 {
		fatalError()
    }

    public mutating func decode(_ type: UInt16.Type) throws -> UInt16 {
		fatalError()
    }

    public mutating func decode(_ type: UInt32.Type) throws -> UInt32 {
		fatalError()
    }

    public mutating func decode(_ type: UInt64.Type) throws -> UInt64 {
		fatalError()
    }

    public mutating func decode(_ type: Float.Type) throws -> Float {
		fatalError()
    }

    public mutating func decode(_ type: Double.Type) throws -> Double {
		fatalError()
    }

    public mutating func decode(_ type: String.Type) throws -> String {
		fatalError()
    }

    public mutating func decode<T : Decodable>(_ type: T.Type) throws -> T {
		fatalError()
    }

    public mutating func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type) throws -> KeyedDecodingContainer<NestedKey> {
		fatalError()
    }

    public mutating func nestedUnkeyedContainer() throws -> UnkeyedDecodingContainer {
		fatalError()
    }

    public mutating func superDecoder() throws -> Decoder {
		fatalError()
    }
}

extension _Decoder : SingleValueDecodingContainer {
    public func decodeNil() -> Bool {
		fatalError()
    }

    public func decode(_ type: Bool.Type) throws -> Bool {
		fatalError()
    }

    public func decode(_ type: Int.Type) throws -> Int {
		fatalError()
    }

    public func decode(_ type: Int8.Type) throws -> Int8 {
		fatalError()
    }

    public func decode(_ type: Int16.Type) throws -> Int16 {
		fatalError()
    }

    public func decode(_ type: Int32.Type) throws -> Int32 {
		fatalError()
    }

    public func decode(_ type: Int64.Type) throws -> Int64 {
		fatalError()
    }

    public func decode(_ type: UInt.Type) throws -> UInt {
		fatalError()
    }

    public func decode(_ type: UInt8.Type) throws -> UInt8 {
		fatalError()
    }

    public func decode(_ type: UInt16.Type) throws -> UInt16 {
		fatalError()
    }

    public func decode(_ type: UInt32.Type) throws -> UInt32 {
		fatalError()
    }

    public func decode(_ type: UInt64.Type) throws -> UInt64 {
		fatalError()
    }

    public func decode(_ type: Float.Type) throws -> Float {
		fatalError()
    }

    public func decode(_ type: Double.Type) throws -> Double {
		fatalError()
    }

    public func decode(_ type: String.Type) throws -> String {
		fatalError()
    }

    public func decode<T : Decodable>(_ type: T.Type) throws -> T {
		fatalError()
    }
}

itaiferber · June 21, 2018, 7:13pm

One thing Swift allows you to do, at the cost of needing to handle the switch yourself:

public mutating func encode<T : Encodable>(_ value: T, forKey key: Key) throws { /* ... */ }

is actually a valid candidate on its own (last I checked) for all of the individual method overloads above it with regards to protocol requirements. So if you want, you can avoid implementing overloads in favor of having one big func encode<T>(...). It will be less efficient as you'll need to switch on T.self yourself (and you run the risk of forgetting a type), but it should be possible to trim down if you prefer shorter code over efficiency.

I wouldn't necessarily recommend it, but it should be possible.

BigZaphod · June 21, 2018, 7:24pm

As I step back and look at it, most of the methods and functions should be pretty straightforward since they are just type conversions, basically. For now, it seems to me that the tricky bit is understanding what to do with things like nestedContainer and nestedUnkeyedContainer and superDecoder etc. I can easily leave most of the data types as fatalError until I actually encounter them as I build something up, but I need to minimally understand the nesting and stuff first, I think, before anything is going to work.

Is there a WWDC video about any of this, by chance?

itaiferber · June 21, 2018, 7:50pm

Unfortunately, we didn't get a chance to do a talk about this more advanced topic, but hopefully in the (near?) future...

Topics like nesting are explained in the original proposal, so it might offer some helpful reference (as might the actual JSONEncoder/PropertyListEncoder implementation). Anything specific I can help cover?

BigZaphod · June 21, 2018, 7:52pm

Oh - I should have thought to check that document. Duh. I'll do some studying and try to stop bothering you. Thanks for your help so far!

itaiferber · June 21, 2018, 7:54pm

Happy to help! Writing an encoder/decoder pair is significantly more complex than using Codable, of course, but part of the goal was keeping that as ergonomic as possible too, given the size of the task. The initial complexity is definitely an initial pain point, but I think that as you design this more will start falling into place.

BigZaphod · June 21, 2018, 9:45pm

Does any of the automatically synthesized encoding/decoding use unkeyed containers? If I were to be, say, super duper lazy, could I reasonably safely fatalError() the unkeyedContainer() function and ignore all of that entirely?

itaiferber · June 21, 2018, 9:46pm

Nothing synthesized does — it's always keyed. Array and Dictionary use unkeyed containers, though, so it's not likely that you're going to be able to avoid them unless you don't have any arrays or dictionaries (sounds unlikely).

BigZaphod · June 21, 2018, 9:46pm

Darn. Okay, thanks.

BigZaphod · June 22, 2018, 2:51am

Another question! I'm working on a KeyedEncodingContainerProtocol implementation and there's a method for encoding super that doesn't use a key. That seems odd since this is a keyed encoder. The documentation states: Equivalent to calling superEncoder(forKey:) with Key(stringValue: "super", intValue: 0). Although apparently there isn't a default implementation for this like there is for some of the other methods in this protocol. Should there be? It looks like the JSON implementation just does what the documentation says - calls though to superEncoder(forKey:), so that's what I'll do, but I'm wondering why this even exists in the first place, and if it needs to exist, why doesn't it just have the default implementation that the docs seem to indicate it should have?

itaiferber · June 22, 2018, 3:04am

The reason for that is due to the fact that keyed containers are generic on the key type the user requests of them — e.g. KeyedDecodingContainer<MyType.CodingKeys>. The issue is that MyType.CodingKeys doesn’t have to have a .super key, which means that MyType.CodingKeys(stringValue: “super”, intValue: nil) returns nil in most cases.

A default implementation would have to pass something in to call into superEncoder(forKey:), but there’s very often no key to pass in.

The fact that keyed containers are generic on a key type is mostly a benefit for the API consumer, though — under the hood, the encoder/decoder can do whatever it wants. JSONEncoder creates a new _JSONReferencingEncoder with the _JSONKey.super key, and can do so because the _JSONReferencingEncoder initializer is not generic on a specific key type — it’ll accept any old CodingKey.

Note that on decode, things work a little bit differently. JSONDecoder has to work around this using a private _superDecoder(forKey:) which is similarly not generic, and the other generic methods call into it.