Does the automatic Codable synthesis not support references? I just built out a very large graph I was going to serialize using the automatic Codable stuff, but near as I can tell it ends up in an infinite loop when attempting to encode it. I'm assuming this is probably because of the various references and back references going on in the data. Does this mean I need to write my own Encoder/Decoder for some made up binary format that can handle references or something?
I wrote this late last night, so maybe it's confusing - but what I mean is that it seems the included encoders in Swift (JSON and PropertyList) do not understand how to encode reference types as references. Instead they seem to encode everything as if they were values. This is not at all what I expected - although it sort of makes sense if I stop and think about it really hard (at least for JSON and plists). I tried NSKeyedEncoder, but none of my objects are objective-c - they're all pure Swift objects and it still seemed to end up spinning forever stuck in an infinite loop. (I haven't tried making all of my objects @objc, but I really don't want to have to do that.)
I'm assuming that the only way I could do this would be to implement my own encoder/decoder? Is there a nice simple template for getting started with this? It seems like there are a lot of things to implement.
There was some discussion of this in Codable != Archivable - #5 by itaiferber, and specifically:
Not all formats support reference semantics, and one of the things we planned was a way to express whether a given format/encoder supports them or not. JSON, for instance, does not support references natively, so we'd have to add an additional encoding layer on top of JSON to support it.
This is something that's planned, but we didn't have time to push it through in the Swift 4 timeframe.
In the meantime, there are two main approaches you can take to cover your needs:
- If you really want to keep synthesized conformances and make support for this automatic for your format, you can implement an encoder/decoder pair which does the work to implement references. The specifics of how this is done or represented depends on the format you're looking to support; you can look at the implementation of
JSONEncoder
andJSONDecoder
as a basis for how to get started — there isn't an easy template at the moment, but this is something we're also looking into potentially supporting - If you are willing to give up synthesized conformances in order to avoid the work needed to implement an encoder/decoder, you can add your own compatibility layer here — you can create a
ReferenceTable
type (or similar) that you give reference types to; it can give each value (by identity) an identifier which you encode in place of the object. After you've encoded all the UIDs, the last step can be to encode theReferenceTable
itself, which encodes the actual objects you have, once. On decode, you decode theReferenceTable
first, then any object UID types you find can go through the reference table to give you objects back out
Approach #2 is what JSONEncoder
/JSONDecoder
would essentially do on your behalf, but unfortunately, we haven't gotten there yet. If you want to do that yourself at the moment, you'll likely need buy-in from your types, which isn't great.
[If you're interested in following either of these approaches, I can give more concrete information about how to get started/what the process might look like.]
Thanks for the reply!
I was really hoping I could use the automatic synthesis to easily and quickly (and basically freely) implement game save/restore, but my game state is big and complicated and has a lot of references between things. It's possible I'm thinking about all of this all wrong, though, (including the idea of representing my save-game this way) but it seems like giving up the automatic synthesis would be more work in this case than implementing a custom encoder/decoder for a simple made up file format or something. Looking through the JSONEncoder/JSONDecoder code you linked, it seems to get pretty complicated, unfortunately.
While searching around, I ran across this project which might be the simplest reference point for implementing an Encoder and Decoder, but I haven't dug into it yet: GitHub - Flight-School/Codable-DIY-Kit: A template for creating your own Swift Codable encoders and decoders
At first glance, that looks like a great resource — I've also been planning on writing a longer-form Swift.org blog post on how you might approach doing this, but that's likely a great place to start. Mike Ash also wrote a blog post a while back exploring how to implement an Encoder
, which can be a handy reference too.
If this is the direction you're going, feel free to ask questions if anything is unclear and I'll be happy to help point you in the right direction.
I'll be looking at this more later, but when thinking about trying to implement my own reference-supporting encoder, I'm wondering how subclasses are handled. For example say I have something structured like this:
class Base: Codable {}
class Subclass: Base {}
class Root: Codable {
let value: Base
}
What happens when the reference in value
is actually an instance of Subclass
? Is that something my encoder would have to understand and encode somehow? I assume when decoding normally, the automatic synthesis would be doing something like: let value = Base.init(from:...)
which would lose the subclass.
This specific use-case is something that Codable
does not explicitly support in the general case, but is something you can support if you want (with additional work).
The essence of the issue here is that in order to decode value
, you need to decode(Base.self, forKey: .value)
— there is no indication that value
may be a Subclass
. That info either needs to come from the decode call itself, or from the payload you're decoding from. NSKeyedArchiver
, for instance, makes this work by encoding the name of the actual class you're trying to encode into the archive, so that on decode, with NSClassFromString
, you get back a Subclass
; this approach has shortcomings (primarily that in order to decode polymorphic values, you have to largely trust the information that's in the archive, which requires a lot of validation) which are discussed in Data You Can Trust.
Obj-C has it easy, though — the above works primarily because class names must be unique, and we don't have nested classes, private types, generic types, etc. You can't rely on class names in Swift to do this for a few reasons, stemming primarily from the fact that class names need not be unique:
- Class names are qualified in the runtime. You can have
Foundation.NSObject
andMyModule.NSObject
just fine; in order to disambiguate, you need to use the fully qualified name. This means, though, that names are more fragile — renaming your module changes the name of your class - Classes can be nested, and the fully qualified names need to reflect that.
MyModule.ClassA.NestedClassA
is different fromMyModule.ClassB.NestedClassA
. This introduces another layer of fragility: moving nested classes out of scope or into a different scope changes their name - Swift has generic classes, whose names are determined at runtime by their concrete type arguments.
MyGenericClass<T>
has a different runtime name fromMyGenericClass<U>
, so depending on how you ask for the type, you can get a different name - Classes can have the same name at the same scope if they are in different files and are
fileprivate
.private
andfileprivate
classes have their runtime names mangled to represent the file/scope they come from, which means that changing that scope/file can change the name. Name mangling has also not been stable over time (but will be one we hit ABI stability), which introduce complexity over time
Depending on your exact needs, these things may or may not be relevant to you — but in general, if you want to support this, you'll likely need a mechanism other than class names to identify types in archives stably over time. Because you're not looking to handle this in the general case, your data and usage patterns may lend themselves to one approach over another.
In the general case, this is something that you'd leave up to individual types to solve. If it makes sense for your use-case, you can have your Base
class know about its subclasses and can include specific identifiers in the payload to decode the right type; alternatively, you can encode a wrapping enum
with associated types for the individual types you expect to decode and it encodes/decodes a marker to do that.
Thanks for the detailed information! Disappointing that I can't get this all "for free" yet, but now I have a pretty good idea where to start messing around.
I agree, and this is something we want to do well for API consumers in the future. Let me know if you run into issues in the meantime.
I took Swift's JSONEncoder/Decoder and removed everything from it to try to extract a minimal "shell" of an Encoder/Decoder. It is rather overwhelmingly big - especially the need to have both keyed and unkeyed variants of everything. Eep!
import Foundation
open class ArchiveEncoder {
open var userInfo: [CodingUserInfoKey : Any] = [:]
public init() {}
open func encode<T : Encodable>(_ value: T) throws -> Data {
fatalError()
}
}
fileprivate class _Encoder : Encoder {
public var codingPath: [CodingKey]
public var userInfo: [CodingUserInfoKey : Any]
fileprivate init(userInfo: [CodingUserInfoKey : Any], codingPath: [CodingKey] = []) {
self.userInfo = userInfo
self.codingPath = codingPath
}
public func container<Key>(keyedBy: Key.Type) -> KeyedEncodingContainer<Key> {
fatalError()
}
public func unkeyedContainer() -> UnkeyedEncodingContainer {
fatalError()
}
public func singleValueContainer() -> SingleValueEncodingContainer {
return self
}
}
fileprivate struct _KeyedEncodingContainer<Key : CodingKey> : KeyedEncodingContainerProtocol {
private(set) public var codingPath: [CodingKey]
public mutating func encodeNil(forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Bool, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Int, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Int8, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Int16, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Int32, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Int64, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: UInt, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: UInt8, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: UInt16, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: UInt32, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: UInt64, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: String, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Float, forKey key: Key) throws {
fatalError()
}
public mutating func encode(_ value: Double, forKey key: Key) throws {
fatalError()
}
public mutating func encode<T : Encodable>(_ value: T, forKey key: Key) throws {
fatalError()
}
public mutating func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type, forKey key: Key) -> KeyedEncodingContainer<NestedKey> {
fatalError()
}
public mutating func nestedUnkeyedContainer(forKey key: Key) -> UnkeyedEncodingContainer {
fatalError()
}
public mutating func superEncoder() -> Encoder {
fatalError()
}
public mutating func superEncoder(forKey key: Key) -> Encoder {
fatalError()
}
}
fileprivate struct _UnkeyedEncodingContainer : UnkeyedEncodingContainer {
private(set) public var codingPath: [CodingKey]
public var count: Int {
fatalError()
}
public mutating func encodeNil() throws {
fatalError()
}
public mutating func encode(_ value: Bool) throws {
fatalError()
}
public mutating func encode(_ value: Int) throws {
fatalError()
}
public mutating func encode(_ value: Int8) throws {
fatalError()
}
public mutating func encode(_ value: Int16) throws {
fatalError()
}
public mutating func encode(_ value: Int32) throws {
fatalError()
}
public mutating func encode(_ value: Int64) throws {
fatalError()
}
public mutating func encode(_ value: UInt) throws {
fatalError()
}
public mutating func encode(_ value: UInt8) throws {
fatalError()
}
public mutating func encode(_ value: UInt16) throws {
fatalError()
}
public mutating func encode(_ value: UInt32) throws {
fatalError()
}
public mutating func encode(_ value: UInt64) throws {
fatalError()
}
public mutating func encode(_ value: String) throws {
fatalError()
}
public mutating func encode(_ value: Float) throws {
fatalError()
}
public mutating func encode(_ value: Double) throws {
fatalError()
}
public mutating func encode<T : Encodable>(_ value: T) throws {
fatalError()
}
public mutating func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type) -> KeyedEncodingContainer<NestedKey> {
fatalError()
}
public mutating func nestedUnkeyedContainer() -> UnkeyedEncodingContainer {
fatalError()
}
public mutating func superEncoder() -> Encoder {
fatalError()
}
}
extension _Encoder : SingleValueEncodingContainer {
public func encodeNil() throws {
fatalError()
}
public func encode(_ value: Bool) throws {
fatalError()
}
public func encode(_ value: Int) throws {
fatalError()
}
public func encode(_ value: Int8) throws {
fatalError()
}
public func encode(_ value: Int16) throws {
fatalError()
}
public func encode(_ value: Int32) throws {
fatalError()
}
public func encode(_ value: Int64) throws {
fatalError()
}
public func encode(_ value: UInt) throws {
fatalError()
}
public func encode(_ value: UInt8) throws {
fatalError()
}
public func encode(_ value: UInt16) throws {
fatalError()
}
public func encode(_ value: UInt32) throws {
fatalError()
}
public func encode(_ value: UInt64) throws {
fatalError()
}
public func encode(_ value: String) throws {
fatalError()
}
public func encode(_ value: Float) throws {
fatalError()
}
public func encode(_ value: Double) throws {
fatalError()
}
public func encode<T : Encodable>(_ value: T) throws {
fatalError()
}
}
//===----------------------------------------------------------------------===//
// Decoder
//===----------------------------------------------------------------------===//
open class ArchiveDecoder {
open var userInfo: [CodingUserInfoKey : Any] = [:]
public init() {}
open func decode<T : Decodable>(_ type: T.Type, from data: Data) throws -> T {
fatalError()
}
}
fileprivate class _Decoder : Decoder {
fileprivate(set) public var codingPath: [CodingKey]
public var userInfo: [CodingUserInfoKey : Any]
fileprivate init(codingPath: [CodingKey] = [], userInfo: [CodingUserInfoKey : Any]) {
self.codingPath = codingPath
self.userInfo = userInfo
}
public func container<Key>(keyedBy type: Key.Type) throws -> KeyedDecodingContainer<Key> {
fatalError()
}
public func unkeyedContainer() throws -> UnkeyedDecodingContainer {
fatalError()
}
public func singleValueContainer() throws -> SingleValueDecodingContainer {
return self
}
}
fileprivate struct _KeyedDecodingContainer<Key : CodingKey> : KeyedDecodingContainerProtocol {
private(set) public var codingPath: [CodingKey]
fileprivate init(codingPath: [CodingKey]) {
self.codingPath = codingPath
}
public var allKeys: [Key] {
fatalError()
}
public func contains(_ key: Key) -> Bool {
fatalError()
}
public func decodeNil(forKey key: Key) throws -> Bool {
fatalError()
}
public func decode(_ type: Bool.Type, forKey key: Key) throws -> Bool {
fatalError()
}
public func decode(_ type: Int.Type, forKey key: Key) throws -> Int {
fatalError()
}
public func decode(_ type: Int8.Type, forKey key: Key) throws -> Int8 {
fatalError()
}
public func decode(_ type: Int16.Type, forKey key: Key) throws -> Int16 {
fatalError()
}
public func decode(_ type: Int32.Type, forKey key: Key) throws -> Int32 {
fatalError()
}
public func decode(_ type: Int64.Type, forKey key: Key) throws -> Int64 {
fatalError()
}
public func decode(_ type: UInt.Type, forKey key: Key) throws -> UInt {
fatalError()
}
public func decode(_ type: UInt8.Type, forKey key: Key) throws -> UInt8 {
fatalError()
}
public func decode(_ type: UInt16.Type, forKey key: Key) throws -> UInt16 {
fatalError()
}
public func decode(_ type: UInt32.Type, forKey key: Key) throws -> UInt32 {
fatalError()
}
public func decode(_ type: UInt64.Type, forKey key: Key) throws -> UInt64 {
fatalError()
}
public func decode(_ type: Float.Type, forKey key: Key) throws -> Float {
fatalError()
}
public func decode(_ type: Double.Type, forKey key: Key) throws -> Double {
fatalError()
}
public func decode(_ type: String.Type, forKey key: Key) throws -> String {
fatalError()
}
public func decode<T : Decodable>(_ type: T.Type, forKey key: Key) throws -> T {
fatalError()
}
public func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type, forKey key: Key) throws -> KeyedDecodingContainer<NestedKey> {
fatalError()
}
public func nestedUnkeyedContainer(forKey key: Key) throws -> UnkeyedDecodingContainer {
fatalError()
}
public func superDecoder() throws -> Decoder {
fatalError()
}
public func superDecoder(forKey key: Key) throws -> Decoder {
fatalError()
}
}
fileprivate struct _UnkeyedDecodingContainer : UnkeyedDecodingContainer {
private(set) public var codingPath: [CodingKey]
public var count: Int? {
fatalError()
}
public var currentIndex: Int {
fatalError()
}
public var isAtEnd: Bool {
fatalError()
}
public mutating func decodeNil() throws -> Bool {
fatalError()
}
public mutating func decode(_ type: Bool.Type) throws -> Bool {
fatalError()
}
public mutating func decode(_ type: Int.Type) throws -> Int {
fatalError()
}
public mutating func decode(_ type: Int8.Type) throws -> Int8 {
fatalError()
}
public mutating func decode(_ type: Int16.Type) throws -> Int16 {
fatalError()
}
public mutating func decode(_ type: Int32.Type) throws -> Int32 {
fatalError()
}
public mutating func decode(_ type: Int64.Type) throws -> Int64 {
fatalError()
}
public mutating func decode(_ type: UInt.Type) throws -> UInt {
fatalError()
}
public mutating func decode(_ type: UInt8.Type) throws -> UInt8 {
fatalError()
}
public mutating func decode(_ type: UInt16.Type) throws -> UInt16 {
fatalError()
}
public mutating func decode(_ type: UInt32.Type) throws -> UInt32 {
fatalError()
}
public mutating func decode(_ type: UInt64.Type) throws -> UInt64 {
fatalError()
}
public mutating func decode(_ type: Float.Type) throws -> Float {
fatalError()
}
public mutating func decode(_ type: Double.Type) throws -> Double {
fatalError()
}
public mutating func decode(_ type: String.Type) throws -> String {
fatalError()
}
public mutating func decode<T : Decodable>(_ type: T.Type) throws -> T {
fatalError()
}
public mutating func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type) throws -> KeyedDecodingContainer<NestedKey> {
fatalError()
}
public mutating func nestedUnkeyedContainer() throws -> UnkeyedDecodingContainer {
fatalError()
}
public mutating func superDecoder() throws -> Decoder {
fatalError()
}
}
extension _Decoder : SingleValueDecodingContainer {
public func decodeNil() -> Bool {
fatalError()
}
public func decode(_ type: Bool.Type) throws -> Bool {
fatalError()
}
public func decode(_ type: Int.Type) throws -> Int {
fatalError()
}
public func decode(_ type: Int8.Type) throws -> Int8 {
fatalError()
}
public func decode(_ type: Int16.Type) throws -> Int16 {
fatalError()
}
public func decode(_ type: Int32.Type) throws -> Int32 {
fatalError()
}
public func decode(_ type: Int64.Type) throws -> Int64 {
fatalError()
}
public func decode(_ type: UInt.Type) throws -> UInt {
fatalError()
}
public func decode(_ type: UInt8.Type) throws -> UInt8 {
fatalError()
}
public func decode(_ type: UInt16.Type) throws -> UInt16 {
fatalError()
}
public func decode(_ type: UInt32.Type) throws -> UInt32 {
fatalError()
}
public func decode(_ type: UInt64.Type) throws -> UInt64 {
fatalError()
}
public func decode(_ type: Float.Type) throws -> Float {
fatalError()
}
public func decode(_ type: Double.Type) throws -> Double {
fatalError()
}
public func decode(_ type: String.Type) throws -> String {
fatalError()
}
public func decode<T : Decodable>(_ type: T.Type) throws -> T {
fatalError()
}
}
One thing Swift allows you to do, at the cost of needing to handle the switch yourself:
public mutating func encode<T : Encodable>(_ value: T, forKey key: Key) throws { /* ... */ }
is actually a valid candidate on its own (last I checked) for all of the individual method overloads above it with regards to protocol requirements. So if you want, you can avoid implementing overloads in favor of having one big func encode<T>(...)
. It will be less efficient as you'll need to switch on T.self
yourself (and you run the risk of forgetting a type), but it should be possible to trim down if you prefer shorter code over efficiency.
I wouldn't necessarily recommend it, but it should be possible.
As I step back and look at it, most of the methods and functions should be pretty straightforward since they are just type conversions, basically. For now, it seems to me that the tricky bit is understanding what to do with things like nestedContainer
and nestedUnkeyedContainer
and superDecoder
etc. I can easily leave most of the data types as fatalError until I actually encounter them as I build something up, but I need to minimally understand the nesting and stuff first, I think, before anything is going to work.
Is there a WWDC video about any of this, by chance?
Unfortunately, we didn't get a chance to do a talk about this more advanced topic, but hopefully in the (near?) future...
Topics like nesting are explained in the original proposal, so it might offer some helpful reference (as might the actual JSONEncoder/PropertyListEncoder implementation). Anything specific I can help cover?
Oh - I should have thought to check that document. Duh. I'll do some studying and try to stop bothering you. Thanks for your help so far!
Happy to help! Writing an encoder/decoder pair is significantly more complex than using Codable
, of course, but part of the goal was keeping that as ergonomic as possible too, given the size of the task. The initial complexity is definitely an initial pain point, but I think that as you design this more will start falling into place.
Does any of the automatically synthesized encoding/decoding use unkeyed containers? If I were to be, say, super duper lazy, could I reasonably safely fatalError()
the unkeyedContainer()
function and ignore all of that entirely?
Nothing synthesized does — it's always keyed. Array
and Dictionary
use unkeyed containers, though, so it's not likely that you're going to be able to avoid them unless you don't have any arrays or dictionaries (sounds unlikely).
Darn. Okay, thanks.
Another question! I'm working on a KeyedEncodingContainerProtocol
implementation and there's a method for encoding super
that doesn't use a key. That seems odd since this is a keyed encoder. The documentation states: Equivalent to calling superEncoder(forKey:) with Key(stringValue: "super", intValue: 0).
Although apparently there isn't a default implementation for this like there is for some of the other methods in this protocol. Should there be? It looks like the JSON implementation just does what the documentation says - calls though to superEncoder(forKey:)
, so that's what I'll do, but I'm wondering why this even exists in the first place, and if it needs to exist, why doesn't it just have the default implementation that the docs seem to indicate it should have?
The reason for that is due to the fact that keyed containers are generic on the key type the user requests of them — e.g. KeyedDecodingContainer<MyType.CodingKeys>
. The issue is that MyType.CodingKeys
doesn’t have to have a .super
key, which means that MyType.CodingKeys(stringValue: “super”, intValue: nil)
returns nil
in most cases.
A default implementation would have to pass something in to call into superEncoder(forKey:)
, but there’s very often no key to pass in.
The fact that keyed containers are generic on a key type is mostly a benefit for the API consumer, though — under the hood, the encoder/decoder can do whatever it wants. JSONEncoder
creates a new _JSONReferencingEncoder
with the _JSONKey.super
key, and can do so because the _JSONReferencingEncoder
initializer is not generic on a specific key type — it’ll accept any old CodingKey
.
Note that on decode, things work a little bit differently. JSONDecoder
has to work around this using a private _superDecoder(forKey:)
which is similarly not generic, and the other generic methods call into it.