Serializing a dictionary with any codable values

Loooop · October 4, 2018, 7:15am

How serialize a dictionary with any codable values?
This code don't work:

var codableDict = [String:Codable]()
codableDict["A"] = 31    // Int value (codable)
codableDict["B"] = "Donald Duck"  // String value (codable)

print(codableDict)    // ok

let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted

let json = try encoder.encode( codableDict )    // Generic parameter 'T' could not be inferred
print( String(data: json, encoding: .utf8)! )

let decoder = JSONDecoder()
let decodedDict = try decoder.decode( type(of:codableDict), from: json) // Generic parameter 'T' could not be inferred

print(decodedDict)

benrimmington · October 4, 2018, 7:41am

I think you need a [String: AnyCodable] dictionary, where the value is a type-erased wrapper.

Loooop · October 4, 2018, 10:05am

Both solutions do not allow encoding (or decoding) an "unespected" type:

struct Test : Codable {
    let pippo = 3
}

var codableDict = [String:Any]() // or [String:AnyCodable]()
codableDict["A"] = 31
codableDict["B"] = "Donald Duck"
codableDict["C"] = Test()   // <-- This cause the problem

jrose · October 4, 2018, 3:57pm

This has been asked before. Check out

Loooop · October 5, 2018, 5:25am

I see. Codable can't encode and decode a dictionary that contains codable heterogeneous elements.

This code should have worked by default:

struct Test : Codable { // or class
	let pippo = 3
}

var dict = [String:Codable]()
dict["A"] = 31
dict["B"] = "Donald Duck"
dict["C"] = Test()

let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted

let json = try encoder.encode( dict )
print( String(data: json, encoding: .utf8)! )

let decoder = JSONDecoder()
let decodedDict = try decoder.decode( type(of:dict), from: json)

But don't work and I can't figure how make it work without inserting the knowledge of "struct Test" within the "AnyCodable" hacks.
So, compared to NSCoding, we have a mechanism that makes serializing simple things simpler but virtually impossible to serialize slightly more complicated things, such as collections of heterogeneous codable elements or references graphs.

zoul · October 5, 2018, 7:44am

I don’t think that’s fair. One conscious design choice was that you have to know the particular type you are (de)coding. That makes sense to me, after the initial adjustment phase. If you don’t know the particular type, there are reasonable workarounds, such as (shameless plug) my JSON library:

let json: JSON = [
    "foo": "foo",
    "bar": 1,
]

// "{"bar":1,"foo":"foo"}"
let str = try String(data: try JSONEncoder().encode(json), encoding: .utf8)!

Wouldn’t that help?

jrose · October 5, 2018, 3:42pm

Tomáš is right. Codable isn't a replacement for NSCoding; it's a different mechanism that often makes more sense for fixed-schema tree-structure formats.

itaiferber · October 5, 2018, 4:18pm

To give you some background on why this doesn't work, a few notes (I've written about this previously, but a short recap here):

In order to decode a value, you have to know its type. On some level you can guess (e.g. "this looks like a number, so maybe Int?"), but between Swift's strong static typing philosophy (e.g. numeric types are not implicitly convertible) and the fact that this falls apart for complex types ("is this [String : Any], or a struct with properties?") means that type information about values has to go somewhere
NSCoding places that type information in the archived data itself — if you look in an NSKeyedArchiver archive, you'll see various class and type names in there. This appears to "just work" in most cases, but has implications for security (see Data You Can Trust from this year's WWDC), and more importantly, works because of the relative simplicity of Obj-C's type system:
- In Obj-C, class names are unique
- Obj-C has no namespacing, nor nesting of types
- All Obj-C types are effectively public via the dynamic type system (e.g. you can NSClassFromString any class at runtime and get at it
In contrast, this would not work as-is in Swift:
- Class names are not unique — classes with the same name can coexist across modules, or even within modules, given proper nesting
- Swift allows types to be nested
- Swift allows types to be private or fileprivate
All of these points contribute to the fact that type names in Swift are not as stable as class names in Obj-C:
- Because class names are not unique, to uniquely refer to a type by its name, you need to refer to its fully disambiguated name, e.g. MyModule.ParentClass.NestedType. If anything in that disambiguated name changes (module name changes, parent class name changes), the type name changes
- Because types can be nested, the nesting is part of the type name. Changing any nesting (hoisting a type out, or nesting it deeper in somewhere else) changes the type name
- Because you can have multiple classes within the same module with the same name as long as they are fileprivate, fileprivate types include the name of their containing file in their type name. Changing the name of the file changes the name of the type
Codable then cannot reasonably offer placing type information into archives because even minor-seeming changes (changing the name of a file!) can cause data to no longer be readable or decodable. (And again, this is aside from the security implications of doing this — you'll note that NSSecureCoding moves NSCoding more toward Codable in that you have to know type information up-front.)
There are other possible mechanisms for uniquely identifying types (e.g. Java's place-a-UUID-in-the-type-and-never-change-it-again scheme), but there are no stable guarantees over time. So instead, Codable opts to place the type information elsewhere: in the hands of the developers working with these decoded values. Indeed, this is a tradeoff: when you don't know what types you might expect, it's hard to decode those values. You have to have at least some inkling of a notion and be able to express that.

In most cases, this means writing a type which can express "I am an Int or a String or Double or a Foo" (or writing something more general, like AnyCodable/JSON/etc.)

For the record, in most cases, Codable can be just as dynamic as NSCoding is — the difference is in how you express that dynamism statically.

beccadax · October 6, 2018, 4:48am

One point I’d add to Itai’s excellent explanation is that because Codable doesn’t contain type information, it is much better suited to loading data from REST APIs written in common server-side languages. NSCoding is only really capable of reading data written by NSCoding. Codable can often communicate with systems you don’t even control at all.

It’s best to think of them as two alternatives which are useful in different situations, rather than old-and-busted vs. new-hotness.

itaiferber · October 6, 2018, 5:58am

Yep, that’s the other major reason. If type information has to go in the archive, it has to be present in the serialized representation. That means no clean interop with any schema that you don’t own, e.g. the vast majority of web API out there.

Loooop · October 7, 2018, 2:57am

Brent Royal-Gordon, it is clear that as it is, Codable is not a replacement for NSCoding.
But NSCoding requires that the model consists of NSObject subclasses: this is a terrible limitation in Swift.
Please, give us SwiftEncoder/Decoder that:

store type information
can properly encode/decode collections of heterogeneous codable elements
can properly encode/decode references graphs.

itaiferber, thanks for your explanation. My thoughts (sorry for my bad english):

NSSecureCoding does not prevent you from storing and retrieving collections of heterogeneous objects. And therefore, or NSSecureCoding is not secure, or collections of heterogeneous objects can be encoded/decoded in a secure way.

import Foundation

class Pippo : NSObject, NSSecureCoding {
	static var supportsSecureCoding = true
	
	var name : String
	
	init( name:String ) {
		self.name = name
	}
	
	required init?(coder aDecoder: NSCoder) {
		guard let name = aDecoder.decodeObject(of:NSString.self, forKey: "name") as String? else { return nil }
		self.name = name
	}

	func encode(with aCoder: NSCoder) {
		aCoder.encode(name as NSString, forKey: "name")
	}

	override var description: String {
		return "Pippo { name = \(name) }"
	}
}

let array : [Any] = [ Pippo(name: "pippo"), "pizza", 31, 25.3 ]
print(array)

guard let data = try? NSKeyedArchiver.archivedData(withRootObject: array as NSArray, requiringSecureCoding: true)
	else { preconditionFailure("archiving failure") }
print(data) // --> [Pippo { name = pippo }, "pizza", 31, 25.3]
guard let oarray = (try? NSKeyedUnarchiver.unarchiveTopLevelObjectWithData( data )) as? [Any]
	else { preconditionFailure("unarchiving failure") }
print(oarray) // --> [Pippo { name = pippo }, pizza, 31, 25.3]

On the 'names' question:

As far as I know, unlike objc, Swift does not have the ability to instantiate a type from a string type name. There isn't a replacement for NSClassFromString(). And so, even if type names were unique, swift can't reify a type from its string type name without some compiler magic.

Why don't ask the user for an 'archiveTypeName'?

protocol Codable {
//...
	static var	archiveTypeName : String? { get }
}
extension Codable {
	static var	archiveTypeName : String? { return nil }
}

If the user 'override' archiveTypeName:

struct Foo : Codable {
//...
	static let archiveTypeName = "com.xxxx.Foo"
}

store it in the archived data itself on encoding.

On decoding you need a table that associate archiveTypeNames and types:

class CodableTable {
	enum Errors : Error {
		case archiveTypeNameNotFound,archiveTypeNotFound,archiveTypeNameAlreadyExists
	}
	
	private var table = [String : Codable.Type]()
	
	private init(){}
	
	static let shared = CodableTable()

	func type( archiveTypeName:String ) -> Codable.Type? {
		return table[archiveTypeName]
	}

	func register( _ t:Codable.Type ) throws {
		guard let archiveTypeName = t.archiveTypeName else { throw Errors.archiveTypeNameNotFound }
		guard type( archiveTypeName:archiveTypeName ) == nil else { throw Errors.archiveTypeNameAlreadyExists }
		table[archiveTypeName] = t
	}
	
	func unarchive( archiveTypeName:String,decoder: Decoder ) throws -> Any {
		guard let type = type( archiveTypeName:archiveTypeName ) else { throw Errors.archiveTypeNotFound }
		return try type.init( from:decoder )
	}
}

So you can register types that can be dearchived from their type names (types that go into heterogeneous collections etc…).

let codableTable = CodableTable.shared
try codableTable.register( Foo.self )
try codableTable.register( Foo2.self )
// etc.... I know, its tedious

and than pass codableTable to the decoder:

let decoder = SwiftDecoder( codableTable:codableTable )

Then add to the decoder a not-generic decode function that return 'any'.
It check for the existence of archiveTypeName string: if it exists the type will be decoded with:

let decoded = self.codableTable.unarchive( archiveTypeName:archiveTypeName,decoder: decoder )

otherwise in the standard way.
Eventually, some compiler 'magic' can automatically register types that declare archiveTypeName != nil

zoul · October 7, 2018, 6:34am

My impression is that what you want is technically complex and would fail in many interesting non-obvious ways in practice. IMHO we have already tried this approach, it turned out to be pretty painful and the lessons we have learned are embodied in the design of Codable.

I know it’s tempting to say “I just want to serialize this object graph, what’s so hard about it?”, but I think this is exactly the difference between Simple and Easy. Codable is simple, NSCoding is easy.

itaiferber · October 8, 2018, 4:26pm

Some thoughts here:

For the record, Codable is not meant to be a replacement for NSCoding. Both exist in the same space in a meaningfully complementary way: their goals are different, as are their approaches for reaching those goals.

Our intention is to continue offering both APIs so you can benefit from both approaches where relevant.

There is nothing preventing any of these from being done today (and all encoders already can encode and decode collections of heterogeneous elements), but what prevents you from encoding [String : Any] is the generic requirement on encode<T : Codable>(_ value: T, ...): if you made [String : Any] Codable (though I don't necessarily recommend this), you could just encode your dictionary.

What's not reasonable to do outside of the scope of your specific application is make the claim that all [String : Any] : Codable, because clearly, it's possible to construct a dictionary with non-Codable elements. You can do this today and it will just work, but the standard library will never offer this.

Preventing the storage and retrieval and of heterogeneous collections has nothing at all to do with security: preventing it does not make an archive more secure, and allowing it does not make an archive less secure. Nor is this NSSecureCoding's goal. The goal of NSSecureCoding is rather limited in scope given the backwards compatibility requirements given NSCoding: preventing arbitrary code execution from happening inside of apps based on trust of malicious archives.

Again, my talk from this year's WWDC covers this in more detail, but the goal there is to prevent arbitrary trust of class names already in the archive: if I ask to decode an NSArray containing NSStrings, I shouldn't get back an NSMachPort. The dynamic design of NSCoding within the context of Objective-C makes it easier to be able to ask for an NSArray containing both NSStrings and NSNumbers, but this is no more or less secure.

In fact, the implication here is reversed. Due to the design decisions regarding putting class names in archives leading us to put type information elsewhere (i.e. in code), we are able to avoid trusting the contents of the archive (since there is nothing to trust). The additional security is a benefit that we get, not a reason to make it more difficult to express heterogeneity.

I meant to mention this above — although it's currently not possible to do, I suspect that Swift already embeds enough metadata in applications that this should be possible for at least public types. One leading question here is: should you be able to look up internal, private, and fileprivate types by name? And if so, from where? Would we start enforcing visibility at runtime? Not clear.

This has been considered, but there are a few obvious limitations:

What, if anything, prevents two types from claiming the same archiving name? This is not possible to prevent at compile time (unlike Objective-C which makes this easier: using the class name ensures uniqueness; if you've got two classes with the same name at runtime, all bets are already off anyway)
Up-front registration does not solve all problems: what if your [String : Any] contains an unnamed type vended by a different framework whose type and archiving name you know nothing about? How do you know to request to register their identifier, and how do you ask the type to do so?

FWIW, this exact registration problem is non-trivial to solve, and we're currently dealing with the challenge in some new API design internally. Swift currently does not offer an easy solution like Objective-C's +load, which would allow arbtirary frameworks to register their types in a global table at load time, but there are ways around this. Not all generalize to what you're looking for here.

All in all, the design decisions made for Codable came from years of experience with the NSCoding and NSSecureCoding APIs, their flaws, and their benefits. These APIs will continue to coexist, and if you truly need polymorphism or heterogeneity in a way that is impossible for Swift to represent (which I don't believe is the case), you always have those APIs to fall back on.

But, from that experience, I maintain than rather than going with named types, it is significantly easier to express (and benefit from the type safety of)

enum StringOrInt : Codable {
    case string(String)
    case int(Int)

    init(from decoder: Decoder) throws { ... }
    func encode(to encoder: Encoder) throws { ... }
}

let myCollection: [String : StringOrInt] = /* ... */
let data = try JSONEncoder().encode(myCollection)

over the NSSecureCoding analogue.

I've mentioned this in other threads, but what I'd really love to eventually see is variadic generics so we could offer

struct OneOf<T... : Codable> : Codable {
    case t0(T[0])
    case t1(T[1])
    // ...
}

and everyone could benefit from [String : OneOf<String, Int>] rather than having to write their own type.

Jon889 · August 6, 2019, 2:44am

It wouldn't have hurt to include AnyCodable in Swift though?

pavm035 · February 2, 2021, 8:59am

i'm also facing same issue and planning to use AnyCodable, it would be nice if it's part of framework itself