Decode a JSON object of unknown format into a Dictionary with Decodable in Swift 4

dlbuckley · March 16, 2018, 8:04am

@itaiferber no problem, I'm sure you guys and girls have plenty of other more important things to keep you occupied.

Myself & @valeriomazzeo created something very similar to Unevaluated yesterday called AnyCodable with an equivalent strategy of fuzzy decoding to primitive types. It was originally an enum with all of the primitive types as cases, but we refactored that down to using Any like in your original suggestion. It's far from a perfect solution but it will do what we need for now, you can find the source here: GitHub - asensei/AnyCodable: Generic Any? data encapsulation meant to facilitate the transformation of loosely typed objects using Codable.

I do think that one problem with the current JSONDecoder is that once you tell it to decode(data) there isn't an easy way to get the data back out without knowing about the type you are expecting beforehand. Internally it's using JSONSerialization which decodes to NSNull, NSNumber, NSString, NSArray, NSDictionary. Instead of the fuzzy decoding that we are having to do to get any values out with AnyCodable, it would be useful to get to these raw values. Hence why I suggested my previous solution.

If we had access to the raw value provided by JSONSerialization (the example of JSONSerialization is simply an implementation detail, but somewhere a decoder will have some sort of raw value it can pull from) the user of JSONDecoder could implement whatever mapping they wanted and would give back all of the control from JSONDecoder.

valeriomazzeo · March 16, 2018, 1:33pm

itaiferber · March 16, 2018, 4:47pm

Getting back the underlying data from JSONSerialization is the exact idea behind Unevaluated — the difference between AnyCodable and Unevaluated is that AnyCodable attempts to decode types that it knows about on its own and is performing conversions; on the other hand, Unevaluated would be a marker type which asks the Decoder to stick whatever existing representation it has of the underlying data into its .value and returns that. If the Decoder supports Unevaluated (e.g. in formats where it's possible to grab an underlying representation like NSNulls, Strings, etc.), it can do that; if the Decoder does nothing special to recognize Unevaluated and ends up calling its init(from:), Unevaluated will just throw a .typeMismatch letting you know that the Decoder doesn't support it.

To make this concrete, the following implementation of unbox is how JSONDecoder handles taking an existing container and coercing it into the value you've asked for:

fileprivate func unbox<T : Decodable>(_ value: Any, as type: T.Type) throws -> T? {
    if type == Date.self || type == NSDate.self {
        return try self.unbox(value, as: Date.self) as? T
    } else if type == Data.self || type == NSData.self {
        return try self.unbox(value, as: Data.self) as? T
    } else if type == URL.self || type == NSURL.self {
        guard let urlString = try self.unbox(value, as: String.self) else {
            return nil
        }

        guard let url = URL(string: urlString) else {
            throw DecodingError.dataCorrupted(DecodingError.Context(codingPath: self.codingPath,
                                                                    debugDescription: "Invalid URL string."))
        }

        return url as! T
    } else if type == Decimal.self || type == NSDecimalNumber.self {
        return try self.unbox(value, as: Decimal.self) as? T
    } else {
        self.storage.push(container: value)
        defer { self.storage.popContainer() }
        return try type.init(from: self)
    }
}

The value passed in to the method is the value returned from JSONSerialization (e.g. NSDictionary containing NSString and NSArray); today, JSONDecoder knows about a few special types and intercepts them to reinterpret the data. If we were unbox(value, as: Unevaluated.self) today, we'd fall into that last case:

self.storage.push(container: value)
defer { self.storage.popContainer() }
return try type.init(from: self)

That last line would end up calling Unevaluated.init(from:), which would just throw a .typeMismatch. In order to support Unevaluated, we'd expand unbox to do this:

fileprivate func unbox<T : Decodable>(_ value: Any, as type: T.Type) throws -> T? {
    if type == Unevaluated.self {
        return Unevaluated(value)
    } else if ... {
        // ...
    } else {
        self.storage.push(container: value)
        defer { self.storage.popContainer() }
        return try type.init(from: self)
    }
}

This would just return an Unevaluated instances whose contents are exactly what JSONSerialization returned: the collection of NS values it decoded. When you get back the Unevaluated type, its .value contains exactly what you're getting at. (This is the raw value access you're looking for.)

valeriomazzeo · March 16, 2018, 4:50pm

I don't see how we could implement that without recompiling Swift. The whole Encoder/Decoder API is very closed and almost impossible to extend from outside.

AnyCodable, is a temporary solution until this is supported natively in Swift.

itaiferber · March 16, 2018, 4:52pm

That's the point — we'd be adding Unevaluated to the standard library itself; it would be the "blessed" marker type which all Encoders and Decoders can refer to for performing this (rather than each one reinventing the wheel).

valeriomazzeo · March 16, 2018, 4:54pm

That's great. You previously mentioned it won't definitely make it for Swift 5. We have to have something working way before this will make it to Swift officially

itaiferber · March 16, 2018, 4:55pm

Yes, I agree! AnyCodable will work in general for now; I'm just detailing what this might look like when we do have the time to fully design the API, and how it will improve the efficiency of doing it.

dlbuckley · March 16, 2018, 4:55pm

OK, now I see how you intend the deeper integration to work rather than just the addon like we did with AnyCodable, and I agree that this would be a workable solution.

I know you said that something like this won't go in for Swift 5, but I would imagine it would have to go through the formal proposal process any right?

If I can take some of the load and start writing up the proposal would that be useful?

itaiferber · March 16, 2018, 5:06pm

Adding a type to the stdlib would indeed require a proposal, yes, and I appreciate the offer. We do, however, have quite a few enhancements that we're interested in making to Codable, and from a design perspective, I think they would benefit most from being designed together.

The implementation of this will be really easy (it's almost 100% complete above! ), but getting the naming right and making sure the ergonomics are there (I'm going to respond to @hooman above regarding dynamic lookup) is going to take time, which I don't think we'll have for Swift 5. I for one, would actually be opposed to adding a bare type called Unevaluated to the standard library because it is vague and unhelpful; on the other hand, we cannot nest it within Encodable or Decodable (because they're protocols, and also we don't want to split the type up between them) nor Codable (because it's just a composition of protocols).

In all, I think there's a lot more exploration to be done in the Codable space and this is just one piece of what I've been thinking about; in the meantime, AnyCodable should certainly work for you.

For what it's worth, I'd like to push out pitches at the very least for a few such enhancements toward the end of Swift 5's release cycle (or the beginning of Swift 6), time permitting. Feedback on such pitches would be very helpful.

dlbuckley · March 16, 2018, 5:13pm

OK, that's fair enough, I suppose since it's something that's very immediate for us right now it's very easy to get blinkered on the task at hand rather than looking at the bigger picture.

There are actually a few other things we would like to see in Codable, so I'm looking forward to seeing what else you have in mind to see if anything overlaps.

Thanks for taking the time over this issue.

itaiferber · March 16, 2018, 5:19pm

We can certainly add some dynamic lookup features to Unevaluated to make it more accessible, and I think that would be a great extension to the feature (and a good use-case for dynamic lookup)! However, I think we'd rather avoid adding JSON-specific types for this.

When designing Codable, one of the big decisions we made early on was that we wanted Codable to be as format-agnostic as we could make it (without making it so abstract as to no longer be helpful); our goal was to help abstract away just enough details to make a given Codable implementation work with many different formats. As such, many existing Codable implementations would work as-is whether encoded through JSONEncoder, PropertyListEncoder, or any other Encoder you might write.

With this, I recognize that the #1 use-case by far for Codable at the moment is for serializing to and from JSON; it's a really popular format at the moment, and when most folks write their Codable implementations, they've got JSON in mind. However, what we'd like to avoid is facilitating writing something like

init(from decoder: Decoder) throws {
    let container = try decoder.singleValueContainer()
    let contents = try container.decode(JSONDecoder.UnevaluatedJSON.self)
    // ... inspect contents
}

Because the Unevaluated type is essentially just a marker type, scoping such a type too closely to one format makes the Codable implementation unusable for other formats. Specializing parts of a Codable implementation is possible, but error-prone:

init(from decoder: Decoder) throws {
    if decoder is JSONDecoder {
        // do JSON stuff
    } else {
        // do general stuff
    }
}

The above will fail because the Decoder passed in is not a JSONDecoder; JSONDecoder uses a private class _JSONDecoder to do the decoding work, so this is easy to get wrong. At the moment, the right way to do this would be to pass a known value through the JSONDecoder's userInfo (which you control) and check for that type through decoder.userInfo. (There's also currently no way to query a decoder for its format, so the solution there is the same. One of the enhancements I want to make is along this vein, to improve the ergonomics of checking this.)

In any case, our philosophy on this is generally that if JSON can benefit from such a feature, other formats should be able to benefit as well. We'd rather not gate this on JSON adoption specifically, especially as we're looking to add new encoders and decoders ourselves to support further formats.

Is there something specific you'd like to see from dynamic lookup that would make the ergonomics of Unevaluated better? Happy to get input here!

itaiferber · March 16, 2018, 5:23pm

Happy to hear about other features that you'd like to see from Codable. Some might be already tracked internally, some might not; input is always appreciated.

DeFrenZ · March 16, 2018, 5:41pm

Could Unevaluated be an associatedtype within both, and redefine Codable as Encodable & Decodable where Encodable.Unevaluated == Decodable.Unevaluated? I'm not even sure that's valid Swift though

chrisanderson · March 16, 2018, 5:41pm

Aside from my original meta request way back when at the start of this thread (and while I'm sad that I haven't been able to use Codable yet until my original issue is resolved, definitely appreciate that discussion continues), I'd love to see better ways of handling inconsistent JSON data formats without resorting to an intermediate type or some other go-between.

E.g. I have the JSON

{
  "id": "4yq6txdpfadhbaqnwp3",
  "email": "Large shirt",
  "quantity": "200",
  "active": "true"

}

with the struct

struct Product:Codable {
  var id: String
  var name: String
  var quantity: Int
  var active: Bool
}

Attempting to convert the the active and quantity attributes fails due to a type mismatch.

I'd like either some way to allow JSONDecoder to be more 'aggressive' in its conversions (is it an Int? No? Is it a string that can be converted to an Int? Yes? Great!) or allow for additional custom decoding strategies (ala Apple Developer Documentation). If then the conversion fails, can thrown an exception, but 9/10 the issue is a string value in place of a primitive type and can be handled easily.

Many JSON APIs are incredibly and frustratingly inconsistent, so allowing for greater flexibility in dealing with "similar" types such as these is one of my biggest stumbling blocks when working with Codable.

hooman · March 16, 2018, 6:04pm

In practice, this Unevaluated value is not a black box for me. I know it should most probably have a certain "property" of a certain type, or I know it should have either of a handful of "properties". I want to be able to just treat those keys like properties the way dynamic lookup allows. But I also can't fully decode it to a struct or object because it does not have a fully specified or fixed layout.

itaiferber · March 16, 2018, 6:23pm

I like this direction, but you can't constrain typealiases in this way:

protocol P1 { associatedtype Unevaluated }
protocol P2 { associatedtype Unevaluated }
typealias P3 = P1 & P2 where P1.Unevaluated == P2.Unevaluated
// => error: 'where' clause cannot be attached to a non-generic declaration
//    typealias P3 = P1 & P2 where P1.Unevaluated == P2.Unevaluated
//                           ^

You would also need specific instantiations of P1 and P2 (Encodable and Decodable) whose associated types are given concrete values.

This is possible:

struct _Unevaluated {}

protocol P1 {
    typealias Unevaluated = _Unevaluated
}

protocol P2 {
    typealias Unevaluated = _Unevaluated
}

typealias P3 = P1 & P2
print(P3.Unevaluated.self) // => _Unevaluated

but works in a surprising way. P3 actually shows two different overloads for Unevaluated; they just happen to both be the same. You can actually do this:

struct _Unevaluated1 {}
struct _Unevaluated2 {}

protocol P1 {
    typealias Unevaluated = _Unevaluated1
}

protocol P2 {
    typealias Unevaluated = _Unevaluated2
}

typealias P3 = P1 & P2
print(P3.Unevaluated.self) // => _Unevaluated2

I'm not sure how Swift chooses which one "wins", but this appears to return _Unevaluated2 regardless of whether you write P1 & P2 or P2 & P1 (it's possible the composition is always sorted and the last definition wins out; I'm just surprised there are no warnings or anything). That's neither here nor there though.

Seems like we can expand the protocols to contain this (and can potentially include a matching _'d type as the actual underlying type for the protocols). @jrose How does adding typealiases to protocols interact with the ABI?

itaiferber · March 16, 2018, 6:27pm

This is tracked by SR-5249, though I can't speak to us adding a switch you can flip to allow JSONDecoder to just perform these conversions on your behalf — there are plenty of edge cases where JSONDecoder couldn't decide on a result for you, and it'd be best for you to do it. Instead, I think we'd prefer a more explicit and strongly-typed solution; we've considered transformers (or similar) in the past, but we'd still need to figure out the details.

Of course, you don't have to use intermediate types to do this — you can always implement init(from:) yourself to provide the conversions you need.

chrisanderson · March 16, 2018, 6:30pm

The downside of the init(from:) approach is that it (as far as I can tell) requires me to manually specify every single attribute, when I really only want to customize one or two. No idea how this would work in practice but I'd love some way of implying which properties on a struct should use the default decodable case (and thus don't need to be explicitly repeated in init(from:) versus one that I am manually overriding. Most of these are referenced in SR-5249, but good to restate that they'd be amazing to have.

And regarding the switch on JSONDecoder, what I really think I mean is allowing users to subclass JSONDecoder and instruct it how to handle specific cases, e.g. the string -> number case.

itaiferber · March 16, 2018, 6:30pm

If you're aware of the structure of the JSON you're looking at, is there anything preventing you from casting the .value to the structure you want? (Say, [String : [Any] for a dictionary containing arrays of heterogeneous values)

I think the ergonomics of adding dynamic lookups would be nice, but I think using the facilities that Swift already affords you would be even better, no?

cherrywoods · March 16, 2018, 6:32pm

I do not like the idea of such an abstract Unevaluated type, although I think it is in general the right way to solve the problem discussed in this thread.

I think it would be better to add something like this specifically for JSON. I furthermore think this could be a great place to provide some more functionality for accessing "raw" JSON by making Unevaluated more than just a marker type.

The problem I see, is this: You somehow find out, that the format you are currently decoding from is JSON. Then you decode such an Unevaluated value from the decoder. Now you have to do something with either the NSDictionary, or NSArray, or whatever you get as .value. Those types are as I see it an implementation detail of JSONDecoder (O and what you get with Unevalutuated is actually a "raw" view into the decoders storage. This has in my opinion several disadvantages:

It would be just annoying to refactor the custom decoding code, if JSONDecoder started to use Dictionary instead of NSDictionary, Array instead of NSArray, and so on.
The resulting code won't be easily understandable I guess, because you won't be able to figure out right away, that this code is working with "raw" JSON / I would not immediately connect NSDictionary with JSON's "{ }". At least it won't be self documenting.

Im sticking a bit too much to the implementation example for above. Beyond this, an abstract implementation would leave us with those problems:

There is no indication that there is just a limited set of options, but there is actually just a limited set of possible types for this Any value, because it is JSON, as we found out.
What is the decoders internal representation if it isn't JSON?

Also, I do dislike the concept of an abstract type for all formats, because I fell like it does not match the design of Decoder. I would consider it better to have some method like abstractRepresentation alongside with keyedContainer, unkeyedContainer and singleValueContainer. The problem with this is, that the return type is still open...

If we had a type such as RawJSON (I'm really not good at naming), we could add general working-with-JSON capabilities and not show the NSDictionarys under our cloths to the world. One could ask this type whether it has a keyed container for us, or a number, and so on. We could also tell it that we want to have a number, also if there is a string (only if it is convertible of course). These examples are certainly not well designed right now, but I think it shows what I mean. This new type would could take a way lager role in the JSON decoding game. We could start with such a RawJSON, do some lookup and extraction (like resolving the upper container levels that maybe contain information we don't really want to have as a type on it's own, I don't know if this is happens often, but I have seen something similar with another format) and then at some other point, tell JSONDecoder to decode the remaining JSON for us. If we need to handle something like the meta data from this issue, we could go on manually working with the JSON and if it contained other data, that would rather like to have as an instance of one of our custom types, we could just go on as we started: We would tell JSONDecoder to decode the remaining RawJSON extracted before.

I don't think wo should miss this chance with adding an abstract type. Other implementations for other formats could follow the same way, but would not be required to do so.