"Escaping" Decoder from TopLevelDecoder (JSONDecoder)

cocoaphony · August 18, 2022, 12:10am

Studying the docs and implementations, my understanding is that Decoders are immutable value type objects. That means that it should be safe to make copies of them. I'd like to extract the Decoder instance so that I can make some more convenient top-level functions.

Extracting it is easy (and could be made even easier with a tiny change to stdlib). The question is whether this is safe (and whether Decoder promises enough that I can expect it to stay safe).

extension JSONDecoder {
    private struct DecoderCloner: Decodable {
        var decoder: Decoder
        init(from decoder: Decoder) throws {
            self.decoder = decoder
        }
    }

    func decoder(for data: Data) throws -> Decoder {
        try decode(DecoderCloner.self, from: data).decoder
    }
}

With that, I can create more powerful versions of Decodable that pass parameters (as with DecodingConfiguration). I currently have to create top-level wrapper objects to get the process started, just to get a Decoder.

Is there a reason this would be unwise?

itaiferber · August 18, 2022, 12:38am

If you're just looking to extract a Decoder instance from a TopLevelDecoder: it might be possible in certain instances, but I wouldn't expect it to be safe in the general case.

In general:

There is no promise that Decoder instances are either immutable or value types — and in general, you can actually expect most Decoder instances to be reference types with mutable internal state
- For instance, JSONDecoder's internal __JSONDecoder (the actual class conforming to Decoder) is such a class, with mutable internal state
This means that having a Decoder "escape" an init(from:) can lead to unpredictable results; in general, you should never hold on to a Decoder or its containers past the end of an init(from:), nor an Encoder or its containers past the end of an encode(to:)
It's tempting to look past this because in spite of this, the Decoder you get at the top leve of a TopLevelDecoder is typically pretty self-contained, and usually maintains its state separately from TopLevelDecoder; but, this is far from guaranteed
- For instance, JSONDecoder isn't exactly a paragon of efficiency, and if you wanted to squeeze a lot more juice out of it, you could imagine the __JSONDecoder mutable state that exists right now living inside of the top-level JSONDecoder itself, with unowned references to various leaves of data instead of spreading that state all over the place

Decoder is intentionally meant to be opaque to give implementations as much flexibility as possible to be performant, which means that you might not be holding on to the type of object you may think you are.

(While originally working on this, one ideal goal was to have the compiler prevent escaping references to Encoder/Decoder and their containers past the end of the method [which would much more strongly codify that this shouldn't be possible], but to this day I don't know how feasible that is; I'm not certain that even move-only types would be enough to express this restriction.)

I originally wrote a response to this post having misunderstood the intention here. Leaving the old comment content around:

Old Comment

While in some cases they might be, I don't think this is the norm. Many Decoders are neither immutable nor value types, since they can have reference semantics and mutable internal state. JSONDecoder's internal __JSONDecoder, for instance, is a class with exactly such state:

private class __JSONDecoder : Decoder {
    // MARK: Properties

    /// The decoder's storage.
    var storage: _JSONDecodingStorage

    // ...
}

UnkeyedDecodingContainers also explicitly have internal mutable state, and they typically rely on/affect the internal state of the Decoder they come from.

No, this is neither safe in the general case today, nor recommended. You should not hold on to an Encoder or its containers past the end of an encode(to:) call, nor a Decoder or its containers past the end of an init(from:) call; doing so can actually affect the results of encoding and decoding in unpredictable ways.

(In an ideal world, I would have made the compiler prevent this from being possible — but I don't believe even move-only types would be enough to express this limitation.)

cocoaphony · August 18, 2022, 1:02am

Thanks. I'd been studying JSONDecoderImpl instead of the Darwin version, I thought it was more indicative of the intended promises than it is. (But the lack of explicit promises is why I posted the question. :D)

itaiferber · August 18, 2022, 1:10am

Yeah, I'd love for the documentation to be clearer about this! Even saying "don't rely on anything" is better than not saying anything at all.

(FWIW, the swift-corelibs-foundation implementation used to be identical to what's on Darwin — but there's clearly room for evolution; it may be possible to make better promises over time.)

Out of curiosity, what were the types of additions you were looking to make with an escaped Decoder? There might be other options that fit within the current promises that Codable does make. (e.g., it's sadly not strongly typed, but: usually threading additional data in can be done with userInfo, and there's even room for fairly flexible communication through that, depending on what you're looking for.)

cocoaphony · August 18, 2022, 3:03am

Sure, consider this kind of common response:

let json = Data(#"""
{
    "response": {
        "results": [{
            "name": "Alice",
            "age": 43
        }],
        "count": 1
    },
    "status": 200
}
"""#.utf8)

struct Person: Decodable {
    var name: String
    var age: Int
}

We just want the results, which is [Person]. We can't change how [Person] is decoded. So the common approach is to make a wrapper (AnyCodingKey is ExpressibleByStringLiteral):

struct PersonResponse: Decodable {
    var results: [Person]
    init(from decoder: Decoder) throws {
        self.results = try decoder.container(keyedBy: AnyCodingKey.self)
            .nestedContainer(keyedBy: AnyCodingKey.self, forKey: "response")
            .decode([Person].self, forKey: "results")
    }
}

This doesn't exist for any reason except to pull apart the response. It is completely possible to make this generic if there are several responses with the same structure, but if the structures are more adhoc, it's kind of annoying to make the extra layer.

By escaping the Decoder, the init(from:) can become just a function rather than a whole type.

let decoder = try JSONDecoder().decoder(for: json)

try decoder.container(keyedBy: AnyCodingKey.self)
    .nestedContainer(keyedBy: AnyCodingKey.self, forKey: "response")
    .decode([Person].self, forKey: "results")

I have other approaches that I'm exploring to improving ad-hoc decoding, but often a handy thing was to get my hands on a Decoder, and the only way I can do that is with a top-level type. That said, the wrapper type hasn't been my biggest problem. It was just something I was exploring how to remove.

Passing parameters is a very early exploration into tracking recovered errors, and providing configuration (for example, formatters). userInfo is very ugly to use. It has no type safety and there's no way to make values required. I don't have any clear question here; I'm still exploring. I just didn't want to get too far down the "escape a Decoder" road without checking its legitimacy.

-Rob

cocoaphony · August 18, 2022, 4:13am

BTW, this does point to another question I should ask. Is it safe to fetch more than one keyed container from the same Decoder. I'd assumed "sure, that's fine" but I suddenly realized it may not be fully safe.

Consider JSON like this, with a struct in a different format:

let json = Data(#"""
{
   "type" : 1,
   "name" : "name",
   "attribute1" : "One",
   "attribute2" : "Two"
}
"""#.utf8)

struct Event {
    var type: Int
    var name: String
    var attributes: [String: String]
}

I've explored several ways to encode and decode this, but I'm now questioning whether they're all legal. For example, this first creates a container keyed by CodingKeys, and then a separate one keyed by AnyCodingKey (subtracting out the explicit keys).

    init(from decoder: Decoder) throws {
        let explicitContainer = try decoder.container(keyedBy: CodingKeys.self)
        self.type = try explicitContainer.decode(Int.self, forKey: .type)
        self.name = try explicitContainer.decode(String.self, forKey: .name)

        let attributeContainer = try decoder.container(keyedBy: AnyCodingKey.self)

        let allKeys = attributeContainer.allKeys.map(\.stringValue)
        let explicitKeys = CodingKeys.allCases.map(\.stringValue)
        let attributeKeys = Set(allKeys).subtracting(explicitKeys)

        let keyValues = try attributeKeys.map {
            ($0, try attributeContainer.decode(String.self, forKey: AnyCodingKey($0)))
        }

        self.attributes = Dictionary(uniqueKeysWithValues: keyValues)
    }

The fix is trivial (just use AnyCodingKey, which also shortens the code a little and is probably better anyway), but the question is whether having two containers is legal.

cocoaphony · August 19, 2022, 1:53pm

The fix is trivial (just use AnyCodingKey, which also shortens the code a little and is probably better anyway), but the question is whether having two containers is legal.

I believe I have my own answer from the docs. A keyed decoding container is documented to be "a view" over the decoder's storage. That seems pretty explicit that I'm free to create different views over the same storage.

ole · August 19, 2022, 2:46pm

Another tidbit from the docs that seems to support your view: KeyedDecodingContainer.allKeys:

Different keyed containers from the same decoder may return different keys here

itaiferber · August 19, 2022, 4:15pm

Yeah, that's a fair reason to want to pull the Decoder out, though I think the better solution would be for TopLevelDecoders to offer a way to start decoding at a given CodingPath, instead of at the root of the data — e.g. you could just ask for a [Person].self at [.response, .results] and skip the wrapper type altogether.

(You can also imagine the possibility of this being more performant in the ideal case, too, as it might be possible to forgo fully parsing irrelevant portions of the data.)

This was something that was discussed a long time ago as an enhancement to the APIs, and I believe there's a Radar floating around for it, though it might not hurt to file duplicate feedback.

Exactly. Containers are intended to be views into the data the decoder is holding at the current coding path, and conforming implementations should allow you to ask for any type of container you want, as many times as you want (and just be prepared to handle a type mismatch for keyed vs. unkeyed containers, if relevant).

cocoaphony · August 20, 2022, 2:58pm

I've been experimenting with exactly those kinds of things, by first extracting the relevant section of the data without fully parsing it:

let scanner = JSONScanner()
// Extract the subdata of the second element under "groups"
let groupJSON = try scanner.extractData(from: Data(jsonString.utf8), 
                                        forPath: ["groups", 1])
let group = try JSONDecoder().decode(Group.self, from: groupJSON)

The non-Darwin JSONParser is particularly amenable to being hacked on and experimented with. :D

Stan_Smida · August 21, 2022, 11:07am

What you are trying to solve is a common problem since Codable was introduced. My rule of thumb is to avoid to try to go against the design but play by the rules. Regarding Codable it is fairly acceptable to do so since property wrappers were introduced. In your case like with @Contained<[Person]>(in: "response", "result"). It can get noisy but it is solid. I know that I'm not going to get into troubles with this in the future. I use these a lot do deal with many problems that I'm otherwise unable to naturally solve on Coders. Like for instance to support different types of internet time formats etc. Actually I've needed so many of them (correctors, dealing with optionals, enums resilient to unknowns, ...) that I've found it better to have a general @Transcoded<Transformer> to keep all those "Codable add-ons" under a single namespace.
This isn't a direct answer to your question though. I don't have any. I just feel more safe to do these adjustments on Codables instead of Coders.