Decoder codingPath incorrect for nested containers

I discovered a confusing state of Decoder.codingPath when working with the nestedContainer APIs.

Here's a happy path example that prints the codingPath when decoding some nested structs.

struct KeyedContainers: Decodable {
    let a: A
    struct A: Decodable { let b: B }
    struct B: Decodable { let c: PrintCodingPath }
}

struct PrintCodingPath: Decodable {
    init(from decoder: Decoder) throws { print(decoder.codingPath) }
}

let data = Data(#"{ "a": { "b": { "c": 42 } } }"#.utf8)
let decoder = JSONDecoder()

try decoder.decode(KeyedContainers.self, from: data)
// outputs: ["a", "b" "c"]

This works as I'd expect. But if you try decoding the same input but instead use nestedContainer, you get a different codingPath.

struct NestedKeyedContainers: Decodable {
    enum RootKeys: CodingKey { case a }
    enum AKeys: CodingKey { case b }
    enum BKeys: CodingKey { case c }
    init(from decoder: Decoder) throws {
        let rootContainer = try decoder.container(keyedBy: RootKeys.self)
        let aContainer = try rootContainer.nestedContainer(keyedBy: AKeys.self, forKey: .a)
        let bContainer = try aContainer.nestedContainer(keyedBy: BKeys.self, forKey: .b)
        try bContainer.decode(PrintCodingPath.self, forKey: .c)
    }
}

try decoder.decode(NestedKeyedContainers.self, from: data)
// actual: ["c"]
// expected: ["a", "b" "c"]

The issue seems to be that nested containers don't capture the state of the codingPath. You can see where an attempt is made in the JSONDecoder to record it, but calls to nestedContainer return immediately without decoding anything so the effect only lasts until the end of the function call and is lost after.

public func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type, forKey key: Key) throws -> KeyedDecodingContainer<NestedKey> {
    self.decoder.codingPath.append(key)
    defer { self.decoder.codingPath.removeLast() }
    // ...
    let container = _JSONKeyedDecodingContainer<NestedKey>(referencing: self.decoder, wrapping: dictionary)
    return KeyedDecodingContainer(container)
}

https://github.com/apple/swift/blob/0366e61/stdlib/public/Darwin/Foundation/JSONEncoder.swift#L1621-L1637

SR-6294 might be reporting the same issue.


I did a quick prototype of what could be a solution. These calls could capture the state in a new __JSONDecoder struct, which all nested containers would point to. This seems pretty low impact and would only affect nested container usage. The general code paths would not need to make any new codingPath copies and maintain their delegation approach.

https://github.com/apple/swift/compare/master...josh:jsondecoder-nestedcontainer-codingpath

I'm new to the Swift contribution process, so I figured I'd post here first.

Thanks,
Josh

CC @itaiferber

1 Like

Yup, this is a problem because JSONDecoder is extremely stateful (I've experienced it with the superDecoder methods). The main thing I'd be curious about is how your change affects performance (if at all). Making decoding slower across the board seems like a large price to pay to solve this issue.

I don't have time to look into this at the moment, but I think this might warrant some deeper investigation — I'm surprised that this is the case given that the implementation of _JSONKeyedDecodingContainer.init(referencing:wrapping:) explicitly captures the given decoder's codingPath directly:

init(referencing decoder: __JSONDecoder, wrapping container: [String : Any]) {
    self.decoder = decoder
    // <snip>
    self.codingPath = decoder.codingPath
}

Once the codingPath is captured, further modification in the __JSONDecoder's defer should CoW to leave the nested container with the previously captured path. Your suggested approach is a trivial change but I'm surprised it would be necessary.

Pretty much every reference to the codingPath inside _JSONKeyedDecodingContainer references self.decoder.codingPath directly, not the container's own self.codingPath. I found that odd as well.

I tried a simple find and replace for self.decoder.codingPath -> self.codingPath which ran into a mutating issue. Many of the methods KeyedDecodingContainerProtocol protocol are not marked as mutating. We'd need a work around for that if we wanted to rely on _JSONKeyedDecodingContainer.codingPath. Turning it into a class seemed like a big change.

This sounds like the core of the issue, then: self.codingPath should be the only path referred to. I would be curious to see the list of methods specifically requiring mutating as a consequence of this, if you have the time to put that together. One less invasive change might be to wrap self.codingPath in a reference-based wrapper (because the container can't trivially be changed into a class, IIRC), but the best solutions would be to create local copies of codingPath, mutate them locally, and throw away the changes if need be.

I would be curious to see the list of methods specifically requiring mutating as a consequence of this, if you have the time to put that together.

I don't think any of them in the keyed containers can be mutating. There are a few in unkeyed though. So it'd affect all decode* methods.

One less invasive change might be to wrap self.codingPath in a reference-based wrapper (because the container can't trivially be changed into a class, IIRC), but the best solutions would be to create local copies of codingPath , mutate them locally, and throw away the changes if need be.

I haven't tested it, but maybe we could update all of decode(type:forKey) call sites defer to the container's own captured self.codingPath and use decoder.codingPath's mutable reference as storage. So something like:

// instead of:
// self.decoder.codingPath.append(key)
// defer { self.decoder.codingPath.removeLast() }

// always use the container's captured codingPath
let newCodingPath = self.codingPath + [key]

// push codingPath onto decoder stack
let oldDecoderCodingPath = self.decoder.codingPath
self.decoder.codingPath = newCodingPath

// pop stack
defer { self.decoder.codingPath = oldDecoderCodingPath }

It'd be a much larger diff for sure. But the container's self.codingPath is not directly mutated so we're allowed to do this. It's a bit messy but maybe some of this could be pushed down to the unbox function. I suppose there's a number of ways this could be factored depending on your preference.

I did think there was something elegant about just creating a new __JSONDecoder to act as a base for the cases that actually needed. If we want that direction, we could also remove the storage for _JSONKeyedDecodingContainer.codingPath and always delegate to decoder.codingPath.

I'm not sure this is actually an issue. Calling decode(_:forKey:) on a container doesn't need to update the container's codingPath, since the type being decoded sees a new Decoder instance altogether. You hit on this below:

This is likely what needs to happen — because self.decoder.codingPath doesn't necessarily contain all of the keys from self.codingPath, pushing on only the latest key is not enough. Overwriting it with a new path temporarily is the way to go.

This is possible — it's definitely possible for a keyed container to only store the last key in the path it was created with, and dynamically create a codingPath with self.decoder.codingPath + [key]; I would be mostly concerned about two things:

  1. Any performance implications for repeated access to codingPath, if meaningful. I suspect codingPath is accessed more often internally than externally, so it would be good to keep an eye on the tradeoff between improvements to memory usage vs. the costs of recreating codingPath with every access
  2. Semantically, I'd be a little concerned at the possibility of creating a container, then somehow reusing it after having modified self.decoder.codingPath. Off the top of my head, the only way to really make this happen is by doing something squirrely, but at the moment, at least this is guarded against

I'd say that getting rid of this storage is likely orthogonal to the underlying issue; it'd be good to give the above changes a shot first. Likely, all encode and decode calls should be audited and updated.

Hey @itaiferber, thanks for waiting, I've got some updates.

I added some test coverage in https://github.com/josh/swift/commit/3796247. It turns out the JSONEncoder container already has good coverage around this. I basically just copied it's test case and applied it to the Decoder instead. My original proposal does fix the tests.

I started on an alternative proposal that does not create a new base decoder, but changes the logic around reseting the current decoder.codingPath in every decode case. It feels like it adds a lot more boilerplate. Any thoughts on a simpler way to express these 4 lines. Would a helper method be worth adding?

https://github.com/apple/swift/compare/master...josh:jsondecoder-nestedcontainer-codingpath2

Oh, one more question. Why is TestJSONEncoder.swift disabled with rdar55727144? The git blame doesn't provide any context.

https://github.com/apple/swift/blob/fa1e9df207a00a2deddd59819e4e1c8d605d844c/test/stdlib/TestJSONEncoder.swift#L12

Great!

I originally had a helper method which did approximately this, but removed it because it led to overlapping exclusive accesses. If you can find a reasonable way to reintroduce that without the overlapping accesses (may be as easy as pulling self out into a local var to pass into the closure, but I don't have the setup to check at the moment), then I think it'd be helpful.

I don't remember off the top of my head; maybe @bendjones can give you some more context.

Pushing up another alternative refactor at https://github.com/apple/swift/compare/master...josh:jsondecoder-nestedcontainer-codingpath3.

The change requires each container pass along an explicit codingPath to __JSONDecoder.unbox. The cool thing about this approach is it makes it clear we only need to mutate __JSONDecoder.codingPath when self.storage.push(container: value) occurs. So that's kinda a nice symmetry. The downside is that it adds codingPath passing arguments everywhere. I was hoping we'd only need this for the decode<T : Decodable> case and not primitives. But even primitives need the correct path in unbox to throw a typeMismatch. Even if it's not worth doing, it does shine a light on the code paths that actually need to manipulate decoder.codingPath for reference purposes.

Besides being verbose, are there any correctness or performance concerns with approach #2 https://github.com/apple/swift/compare/master...josh:jsondecoder-nestedcontainer-codingpath2?

Thanks for your thoughts so far!

What do you think about using a more lightweight data structure for the codingPath? As you mentioned, the happy path mainly appends to the list and we only really use the Array value when throwing and error or exposed through codingPath: [CodingKey]. Internally, the path could be represented as a stack and lazily convert it to an Array when needed.

enum CodingPath {
    case empty
    indirect case cons(CodingKey, Self)
    mutating func append(_ key: CodingKey) {
        self = .cons(key, self)
    }
}

self.decoder.codingPath.append(_JSONKey(index: self.currentIndex))

I don't have empirical evidence either way, but I suppose the counter point is that Array already performs well as a stack. If this is true, hopefully they aren't too many concerns with changing these existing codingPath call sites.

Terms of Service

Privacy Policy

Cookie Policy