JSONDecoder, keyDecodingStrategy issue with Dictionary

Hi @itaiferber,

I have an implementation, but it did turn out to be more difficult than I originally imagined, and perhaps it is too messy.

My challenges were:

  1. Checking if the current type conforms to the marker protocol is not quite enough. Once that you know that the type is a string keyed dictionary you still have to be able to create new instances and you need to know that the type of the value is Decodable. So instead of a pure 'marker' protocol, the protocol I ended up with has both an init and a decode function. Then I can use casting of conditional conforming types of Swift 4.2 to use these functions.

  2. I tried any number of ways to 'short circuit' the boxing and unboxing without having go through the existing APIs - in order to avoid the encoding and decoding of the keys. For decoding I created a Dictionary manually, and unboxed each value of the container manually too. But all attempts to do this meant that the [CodingKey] path is not being updated, and so I would break error handling. I tried taking this a step further, reproducting the box_ and unbox behaviors, but these have overrides for many specific types, to I would end up duplicating a whole lot of code.

So my suggestion for a solution (which I do find suboptimal) is:

  1. Create a CodingKey conforming type intended for string keyed dictionaries.
  2. In my 'marker' protocol I have alternative versions of init(from decoder:) and encode(to encoder:) that performs the exact same code as the current Dictionary Decodable and Encodable implementations (where the key is a String).
  3. These implementations, however, use my new CodingKey conforming type (_JSONStringDictionaryCodingKey).
  4. Where the key encoding and key decoding is performed, I check if the type of the key is _JSONStringDictionaryCodingKey and skip the key coding if it is.

This implementation is very similar to a strategy where one could opt out from key encoding / decoding by conforming to a specific marker protocol. If such a strategy was implemented, then the existing _DictionaryCodingKey could just conform to this protocol and most of the code in my current fix could be deleted.

I'd be glad to make a PR, but right now I have only tried out the implementation in a copy of the JSONEncoder/JSONDecoder classes, so it would need a bit of cleaning up + official tests. Perhaps you would like to comment if this is a desired direction before I make the PR?

Extensions to JSONEncoder.swift:

fileprivate protocol _JSONStringDictionaryMarker {
    init(_from decoder: _StructuralDecoder) throws
    func encode_(to encoder: _StructuralEncoder) throws
}

/// A wrapper for dictionary keys which are Strings or Ints.
@usableFromInline // FIXME(sil-serialize-all)
@_fixed_layout // FIXME(sil-serialize-all)
internal struct _JSONStringDictionaryCodingKey : CodingKey {
    @usableFromInline // FIXME(sil-serialize-all)
    internal let stringValue: String
    @usableFromInline // FIXME(sil-serialize-all)
    internal let intValue: Int?

    @inlinable // FIXME(sil-serialize-all)
    internal init?(stringValue: String) {
        self.stringValue = stringValue
        self.intValue = Int(stringValue)
    }

    @inlinable // FIXME(sil-serialize-all)
    internal init?(intValue: Int) {
        self.stringValue = "\(intValue)"
        self.intValue = intValue
    }
}

extension Dictionary : _JSONStringDictionaryMarker where Key == String, Value: Decodable, Value: Encodable {
    fileprivate init(_from decoder: _StructuralDecoder) throws {
        self.init()
        let container = try decoder.container(keyedBy: _JSONStringDictionaryCodingKey.self)
        for key in container.allKeys {
            let value = try container.decode(Value.self, forKey: key)
            self[key.stringValue] = value
        }
    }

    fileprivate func encode_(to encoder: _StructuralEncoder) throws {
        var container = encoder.container(keyedBy: _JSONStringDictionaryCodingKey.self)
        for (key, value) in self {
            let codingKey = _JSONStringDictionaryCodingKey(stringValue: key)!
            try container.encode(value, forKey: codingKey)
        }
    }
}

In _JSONKeyedEncodingContainer:

    private func _converted(_ key: CodingKey) -> CodingKey {

        let useKeyEncoding = !(key is _JSONStringDictionaryCodingKey)
        guard useKeyEncoding else {
            return key
        }
        ...
    }

In _JSONKeyedDecodingContainer:

    /// Initializes `self` by referencing the given decoder and container.
    fileprivate init(referencing decoder: _StructuralDecoder, wrapping container: [String : Any]) {
        self.decoder = decoder

        let useKeyEncoding = !(Key.self is _JSONStringDictionaryCodingKey.Type)

        switch (decoder.options.keyDecodingStrategy, useKeyEncoding) {
        case (.useDefaultKeys, _), (_, false):
            self.container = container
        case (.convertFromSnakeCase, true):
            ...
        case (.custom(let converter), true):
            ...
        }
        ...
    }

In _JSONEncoder, box_:

        ...
        // The value should request a container from the _StructuralEncoder.
        let depth = self.storage.count
        do {
            if let stringKeyedDictionary = value as? _JSONStringDictionaryMarker {
                try stringKeyedDictionary.encode_(to: self)
            } else {
                try value.encode(to: self)
            }
        } catch {
        ...

In _JSONDecoder, unbox:

        ...
        } else if let stringKeyedDictType = type as? _JSONStringDictionaryMarker.Type {
            self.storage.push(container: value)
            defer { self.storage.popContainer() }
            return try stringKeyedDictType.init(_from: self) as? T
        } else {
            self.storage.push(container: value)
            defer { self.storage.popContainer() }
            return try type.init(from: self)
        }

As I said - I'm not too happy about this - it feels a bit hacky. But if it were a general protocol that CodingKey conformers could also conform to, then most of this could go away and the existing _DictionaryCodingKey could just be made to conform to this.

What do you think?

True. Seems like I was a bit confused there.

I agree, that‘s why I think there is a need for customization there and I would clearly favor another custom key decoding strategy that also gets the type. To implement this, there are some changes required, too. It would be necessary to remember types for containers on the _JSONDecoder storage stack. Then you can pass the type to the keyed container initalizer and it can apply the new key decoding strategy then.

If I am not missing something here, I think this should work. The default key decoding strategies could then be modified to find Dictionaries (by a marker) and would then not apply their strategies for them. The Dictionary decoding code would stay the same.

While writing this down, I just noticed that my original aproach was not working and this approach may have some obvious faults to, I just don‘t see

That's certainly a valid approach; I have a hunch that there's likely a way to implement this a bit more simply, though. Let me look into this a bit later today (I've got a few other things going on in parallel) and we might be able to find a simpler direction to take this into.

Off the top of my head (but not verified at the moment), regarding your challenges:

  1. Instead of replicating init and encode, you could solve this issue with two approaches. On encode, all you need to do is cast the existing value to a [String : Encodable] and simply iterate over the values to call encode(to:) on them directly. Since the specific type doesn't need to be known, this should suffice. On decode, it should be enough to add a static var that returns the value type (e.g. static var elementType: Decodable.Type { return Element.self } for Decodable dictionaries); you can then use the returned type dynamically to initialize, e.g. let value = try dict.elementType.init(from: self); here you can also take into account the actual type in order to call the right unbox method. (For simplicity, it's also possible to remove the generic restriction from the generic unbox method to make it dynamically callable with a Decodable metatype you know nothing about otherwise)
  2. Since codingPath is mutable, you should be able to push it onto the end of the path before calling unbox

I agree that this is a pain, but it could help keep the implementation more self-contained. I'll try to put together an example snippet of what I'm thinking with this. If there's something clearly wrong with my approach, though, we can see how to work around it.

1 Like

Hi again,

Casting to [String : Encodable] will not work, since you cannot call encode or box_ on an Encodable type directly, but only on a type conforming to Encodable. E.g. Cannot invoke 'box_' with an argument list of type '(Encodable)'.

I am going on vacation the next few days. Would it be alright timing-wise if I continue to look at this next week?
I think that I can do something in between what I tried already, and what you are suggesting once I get back.

If I may ask a few related questions (should perhaps be new topics, but I don't want to create a lot of noise if now is not a good time):

  1. With regards to my comments about the current key coding strategies being implemented as a pair of mapping functions instead of one mapping function:
    I could rephrase my points to deal only with the structure of the CodingKey. I know that you commented, that this could be revisited in case more encoders and decoders with key coding strategies are added, but my point is that this is also an issue for custom developed encoders and decoders.
    You will always need pairs of mapping functions when the CodingKey is not enumerable (because the way to get rid of the inverse map would be to enumerate all keys, perform the mapping function and create a lookup dictionary from the result).
    Would it be worth bringing this topic up for discussion?

  2. It has previously been discussed that the current internals of the JSONEncoder and JSONDecoder could have value to end users with the JSONSerialization steps removed, so that encoding gives you an dictionary/array/value hierachy of type Any, and decoding takes the same as input.
    The name StructuralEncoder has been mentioned.
    JSONEncoder and StructuralEncoder could share most implementation details until the time when JSONEncoder could possibly be re-implemented by some more advanced streaming serialization mechanism. Ditto for the decoders.
    When might be a good time to bring this topic up for discussion again? :slight_smile:

What about implementing this the same way CodingKey enums are automatically created?
This would be my approach to solve the double mapping problem.
The main problem I see is that we would need to include some meta information inside the code to make clear which kind of coding keys we want to generate, but I'm pretty sure there are more problems I can't see.

We should be able to change box_ to be non-generic to fix this.

Yes, that's quite alright! I myself don't currently have the time to give this the full thought it deserves.

You are always welcome to bring things up for discussion. :slightly_smiling_face: I think I missed the specifics of this point (and I don't think I quite grok the explanation above); do you mind recapping this for me?

And again, you're always welcome to bring the topic up for discussion again, but on this, I think the answer has not yet changed; we simply haven't had the time to discuss and review what we'd like to do internally. This is a change I'm committed to making (since it will simplify a lot for us), but like any other API, we need to make sure the API naming and semantics are appropriate.

I'm not sure I follow what you mean by this. Can you give an example?

Sure. Instead of @Morten_Bek_Ditlevsen approach of having a protocol for enumerated coding key types, so (every) encoder and decoder can check for this and see that it‘s a finite and well defined list of coding keys that‘s then iteratable so we need only the conversion from CodingKey to string and not the other way around, my idea was to just generate snake case coding keys right away in autogenerated CodingKey enums at choice. It looks a bit too much like java for me and I have no idea whether it’s actually valid swift, but I think of something like this:

@snakeCaseKeyedCodable
struct CustomStructWithAutoGeneratedSnakeCaseCodingKeys: Codable {
    let incredibleValue: Int
    let anotherValue: Int
}

I don‘t know how the thing is called that generates auto derived Codable conformance code, but this code generator would then insert instead of this

...
    enum CodingKeys: CodingKey {
        case incredibleValue
        case anotherValue
    }
...

this snake case keyed version:

...
    enum CodingKeys: CodingKey {
        case incredibleProperty = ”incredible_property”
        case anotherProperty = ”another_property”
    }
...

I don‘t quite know how to implement it in Sourcery (if its possible at all there), but I guess you need a SnakeCaseCodingKeyedCodable marker protocol.

It would also be possible to run key conversion strategies at runtime in init(stringValue:) and stringValue of the CodingKeys enum.


A short list of advantages and disadvantages of both approaches, I see right now:

  • :+1: No need to reimplement key conversion support in all custom encoders and decoders
  • :-1: therefor no straight forward way to use diffrent keys for different encoder/decoders (but not that hard if it was possible to mark a coding key enum directly, so the cases are inserted into them)
  • :+1: however always available for all de- and encoders (I think keys beeing in snake case is also a language interference thing and therefor not necessaryly limited to JSON)
  • Version 1: Direct snake case auto generation:
    • :+1: More reliable. Only conversion from CodingKey to String, never the other way around * :+1: aligns writing special coding keys on your own and using key conversion
    • :-1: Requires a plugin for the code generator for custom conversion strategies. Or one would need to use Sourcery.
  • Version 2: Conversion in init and property:
    • :-1: does actually not solve the problem of conversion from CodingKey to String and vice versa.

Hi Itai,
I now have a working example with a copy of the JSONDecoder and JSONEncoder with a working change based on your suggestions - changing box_ to be non-generic.
It certainly seems to be a much simpler and a much more contained way to implement the fix than in my original attempt.

Next up I will try moving this to the swift master branch and adding the appropriate tests.

Will get back to you once it's done. :slight_smile:

Thanks for taking the time to put this together! I'll be happy to review when more is in place.

There is also one final dead-simple approach to this that we have the option of taking, but I'm a little bit wary... We could expose _DictionaryCodingKey from the stdlib (it's currently internal) and simply check for that in the key conversion functions. At first glance, this would be maybe a 5-line change.

But, while this would make the transformation super simple, but I'd be concerned about:

  1. Exposing _DictionaryCodingKey as API forever (especially once ABI stable)
  2. Having others accidentally or intentionally misuse _DictionaryCodingKey to avoid CodingKey conversions

Given this, I think we should continue taking a look at the current approach and improving it as much as we can. If this really becomes a much larger issue in the future, I think we can consider exposing more of this as API.

Thanks for elaborating here! My apologies for not getting to respond to this last week :slightly_frowning_face:

This is an option, but a few things immediately come to mind:

  1. What about CodingKeys which already have custom values? (e.g. case incrediblyProperty = "somethingElseAltogether") We'd need to keep all versions of the string around somewhere
  2. How would we handle other, or custom key conversion strategies? Snake case to camel case (and vice versa) is only one transformation
  3. How would this help us avoid the current issue here? (I recognize that this doesn't directly answer the specific need here, but how would we be able to use this to solve the problem when the issue is recognizing which keys should be transformed and which shouldn't?)
  4. What about custom types which conform to CodingKey manually (like the actual _DictionaryCodingKey type)?

While this would be nice, again, I've yet to see other encoders/decoders need this feature (and implement it themselves).

This is another thing which came to mind — in order to support compiler-generated conversions like this, we'd need to put the case conversion logic into the compiler, where I don't really think it belongs.

Building this into the types directly has another problem, too — if the compiler generates methods to implement this, different CodingKey types from different modules can become out of date, which can lead to inconsistency. If you rely on a pre-compiled framework and a bug were fixed in the conversion strategy, types from the framework would be inconsistent with the other types in your application.

Hi Itai,
Yes, that would be a really simple fix. Alternatively add a public protocol and make the _DictionaryCodingKey conform to this. Then your worry about misuse could become proper supported use? :-)

But that goes back to your argument that key coding is currently a JSONEncoder/Decoder specific thing and it would be weird to have a public protocol to opt out from this in general.

An implementation for your suggested approach can be found here:

You can compare against swift/master. Since last merge of master into 4.2 has already been made, I assume that any actual PR should be made against swift-4.2-branch and then subsequently cherry-picked to master, right?

I have added a few tests - I am still lacking tests for correct coding paths in case of an error while encoding or decoding data inside of one of these dictionaries.

I have changed a few things to being dynamic instead of generic. Will this have any effects on performance? I'm guessing not since the generic functions were also doing dynamic casting, but I am not certain.

Please let me know if my implementation is way off compared to your thoughts.

Sincerely,
/morten

1 Like

Hi again @itaiferber,
I went ahead and made a [WIP] PR - just to make the issue a bit more visible:

Sincerely,
/morten

Thanks for putting this up, and apologies for the delay again; currently a lot on my plate. I'm going to try to review the PR soon and give my feedback in there.

One additional note about exposing _DictionaryCodingKey in some way — we can always make it public in the future if we see fit, so worst comes to worst we go with an intermediate solution like this and tighten it up later. We just can't do things the other way. :slight_smile:

1 Like