New "Unevaluated" type for Decoder to allow later re-encoding of data with unknown structure

cherrywoods · March 18, 2018, 5:44pm

I think dynamic lookup for Unevaluated would be a truly great thing. However, I have some concerns related to the implementation and the usability of the Codable environment for Decoder implementors.

My main question here is: How will this dynamic lookup on Unevaluated be implemented? Since .value would be specific to the actual decoder, it would be hard or even impossible to tell anything about it and provide dynamic lookup to it. We could do guesses here, or require that the representation matches the one from JSONDecoder and PlistDecoder. I don't think that would be good.

I see right now three ways to work around this:

let the implementor of the Decoder take care of this
Require that .value will (not literally) be a KeyedDecodingContainer, UnkeyedDecodingContainer or a SingleValueDecodingContainer. With something like that, we could do a lot of dynamic lookup (I guess).
Refer to the decoder that was asked to return Unevaluated (use decoder as a delegate) NOTE: It is actually a bit more complicated then I thought first: Unevaluated needs some sort of immutable snapshot of the decoder because decoders storage will change.

In case 1, the implementor would, as far as I can see, in essence have to write just another decoder here. He would need to supply pretty the same functionality twice, because he would need to write a Decoder and then something similar, that worked as a delegate for dynamic lookup of Unevaluated. Case 3 resolves this.

Case 2 is some approach to standardize the storage of a Decoder. Unevaluated would require that .value conformed to certain protocols like KeyedContainer, UnkeyedContainer and SingleValueContainer. I think this would be super cool, because it gets easier to write a Decoder then, if you rely yourself on Unevaluated and implement KeyedDecodingContainer, etc. over it, instead of implementing that logic yourself. If I did not look on it from this perspective, I think, it could also looks like you have to implement another keyed container thing instead of another decoder here. Interesting is that this correlates up to some extend with what I think is a way to make Decoder simpler to implement (I pointed this out a bit here, it's already implemented, please see the v2 branch of https://www.github.org/cherrywoods/swift-meta-serialization)

Case 3 would be easiest for the implementor of a Decoder I guess: You would not create a Unevaluated with the content on top of the storage, instead you would pass (EDIT: a immutable snapshot of) self to init. Unevaluated implements the dynamic lookup by somehow using the "traditional" methods, like the container method. Encoder used the decoder and looked at it's storage when re-encoding. Dynamic lookup would look somehow like this:

unevaluated.myCodingKey? as String?

However, this leads to another (pretty interesting, I think) question: Would Unevaluated actually be a alternative to Decoder for the user?
Considering this "traditional" decoding code:

init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: SomeCodingKeys.self)
    self.value = try container.decode(Int.self, forKey: .value)
    let nestedContainer = try container.nestedUnkeyedContainer(forKey: .otherStuff)
    self.one = try nestedContainer.decode(String.self)
    self.two = try nestedContainer.decode(Bool.self)
}

Will this do the same?

init(from decoder: Decoder) throws {
    let unevaluated = decoder.unevaluatedData()
    self.value = unevaluated.value // The property access can't throw, the lookup would return nil, I omit the handling here
    self.one = unevaluated.otherStuff.0 // if .0 will be possible, it could also be:
    self.two = unevaluated.otherStuff[1]
}

In my opinion, adding dynamic lookup would make a decoder (and also an encoder) more usable. But I do think, that this step should be part of a larger redesign of decoder and encoder, it should not be added in parallel, if I am not mistaking about the abilities of it. It could also be added to decoder directly.

For Unevaluated in general, I just see a minor issue.
From now on I will see Unevaluated just as a way to get "raw" storage data that will just be re-encoded and assert that Unevaluated is a struct like this one:

Decode a JSON object of unknown format into a Dictionary with Decodable in Swift 4

public struct Unevaluated : Codable {
    public let value: Any
    public init(_ value: Any) { self.value = value }

    public init(from decoder: Decoder) throws {
        // throw a type mismatch
    }

    public func encode(to encoder: Encoder) throws {
        // throws an invalid value error
    }
}

The abstract concept somehow suggests to me that I can also use this with other encoders than the one related to the decoder I got the Unevaluated from. I should of course not do this, but I could mix JSONDecoder and PlistEncoder and succeed in doing so, if the JSON only contained Dictionary, Arrays, Strings and Numbers and the implementation was this one:

Decode a JSON object of unknown format into a Dictionary with Decodable in Swift 4

fileprivate func unbox<T : Decodable>(_ value: Any, as type: T.Type) throws -> T? {
    if type == Unevaluated.self {
        return Unevaluated(value)

I don't think that this is a totally unrealistic scenario. I we have some structure that re-encodes something, when it is encoding, and does this on a totally generic level (is not specific to any serialization format), so why not give it to another encoder? Also, the unknown structure issue can also come up, if we want to transfer to another format and not back to the original one.

For those reasons, I would prefer Unevaluated to be a protocol rather then a struct.

With a protocol the implementor of decoder could also connect this with format specific lookup and manipulation support, e.g. with a JSON enum that conforms to Unevaluated (although format specific lookup seems not to be necessary to me, if there is dynamic lookup). If passing this to e.g. a PlistEncoder, one would get a clear error here, or JSON could even support such a cross over by implementing Encodable and encoding the way it is implemented in itaiferber's gist. Unevaluated could be documented as a good point to implement format specific lookups and manipulation.

One disadvantage of a protocol is that it won't be possible to call decode(Unevaluated.self) to get raw storage of a decoder to encode it later, as far as I can see. A method returning Unevaluated would work, but this would require all decoders to support it. However all decoders should have some sort of data they are working that they can pass back here.

Another disadvantage is that now Case 1 from above applies. Format specific lookup code still look verry similar on similar formats (e.g. JSON and msgpack) I think.