New "Unevaluated" type for Decoder to allow later re-encoding of data with unknown structure

Currently it is impossible to ask a Decoder for some unevaluated data from it's storage. This problem arrises, if someone needs this data to re-encode it later, but does not know how it is structured. The problem was described in SR-53112.

A Unevaluated type has also been discussed in relation to this question.

I am opening this topic, because the discussion seems no longer directly relevant to this other problem to me.

Dynamic Member Lookup

I think dynamic lookup for Unevaluated would be a truly great thing. However, I have some concerns related to the implementation and the usability of the Codable environment for Decoder implementors.

My main question here is: How will this dynamic lookup on Unevaluated be implemented? Since .value would be specific to the actual decoder, it would be hard or even impossible to tell anything about it and provide dynamic lookup to it. We could do guesses here, or require that the representation matches the one from JSONDecoder and PlistDecoder. I don't think that would be good.

I see right now three ways to work around this:

  1. let the implementor of the Decoder take care of this
  2. Require that .value will (not literally) be a KeyedDecodingContainer, UnkeyedDecodingContainer or a SingleValueDecodingContainer. With something like that, we could do a lot of dynamic lookup (I guess).
  3. Refer to the decoder that was asked to return Unevaluated (use decoder as a delegate) NOTE: It is actually a bit more complicated then I thought first: Unevaluated needs some sort of immutable snapshot of the decoder because decoders storage will change.

In case 1, the implementor would, as far as I can see, in essence have to write just another decoder here. He would need to supply pretty the same functionality twice, because he would need to write a Decoder and then something similar, that worked as a delegate for dynamic lookup of Unevaluated. Case 3 resolves this.

Case 2 is some approach to standardize the storage of a Decoder. Unevaluated would require that .value conformed to certain protocols like KeyedContainer, UnkeyedContainer and SingleValueContainer. I think this would be super cool, because it gets easier to write a Decoder then, if you rely yourself on Unevaluated and implement KeyedDecodingContainer, etc. over it, instead of implementing that logic yourself. If I did not look on it from this perspective, I think, it could also looks like you have to implement another keyed container thing instead of another decoder here. Interesting is that this correlates up to some extend with what I think is a way to make Decoder simpler to implement (I pointed this out a bit here, it's already implemented, please see the v2 branch of https://www.github.org/cherrywoods/swift-meta-serialization)

Case 3 would be easiest for the implementor of a Decoder I guess: You would not create a Unevaluated with the content on top of the storage, instead you would pass (EDIT: a immutable snapshot of) self to init. Unevaluated implements the dynamic lookup by somehow using the "traditional" methods, like the container method. Encoder used the decoder and looked at it's storage when re-encoding. Dynamic lookup would look somehow like this:

unevaluated.myCodingKey? as String?

However, this leads to another (pretty interesting, I think) question: Would Unevaluated actually be a alternative to Decoder for the user?
Considering this "traditional" decoding code:

init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: SomeCodingKeys.self)
    self.value = try container.decode(Int.self, forKey: .value)
    let nestedContainer = try container.nestedUnkeyedContainer(forKey: .otherStuff)
    self.one = try nestedContainer.decode(String.self)
    self.two = try nestedContainer.decode(Bool.self)
}

Will this do the same?

init(from decoder: Decoder) throws {
    let unevaluated = decoder.unevaluatedData()
    self.value = unevaluated.value // The property access can't throw, the lookup would return nil, I omit the handling here
    self.one = unevaluated.otherStuff.0 // if .0 will be possible, it could also be:
    self.two = unevaluated.otherStuff[1]
}

In my opinion, adding dynamic lookup would make a decoder (and also an encoder) more usable. But I do think, that this step should be part of a larger redesign of decoder and encoder, it should not be added in parallel, if I am not mistaking about the abilities of it. It could also be added to decoder directly.

For Unevaluated in general, I just see a minor issue.
From now on I will see Unevaluated just as a way to get "raw" storage data that will just be re-encoded and assert that Unevaluated is a struct like this one:

The abstract concept somehow suggests to me that I can also use this with other encoders than the one related to the decoder I got the Unevaluated from. I should of course not do this, but I could mix JSONDecoder and PlistEncoder and succeed in doing so, if the JSON only contained Dictionary, Arrays, Strings and Numbers and the implementation was this one:

I don't think that this is a totally unrealistic scenario. I we have some structure that re-encodes something, when it is encoding, and does this on a totally generic level (is not specific to any serialization format), so why not give it to another encoder? Also, the unknown structure issue can also come up, if we want to transfer to another format and not back to the original one.

For those reasons, I would prefer Unevaluated to be a protocol rather then a struct.

With a protocol the implementor of decoder could also connect this with format specific lookup and manipulation support, e.g. with a JSON enum that conforms to Unevaluated (although format specific lookup seems not to be necessary to me, if there is dynamic lookup). If passing this to e.g. a PlistEncoder, one would get a clear error here, or JSON could even support such a cross over by implementing Encodable and encoding the way it is implemented in itaiferber's gist. Unevaluated could be documented as a good point to implement format specific lookups and manipulation.

One disadvantage of a protocol is that it won't be possible to call decode(Unevaluated.self) to get raw storage of a decoder to encode it later, as far as I can see. A method returning Unevaluated would work, but this would require all decoders to support it. However all decoders should have some sort of data they are working that they can pass back here.

Another disadvantage is that now Case 1 from above applies. Format specific lookup code still look verry similar on similar formats (e.g. JSON and msgpack) I think.

Thanks for putting together some thoughts on the issue, and apologies on the delay — I had most of this typed out yesterday but didn't get a chance to finish. There's a lot to unpack here, so let's take a step back for a moment.

I think I should have gated this statement a bit better, and with some thought, I've changed my mind a little. I think adding dynamic lookup to Unevaluated would be a great addition if we

  1. Identify a clear use-case for adding the feature
  2. Decide on beneficial semantics of the implementation that would integrate well with the existing API

Before tackling #2 here, I think we need to figure out #1 — is there indeed a use-case here that merits the potential complexity of the feature?

To reiterate what I said in the other thread,

The primary goal for introducing Unevaluated is to solve the issue of not being able to decode data you know nothing about for the purpose of preserving it. At its core, the representation of this data will be opaque to you, since the purpose is not to consume the data but to preserve it for future re-encodes.

So before we decide on dynamic lookup or anything similar, we need to decide on whether making Unevaluated consumable in a reasonable way is something we want to do or not; all API needs motivation for its introduction, not motivation against its introduction. Questions:

  1. What use-case does making Unevaluated consumable solve? Is there something you can do by getting an Unevaluated that you can't by using existing APIs directly?
  2. What sort of patterns might we enable by making Unevaluated consumable?

I think the answer to #1 is "no" at the moment. Right now, I can't think of anything you can't decode by writing your own AnyCodable/JSON/what-have-you enum to decode arbitrary contents of a payload in a way that lets you inspect them in a type-safe way.

As for #2, I think your example shows how we might expect some folks to use the feature:

Error-handling and casting aside, I don't think it would be unreasonable for many developers to flock to a potentially "easier" and less verbose way of getting at their values. Given two ways of doing the same thing with one of them being easier at the cost of some safety, I think many would understandably go with the easier option. In general, we try to avoid offering two ways of doing the same thing, especially when a major goal of the Codable design is to offer strong type safety for working with data that's generally not typed; undermining that goal won't be much help.

So, if we're looking to add dynamic lookup to something like Unevaluated, let's motivate it — is there a problem that we would be solving (in a format-agnostic way)?


Keep in mind, again, that this is in contrast to offering your own, say AnyCodable type (possible today) which would allow you to do something like this:

init(from decoder: Decoder) throws {
    let container = try decoder.singleValueContainer()
    let stuff = try container.decode(AnyCodable.self)
    guard case .dictionary(let dictionary) = stuff,
          case .array(let array) = dictionary["otherStuff"] else {
        // throw
    }

    self.one = array[0]
    self.two = array[1]
}

If you do offer your own type, you can also definitely add dynamic lookup to make it more dynamic too.

No need to apologise, I'm not in a hurry :slight_smile:.
Thank you for giving my thoughts API relevant context!

I actually can't see a specific use case. This is also part of what I wanted to point out: I think we can do all this with a decoder or a container.

Right now, I would want to ask another question for Unevaluated in general:

What I have in mind here is motivated by this thread: Using Unsafe pointers to manipulate a JSON Encoder

The question is: Using Unevaluated, would there be a way to get only the parts of the underlying data as Unevaluated, that I haven't evaluated? Less abstract:

init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: /*some CodingKey enum*/)
    let name = try container.decode(String.self, forKey: /*some key*/)
    let remaining = try container.remainingUnevaluatedData() // would this be this possible?
}

This would not be an unsolvable issue if not, the way to solve it could be this:

  • ask the keyed container for all coding keys, via .allKeys
  • decode Unevaluated for all the keys
  • store the results in a dictionary and use it for re-encoding

I see that this not a big thing, since it is possible to do. However, to me it seems like this is part of the use case for which Unevaluated is actually meant (various examples like the one in the thread above).

To ask an even more general question: All cases I have seen until now for Unevaluated were specific to a concrete format (JSON, actually). Is there a such generic use case, that it can not be solved by a JSONValue (or MessagePackValue, or what ever)?
The one I can see is transferring to another format, which could be very easy with a thing like Unevaluated, but the current Unevaluated isn't usable for it.

The question is if this kind of need-to-re-encode-later issue only comes up in format specific decoding code, or if there are truly format independent structures (like Dictionary, Array, Set, Float, LinkedLists, other data structures) that need this. It is of course nice to keep custom decoding code as generic as possible, but to me, it seems as if you could make the necessary assumptions to use something like JSONValue in all the cases I have seen for Unevaluated. The assumptions I mean are: What are the possible structurings (this is actually set by decoder: keyed and unkeyed containers) and what are the "Primitives“ of the data, in case of a JSON specific API, this would be Strings, Bools, Numbers and Null. In msgpack you would additionally have binary data and extension values. Maybe the API sets some restrictions on the possible primitives too. In my current view, you would only really need Unevaluated if you have an API that is not bound to any format or to a format you can put data into in ways not even the format knows. I don‘t know if something like this exists. Maybe I‘m mistaken here.

If this just isn‘t clear for me and we are talking about adding Unevaluated because it makes it easier to handle such format specific cases, I would be fine with it.