Proper way to structure containers in new coders?

itaiferber · April 6, 2018, 8:51pm

Sorry for the delay in responding to this — I just got back from vacation, so I’m catching up on a lot of threads here.

Formats whose contents are represented in terms of Dictionary (with String or Int keys), Array, numbers, Strings, and null values offer representations in the vast majority of formats. It's also entirely possible for an Encoder to collect an object graph in this representation and transform it into a more natural representation, too; there isn't a strict requirement that the object graph passed in is represented in the same order when serialized.

But yes, if you have a format which is truly incompatible with this model, then it might not be well-suited to fit the Codable API. This is a tradeoff we made in the design: if you try to capture something representative of all serialization formats, you end up with very weak and vague API because there are ends of the spectrum which don't align with one another at all. But I would say that ~90% of serialization formats are extremely similar to one another and can be captured well here.

DeFrenZ:

XML

With the XML value I don't see issues on the decoding, but I think the encoding is ambiguous: how does the model give information to the Encoder whether a property will go into an XML attribute or a child XML element?
Given how Codable is structured I don't think this should come from info custom fed into the Encoder (e.g. through userInfo), but should come from the model itself. The suitable place for this seems to be CodingKeys, which specify where the properties get coded.
Though the protocol doesn't leave options to define additional details about the keys, so I believe the option here is only to define something like protocol XMLCodingKey: CodingKey which allows for that. But Encoder will still work with a keyed container which will be defined with a CodingKey... should the encoder fail (i.e. throw) at runtime if the key is not XMLCodingKey?
Also this means that you have to give up the synthesised implementation, but I don't think there are ways around it...

This is the type of decision that I would posit is really up to the Encoder, not what's being encoded. XML has many, many ways to represent similar concepts, and I think the specific format choices should be up to, and controlled by the Encoder. For instance, the XML you give above could be represented in at least few ways off the top of my head:

Your original (properties in XML attributes, except for complex values):
```
<PARENT value="42">
    <CHILD childValue="aValue" />
</PARENT>
```

Properties in child nodes with custom node types:

<parent>
    <value>42</value>
    <child>
        <childValue>1</childValue>
    </child>
</parent>

Properties in child nodes with generalized node types (e.g. property-value pairs):

<property>parent</property>
<value>
    <property>value</property>
    <value>42</value>

    <property>child</property>
    <value>
        <property>childValue</property>
        <value>1</value>
    </value>
</value>

In all, this is up to the DTD/schema you're working with, and how the Encoder chooses to support these representations. This isn't a choice that the actual type should control (or even care about); a flag on the Encoder could potentially allow you to choose the representation you'd want.

The manual declaration of CodingKeys is indeed the answer here. We didn't want the compiler to unwittingly introduce fragility into your CodingKeys enum by deciding on integer values on your behalf. Since all enum cases have names, string values are easy; integer values fall prey to reordering and naming concerns. String backing is relatively safe (short of renaming your property and the associated key); Int backing you'd have to do yourself.

Again, I think that depending on the model you're going for, this can be solved either by the Encoder itself, or by the type directly without need for anything special:

If the desired format is such that the produced protobuf output is consistent in wire types across the board (e.g. use varints everywhere, or use fixed-width values everywhere, etc.), then there is an easy solution which is that the Encoder applies the same strategy across the board. The types themselves don't need to care about the wire formats of their individual properties, since they're all the same
If different types care about the wired formats, then the protobuf Encoder can offer specific types for those wire formats — e.g., Parent.value would be of type ProtobufEncoder.FixedInt64 (or something like that). FixedInt64 would just wrap an Int64 (and would encode as a regular Int64 through all other encoders), but would be special for ProtobufEncoder: the encoder could intercept this type specifically to get its value to write out in the fixed format as needed. ProtobufEncoder would also offer VarInt64, and the Protobuf compiler could generate the different types based on your schema

To get at the core of this: I don't think refining the CodingKey protocol is necessary given the tools we already have at hand. Besides the additional complexity (conceptually, of implementation, etc.), I think between splitting some of the responsibilities here among the Encoder and the types being encoded, it's possible to do everything here today.

With a definition of

struct Parent : Codable {
    let value: ProtobufEncoder.FixedInt64
    let child: Child

    private enum CodingKeys : Int, CodingKey {
        case value = 1
        case child = 2
    }
}

struct Child : Codable {
    let childValue: String

    private enum CodingKeys : Int, CodingKey {
        case childValue = 1
    }
}

it should be possible to write Parent to JSON, XML, and Protobuf as-is today.