Writing Encoders and Decoders (different question)

(I’m starting a new thread for this, because the question is unrelated to the one in the other similarly-named thread.)

I’m having trouble with semantics of the protocol requirements in (say) KeyedEncodingContainerProtocol:

/// Stores a new nested container for the given key and returns A new encoder
/// instance for encoding `super` into that container.
///
/// - parameter key: The key to encode `super` for.
/// - returns: A new encoder to pass to `super.encode(to:)`.
public mutating func superEncoder(forKey key: Self.Key) -> Encoder

Yes, but what nested container? Keyed or unkeyed? If keyed, what is the generic’s specialization type? Ultimately, a keyed container is going to show up in the superclass via the returned encoder and this Encoder API:

public func container<Key>(keyedBy type: Key.Type) -> KeyedEncodingContainer<Key> where Key : CodingKey

which returns one of these:

public struct KeyedEncodingContainer<K> : KeyedEncodingContainerProtocol where K : CodingKey {
    public typealias Key = K
    /// Creates a new instance with the given container.
    /// - parameter container: The container to hold.
    public init<Container>(_ container: Container) where K == Container.Key, Container : KeyedEncodingContainerProtocol

where container is presumably the container “stored” as a result of the superEncoder call (above). But K here is the superclass’s CodingKey-conforming type, and that is generally not Container.Key, because the type for K isn’t known to the superEncoder method.

IOW, the container returned to the superclass seems to be lying about its (specialization) type. It makes no difference for getting the string keys, since both specialization types conform to CodingKey, but isn’t this going to blow up if anyone decides to add unique behavior to the superclass’s concrete coding key type?

Or am I connecting the wrong dots?

Well this function doesn't return either a keyed or unkeyed (or single element) container. It reiturns an Encoder, that is an object that has functions which return keyed and unkeyed containers.

I think this function is intended to be used like this:

override func encode(to encoder: Encoder) {
  var keyedContainer = //...
  super.encode(to: keyedContainer.superEncoder(for: .myKeyToStoreSuperIn))
  // encode new properties
}

According to the code comment, the returned encoder is for “encoding super into that container”, previously said to be a new nested container. That’s the container I’m asking about.

In your code fragment, .myKeyToStoreSuperIn belongs to the subclass’s CodingKey type, not the superclass’s, which is not known at that point in the code. It seems to me you can’t create a new, nested, keyed container without knowing the coding key type. (Or you can, but I just can’t see it.)

FWIW, I did misread the requirements on init<Container>. The equality is of associated types, not specialization types, so the apparent conflict is a protocol conformance lying about its associated type, not a generic lying about its specialization type.

After playing around with this, I think the problem is mainly the comment:

/// Stores a new nested container for the given key and returns A new encoder
/// instance for encoding `super` into that container.

Here’s what I think it should say:

/// Returns a new encoder configured to store its created container within *this* container 
/// as a nested container for the given key.
///
/// - parameter key: The key to encode `super` for.
/// - returns: A new encoder to pass to `super.encode(to:)`.
public mutating func superEncoder(forKey key: Self.Key) -> Encoder

In other words, this is similar to returning a nested container, except that creation of the nested container is deferred until it’s known what kind of container will be needed.

AFAICT, this is not quite what JSONEncoder does. It seems to create (basically) both an unkeyed container and a type-erased keyed container, and lets the super encoder choose one of them to use. It’s not yet clear why it creates the container storage immediately, rather than waiting to see which one is going to be used.

It’s also not clear what happens if the super encoder tries to use a singleValue container.

Some background here about the purpose of this method and how it can be implemented:

When you have an object which inherits from a Codable superclass, it’s generally useful (and often necessary) to encode that superclass’s contents as well as your own. One way to do this is to simply encode your superclass’s properties however you’d like, with your own keys; generally, however, this isn’t recommended since there should be a separation of concerns between how you encode your properties, and how your superclass encodes its properties (in the same way that a subclass’s initializer should call into its superclass’s initializer so it can decide what to do for itself).

The correct way to do this is to call super.encode(to:) to allow your parent class to encode however it likes. However, encode(to:) takes an Encoder, not a specific container, since every type can choose its own container to encode into. There are two options here too:

  1. Pass in the Encoder you got
  2. Produce another Encoder to give to the superclass

Option 1 is problematic because of exactly that separation of concerns. If your class chooses to encode as a keyed object (AKA producing a dictionary) but your superclass chooses to encode as an unkeyed object (AKA producing an array), you can’t encode into the same Encoder — every Encoder can only produce one container (keyed, unkeyed, or single-value).

The other option is to create a new Encoder which your superclass can encode into; whatever container is produced can be stored into your container as a nested container.

In other words, superEncoder(...) creates a new Encoder which wraps a new nested container (of any type) that super can encode to without concern for how you encoded your properties.

There are at least two ways to do this:

  1. Insert a new container for the given key (keyed container)/at the current index (unkeyed container) when superEncoder is called
  2. Defer the insertion until after encoding is done

To answer your first question regarding option 1:

Yes, but what nested container? Keyed or unkeyed? If keyed, what is the generic’s specialization type? Ultimately, a keyed container is going to show up in the superclass via the returned encoder and this Encoder API:

You can insert any dummy value you like, as long as you end up replacing it with the container that the superEncoder ends up producing. The newly created Encoder should not be limited to whatever dummy object you create; it can maintain its own stack of containers and you should have a way of getting back the final produced value.

AFAICT, this is not quite what JSONEncoder does. It seems to create (basically) both an unkeyed container and a type-erased keyed container, and lets the super encoder choose one of them to use. It’s not yet clear why it creates the container storage immediately, rather than waiting to see which one is going to be used.

It’s also not clear what happens if the super encoder tries to use a singleValue container.

JSONEncoder, BTW, does not do this — it goes with option 2. When you ask for a superEncoder(forKey:), it doesn’t insert anything — it returns a new _JSONReferencingEncoder which keeps a reference to the key/index you wanted to insert the container into:

public mutating func superEncoder(forKey key: Key) -> Encoder {
    return _JSONReferencingEncoder(referencing: self.encoder, key: key, convertedKey: _converted(key), wrapping: self.container)
}

_JSONReferencingEncoder then maintains its own container stack — this is easily done by subclassing _JSONEncoder itself to get most of the implementation for free. The key difference is that at deinit time, the referencing encoder then writes out the contents of what super encoded back into whatever container it was referencing.


As a brief example of this, let’s pretend that you have an object which encodes into an unkeyed container; after 3 encode(...) calls, the container might (abstractly) look like this:

┌───────Unkeyed Container───────┐
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │ prop1 │ │ prop2 │ │ prop3 │ │
│ └───────┘ └───────┘ └───────┘ │
└──────────────────────────────▲┘
                               │
                               │
                            Current
                             Index

If you then call superEncoder(), you get the following:

  ┌───────Unkeyed Container───────┐
  │ ┌───────┐ ┌───────┐ ┌───────┐ │
  │ │ prop1 │ │ prop2 │ │ prop3 │ │
  │ └───────┘ └───────┘ └───────┘ │
  └───▲──────────────────────────▲┘
      │                          │
      │                          │
Referencing                   Current
    │                          Index
    │
  ┌─┴────Referencing Encoder──────┐
  │ ┌────────┐                    │
  │ │Index: 3│                    │
  │ └────────┘                    │
  └───────────────────────────────┘

You can then encode more things into the original container:

  ┌─────────────────Unkeyed Container─────────────────┐
  │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │
  │ │ prop1 │ │ prop2 │ │ prop3 │ │ prop4 │ │ prop5 │ │
  │ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ │
  └───▲──────────────────────────────────────────────▲┘
      │                                              │
      │                                              │
Referencing                                       Current
    │                                              Index
    │
  ┌─┴────Referencing Encoder──────┐
  │ ┌──────────┐                  │
  │ │ Index: 3 │                  │
  │ └──────────┘                  │
  └───────────────────────────────┘

super then encodes some of its properties into the Encoder you gave it:

  ┌─────────────────Unkeyed Container─────────────────┐
  │ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │
  │ │ prop1 │ │ prop2 │ │ prop3 │ │ prop4 │ │ prop5 │ │
  │ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ │
  └───▲──────────────────────────────────────────────▲┘
      │                                              │
      │                                              │
Referencing                                       Current
    │                                              Index
    │
  ┌─┴────Referencing Encoder─────────┐
  │ ┌──────────┐ ┌───────┐ ┌───────┐ │
  │ │ Index: 3 │ │ prop1 │ │ prop2 │ │
  │ └──────────┘ └───────┘ └───────┘ │
  └──────────────────────────────────┘

At the end of the scope of your encode(to:), the produced superEncoder() is cleaned up; at deinit time, it inserts its contents into the referenced container, and you end up with the final product as expected:

┌────────────────────────────Unkeyed Container───────────────────────────────┐
│                                ┌─────────────────────┐                     │
│ ┌───────┐ ┌───────┐ ┌───────┐  │ ┌───────┐ ┌───────┐ │ ┌───────┐ ┌───────┐ │
│ │ prop1 │ │ prop2 │ │ prop3 │  │ │ prop1 │ │ prop2 │ │ │ prop4 │ │ prop5 │ │
│ └───────┘ └───────┘ └───────┘  │ └───────┘ └───────┘ │ └───────┘ └───────┘ │
│                                └─────────────────────┘                     │
└────────────────────────────────────────────────────────────────────────────┘

In this example, both containers are unkeyed, but that was for ease of ASCII art. :) There’s no limitation at all on what values super can encode into its own encoder.


So, to circle back around to the original question regarding the doc comment:

/// Stores a new nested container for the given key and returns a new encoder
/// instance for encoding `super` into that container.
///
/// - parameter key: The key to encode `super` for.
/// - returns: A new encoder to pass to `super.encode(to:)`.

The comment is meant to be purposefully abstract — it’s more of a doc comment for consumers of the API rather than Encoder/Decoder writers. “container” here is being used very abstractly: what sort of container is being inserted? The one produced by super. :) [Consumers of the API don’t care about the specifics of how this container might be inserted or the semantics of that, so the commend is somewhat hand-wavy. I do agree, though, that it would be helpful to have documentation aimed at those writing Encoders and Decoders.]

4 Likes

Thanks for the extended explanation.

“Container” isn’t the word I have a problem with. “Stores” is. The comment is describing what the “superEncoder” method is expected to do, and (as you just finished explaining) storing cannot possibly one of them.

This is intensified by “new”, which suggests that the “superEncoder” method is expected to create the container, which it can’t.

Instead, storing and newing are the job of the returned super-encoder, which I tried to express this way:

/// Returns a new encoder configured to store its created container within 
/// *this* container as a nested container for the given key.

For an unkeyed container’s “superEncoder” method, this would need to say “…at the current index.” instead of “…for the given key”.

Finally, if the superclass decides to encode into a single-value container, I’m guessing the super-encoder has to store the value at the given key or current index, not a nested container?

The “disappearing” of single-value containers appears to happen automatically elsewhere (there is no method to get a nested single-value container), but this is something the super-encoder has to do manually, I’m guessing.

You can indeed create and store a new container in this spot; _JSONEncoder just chooses option 2. Option 1, as mentioned above, is totally valid; you can store anything you want in the spot where superEncoder(...) is called, as long as it gets appropriately replaced by what super wants to encode.

For instance, there's nothing stopping _JSONEncoder from storing an unkeyed container wherever superEncoder(...) is called. If super requests an unkeyed container, then it can just get whatever container was just created; otherwise, if it wants a new keyed container, _JSONReferencingEncoder could create a new keyed container and replace the unkeyed container with it.

In fact, there's nothing preventing you from implementing an abstract Container which can dynamically change from keyed to unkeyed to single-value depending on whether you encode a value to it through a KeyedEncodingContainer, UnkeyedEncodingContainer, or SingleValueEncodingContainer interface; in that case, you could also totally create an empty container in-place and wait for super to request an interface for it to encode through.

But, all of this is implementation detail — how and when the container is created is not interesting to a consumer of the API, whom the doc comment targets. "This creates a container that you can encode into which happens to look like an Encoder" (whether or not that's how things are actually implemented) is somewhat conceptually simpler for a consumer than "this creates an Encoder which will later write its container into here".

Single value containers are an interesting beast in that they are not actually "containers" in the sense that keyed and unkeyed containers are. When you create a keyed container, internally, you get an empty dictionary to store things into; when you create an unkeyed container, you get an empty array. When you create a single value container... nothing happens:

public func singleValueContainer() -> SingleValueEncodingContainer {
    return self
}

_JSONEncoder actually conforms to SingleValueEncodingContainer itself, and is written in a way such that no additional work needs to be done for single values.

_JSONEncoder maintains a container stack such that encoding a value places it on the stack. "Container" here is being used differently than above — here it means "boxed value", e.g. KeyedEncodingContainer becomes an NSDictionary on the stack; UnkeyedEncodingContainer becomes an NSArray on the stack; numeric values become NSNumbers on the stack; strings become NSStrings on the stack; etc.

When you create a KeyedEncodingContainer, it pushes a new NSMutableDictionary onto the stack — any values written into the keyed container are encoded on top of the container stack (i.e. boxed), then popped off and written into the contents of the dictionary. The same happens with an UnkeyedEncodingContainer — values encoded into the container are encoded on top of the container stack, then popped off and appended to the array.

With a SingleValueContainer, you get neither a new array nor a new dictionary; the stack is left untouched — when you encode into the container, it simply encodes the value directly onto the stack, with no additional outer container. That means that wherever you created the SingleValueContainer from gets just a single result (whatever was just popped off the stack), and no unwrapping is necessary.

Hence why the "disappearing" is automatic — you don't need to get rid of something that was never there. :)


Another small concrete example of this:

struct Foo<T : Encodable> : Encodable {
    let identifier: String
    let value: T
}

enum MySingleValueThing : String, Encodable {
    case a, b, c
}

When you want to encode a Foo<MySingleValueThing>, a sequence of things happen:

  1. Foo requests a KeyedEncodingContainer, since, by default, that's what Encodable synthesis produces. An empty dictionary is placed on the stack:

          ┌───────────────────┐
    Stack │    Dictionary     │
          └───────────────────┘
    
  2. Foo encodes its identifier for key .identifier. Since identifier is a String and there is an overload for encode(_: String, forKey: Key), that gets called. _JSONEncoder knows about Strings and can just write them directly into the dictionary (since there's no point in popping them onto the stack then right off again for insertion):

          ┌───────────────────┐  ┌──────────────────────┐
    Stack │    Dictionary     │─▶│ "identifier" : "foo" │
          └───────────────────┘  └──────────────────────┘
    
  3. Foo encodes its value, which happens to be MySingleValueThing. Since there's no overload for MySingleValueThings, this falls into the generic encode<T : Encodable>(_: T, forKey: Key) call. _JSONEncoder calls value.encode(to: self) in preparation for it to encode its values onto the stack; once encoding is done, it will grab those values off of the stack for insertion into the dictionary

  4. MySingleValueThing requests a SingleValueEncodingContainer from its Encoder (this is provided by a default implementation given to RawRepresentable types whose raw value is String; _JSONEncoder just returns self

  5. MySingleValueThing encodes its rawValue into the SingleValueEncodingContainer; this places the raw value onto the stack:

          ┌───────────────────┐
          │        "a"        │
          └───────────────────┘
          ┌───────────────────┐  ┌──────────────────────┐
    Stack │    Dictionary     │─▶│ "identifier" : "foo" │
          └───────────────────┘  └──────────────────────┘
    
  6. MySingleValueThing is finished and returns. _JSONEncoder (continuing what it was doing in step 3) then takes the value off of the stack and inserts it into the dictionary at the key that it was given:

          ┌───────────────────┐  ┌──────────────────────┐  ┌───────────────────┐
    Stack │    Dictionary     │─▶│ "identifier" : "foo" │─▶│   "value" : "a"   │
          └───────────────────┘  └──────────────────────┘  └───────────────────┘
    

Encoding is now finished, and _JSONEncoder grabs the container stack and passes it over to JSONSerialization, which produces

{
    "identifier": "foo",
    "value": "a"
}

All in all, this is no extra work for superEncoder because _JSONEncoder is structured in a way that makes single values not special at all.

3 Likes

Thanks, I think I’ve got it all now. I’m off to finish writing my encoder using this information.

I strongly urge the rewording of the comments for the “superEncoder” methods. If nothing else, the next person who writes an encoder shouldn’t have to struggle over the strange specificity of what is actually a very general injunction.

This is all in preparation for examining an unrelated issue in decoding, so there is more coming in the future.

1 Like