New {De,En}codingContainer types to support enums with associated types (and tuples)

George · February 10, 2021, 10:59pm

The issue of automatic Codable synthesis for enums with associated types has come up several times in this forum, most recently here: SE-0295: Codable synthesis for enums with associated values (this was recently returned for revision). It seems the main issues from this and prior proposals were that they were not customizable enough to support the full range of coding strategies folks were interested in, and that there wasn't agreement on a good global default.

I think we can solve both of these issues by introducing four new codable container types: Tagged{En,De}codingContainer<Tag>, and Tuple{En,De}codingContainer<Tag>. Here is an example of what this could look like for Decodable:

enum Foo: Codable {
  case int(Int)
  case string(string: String)
  case tuple(Int, String)
  /// Cases with identical labels will prevent automatic synthesis
  /// case tuple(String, Int)
  case tupleWithLabel(Int, label: String)

  /// Synthesized
  enum CodingTag: Swift.CodingTag, String { /// This is synthesized similar to CodingKey
    /// We could eventually expand the `CodingTag` protocol to handle cases which only differ by value labels
    case int, string, tuple = "tupleWithoutLabel", tupleWithLabel
  }
  init(from decoder: Decoder) throws {
    let container = try decoder.container(taggedBy: CodingTag.self)
    switch container.tag {
    /// Enums with a single associated value will decode via `singleValueDecodingContainer`
    /// The inability to have a single-value tuple already has precedent in the language
    case .int:
      self = try .int(container.valueDecoder.singleValueDecodingContainer().decode(Int.self))
    /// Everything else will decode as a tuple
    case .string:
      var valueContainer = container.valueDecoder.tupleContainer(count: 1)
      self = try .string(valueContainer.decode(String.self, label: nil))
    case .tuple:
      var valueContainer = container.valueDecoder.tupleContainer(count: 2)
      var value_0 = valueContainer.decode(Int.self, label: nil)
      var value_1 = valueContainer.decode(String.self, label: nil)
      self = try .tuple(value_0, value_1)
    case .tupleWithLabel:
      var valueContainer = container.valueDecoder.tupleContainer(count: 2)
      var value_0 = valueContainer.decode(String.self, label: "label")
      var value_1 = valueContainer.decode(Int.self, label: nil)
      self = try .tuple(value_0, value_1)
    }
  }
}

With this infrastructure in place, we can provide some of the more popular defaults as decoding strategies. We would still have a true default, but it would be much easier to customize with something like TupleEncodingStrategy and critically, the default would be per-decoder and we would be preserving the full type information in the generated initializer. This way we can have different defaults for JSON vs Plists if that makes sense. The synthesis for tuples could also extend to struct stored properties with tuple types.

I haven't made any significant contributions to Swift yet, but I'm confident I could handle the Swift parts like modifying JSONDecoder if someone would like to collaborate on the compiler bits for a reference implementation.

Postscript: While I think something like this could fit nicely into Codable as it currently stands, it exacerbates some of the problems with Codable, most significantly the sheer amount of code it takes to create a custom Decoder. I'd love to eventually see an approach that simplifies creating custom Decoders, and maybe even addresses some of the underlying performance issues of Codable, but I wouldn't want to wait for that to have a good story for tuples and enums with associated values.

xwu · February 10, 2021, 11:07pm

The core team’s decision on SE-0295 isn’t even available yet: further discussion should wait for that feedback, after which it would be polite for anyone who wants to contribute any revisions (if any are requested by the core team) to collaborate with the original author.

George · February 10, 2021, 11:11pm

Whoops, thanks for the clarification. I will post in that thread.

UPDATE: It looks like that proposal was just returned for revision.

rpsm · February 20, 2021, 12:44pm

This is unfortunate, as it is actually quite a common scenario. I've got a project where associated values are extremely useful, but the entire model has to be codable. I've implemented the protocol and, yes, it's not difficult, but it's a lot of useless boilerplate code. My concern is that when this is finally adopted, it will be incompatible with my implementation and I'll have to handle migration.

George · February 20, 2021, 4:31pm

Can you elaborate on what you mean by "incompatible"? One of the ways in which this pitch differs from previous pitches is that it would allow us to have a TaggedContainerDecodingStrategy which can be set on JSONDecoder (or other Decoders). To see how this helps assume two projects exist and have manually implemented coding for enums with associated types: Project A encodes them as { "<type>": <payload> } and Project B encodes them as { "type": "<type>", "payload": <payload> }. With the new container types, we would be able to define two strategies (pardon the naming) tagAsKey and typeAndPayload which would allow the same synthesized initializer to work in both projects (with each project explicitly setting its coding style on the JSONDecoder it uses).

Also, manually implemented initializers should always take precedence over compiler-synthesized ones so no solution to this issue should require folks to migrate existing implementations (unless they want to transition to synthesized initializers).

rpsm · February 22, 2021, 1:25pm

Yes, I understand, but no doubt choosing a strategy is more prone to error, and remember there are other encoders, not just JSON, so this now becomes an obligation on each to expose in the same way. From what I can see, this is not the same with other strategies, like date, where the Property List encoder doesn't need to specify how dates are stored. Here, if I use one strategy now, on a property list, it means all encoders will need to forever specify strategies. This looks like a problem to me, as it blurs the separation between the data being encoded and the format of the encoder.

George · February 22, 2021, 5:02pm

Codable is already designed in a way that prioritizes ease of use during coding over the complexity of implementing a custom coder (because the former happens much more frequently), so I don't see a major issue with adding more complexity there. I agree that not all coders will include this strategy, some serialization formats might have explicit support for tagged unions and thus would benefit from the type information being preserved until it gets to the custom coder.

From what I've seen, a strategy is the correct level for making this kind of decision, since most likely all tagged unions within a particular serialization (for instance, a single JSON response) use the same strategy for tagged unions.

We do have an issue with existing coders, which would not implement the container(taggedBy:) method and associated functionality. This can be pretty easily addressed by providing a default implementation that falls back to KeyedDecodingContainer or throws an error.