Codable synthesis for enums with associated values

drexin · October 27, 2020, 12:17am

Introduction

Codable was introduced in SE-0166
with support for synthesizing Encodable and Decodable conformance for
class and struct types, that only contain values that also conform
to the respective protocols.

Motivation

Currently auto-synthesis does not work for enums with associated values.
There have been discussions about it in the past, but the concrete structure
of the encoded values was never agreed upon. We believe that having a solution
for this is an important quality of life improvement.

Proposed solution

The following enum with associated values

enum Command: Codable {
  case load(key: String)
  case store(key: String, value: Int)
}

would be encoded to

{
  "load": {
    "key": "MyKey"
  }
}

and

{
  "store": {
    "key": "MyKey",
    "value": 42
  }
}

The top-level container contains a single key that matches the name of the enum
case that points to another container that contains the values as they would be
encoded for structs and classes.

Associated values can also be unnamed, in which case they will be encoded into an
array instead (that need to happen even if only one of the value is not named):

enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}

would encoded to

{
  "load": [
    "MyKey"
  ]
}

and

{
  "store": [
    "MyKey",
    42
  ]
}

This solution is closely following the default behavior of the Rust library serde.

itaiferber · October 27, 2020, 1:06am

Happy to see a pitch in this area! I think this is a good start in the direction of having something in place for enums which has been missing for a long time. I'm wondering about a few cases here which are not explicitly discussed in the pitch:

What about enum cases without associated values? Are those still encoded as keyed containers, or single values? For instance, .load in
```
enum Command: Codable {
    case load
    case store(key: String, value: Int)
}
```
What about enum cases that share a name? For instance, both cases of
```
enum Foo {
    case bar(String, id: Int)
    case bar(String, value: Double)
}
```
could end up with the same encoded representation ({"bar": ["abc", 123]} could match either). Would the synthesized initializer try cases in order, or how would that work?
It would be nice to see some Alternatives Considered/Future Directions with some discussion about other approaches. For instance, from a lot of discussion in Automatic Codable conformance for enums with associated values that themselves conform to Codable and similar threads, we know that different folks have differing opinions on what they would expect to be a default implementation based on their needs.
- As part of that, I'd be curious to see some discussion about potentials for customization of synthesis (whether planned/declared to be an anti-goal/etc.), primarily, strategies for using a discriminator vs. not

drexin · October 27, 2020, 6:59pm

Those are very good points, thanks for bringing them up.

I think right now we have the following options:

Use the same structure as for the other cases, i.e. { "load": {} } or { "load": [] }
Make it a string, i.e. "load"

Using the default structure would certainly make it easier for the code generation and also there is currently no way to check if a key within a container points to a value or a nested container. So in this case we would need to use container.nestedContainer and catch the error if it doesn't exist. It would certainly be nicer to have a version of this function that returns an optional, so maybe that is something to consider adding.

I think one thing that speaks against using raw values as the default behavior is consistency.

I'm not sure there is a good way to represent this and I'm also not sure this is a very common case for serialization. I'm open for suggestions on this one, but I am leaning towards disallowing this for auto-synthesis for now.
I'll add some thoughts on the alternatives to the pitch later. Thanks for pointing this out.

itaiferber · October 27, 2020, 8:20pm

Thanks for the consideration! I think there's a lot of potential here. Some additional thoughts:

I'm not entirely sure what you mean by this; do you mind elaborating? As in, determining the difference between "load" and {"load": ...} directly from the Decoder instance itself, or something else?

Although I agree that consistency is definitely nice, one thing to consider is the evolution of enums over time. Code which uses an enum like

enum Foo: Codable {
    case bar
    case baz
}

If it later adds a

case quux(String)

it would no longer be able to decode previously-encoded .bar and .baz values, which could have encoded as single values. It might be surprising that the addition of one enum case with associated values would change the encoded representation of all of the other values too (and potentially silently at that).

Although I've seen enums like this in the wild, I mostly brought this up as something worth explicitly calling out in the pitch and the implementation. I don't necessarily think there needs to be a different representation for these two cases, just clearly spelled-out rules about what would happen (e.g., enum cases like this are attempted in order, always). I think disallowing these altogether might introduce a bit more pain than is needed, but just my opinion.

drexin · October 27, 2020, 9:15pm

Yes, sorry. I mean if I have a KeyedDecodingContainer, I can't determine whether under a given key, there is a nested container in a non-throwing manner. For values there are the decodeIfPresent functions, for nested containers there is only the throwing function.

itaiferber:

Although I agree that consistency is definitely nice, one thing to consider is the evolution of enums over time. Code which uses an enum like
enum Foo: Codable {
    case bar
    case baz
}
If it later adds a
case quux(String)
it would no longer be able to decode previously-encoded .bar and .baz values, which could have encoded as single values. It might be surprising that the addition of one enum case with associated values would change the encoded representation of all of the other values too (and potentially silently at that).

Yes, that is another point in favor of always using containers instead of raw values.

I think disallowing this would be better than silently running into the wrong case. If we can find a good way to represent these cases in the future, the support can still be added, but it's hard to change it, once it's in.

itaiferber · October 28, 2020, 3:14pm

Got it, agreed. The way to do this would be to attempt to fetch a nested container and catch an error if that failed. Synthesis could at least be smart about this and attempt only the containers expected based on the types of enum cases (e.g. if all enum cases are labeled, there's no need to attempt an unkeyed container).

I was going to say that I'm not sure I agree (I was trying to make the case for the opposite side) but realized that my concern is unfounded. To be a bit clearer, I was concerned that

Attempting to encode enum Foo: Codable today would encode .bar and .baz as simple strings, but with this proposal, the encoding format would change
Along those lines, adding another case with associated values (not possible today) could risk changing the encoding format

I realized after posting that enum Foo: Codable as expressed above doesn't work today because Foo is not RawRepresentable, and if Foo were made RawRepresentable to compile today, you couldn't add a case with associated values without dropping the raw value anyway.

So to sum up:

Having cases with no associated values encode as keyed containers won't change any behavior from today
Having the, encode as keyed containers wouldn't change the encoding format if you add additional cases with associated values

I'm on board with this direction.

swhitty · November 14, 2020, 12:05am

Considering the similarities to tuples, it may make sense to reduce the scope to only provide synthesis for cases with single associated values. I would still find this very useful.

enum Action: Codable {
  case close
  case web(Auth)
  case web(link: Link)

  // do not support until tuples also have a synthesis solution 
  case both(auth: Auth, link: Link)
}

struct Auth: Codable { }
struct Link: Codable { }

Developers would naturally expect to be able to override the synthesis of CodingKeys like they can today with structs;

enum Action: Codable {
  case close
  case web(Auth)
  case web(link: Link)

  enum CodingKeys: String, CodingKey {
    case web = "auth"
    case webLink = "link"
  }
}