SE-0295: Codable synthesis for enums with associated values

Dante-Broggi · December 2, 2020, 6:02pm

Do keep in mind that we still ought to distinguish:
(foo: "bar", 42) and (foo: "bar", `_`: 42)
and the enum cases based on these.

wowbagger · December 2, 2020, 6:09pm

I suppose

(foo: "bar", `_`: 42)

can be encoded to

[
    {"foo": "bar"},
    {"`_`": 42}
]

Although, this does make it inconsistent with the synthesis for other types that don't need to worry about labels.

EDIT: Just gave it another thought,

(foo: "bar", 42)

encoding to

[
    {"foo": "bar"},
    {"": 42}
]

and

(foo: "bar", _: 42)

encoding to

[
    {"foo": "bar"},
    {"_": 42}
]

is better

George · December 2, 2020, 6:16pm

We would also need to distinguish between the following:

  case foo(_: Int)
  case foo(bar: Int)

which wouldn't be a problem in my reduced-scope suggestion since you can't write

  case foo(Foo)
  case foo(Bar)

as it is considered a redeclaration of an enum case

drexin · December 2, 2020, 7:16pm

JSONSerialization has a an option to allow fragments, but yes, PropertyListSerialization does not allow that. So I think keeping it an object is the more compatible variant.

beccadax:

This, on the other hand, will definitely break with some coders. For example, the plist format doesn't have a way to write null, so this:
import Foundation
struct HasOptional: Encodable { var value: Int? = nil }
let plist = PropertyListEncoder()
plist.outputFormat = .xml
print(String(decoding: plist.encode(HasOptional()), as: UTF8.self))
Prints this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict/>
</plist>
I don't think any coder would have trouble distinguishing true from a missing key; that's why I chose that value. (I could totally imagine coders having trouble with false, null, 0, "", or even [] or {}, though.)

That is not the same structure es we would use for the enums, though. That would be:

import Foundation

enum Foo: Encodable {
  case bar

  enum CodingKeys: CodingKey {
    case bar
  }

  func encode(to encoder: Encoder) throws {
    var container = encoder.container(keyedBy: CodingKeys.self)
    try container.encodeNil(forKey: .bar)
  }
}

let plist = PropertyListEncoder()

plist.outputFormat = .xml

print(String(decoding: try! plist.encode(Foo.bar), as: UTF8.self))

which results in the following output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>bar</key>
	<string>$null</string>
</dict>
</plist>

Since nil values are allowed in these positions in Codable, wouldn't it be the codec's responsibility to handle it properly?

drexin · December 2, 2020, 9:57pm

George:

We would also need to distinguish between the following:
  case foo(_: Int)
  case foo(bar: Int)
which wouldn't be a problem in my reduced-scope suggestion since you can't write
  case foo(Foo)
  case foo(Bar)
as it is considered a redeclaration of an enum case

As mentioned above, I would prefer to explicitly disallow duplicate case names for the synthesis. The goal of the proposal is to cover the most common cases in a reasonable way. That means finding the balance between restricting it to special cases and covering all possible cases.

drexin · December 2, 2020, 10:15pm

Thank you

The KeyEncodingStrategy and KeyDecodingStrategy are specific to JSONEncoder and JSONDecoder and apply in the same way as they do for other types. Codable itself (or the synthesis) does not have any influence on that, so I don't think it would contribute to the clarity of the proposal to talk about features of specific codec implementations. The important part is that case names and labels (for cases that have no unlabeled parameters) are represented as CodingKeys, so the same rules as for the existing cases apply.

Morten_Bek_Ditlevsen:

I would like to echo a few of the comments of @brentdax and throw in a small question:

In case of:
enum Command: Codable {
  case load
}
If this were to encode as in one of the suggestions by @brentdax as:
{
  "load": true
}
(here I agree that true is the correct choice of value for the same reasons that Brent gives)

Then load is clearly a key and the encoding of the key should follow the keyEncodingStrategy of the (json) encoder.

If we chose to encode it without a keyed container as simply:
"load"
Then it is no longer a key and would not respect the keyEncodingStrategy.
It might be confusing for users to understand when a case might be a value and when it might be a key.

To me this confusion suggests the following rule:
If an enum is made Codable, then all of it's cases should encode (and decode) to (and from) keyed containers using the case name as the key.

Yes, I absolutely agree with this. I think there could be more customization options in the future to allow for the other case as well, but for this particular proposal I think consistency is the better choice.

Morten_Bek_Ditlevsen:

A question (where I feel I know the answer, but would like it clarified):
In case of the following:
enum Command: String, Codable {
  case load
}
Here the compiler would use the existing Codable conformance of RawRepresentable (String) instead of synthesizing this newly suggested conformance, right? Otherwise this would be a behavior breaking change.

Correct. The existing cases are not affected in any way.

Morten_Bek_Ditlevsen:

I'd very much like to echo the suggestion from others that single unlabeled associated values should encode directly as values with the case as the key:
enum Command: Codable {
  case load(String)
}
encodes as:
{
  "load": "my associated value"
}
Like @George , I've written many, many manual Codable enum conformances exactly in this way.

If this should not be the default, then I'd at least hope for some customization point that can give this behavior.

Yes, I also think that this is reasonable.

I still think that falling back to UnkeyedContainer for the mixed label case is reasonable. Is there a specific reason you'd prefer it to be rejected?

For the duplicated names I agree that it should be rejected and it should have been part of the proposal, but it slipped my mind while writing the proposal.

Morten_Bek_Ditlevsen · December 3, 2020, 7:57am

Regarding the KeyEncodingStrategy (etc), you're of course right that this isn't affected by the proposal, and that it will be used 'automatically', so I agree that there's no reason to point it out in the proposal.

Well, this is just my own humble opinion of where to 'draw the line'.

My reasoning is:

There is a choice between trying to handle any enum or a subset of all enums. Trying to handle any enum would likely result in a very complex encoded structure, that would not feel 'natural' for many users.

So it appears that the choice of only handling a subset of all enums has already been made (leaving out duplicate case names, for instance).

And when this choice has been made, then the next choice is where to 'draw the line'.

And this is where my personal taste / gut feeling comes in:

I don't think that I would ever write a custom encoding of an enum case with a mix of labelled and unlabelled values into an UnkeyedContainer. I just don't think it would be natural for me to do. If I encountered an enum case with this mix, I think that I would 'key' them in the output - most likely explicitly be adding an intermediate Codable struct carrying all the values and thus having it encoded into a KeyedContainer. Doing this manually of course allows me to give unique and meaningful names to the keys, so it's not something that can likely be done automatically.

I also don't think that I have encountered many enums with a mix of labelled and unlabelled values in the code bases I have been working on. Perhaps it would be worth a survey to try and figure out how many of them are used in general, how many of these are Codable and how the Codable respresentation tends to be implemented. I think someone built a tool to automatically test for occurrences of patterns in swift code across many public swift projects on github, but I guess it would be a huge task to do.

If I did have an enum with this mix, and I found that it would encode it's associated values into an array (or UnkeyedContainer), then I think I would prefer to 'fix' the enum case and add labels to all values and thus have it encoded into a KeyedContainer instead. And in that respect it would actually be more helpful to me if the enum could not have Codable synthesized before it was 'fixed'.

Ok, I know that the above is very highly a matter of personal taste, but there is definitely a balance between a completely general format that can encode any enum (though because of the generality perhaps not in the most 'natural' way) and a more specialized format that only works for a subset of enums (but allows encodings in a way that may feel more natural in each of the specialized, supported cases).

Would it be worth to try and formalize this 'balance' into a Goal section of the proposal?

Is it a goal to synthesize Codable support for any enum?
Is it a goal to solve many use cases by using a very general format? (in this case I think that the UnkeyedContainer approach is the most general and ought to be applied for all supported cases, but I am personally not a fan of the encoded output.)
Is it a goal to solve fewer specific use cases with specific rules? (perhaps like the three cases I tried implementing in the Codable conformance in my previous post )
Is it a goal to solve some specific use cases with special formats and then 'fall back' to the more generalized format in the rest of the cases? (in other words, 3. but falling back to 2. for cases with a mix of labelled an unlabelled values).
What might future customization points look like (other than the key-mapping ones presented in the proposal)?

I know that these are not easy questions and many people may have different preferences.

Apologies for the long rant.

drexin · December 3, 2020, 6:46pm

Morten_Bek_Ditlevsen:

Would it be worth to try and formalize this 'balance' into a Goal section of the proposal?

Is it a goal to synthesize Codable support for any enum?

Is it a goal to solve many use cases by using a very general format? (in this case I think that the UnkeyedContainer approach is the most general and ought to be applied for all supported cases, but I am personally not a fan of the encoded output.)

Is it a goal to solve fewer specific use cases with specific rules? (perhaps like the three cases I tried implementing in the Codable conformance in my previous post )

Is it a goal to solve some specific use cases with special formats and then 'fall back' to the more generalized format in the rest of the cases? (in other words, 3. but falling back to 2. for cases with a mix of labelled an unlabelled values).

What might future customization points look like (other than the key-mapping ones presented in the proposal)?

I know that these are not easy questions and many people may have different preferences.

Apologies for the long rant.

No worries, this is exactly why we are doing the reviews - to get feedback from the community . I really appreciate you and everyone else taking the time to comment here.

I think it makes sense to add this information to the proposal, but let me try to answer your questions:

The goal is to find a reasonable default, meaning that not all cases (e.g. duplicate names) will be supported, but as many cases as we can reasonably support will be. This also means finding reasonable structures for these cases. If all parameters are labeled, I think most users would expect the labels to be used as keys. If there are no labels, it seems reasonable to use an UnkeyedContainer. Since we can't make up keys for any unlabeled parameters, but we still want to support as many cases as possible, it seems reasonable to fall back to the UnkeyedContainer for the mixed case. I don't think there is one layout that will fit everyone's use cases / preferences, but the proposed layout (with the changes already discussed) should cover many of them.

As for future customization points, I'd be happy to discuss that, but since it's out of scope of this proposal, I think that should be a separate discussion, so we don't distract from the topic at hand.

George · December 3, 2020, 7:33pm

I don't think this is true in the single unlabeled parameter case. I've written and seen many implementations of Codable for enums and not one of them encoded case foo(Bar) as "foo": [ <encoding of Bar> ], instead providing the encoding of Bar directly under the key ("foo": <encoding of Bar>). I don't think case foo(<fields of Bar>) is a "good enough" representation of this very frequent use case, since having a Bar type is often very helpful (for instance, when customizing decoding for just that one case using init(from decoder: Decoder)).

For the record, I think this was acknowledged this in an earlier post, but I wanted to make sure this wasn't being walked back. If I'm understanding correctly, is the current proposal that enums with a single associated type be encoded directly (i.e. load(String) encodes as "load": "string"), but with multiple unlabled associated types encodes as an array (i.e. load(String, String) encodes as "load": ["string", "string"])?

Here, I would still prefer the multiple-unlabeled case to simply prevent automatic synthesis as it seems a little weird to me that changing load(String) to load(String, Int) is a more breaking change than changing load(String, Int) to load(String, Int, Int). Though this isn't the biggest deal for me, as I don't expect I would ever use that form.

drexin · December 3, 2020, 8:38pm

George:

drexin:

If there are no labels, it seems reasonable to use an UnkeyedContainer .

I don't think this is true in the single unlabeled parameter case. I've written and seen many implementations of Codable for enums and not one of them encoded case foo(Bar) as "foo": [ <encoding of Bar> ], instead providing the encoding of Bar directly under the key ("foo": <encoding of Bar>). I don't think case foo(<fields of Bar>) is a "good enough" representation of this very frequent use case, since having a Bar type is often very helpful (for instance, when customizing decoding for just that one case using init(from decoder: Decoder)).

For the record, I think this was acknowledged this in an earlier post, but I wanted to make sure this wasn't being walked back. If I'm understanding correctly, is the current proposal that enums with a single associated type be encoded directly (i.e. load(String) encodes as "load": "string"), but with multiple unlabled associated types encodes as an array (i.e. load(String, String) encodes as "load": ["string", "string"])?

Sorry for being imprecise in that response. Yes, I agree that representing single unlabeled parameter cases as the raw value instead of an array makes sense.

As I said earlier, I think there is no one solution that fits all use cases. I think the multiple unlabeled case is useful and I would prefer to keep it in.

anandabits · December 4, 2020, 2:42am

I like most of the details of this proposal quite well. Thank you for working on it!

I was thinking the same when I read the proposal. +1 to this idea.

+1 to this idea as well. The _caseName convention does not feel right for Swift.

The other point of feedback I have is that I do not think unkeyed containers should be used for associated values. Instead, I think we should establish a convention for handling them and recommend use of the coding keys to override the default. The first idea that comes to mind for the convention is to use implicit positional names such as _0, _1, etc for unlabeled associated values.

By default, this enum:

enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}

would encode a store value as:

{
    “store”: {
        “_0”: “some string”,
        “value”: 42
    }
}

To override this, you would declare:

enum CodingKeys: CodingKey {
    case load
    case store

    enum Store: String, CodingKey {
        case _0 = “keyForEncoding”
        case value
    }
}

Morten_Bek_Ditlevsen · December 4, 2020, 7:09am

I really like this approach to always encoding multiple-value cases into KeyedContainers!
I especially like how this allows for customization of key names for unlabelled cases. Nice!

In this case the synthesis 'rules' would be:

enums with multiple cases with the same case names will not have Codable conformance synthesized.
enums with cases with associated values where multiple labels have the same name will not have Codable conformance synthesized.
Any enum instance will encode into a KeyedContainer using the case name as the key - meaning that all cases will have a 'value' encoded (even cases with no associated values).
Cases with no associated values encode the value as a fixed value (true, [ ], {}, null, false, 1, 0)
Cases with one (unlabelled or both labelled and unlabelled?) value encode the value directly.
Cases with multiple values encode into KeyedContainers with unlabelled values mapping to the value position prefixed with an underscore - allowing easy customization of encoding and decoding of these keys.

I really like that!

5 should probably just be the unlabelled single value, but it could apply to all single value cases whether they have values or not.

For 4 I'd definitely prefer true as the fixed value following the reasoning of @beccadax.

Morten_Bek_Ditlevsen · December 4, 2020, 7:18am

In the above, 2 should of course also disallow enums with cases where there is a clash between the synthesized key names for unlabelled parameters and actual parameter labels:

enum Invalid {
  case a(String, _0: String)
}

Morten_Bek_Ditlevsen · December 4, 2020, 7:24am

Sample synthesized conformance:
Note that allLabelled, mix and singleLabelled are all synthesized by the same 'rule'. They are just all provided for clarification.

enum Value: Codable {
  case noValues
  case allLabelled(a: String, b: String)
  case singleUnlabelled(String)
  case singleLabelled(label: String)
  case mix(String, b: String, String, d: String)
}

Synthesizes something similar to the following:


extension Value {
  enum CodingKeys: String, CodingKey {
    case noValues
    case allLabelled
    case singleUnlabelled
    case singleLabelled
    case mix
  }

  enum AllLabelledCodingKeys: String, CodingKey {
    case a
    case b
  }

  enum MixCodingKeys: String, CodingKey {
    case _0
    case b
    case _2
    case d
  }

  enum SingleLabelledCodingKeys: String, CodingKey {
    case label
  }

  init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    if container.contains(.noValues) {
      self = .noValues
    } else if container.contains(.singleUnlabelled) {
      self = try .singleUnlabelled(container.decode(String.self, forKey: .singleUnlabelled))
    } else if container.contains(.allLabelled) {
      let nested = try container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
      self = try .allLabelled(
        a: nested.decode(String.self, forKey: .a),
        b: nested.decode(String.self, forKey: .b)
      )
    } else if container.contains(.mix) {
      let nested = try container.nestedContainer(keyedBy: MixCodingKeys.self, forKey: .mix)
      self = try .mix(
        nested.decode(String.self, forKey: ._0),
        b: nested.decode(String.self, forKey: .b),
        nested.decode(String.self, forKey: ._2),
        d: nested.decode(String.self, forKey: .d)
      )
    } else if container.contains(.singleLabelled) {
      let nested = try container.nestedContainer(keyedBy: SingleLabelledCodingKeys.self, forKey: .singleLabelled)
      self = try .singleLabelled(
        label: nested.decode(String.self, forKey: .label)
      )
    } else {
      throw NSError(domain: "", code: 0, userInfo: nil)
    }
  }

  func encode(to encoder: Encoder) throws {
    var container = encoder.container(keyedBy: CodingKeys.self)
    switch self {
    case .noValues:
      try container.encode(true, forKey: .noValues)
    case let .allLabelled(a, b):
      var nested = container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
      try nested.encode(a, forKey: .a)
      try nested.encode(b, forKey: .b)
    case .singleUnlabelled(let value):
      try container.encode(value, forKey: .singleUnlabelled)
    case let .singleLabelled(label):
      var nested = container.nestedContainer(keyedBy: SingleLabelledCodingKeys.self, forKey: .singleLabelled)
      try nested.encode(label, forKey: .label)
    case let .mix(_0, b, _2, d):
      var nested = container.nestedContainer(keyedBy: MixCodingKeys.self, forKey: .mix)
      try nested.encode(_0, forKey: ._0)
      try nested.encode(b, forKey: .b)
      try nested.encode(_2, forKey: ._2)
      try nested.encode(d, forKey: .d)
    }
  }
}

hisekaldma · December 4, 2020, 1:49pm

I’m a strong -1 on this proposal as it is written currently.

The reason that SE-0166 left out encoding of enums with associated values is that unlike structs and RawRepresentable enums there is no obviously correct format for them. There have been a number of discussions about possible formats, e.g. in the thread linked to from the proposal, without a clear consensus emerging. As such, the problem to be solved here isn’t just to introduce a format for enums with associated values – but to introduce a format and show why that format is a better default than the many alternative formats that have been discussed, given the trade-offs involved. This proposal doesn’t do that.

One of the specific problems that I see with the suggested format is that it makes it hard to evolve schemas over time. In my experience it’s really common to want go from ”this can only be A” to ”this can be A or B”, i.e. from a struct to an enum. With the suggested format, this type of change is neither backward compatible nor forward compatible. With a format where the case name is encoded as its own key, this type of change can instead be made both backward compatible and forward compatible.

To illustrate, lets say we have a struct:

struct Foo: Codable {
  var a: Int
  var b: Int
}

This gets encoded like so:

{
  "a": 1,
  "b": 2
}

Now we want to evolve this to be an enum where the old Foo is one of several cases:

enum Bar: Codable {
  case foo(a: Int, b: Int)
  case baz(b: String)
}

With the proposed format, this would get encoded like so:

{
  "foo": {
    "a": 1,
    "b": 2
  }
}

This means that existing Foos encoded with earlier versions can’t be decoded as Bar.foos by the new version (the new version isn’t backward compatible) and that Bar.foos encoded with the new version can’t be decoded as Foos by earlier versions (earlier versions aren’t forward compatible).

If we instead encoded the name of the case as its own key, like so...

{
  "$case": "foo",
  "a": 1,
  "b": 2
}

...then Bar.foos encoded with the new version can automatically be decoded as Foos by earlier versions (they’re forward compatible). And if we default to the .foo case when the $case key is missing, then Foos encoded with earlier versions can be decoded as Bar.foos by the new version (it’s backward compatible).

It may well be that encoding the case name as its own key has other problems that the format proposed here doesn’t have. Perhaps the proposed format is really common among existing JSON APIs, and interoperability with those APIs is more important than the ability to evolve schemas over time? I don’t know. But whatever format we end up choosing will, by virtue of being the default, be used by many file formats and web services for years to come. That type of decision should rest on much stronger reasoning about the technical merits and trade-offs of the format than the proposal currently has.

At the very least, I think the proposal should compare the major alternative formats that have been suggested, and describe a) how easy or hard they make it to evolve schemas over time, and b) how common they are in the wild (both in handwritten Swift and in other languages’ serialization libraries, as well as in common APIs that Swift code might interoperate with).

Morten_Bek_Ditlevsen · December 4, 2020, 3:44pm

I really appreciate that the resulting encoding formats are challenged. As you note, the formats will be the natural first choice for many developers going forward.

I do not, however, think that forward/backwards compatibility between the two rather arbitrary formats in your example should influence the decision. One might easily come up with a structure that encodes exactly as an enumeration case with multiple values and thus conclude the opposite: namely that the chosen format is good for structural evolution. In truth neither are better for evolution because structures can evolve in any way imaginable - and thus it seems like it does not make a lot of sense to consider this case.

But I definitely agree that it would be good with some ‘evidence’ that the chosen encodings are better than others. I like the chosen ones since they overlap with my own experiences encoding enums, but that’s just a single personal observation.

hisekaldma · December 4, 2020, 4:18pm

I disagree that the example is arbitrary; in my experience this specific change – going from "only this" to "this or that" – is one of the most common changes made to data formats over time. But perhaps the stripped-down example that I gave makes it feel more arbitrary than it is? A more life-like example would be to go from:

struct Image: Codable {
  var resource: ImageResource
}

...to:

enum Media: Codable {
  case image(resource: ImageResource)
  case video(resource: VideoResource, startTime: Double)
}

I would also dispute this:

Both backward compatibility and forward compatibility are often important, and they both put hard constraints on how formats can evolve. For example, backward compatibility is often important when evolving file formats, since people don’t like losing their data. And forward compatibility is typically important when evolving web APIs where older clients must be able to communicate with newer versions of the API. As such, formats don’t tend to evolve in any way imaginable – they tend to evolve specifically in the ways that preserve compatibility, e.g. by adding optional keys or keys with default values, or by going from "only this" to "this or that". Swift’s own features for library evolution are a great example of this.

Morten_Bek_Ditlevsen · December 4, 2020, 4:23pm

Apologies, I didn’t mean ‘arbitrary’ as in ‘unrealistic’, but rather as just one of several ways in which structures may evolve. I’m not at a computer right now but I will give other examples later.

Jumhyn · December 4, 2020, 4:27pm

I think this strikes at the core of my discomfort with this proposal. Thank you for articulating all that, @hisekaldma. Historically, we've reserved protocol requirement synthesis for only those which have such "obviously correct" implementations. This philosophy guided the synthesis of Equatable and Hashable implementations, and precluded the inclusion of other protocols like AdditiveArithmetic and Comparable. I'm wary about including synthesis for implementations that have the goal of "just do it some way, doesn't matter which."

IMO, such synthesized implementations are better suited for a more generalized synthesis feature that would allow authors to specify exactly how a conformance should be derived (discussed some here for AdditiveArithmetic, but the ideas are more general than that one protocol, and have cropped up in other threads over the years).

drexin · December 4, 2020, 9:59pm

Thanks for your feedback. My thoughts on your proposed format are:

How would the synthesized decoder decide which case to pick as the default one?
What if the labels and types are the same for multiple cases and only the case name is different?
The format you propose would make it look identical to the old code and could cause unwanted effects.

I agree that evolution is important, but correctness is more important for the default behavior.