SE-0295: Codable synthesis for enums with associated values (second review)

jayton · January 25, 2021, 9:25am

woolsweater:

Along with everything else that's been said, the chosen encoding is highly ambiguous; unless I am misreading, these two have identical encodings!
enum E { case a(Bool), b([Int]) }
enum F { case a, b(UInt8, UInt8, UInt8) }
That kind of thing is a trap laid waiting for a program ingesting serialized data even from a nominally trusted source.

I’m generally dubious about this proposal for reasons that have already been stated, but this isn’t an actual problem. Codable is designed for duck-typed serialization formats; the fact that one representation can map to multiple types is inherent. For example, 3 in JSON might be decoded as an Int or a Double, while "http://example.com/"can be decoded as a String or a URL.

In this case, E is more general than F; an encoded E should be decodable as an F if it happened to have three integers in the range 0...255, and not otherwise.

Max_Desiatov · January 25, 2021, 9:42am

I hope that any new additions to Codable would avoid the strategies pattern. In my experience it has proven to be critically inflexible, even when a strategy is set up per encoder/decoder instance, not a static property.

It's relatively frequent for APIs to return dates in different formats even for different properties of the same entity, or for different entities. With strategies it's very cumbersome to customize coding per property or per type. The strategies pattern could have made sense when Codable was designed (what was it? Swift 2? Swift 3?), but right now in 2021 property wrappers are a much better approach in my opinion (see the BetterCodable library).

Now when reading your code again, I see that static property is defined on the codable type itself, I still think it's relatively cumbersome. While property wrappers can't be attached to enum cases or associated values, I wonder if there could be a better way to specify customizations with them for this case.

gwendal.roue · January 25, 2021, 9:55am

I agree this is a real question, and that years have shaped our experience of Codable. You can see the follow-up question, and I'm very interested in the point of view of the designers of Codable, who did a tremendous job. On the topic of property wrappers: they sure are handy. But don't they have their own limitations? For example, since they are usually embedded as a property of a container type, how can we deal with a top-level value?

Max_Desiatov · January 25, 2021, 9:56am

And yet, sadly, the proposal and both review threads focused mostly if not exclusively on JSON.

I agree that setting these properties on coder instances is probably right in principle for complete separation of "model" types and coding formats. And yet, I don't see it working really well in practice. Especially when there's a need to customize a single property of a single type in a deeply nested tree of coded entities. In my practice this is much more common than sharing the same codable type between multiple target formats

Doing that with the current strategies implementation is incredibly cumbersome. Your approach with a static property on codable types is actually much better from this perspective.

gwendal.roue · January 25, 2021, 10:45am

Thank you ;-) But I wouldn't want anyone to think that I'm keen to throw a core design decision about Codable away, and call it a day. Since the beginning, the language, the standard library, and Foundation have spoken together with a single voice. We can't introduce a discordant api lightly, because the consequences of a paradigm schism are difficult to foresee.

itaiferber · January 25, 2021, 2:44pm

There are a few things to unpack in this thread, so I hope I manage to get them all.

Not a direct response to what you wrote, but for context: I think it's important to call out that what this pitch is proposing is a strategy for Codable synthesis for enums, so we're talking about compile-time behavior and not necessarily runtime behavior (which is separate, and not incompatible). In scope, we're looking at a way to influence how a type implements init(from:) and encode(to:), and are not able to influence how a property encodes — property wrappers (and strategies) are a nice solution, but out of scope here.

To discuss synthesis specifically — a while back I had a similar suggestion, along slightly different lines:

Automatic Codable conformance for enums with associated values that themselves conform to Codable

For instance, one straw-man approach to this is to generate both implementations and allow someone to use either approach via a property, e.g.
enum SynthesizedEnumDiscriminatorType {
    // No discriminator -- attempt cases in order single-value
    case none

    // Use the case name as the value-containing key, e.g.
    // {"album": {"id": "abc"}} or {"podcast": {"id": "xyz"}}
    case discriminateByCaseName

    // Use the case name as a discriminator keyed by the given key, e.g.
    // with disciminateByCaseNameForKey(CodingKeys.object), {"object": "album", "id": "abc"}
    case discriminateByCaseNameForKey(CodingKey)
}

enum MyEnum {
    case string(String)
    case double(Double)

    // Synthesized
    static var codableDiscriminatorType: SynthesizedEnumDiscriminatorType {
        return .none
    }

    init(from decoder: Decoder) throws {
        switch MyEnum.codableDiscriminatorType {
            // ... use the strategies
        }
    }
}
Essentially, add another customization point by using a property similar to how CodingKeys influences the representation of a structured type. If you want to override the discriminator strategy used, provide a discriminator type via the enum (you can even make it mutable if we decide to generate all implementations and choose dynamically).

Given that enums span a wide range of use cases, I've always been in favor of offering customization for those use-cases. The (understandable) lack of consensus among how folks would prefer an enum encode was something that always prevented an easy implementation of this feature, and while I sympathize with the desire to get a default in place, I wish the pitch called this out a little more explicitly.

A static property would give us an "in" to inform the compiler of how you might expect the enum to encode and decode, in the same way that CodingKeys allow us to inform the compiler of expected synthesis for structural types.

So, my thoughts on this are: given thoughtful design of what customization options are possible, I'm obviously in favor. This design would effectively depart entirely from the pitch, which is, of course, entirely out of my hands.

Now, this design gives customization for a compile-time implementation of init(from:)/encode(to:), but that is entirely orthogonal to runtime overrides on an encoder/decoder or property level. Individual encoders and decoders could look for some info in .userInfo to override specific types (already possible today, done on a type-by-type basis with buy-in from the encoder/decoder), and property wrappers can be written to override enum properties. These are already possible today, though, and still require an existing Codable implementation, which the pitch intends to offer a default for.

A few unrelated thoughts on some things called out here:

Edge cases not covered by the pitch, such as case bar(number: Int, number: Int, number: String), case barbar(Void, (Void, Void)), and others: enums offer a surface area that other structural types do not (e.g. you can't have overloaded property names on a struct, but you can have repeated labels), and I think that regardless of future directions, we can come up with solutions to these, either by defining behavior, or preventing synthesis. Either way, it'd be nice to collect these edge cases in the proposal and explicitly call out what is to be done in each case
For "ambiguous" encoding of something like
woolsweater:
```
enum E { case a(Bool), b([Int]) }
enum F { case a, b(UInt8, UInt8, UInt8) }
```
@jayton hits the nail on the head: if E and F output the same serialized representation, you should be able to decode either from that representation. This matches existing behavior for all other types, e.g., [1, 42] can be decoded as any of [Int]/[Int8]/.../[Float]/[Double], but also CGPoint and CGSize, which encode two numeric properties in an unkeyed container. This is an intentional part of Codable design: the format should not prescribe what is possible to decode — that's part of the code itself
For historic reference, Codable was released with Swift 4:

Max_Desiatov:

The strategies pattern could have made sense when Codable was designed (what was it? Swift 2? Swift 3?), but right now in 2021 property wrappers are a much better approach in my opinion (see the BetterCodable library).

I'm happy to discuss this elsewhere as I don't want to derail conversation here, but property wrappers solve an entirely different problem from strategies — the difficulty you note in solving one problem with the other alludes to that. Had property wrappers existed in that time frame I'm sure Codable would have made thorough use of them, but I feel fairly confident we would have introduced strategies along with them.

hisekaldma · January 25, 2021, 6:35pm

-1. I’m still of the opinion that if Swift is going to embrace a single default for encoding enums a ”case as value” approach is a better default than a ”case as key” approach. The changes to the proposal since the first review don’t really change that. (And some great arguments have been made both in this thread and the previous thread that we in fact shouldn’t embrace a single default approach, but rather make it something that the user must choose explicitly.)

However, since the core team has specifically asked for input on the updated proposal from a schema evolution standpoint, I want to do a deep dive on that. Regardless of what approach we adopt, I think the out-of-the-box behavior should support changing schemas in backward/forward compatible ways as much as possible. If you slap Codable on an enum, you should get something that you can use immediately, but it should also be something that you can change when you have data or clients out there using it. And the proposed format is not it.

It’s a common principle when evolving schemas that adding a member with a default value is a backwards compatible change. In the proposed format, this isn’t generally true for associated values.

Let’s look at some scenarios in detail:

A. Adding an unlabeled value to a case without associated values

Not backwards compatible. The container doesn't change, but the value changes from a Bool to whatever type the associated value is. Data encoded with the old version will (probably?) throw if decoded with the new version.

	Swift	JSON
Old	`enum E { case test }`	`{ "test": true }`
New	`enum E { case test(Double = 0.0) }`	`{ "test": 2.0 }`

B. Adding an unlabeled value to a case with a single unlabeled value

Not backwards compatible. The container changes from a single value container to an unkeyed container, so data encoded with the old version will throw if decoded with the new version.

	Swift	JSON
Old	`enum E { case test(Int) }`	`{ "test": 1.0 }`
New	`enum E { case test(Int, Double = 0.0) }`	`{ "test": [1.0, 2.0] }`

C. Adding an unlabeled value to a case with multiple unlabeled values

Maybe backwards compatible. It depends on whether the synthesized init(from:) handles unkeyed containers with fewer elements than expected when some or all associated values have default values. The behavior here is unspecified in the proposal, so it’s hard to tell.

	Swift	JSON
Old	`enum E { case test(Int, Int) }`	`{ "test": [1.0, 1.0] }`
New	`enum E { case test(Int, Int, Double = 0.0) }`	`{ "test": [1.0, 1.0, 2.0] }`

D. Adding an unlabeled value to a case with labeled values

Not backwards compatible. The container changes from a keyed container to an unkeyed container, so data encoded with the old version will throw if decoded with the new version.

	Swift	JSON
Old	`enum E { case test(name: String) }`	`{ "test": { "name": "abc" } }`
New	`enum E { case test(name: String, Double = 0.0) }`	`{ "test": ["abc", 2.0] }`

E. Adding a labeled value to a case without associated values

Not backwards compatible. The container changes from a single value container to an keyed container, so data encoded with the old version will throw if decoded with the new version.

	Swift	JSON
Old	`enum E { case test }`	`{ "test": true }`
New	`enum E { case test(name: String = "") }`	`{ "test": { "name": "abc" } }`

F. Adding a labeled value to a case with a single unlabeled value

Not backwards compatible. The container changes from a single value container to an unkeyed container, so data encoded with the old version will throw if decoded with the new version.

	Swift	JSON
Old	`enum E { case test(Int) }`	`{ "test": 1.0 }`
New	`enum E { case test(Int, name: String = "") }`	`{ "test": [1.0, "abc"] }`

G. Adding a labeled value to a case with multiple unlabeled values

Maybe backwards compatible. It depends on whether the synthesized init(from:) handles unkeyed containers with fewer elements than expected when some or all associated values have default values. The behavior here is unspecified in the proposal, so it’s hard to tell.

	Swift	JSON
Old	`enum E { case test(Int, Int) }`	`{ "test": [1.0, 1.0] }`
New	`enum E { case test(Int, Int, name: String = "") }`	`{ "test": [1.0, 1.0, "abc"] }`

H. Adding a labeled value to a case with labeled values

Backwards compatible.

	Swift	JSON
Old	`enum E { case test(name: String) }`	`{ "test": { "name": "abc" } }`
New	`enum E { case test(name: String, category: String = "") }`	`{ "test": { "name": "abc", "category": "things" } }`

These scenarios highlight a couple of things:

Changing the type of container based on the payload is a really bad idea from a compatibility standpoint. The changes in the scenarios above would all be backward compatible if the synthesis always used a keyed container, and used "$0" or "_0" as keys for the unlabeled values, as has been suggested multiple times.
There is no underlying principle or general rule for which changes you can make to enums and maintain backward compatibility with the proposed format. In the previous version, you could at least go by ”adding a labeled value is always backward compatible”, but now that cases without associated values are encoded as true, even that doesn’t work.

In fact, if you’re working on something where backwards compatibility is an absolute must, the only sound advice seems to be ”if an enum has associated values, always implement Codable manually.” I can definitely see that becoming a lint rule in some projects.
The proposal still doesn’t specify how the synthesis will actually work in enough detail (e.g. how does decoding of unlabeled values with default values work?).

Drew_Crawford1 · January 25, 2021, 7:03pm

I am -1 on this proposal as written for reasons best expressed by @Max_Desiatov, this is not widely applicable enough to be a default.

I do feel the solution is strong enough we should ship it as an option. @sveinhal discusses how to use strategies to pick between these sorts of option, which at a very high level seems like the right approach to me, although since strategy is a runtime API configured at the decoder level this has many practical limitations.

However if the goal is to configure the autosynthesis at the type level, a natural way might be struct Foo: CaseAndValuesCodable { ... }. Alternatively, a "type wrapper" like @CaseAndValuesCodable struct Foo { ... } could work. (We'd implement @CaseAndValuesCodable like a builtin attribute now, but someday you could write your own typewrapper and get your own autosynthesis flavor, and we could continue to consider new builtin flavors in new proposals).

IMO getting some consensus around shipping this as an option and how like options are specified would be the best way to move this proposal forward.

aerobounce · January 27, 2021, 7:56am

-1 if the proposal would be accepted without modification.

Although I +1 ed before in this thread, seeing people's opinions and points, and re-reading the proposal again and again, now -1.

Some opinions observing ideas posted by people

Drop support for unnamed value

Given CodingKeys is auto-synthesized, it doesn't have to support cases that contain unnamed value. If it doesn't support unnamed value, discussion may go forward since we don't have to define what default style it should be. First of all, I don't see a big benefit supporting it personally.
If unnamed value have to be supported, default style has to be defined - and currently there is no single style that everybody agrees nor likely there will be in the future. That kind of architecting should be left to end users. In other words those should be implemented by individuals.
And I suggest discussing about unnamed value to be Codable in the other pitch/proposal, if needed.

Drop support for repeated named value

A case such as below should not be supported.

enum Foo {
    case bar(number: Int, number: Int, number: String)
}

Support only associated values whose are `Codable`

If a case contains a value type such as Void, that should not be supported.
If, types whose are not Codable become Codable only when it's enum's an associated value... — That does not make sense and confusing.

Drop support for an enum case without associated values / mixed cases

We don't have one concrete idea for this just like unnamed value doesn't.
If this have to be implemented, then that should be discussed in the other pitch or proposal as well, as this proposal states enums with associated values.
I guess this had to be included in the proposal since it's quite easy to imagine enums with mixed cases (case with values and simple case without any values)

Briefly dividing, there are problems like:

How to handle unnamed value
How to handle enum with mixed cases (one with value, other without)
How to handle a case has values whose are not Codable already

I'm starting to feel that that's too much for one proposal to define them all.

I'm basically suggesting to make it like struct conformed to Codable, by dropping features what struct Example: Codable cannot do.
I think, basically, situations that classes or structs cannot conform to Codable should not be supported in this proposal neither.

So, IMO:

enum with associated values should be Codable only when:

Every associated values in an enum conform to Codable
Every associated values in an enum are named / labeled
Associated value names in a case must not be duplicated / repeated
An enum must not have a case without associated values (every cases must have associated values)

wowbagger · January 27, 2021, 8:56am

I assume this is in some way related to this edge case I posted upthread:

enum Foo {
    case bar(Void)
    case barbar(Void, (Void, Void))
}

I should've clarified that the edge here is that case bar(Void), case bar(), and case bar are practically the same thing. Given how similar associated values are with tuples, and since tuples like (Void) and () are treated the same in some places with tuple splat, I find it worth discussing how Codable synthesis for enums should handle it.

jayton · January 27, 2021, 9:17am

I think it’s simpler than that: those cases shouldn’t be supported because Void is not Codable. If Void (and other tuples) were codable, the representation of the enum would follow from the representation of the tuples.

Special rules for Void would be a bad idea, and a bit silly since there are few if any non-generic uses of Void associated values. For generic cases, special rules wouldn’t work:

enum Bloptional<T> {
    case some(T)
    case none
}

// This kind of extension (in the same module) works for structs
extension Bloptional: Codable where T: Codable {}

Bloptional<Void> can’t have special behaviour in Swift’s generic model; its behaviour must be a composition of the Codable behaviour of Bloptional with the (hypothetical) Codable behaviour of Void.

Realistically, if tuples are extended with conditional Codable support they will be represented as unkeyed containers, and the representation for Void will be equivalent to [] while (Void, Void) will be equivalent to [[], []].

drexin · January 27, 2021, 7:07pm

Thanks for your feedback!

JSON is being used for the examples because it's one of the most widely used formats and easily readable. The wording in the text should have been along the lines of Codable, I agree, but the JSON types map directly to the respective containers in Codable.

Related: @Max_Desiatov do you have any specific examples of formats that would not work with this proposal? Since everything here uses the standard Codable interface, I would be surprised if anything in this proposal specifically would cause compatibility issues that are not already observable.

wowbagger:

The proposal has no discussion about enum cases with repeated labels:
enum Foo {
    case bar(number: Int, number: Int, number: String)
}
Although JSON allows repeated keys in a dictionary/object, many serialisation formats do not. It also doesn't seem to work with Codable's containers.

The above example is not valid in Swift. It is currently possible to define this case because of a compiler quirk, you will however get a compiler error when you are trying to create the value. From 5.4 onwards even the definition will cause a compile error.

As @jayton already mentioned, Void does not conform to Codable and it it were, the format would be defined by that conformance and is not relevant for this proposal.

wowbagger:

The proposed solution for enum cases without associated values is actively harmful, because it's both surprising and confusing:
enum Command: Codable {
  case dumpToDisk
}
would encode to:
{
  "dumpToDisk": true
}
There is no obvious reason why true is used instead of null or {} or [].

There is no way for a reader of the serialised data to tell whether the true is a placeholder, or whether it contains actual information.

I agree that the format is not great. In the original proposal it was {}, but I received feedback that true would be less biased.

Max_Desiatov · January 27, 2021, 7:27pm

My point is not that it wouldn't work at all, but rather that the proposed default synthesized conformance doesn't necessarily make sense for all formats. Consider this XHTML:

<table>
  <tr>
    <th>A0</th>
    <td>B0</td>
  </tr>
  <tr>
    <td>A1</td>
    <td>B1</td>
  </tr>
  <tr>
    <td>A2</td>
    <td>B2</td>
  </tr>
</table>

I hope that a "natural" Codable model type for it would look like this:

struct Table: Codable {
  let tr: [TableRow]
}

struct TableRow: Codable {
  let cells: [Cell]
}

enum Cell: Codable {
  case th(String)
  case td(String)
}

I don't think that this proposal allows for that. But this is only my inference from the current proposal text. Instead of giving examples of the synthesized Codable conformance code, or explaining what would happen in terms of keyed and unkeyed containers, with only JSON examples provided I can only guess what would (or would not) happen when/if the implementation lands.

Additionally, with XML we need a way to specify which keys are encoded as attributes, and which as elements. We can already do that for structs with property wrappers, but is there a way to implement this for enums at all? Is this even considered as a future direction? I don't see how the proposal as written can be expanded in the future to allow this.

drexin · January 27, 2021, 9:14pm

Max_Desiatov:

drexin:

Related: @Max_Desiatov do you have any specific examples of formats that would not work with this proposal? Since everything here uses the standard Codable interface, I would be surprised if anything in this proposal specifically would cause compatibility issues that are not already observable.

My point is not that it wouldn't work at all, but rather that the proposed default synthesized conformance doesn't necessarily make sense for all formats. Consider this XHTML:
<table>
  <tr>
    <th>A0</th>
    <td>B0</td>
  </tr>
  <tr>
    <td>A1</td>
    <td>B1</td>
  </tr>
  <tr>
    <td>A2</td>
    <td>B2</td>
  </tr>
</table>
I hope that a "natural" Codable model type for it would look like this:
struct Table: Codable {
  let tr: [TableRow]
}

struct TableRow: Codable {
  let cells: [Cell]
}

enum Cell: Codable {
  case th(String)
  case td(String)
}
I don't think that this proposal allows for that.

Cell would encode to <th>value</th> and <td>value</td> respectively. So I think it does exactly what you want.

That is a good point. I will add that to the proposal, thanks.

I don't quite understand how this is relevant to this proposal. Property wrappers don't work for associated values, but if they did, they should behave the same as they do on properties today. The wrapper would be passed to encode(_:), which should then invoke the same logic in your encoder as it does for class/struct properties. If you are asking for an alternative way to achieve on enums what you are doing for properties today, I think that is out of the scope of this proposal.