SE-0295: Codable synthesis for enums with associated values (second review)

compnerd · January 21, 2021, 4:43pm

The second review of SE-0295 - Codable synthesis for enums with associated values begins now and runs through January 31, 2021.

Insights from the first review have been incorporated into the current revision by the author, @drexin. For the second review, the goal is not to re-litigate the entire proposal, but to focus on reviewing the details around the schema evolution and handling of overloaded cases. There were small improvements made to the synthesized enum names and handling of unvalued and singled valued cases.

As always, thank you for your contributions towards making Swift a better language.

Saleem Abdulrasool
Review Manager

Dante-Broggi · January 21, 2021, 5:26pm

At a first pass, I believe that enums cases that mix labeled and unlabeled cases, like this example from the proposal:

enum Command: Codable {
  case load(String)
  case store(String, value: Int) // *
}

should not encode as if it were entirely unlabeled: (as from the proposal)

{
  "store": [
    "MyKey",
    42
  ]
}

It should either be unsupported or have different behavior.

Nevin · January 21, 2021, 6:50pm

At a first pass, I believe that associated values should always be encoded without labels. If a case has multiple associated values, they should be listed sequentially in an array.

The synthesized encoding format should be as simple as possible. If two cases share the same base name, then no synthesis should occur.

Jumhyn · January 21, 2021, 6:59pm

@compnerd I have a couple of meta-process questions:

Was there ever a decision rendered on the previous review? I don't see one in Announcements.

Also, is this note:

the goal is not to re-litigate the entire proposal

an indication that the Core Team has endorsed the basic premise of the proposal, that there should be some default synthesis of Codable for enums with associated values? I still feel as though the diverse opinions about the "correct" way to encode these enums is an indication that they are unsuitable for conformance synthesis, and I don't really see that addressed in this revision of the proposal.

compnerd · January 21, 2021, 7:13pm

Ah, @Douglas_Gregor was review manager for the first round. I think that we missed messaging the conclusion of the first cycle, I am sorry about that.

Correct, the Core Team discussed this and determined that the premise that there should be some synthesis was sound, but that there were still some further refinements which warranted that the proposal go through another round of review. Some of the changes in the proposal are around the handling of the encoding.

Jumhyn · January 21, 2021, 7:25pm

Got it, thanks!

In that case, I remain -1 on this proposal on the basis that, IMO, synthesized conformances should be 'obviously correct', a bar which I don't believe this synthesis scheme meets.

Morten_Bek_Ditlevsen · January 21, 2021, 8:49pm

A very big +1 from me.
Thanks again for all your efforts in making this a part of Swift, @drexin!

I very much agree with most points in the proposal.

The only comment I have is with regards to cases where not all associated values are labelled.

There was an idea by @anandabits in the previous review discussion that keys are auto-generated for unlabelled values. This has the great advantage that you can then customize the key mapping:

SE-0295: Codable synthesis for enums with associated values

The other point of feedback I have is that I do not think unkeyed containers should be used for associated values. Instead, I think we should establish a convention for handling them and recommend use of the coding keys to override the default . The first idea that comes to mind for the convention is to use implicit positional names such as _0 , _1 , etc for unlabeled associated values.

By default, this enum:
enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}
would encode a store value as:
{
    “store”: {
        “_0”: “some string”,
        “value”: 42
    }
}
To override this, you would declare:

copied!
enum CodingKeys: CodingKey {
    case load
    case store

    enum Store: String, CodingKey {
        case _0 = “keyForEncoding”
        case value
    }
}

I think that this is a really elegant solution that provides great flexibility.

Personally I also think that it could be nice with a 'Future directions' where we could consider the similarity between tuples and enum cases with multiple and/or labelled values - and then consider allowing encoding of structs and classes with values that are tuples (of Codable values). We could do this by borrowing the result of this proposal, and use the same strategy for supplying named coding keys for these tuples. But this is a minor thing that could be added in the future.

Is the problem being addressed significant enough to warrant a change to Swift?
Most definitely.
Does this proposal fit well with the feel and direction of Swift?
Yes. I am very excited by the proposal and I feel that it will be a great addition to the language.
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I participated quite a bit in the first review discussion.

aerobounce · January 23, 2021, 5:36pm

I fully, strongly agree with the proposal.
This eliminates lots of boilerplates and would be huge improve when you architect a model.
And I believe this feels so natural for what Swift would do. Vice versa, I don't see reasons we don't have this today.

...Except — A case with an unnamed value would be encoded as an array. That feels a bit awkward and unstable.
I agree with the style quoted in this post: SE-0295: Codable synthesis for enums with associated values (second review) - #7 by Morten_Bek_Ditlevsen

enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}

would encode a store value as:

{
    “store”: {
        “_0”: “some string”,
        “value”: 42
    }
}

Because I see something common with unnamed Tuples.

xwu · January 23, 2021, 5:54pm

At the risk of repetition, the argument against this feature (as already described by @Jumhyn) is illustrated precisely by your point here:

After multiple rounds of discussion, even on this second and likely final review, there is no agreement on what the synthesized implementation should actually be. Based on the extensive discussion thus far and again illustrated by this comment, there has been no convergence on this question and, on the evidence, never will be.

Simply put, there is a broad appetite for some default synthesis of Codable, but since no one agrees on what that should be, by construction, it cannot be said that there is appetite for any one default. And make no mistake: this is the only feature in question here, a single default. There is no mechanism as in Rust's Serde package to select one of a number of options.

Addressing the revisions on this review specifically:

I am very glad that this revision has adopted Serde's default; I think that speaks to the reasonableness of what is chosen, all things considered. I am worried about the choice of encoding non-payload cases as true in part for implementation reasons and in divergence from Serde.

Having not used the Rust package in question: what is the default used by Serde for non-payload cases (and can we survey some other similar tools also) and are there non-implementation-specific reasons not to adopt such precedent here?

Overall though, I have to agree with @Jumhyn: I do not see evidence that there is broad consensus for any one synthesized conformance or any movement towards convergence.

Although I like the move towards a more precedented default by looking to Serde, I would also note that Serde's documentation of its corresponding feature almost immediately goes into discussion of overriding the default that it provides given its unsuitability in a variety of circumstances.

If we are to hold that library up as useful precedent, we ought to consider seriously that the absence of any obviously correct default pushed Serde to offer options for encoding which we do not contemplate here. I do not see reason to rush this now as opposed to when we can address the concern. It is not merely an additive future direction but quite fundamental to the question of a default if it will often need overriding.

Put another way, I am concerned that making the default easy but not making any tweaks similarly easy in the context of something that needs tweaking could turn out to be an anti-feature.

Max_Desiatov · January 23, 2021, 6:02pm

I keep the -1 from my review for the original proposal. The revised proposal is still focused on JSON exclusively and ignores the fact that Codable is supposed to work for other serialization formats.

As other reviewers have pointed out multiple times, there's no sane default here that can work well for all (or at least the majority of those widely available, like XML) formats. The inability to sufficiently customize it is a deal-breaker in my opinion. I don't see that much has changed from this perspective when comparing it to the original proposal.

aerobounce · January 23, 2021, 6:58pm

Well, although I still agree with the proposal, I guess I have to I admit you really have a point.

Jumhyn · January 23, 2021, 8:24pm

I wholeheartedly agree with everything you've written here, @xwu. Thank you for explicating those thoughts more fully than I was able to!

MutatingFunk · January 23, 2021, 10:55pm

Big +1 for having synthesised Codable for enums. I do have a few concerns having read the proposal.

I feel that the encoding for enums with no associated values as properties set to true is an anti-feature. Instead, encoding the case name into a single-value container as a String would both match my expectations of a default, and cater for every use-case in my experience. I'd prefer to get the error in this case, which currently reminds me to add the String backing, or else just get that encoding method by default.

My expectation (and strong preference) is that all-labelled associated values would encode and decode similar to struct properties, meaning the addition of new Optional values, or removal of any values, are both backwards-compatible operations. The proposal doesn't specifically call this out, but it probably should.

For enums with semi-labelled or unlabelled cases, the logic seems rather more fragile - encoding to a fixed-length, ordered list of values. Is adding a new, Optional value at the end of the enum's associated value list a breaking change is this scenario? How about in the middle of the list?

It's also worth noting that this means cases with multiple, unlabelled or semi-labelled associated values, are potentially equivalent to those with a single associated value that happens to be an Array of a homogenous or type-union-style enum type. If these do indeed encode to the same thing, with the same handling of Optionals, I'd strongly suggest we consider simplifying the mental model, and always encode to a keyed container where the proposal currently encodes to an array. Aside from being more backward-compatible, this has the added benefit of allowing more flexibility for reordering of the labelled cases, which as a user I'd naively expect to be possible.

wowbagger · January 24, 2021, 5:48am

Enums can have cases with repeated labels:

enum Foo {
    case bar(number: Int, number: Int, number: String)
}

Whereas structs can not have repeated variable names.

wowbagger · January 24, 2021, 6:56am

Strong -1, for the following reasons:

The proposal only proposes the solution in terms of JSON, instead of containers. Although the implementation uses containers, it's the written proposal we are evaluating here. It's impossible to for me to agree with the proposal without seeing discussions on encoding/decoding for other serialisation formats.
The proposal has no discussion about enum cases with repeated labels:
```
enum Foo {
    case bar(number: Int, number: Int, number: String)
}
```
Although JSON allows repeated keys in a dictionary/object, many serialisation formats do not. It also doesn't seem to work with Codable's containers.
The proposal does not address some other edge cases such as Void associated values:
```
enum Foo {
    case bar(Void)
    case barbar(Void, (Void, Void))
}
```
The proposed solution for enum cases without associated values is actively harmful, because it's both surprising and confusing:
```
enum Command: Codable {
  case dumpToDisk
}
```
would encode to:
```
{
  "dumpToDisk": true
}
```
There is no obvious reason why true is used instead of null or {} or [].

There is no way for a reader of the serialised data to tell whether the true is a placeholder, or whether it contains actual information.
The proposed solution is not forward-compatible.

There are many unsupported cases in the proposed solution. However, there is no apparent and non-source-breaking way for any future improvement to support this proposal's unsupported cases.

Enum cases are more like a combination of tuples and static methods than they are like class or struct instance variables. It would be much more rational to have auto-synthesised Codable conformance for tuples first, than to skip over it for enum cases first.

Not sure if it's worth anything, but I've implemented Codable conformance for an enum to encode to and decode from nested JSON object with arbitrary depth (stackoverflow answer, better implementation in a project). The experience of implementing it has given me some strong opinions on Codable conformance for enums.

woolsweater · January 24, 2021, 7:42pm

It's great to have this worked on; it would be a nice thing to have. But I don't think this proposal should be accepted. The inflexible nature of Codable synthesis means that having a bad default would be worse than having no default at all. Having the default makes it too easy to adopt an implementation that breaks later and requires more work to clean up than it would have been to write manually in the first place.

And I strongly agree with what others have said that this particular default is not obviously the one true scheme. Along with everything else that's been said, the chosen encoding is highly ambiguous; unless I am misreading, these two have identical encodings!

enum E { case a(Bool), b([Int]) }
enum F { case a, b(UInt8, UInt8, UInt8) }

That kind of thing is a trap laid waiting for a program ingesting serialized data even from a nominally trusted source.

I think this feature might have to remain in the "nice to have" column until Codable synthesis moves further away from its all-or-nothing nature. At the least, the encoding needs to be much more robust than what's proposed.

benrimmington · January 25, 2021, 6:52am

Enum cases with associated values aren't documented in LibraryEvolution.rst, or the Library Evolution in Swift blog post. But as far as I can tell, most changes aren't binary or source compatible:

adding a new associated value (regardless of its label, type, or default argument).
removing, reordering, or relabelling an existing associated value.

I think the only possible changes are:

adding, changing, or removing an internal parameter name.
adding (but not changing or removing) a default argument.

Therefore, a public enum case can't evolve, even if a keyed container is used for its associated values.

I also question the importance of compatibility with the Rust library. Surely, the ability to easily map to other existing encodings is more useful?

In the previous review thread, there was:

a counter-proposal, which mentioned the OpenAPI specification.
a common practices post (on single vs many associated values).

If we wanted a simple encoding (in JSON at least), and didn't care about compatibility, we could use unkeyed containers in all cases.

For example, the following cases:

let commands: [Command] = [
  .dumpToDisk,
  .load1st("MyKey"),
  .load2nd(key: "MyKey"),
  .store3rd("MyKey", value: 42),
  .store4th(key: "MyKey", value: 42)
]

would encode their case names and associated values (but without any other labels):

[
  ["dumpToDisk"],
  ["load1st", "MyKey"],
  ["load2nd", "MyKey"],
  ["store3rd", "MyKey", 42],
  ["store4th", "MyKey", 42]
]

Customization should be possible via CodingKeys, but not JSON conversion to/from snake-case. (I guess this is similar to RawValue == String enums.)

sveinhal · January 25, 2021, 8:15am

One of the main use cases for Codable is to support integration of network (often JSON) APIs into client side apps with minimal code. Auto synthesis of Codable conformances make it even easier to do so, by removing boilerplate.

Auto synthesis of structs, combined with JSONDecoder.keyDecodingStrategy means we often need to write zero addition lines of code, apart from simply declaring members, and Swift simply does the right thing.

I realise that Codable can also be used for many other things, and sometimes the underlying encoded bytes doesn't really matter, as long as encoding and decoding is idempotent up to Equality.

However, when I do care about how types are encoded, is when they match existing APIs.

And I still haven't seen much evidence that the proposed encoding scheme is commonly used for one-of types in real world APIs seen around the world. Can we please see more evidence of this? If the automatically synthesised comnformance, more often than not needs to be overridden, this proposal has little to no value.

gwendal.roue · January 25, 2021, 8:56am

So Serde supports various formats, and it looks like we have a desire for a similar customization.

Did we consider passing such customization into the userInfo: [CodingUserInfoKey: Any] property of Decoder and Encoder, or into the static property of a new protocol?

Considering that there exists apis that do not provide easy access to some inner raw Decoder and Encoder instances, a static property right on the Codable enum might be a better choice. For example:

In the std lib:

public protocol CustomEnumEncodable: Encodable {
    // Customization point
    static let enumEncodingStategy: EnumEncodingStategy { get }
}
extension CustomEnumEncodable {
    // Default implementation provides a default encoding
    public static var enumEncodingStategy: EnumEncodingStategy { .default }
}

public protocol CustomEnumDecodable: Decodable {
    // Customization point
    static let enumDecodingStategy: EnumDecodingStategy { get }
}
extension CustomEnumDecodable {
    // Default implementation provides a default decoding
    public static var enumDecodingStategy: EnumDecodingStategy { .default }
}

// Convenience typealias
public typealias CustomEnumCodable = CustomEnumEncodable & CustomEnumDecodable

Application code:

enum MyEnum: CustomEnumCodable {
    static var enumEncodingStategy: EnumEncodingStategy { .someStategy }
    static var enumDecodingStategy: EnumDecodingStategy { .someStategy }
    
    case foo(...)
    case bar(...)
    case baz(...)
}

The standard lib would ship with some built-in strategies.

gwendal.roue · January 25, 2021, 9:07am

I wonder what @itaiferber thinks about this sentence. Codable was designed in order to decouple the coding format from the Codable types. Customization, when possible, is only available via decoder or encoder instances (userInfo, or properties as in Foundation encoders). In my previous message, I'm suggesting that the enum type itself becomes responsible for its formatting, and this clearly goes against the initial design choices. Hence this genuine question :-)