SE--0295: Codable synthesis for enums with associated values (third review)

The third review of SE-0295 - Codable synthesis for enums with associated values begins now and runs through March 5, 2021.

For the third review, the goal is not to re-litigate the entire proposal, but to focus on reviewing the additional details around the schema evolution. The updates for handling of certain overloaded enums and the choice to exclude certain enums from auto-synthesis due to evolution restrictions as specified in the proposal are also in scope.

As always, thank you for your contributions towards making Swift a better language.

Saleem Abdulrasool
Review Manager

5 Likes

The proposal text does not include the word “schema”. Could you clarify what is meant by “the additional details around the schema evolution”?

Also, is there a diff between this and the previous version of the proposal, or at least a summary of changes?

I simply meant the evolution around the shape of the enum. Evolution tends in context of the language design around API/ABI, and I was trying to make that difference explicit.

I don't know how to generate a diff via GitHub. You would be able to do so with:

git diff 71f3668cf96692e119953056f8ea205e70333030...d62d4ef738acbc7da0dcf94f1d0b832c42215fba proposals/0295-codable-synthesis-for-enums-with-associated-values.md

The summary of changes is effectively what the focus is on:

  • additional details around how to deal with changes to the shape of the enum
  • what cases can be auto-synthesized
  • why certain cases are not auto-synthesized

I am negative to this proposal over the same concerns I've raised in the previous rounds: This proposal aims to make a single coding format a de-facto standard where there are several other possible coding formats.

I strongly believe that auto synthesis of protocol conformances should only happen when there is a single obviously correct way of conforming a type to a protocol. This proposal does not meet that bar.

In fact, I think it is actively harmful in some cases, where the automatic behaviour does not match the programmers expectation or project requirements. In those cases, it is far preferable to get a compilation error of not conforming to Codable rather than incorrectly getting an invalid implementation automatically.

6 Likes

-1.

(I think this comment falls under the category of schema evolution, sorry if that isn't the case)

This is a problem that needs to be solved, but the surprising way of encoding case load(String) is unfortunate.
A major use case of serialization is to interface with other ecosystems and I have not yet encountered one where "load": { "_0": "<value>" } is the way data like this would be encoded. The vast majority I have written/seen use something like "load": "<value>".
Another anecdotal observation is that frequently I have needed to refer to type of the associated arguments in other parts of the code. To use the example in the proposal, a Codable Command will likely participate in a switch and then call a function. Either the arguments there need to be duplicated (think func store(key: String, _ value: Int)) or you use a composite type like StoreParameters. The latter becomes exceedingly likely as the number of parameters increases, and unfortunately in this proposal case store(StoreParameters) has the unexpected serialization style of "store": { "_0": <encoded parameters> } (as opposed to "store": <encoded parameters>).
Finally, I think this fails on the grounds of progressive disclosure since the simplest form of enum-with-associated-values (an enum where each case has a single associated type) forces the user to reason about "_0" and tuple-style arguments.
My suggestion for solving this issue is to special-case the single-unlabeled-associated-value case, the same way we special-case a single-element tuple and have the examples above encode without the "_0" wrapper. This way we end up with a bit more complexity around the single-unlabled-associated-value to anything-else transition (since that introduces a wrapping container), but I think it is worth the tradeoff to make the common case simpler.

4 Likes

I would go even further:

"_0" is ugly, and we should not introduce it as the default encoding form at all, in any context whatsoever.

4 Likes

+1 to the review

While I understand the points that @sveinhal makes, I think the value to the community is quite large, and (as I understand it) the synthesis doesn't prevent or make it any more difficult for you to provide your own custom decoders to match serialized formats from external systems and libraries.

In addition, I think it will be a notable boon for developers wanting serialization for their own document formats, and a consistent way - even if there are other possible ways of doing the same thing - is a benefit for multiple teams interacting based on the same source code.

In addition, this is a common pain point that would be resolved with this proposal - how to add your own encoding to an enumeration with associated values.

4 Likes

Unlabeled associated values

It is unclear to me how one would distinguish an unlabeled associated value from a labeled associated value where the label is _0, etc.

If it's a non-goal to make such a distinction, then there is no need to choose a rather aesthetically unpleasing scheme such as _0. If it's a goal actually to make such a distinction, then it would be reasonable to choose $0, which at the very least has precedent in the language and has the virtue of not clashing with possible label names.

Of course, $0 could not be used with the proposed customization feature, but I do not believe the proposed customization to be a good fit with the current design or direction of Swift either (see below), so it's no drawback in my book.


User customization

My (and others') point about the lack of customization has to do with the shape of the encoded form, not with the names of the associated values. This particular aspect of user customization does nothing really for the former concern but it offers an axis of customization that hasn't been asked for (to my recollection) and which could always be accomplished, of course, with a manual implementation of Codable.

Now, I would not necessarily be against additional customizability per se, but in this case it leans on (a) the CodingKeys design that already hasn't aged well according to those who designed and implemented it (I don't have the luxury of time to dig up past posts about this); and--more concerningly, (b) appears (without text to actually confirm that this is what's going on) to work by taking case names and magically uppercasing them to form a type name after suffixing with CodingKeys. This does not seem to be a principled design and is unprecedented in Swift--which, again, might be justifiable if it were to offer something equally unprecedented in value, but I don't see how this axis of customization carries enough weight for such justification.


If we are to proceed with this proposal, taking everything else as settled but the most recent changes, then my suggestion would be to replace _0 with $0 and to jettison the *CodingKeys associated value customization feature, leaving any such customization to a manual implementation. To emphasize, again, I think the most important customization that users could use here has to do with the shape of the encoded form, which is not tackled here. For that reason also, I would urge the core team to re-evaluate this proposal holistically and consider whether it ought to move forward at this point.

8 Likes

I only have time for a short post now, but I will expand on it later.

First of all again many thanks to the authors for tackling this!

The brief version of my comments:

  • I did prefer the special casing for single, unlabelled values, but I partly agree with the uniformity and comments about evolution in the current version of the proposal.
  • I think that an aspect of 'evolution' is to go from case lineTo(x: Double, y: Double) to case lineTo(Point). This direction of evolution is just as valid as adding more values in my opinion. I even think there's a SwiftLint rule for suggesting converting cases with too many values to one where the values are moved to their own type (but I may be misremembering this rule, it might actually be about tuples - I will return after investigating).

Perhaps most importantly, a comment that also requires some investigation by myself or others:
I like how 'verbose' the current proposal makes the generated structure, because this means that no structural information is discarded.
This could mean that with a bit of extra synthesized annotation, the Encoder/Decoder might actually be able to transform the verbose structure to and from a flattened one.

The vague idea is that the synthesized codingkey for the enum is being made to conform to a new protocol: EnumCodingKey and the synthesized codingkeys for the enum cases conforms to EnumCaseCodingKey.

The the encoder may be able to be configured with an enumStructureStrategy that could for instance 'flatten' unlabelled single value cases to skip the extra structural level. For instance if EnumCaseCodingKey was also made CaseIterable the encoder could check for a count of 1 and the case being _0 and in that case it could 'cheat' and not return a nested keyed container, but return the same level again.

This could perhaps even be used to transform the structure into the 'discriminator' approach that other people have requested.

I mention CaseIterable above, but the special cases that I could imagine the need for extra handles for could actually also be made explicitly by conforming to SingleUnlabelledEnumCaseCodingKey and NoAssociatedValueEnumCaseCodingKey.

Again: I will try to explore this myself, but it would be great if other people could give feedback to whether they think this is feasible - and whether it's the right place to have this transformation in the encoder/decoder.

Apologies for the messy post, I hope to clean it up when I have more time.

Very positive +1 with a modification under certain circumstance.


  • The certain circumstance is, if "Supporting unlabeled values" will be the only reason this proposal to be returned for revision again or worse, rejected — the support should be dropped.

  • At the 3rd iteration, "How unlabeled value should be treated" is still being discussed — if there will be no concrete agreement in the end, dropping the feature is more constructive choice rather than dropping the whole proposal.

  • If merging every proposed features is not possible, accepting at least only what is concrete is valuable enough, especially for proposal like this (Unless that impossible feature is must-include).
    Reason explained below:

  • Although I feel fine with how unlabeled values will be synthesized, on the other side, I don't see a big unusefulness by lacking supporting it. Below sounds natural and reasonable and being strict is not a bad thing.

Compile error: Associated values must be labeled.

  • It is confusing not being able to see explicit mentioning in the proposal how Repeated Labels in a case would be treated.
case store(key: String, value: Int, value: Int)

edit: simplified and clarified expressions.

1 Like

I think this was discussed previously. Apparently you can define such an enum but you can’t instantiate it, so it’s likely a bug that you can define it.

1 Like

Yes, I know this has been discussed. That's the reason it was a bit surprising to find it is not mentioned in the latest proposal, while overloaded cases — which also had many opinions from community — are mentioned. (Sorry if I'm missing something)

To clarify, automatic synthesis does not prevent using custom decoding and encoding methods, but it does effectively prevent (or makes it much more difficult for) us from ever introducing a different default or behavior and as such I don't think simply adding value is a high enough bar for acceptance.

This is similar to the idea I presented here: New {De,En}codingContainer types to support enums with associated types (and tuples). The benefit of a new container type vs translating via a custom encoder/decoder is that you maintain even more information about the structure of the types being coded (for instance, whether or not "_0" was an explicit label or generated for an unlabeled argument). Implementing this would be a bit more work, but doable and I'm happy to work on the Swift parts if someone can help me through the updates needed to the compiler.

2 Likes

That's a neat idea.
That will give even more structure as you say, but I think it would be possible to suggest the same structure using protocols.

As a proof-of-concept I tried adding functionality to JSONEncoder and JSONDecoder for different enum coding strategies.

The code can be found here and explored by running the tests:

My example starts out with the synthesized Codable conformance for the Command enum in the example.

I added an unlabelled single value case and a case with no values - and applied the code that would be synthesized in this proposal.

Then I added the following three (empty) marker protocols: EnumCodingKey, SingleUnlabelledEnumCaseCodingKey and NoValueEnumCaseCodingKey.

The (would be synthesized) CodingKeys is made to conform to EnumCodingKey, the unlabelled single value case keys are made to conform to SingleUnlabelledEnumCaseCodingKey and cases without associated values are made to conform to NoValueEnumCaseCodingKey.

In the JSONEncoder and Decoder I added the following encoding and decoding strategies:
For the encoder:

    public enum EnumEncodingStrategy {
        case useDefaultEncoding
        case useDiscriminator(String)
        case flattenUnlabelledSingleValues
    }

    public enum NoValueEnumEncodingStrategy {
        case useDefaultEncoding
        case useBool(Bool)
        case useInt(Int)
        case useString(String)
    }

and similarly for the decoder:

    public enum EnumDecodingStrategy {
        case useDefaultDecoding
        case useDiscriminator(String)
        case flattenUnlabelledSingleValues
    }

    public enum NoValueEnumDecodingStrategy {
        case useDefaultDecoding
        case useBool(Bool)
        case useInt(Int)
        case useString(String)
    }

The .useDiscriminator(String) flattens the two levels of keyed containers into one - using the supplied String as the discriminator. Similar to what other people have requested support for in previous discussions. The result is:

    func testDiscriminatorCoding() {
        let encoder = EnumCoding.JSONEncoder()
        encoder.enumEncodingStrategy = .useDiscriminator("_discrim")
        let data = try! encoder.encode(Command.store(key: "a", value: 42))

        XCTAssertEqual(String(bytes: data, encoding: .utf8), """
{"_discrim":"store","key":"a","value":42}
""")

        let decoder = EnumCoding.JSONDecoder()
        decoder.enumDecodingStrategy = .useDiscriminator("_discrim")
        let command = try! decoder.decode(Command.self, from: data)
        XCTAssertEqual(command, Command.store(key: "a", value: 42))
    }

Similarly, the .flattenUnlabelledSingleValues does the following:

        let encoder = EnumCoding.JSONEncoder()
        encoder.enumEncodingStrategy = .flattenUnlabelledSingleValues
        let data = try! encoder.encode(Command.single(Person(name: "Jane Doe")))

        XCTAssertEqual(String(bytes: data, encoding: .utf8), """
{"single":{"name":"Jane Doe"}}
""")

        let decoder = EnumCoding.JSONDecoder()
        decoder.enumDecodingStrategy = .flattenUnlabelledSingleValues
        let command = try! decoder.decode(Command.self, from: data)
        XCTAssertEqual(command, Command.single(Person(name: "Jane Doe")))

It required a small bit of hackery in the JSONEncoder, but it's really not too bad, and it was basically just to prove, that with a little bit of extra synthesized information, we could put these transformations in the coders themselves.

But I guess that both your suggestion and this protocol based suggestion could both be add-ons to the current proposal.

1 Like

Ok, with the small exploration from my previous post (see GitHub - mortenbekditlevsen/EnumCoding), I am now ready to give my review.

Overall a big +1 - even if this is not the style that I am using myself (I mostly use single unlabelled values in my enum cases), I am much in favor of having a default rather than having nothing at all - and I think that the selected default is easy to understand - and the arguments about evolution of types are easy to follow.

Since I can see a future where further customization may be added without greatly changing the synthesized code - and rather putting the extra work in the en-/de-coders, I don't have any worries about this proposal missing out on the possibility of customization.

That said, here are my comments - and I may be repeating myself from my first post in this topic:

  • The proposal puts some focus on the ability of evolution (say from a case without associated values to one with associated values). I would have liked the mention of the evolution that is going from .lineTo(x: Double, y: Double) to .lineTo(Point) - or a similar choice that could be made on the basis of the amount of values becoming too large.
  • It is mentioned that a future proposal could add customization. This could be fleshed out a bit more to give an example of how this might be done - perhaps inspired by the ideas from @George - or the proof-of-concept I posted that uses marker protocols.
  • I would of course love it even more if these marker protocols were already added to the synthesization, so that I could implement customization in my own code straight away. :slight_smile:
  • I like the customization of both labelled and unlabelled key names - but as noted ther may be clashes with the form that is used (_0, _1, etc.). Maybe a mention in the proposal about how small or big an issue this could be expected to be?
1 Like

–0.5.

This removes the clear footgun that I detailed in the previous review thread. Adding a member with a default value is now always a backward compatible change, whether you’re working with a struct, a class or an enum. That’s a clear and important improvement over the previous iteration. I think the proposal has finally gotten to a place where it isn’t actively harmful. :clap:

At the same time, removing the footgun also means that the design has ended up in a weird middle-ground in the design space that I think will be unintuitive to everyone. It obviously doesn’t align with the expectations of those who prefer the case-as-value approach, since the case isn’t encoded as a value. But it doesn’t align with the expectations of those that prefer the case-as-key approach either, because they expect single unlabelled values to be encoded without a nested container. (As @Morten_Bek_Ditlevsen mentions above.) I don’t think anyone will find this encoding of Either<L,R> intuitive or elegant:

{
  "left": {
    "_0": 1
  }
}

It seems like there are two design goals at play here: one is supporting backward compatibility, and the other is aligning with user expectations for what the encoded data is going to look like. (We clearly can’t align with everyone’s expectations, but we could at least align with a wide subset of users.)

What this iteration of the proposal clearly demonstrates is that there is no way to satisfy both design goals with the case-as-key approach. We either have to sacrifice backward compatibility, as earlier versions of the proposal did, or we have to sacrifice alignment with the expectations of those used to the case-as-key approach, as this version does. Compare this to the case-as-value approach: it doesn’t have to make that trade-off. It naturally supports backward compatibility and aligns with the expectations of those used to that approach. (It also has a number of other desirable properties, mentioned previously, like avoiding duplicated CodingKeys enums, clearly encoding the fact that there can only be one case, matching how unions are commonly encoded in JSON APIs that use the OpenAPI spec, and potentially letting you evolve a struct into an enum if all of its properties have default values.)

So given those design goals, and the sacrifices needed to make the case-as-key approach backward compatible, what motivation is left for choosing it instead of the case-as-value approach? If the encoded data isn’t going to match the expectations of people used to the case-as-key approach, why not at least let it match the expectations of people used to the case-as-value approach and reap the other benefits of that approach?

What is gained by stubbornly holding on to the case-as-key approach, even though by supporting backward compatibility we’ve lost what made it elegant in the first place?

5 Likes

Apologies for the late review. I did not have much free time this past week.

I think this version is an improvement from the previous one. The encoding format is now more consistent regardless of the count and labels of associated values.


I agree with @xwu's position on unlabeled associated values (replacing _0 with $0).


one for each case that contain the keys for the associated values, that are prefixed with the capilized case name, e.g. LoadCodingKeys for case load .

This feels very anglocentric to me.

Some languages that use Latin script have letters that clash when capitalized using English rules. For example, in Turkish there is a dotted "i" and a dotless "ı":

lowercase uppercase
dotted i Ä°
dotless ı I

Under the proposed rule, a Turkish word such as "iki" (two) will be capitalised incorrectly to "Iki". Also if there are both cases iki and ıki in the same enum, the code synthesis will fail because they both uppercase to Iki.

I believe all languages should be treated equally, and a better rules should be found that does not force some languages' speakers to find workaround for enum case naming.


I agree with @xwu's position on user customization too.


There are two models of evolution, the one designated by the language and one that is useful in the regular use of enumerations.

What does this mean? Is it referring to SE-192?


I understand that the core team has agreed that there should be auto-synthesised Codable conformance for enums. I'm not going to argue against it, because I agree with it too. However, I do wonder if this proposal should be deferred after we have Codable conformance for tuples, since associated values are built on top of tuples. If this proposal is accepted before we have Codable tuples, then the rules for Codable enums will need to be considered for consistency, which is kind of like we trying to run before we can walk.

5 Likes

A small comment about Codable tuples:

I think that this proposal paves the way for a strategy to make tuples Codable - not on their own, but when contained in a struct or a class.

The coding key naming scheme from this proposal could be borrowed:

struct A: Codable {
   var b: (c: Int, d: String)
}

Could synthesize both CodingKeys and BCodingKeys just as in this proposal.

There’s of course the question about allowing such a definition recursively with tuples containing other tuples, but that could probably be skipped.

Swift is a formal language that leans heavily on the natural language English. It shows in keywords such as func, throws, try, return, while, async etc — as well as in standard library type names such as CaseIterable, Collection, CustomStringConvertible. It is true for annotations like @propertyWrapper and @inlineable and in diagnostics, command line parameters. Both the language it self, and all available tooling around it is either in English outright, or heavily based off of it.

I’m not a native speaker of English, but I strongly feels that keeping Swift anglocentric should be a goal in and of itself.

1 Like

Yes, reading Swift is intended to be similar to reading English. It's been stated in the past on these forums, and it's also a theme in the API design guidelines.

I'm not saying that Swift should stop being designed like English, but I'm saying that Swift should continue accommodating other languages (in file names, identifiers etc.).

There is a difference between being English-based and being anglocentric. Swift can be based in English without disregarding other natural languages.

Swift as far as I know has not had rules that restricted the usage of any other natural languages in identifiers. The proposed design, however, restricts some languages (or some of their glyphs/characters) from being used in enum case names (or identifier-head) if the enum is to have automatic Codable conformance.

I consider this side effect of the proposed design as a regression. If it is desired or acceptable, then it should at least be discussed in the unsupported cases section.

6 Likes