SE-0295: Codable synthesis for enums with associated values

Douglas_Gregor · December 1, 2020, 9:05pm

Hello Swift community,

The review of SE-0295 "Codable synthesis for enums with associated values"" begins now and runs through Dec 11, 2020.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. When emailing the review manager directly, please keep the proposal link at the top of the message.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at

https://github.com/apple/swift-evolution/blob/master/process.md

Thank you,

Doug Gregor
Review Manager

Dante-Broggi · December 1, 2020, 9:29pm

How does the proposal encode enums which overload on base names, possibly with unlabeled cases?:
e.g. how would each of the cases of the following be encoded?

enum Command: Codable {
  case load(key: String)
  case load(index: Int)
  case store(key: String, Int)
  case store(index: Int, Int)
}

sveinhal · December 1, 2020, 9:40pm

I’d like to see some stronger motivation for the proposed solution. Is this coding common? Have we observed this in the wild, either in custom Codable implementations in open source Swift projects? Or in JSON APIs in the wild?

It seems strange to me to default to using keys to differentiate between values. Is this coding scheme idiomatic Swift, or idiosyncratic to the author of the proposal?

drexin · December 1, 2020, 9:47pm

Dante-Broggi:

How does the proposal encode enums which overload on base names, possibly with unlabeled cases?:
e.g. how would each of the cases of the following be encoded?
enum Command: Codable {
  case load(key: String)
  case load(index: Int)
  case store(key: String, Int)
  case store(index: Int, Int)
}

Ah, good point. I have thought about this, but it must have slipped my mind while writing the proposal. It would be rather difficult to differentiate between the cases while keeping a sane structure, so I would make this case explicitly unsupported for auto-synthesis. I am however open to suggestions.

Jumhyn · December 1, 2020, 9:51pm

We already have a way to name such cases: load(key:), load(index:), store(key:_:), store(index:_:). I believe that that should properly unique enum cases (since we disallow things like case load(index: Int), load(index: String)), but that naming pattern probably causes issues for the CodingKey situation in this proposal...

Lantua · December 1, 2020, 9:55pm

We probably want to be careful about using these names. Since it's not a valid enum identifier, it'd be tricky to change the CodingKeys.

enum CodingKeys: CodingKey {
case load(key:) // bad
...
}

drexin · December 1, 2020, 10:10pm

As stated in the proposal, this is the same structure that Rust's serde library uses.

With Swift's enums being tagged unions, I think it is fair to view the cases as sort of properties, but only one can be present. The same is true for union types in C/C++, with the difference that Swift's enums are safer, because they don't allow direct, unchecked access to the data.

The basic idea here is to use the same mapping that is used for structs and classes, where we map from the property name to the value.

What would you prefer the structure to be and why?

drexin · December 1, 2020, 10:12pm

Lantua:

Jumhyn:

We already have a way to name such cases: load(key:) , load(index:) , store(key:_:) , store(index:_:) .

We probably want to be careful about using these names. Since it's not a valid enum identifier, it'd be tricky to change the CodingKeys.
enum CodingKeys: CodingKey {
case load(key:) // bad
...
}

Yes, that was exactly my thinking as well. Even if we tried to map them to valid names, it would be difficult to avoid collisions.

beccadax · December 1, 2020, 10:45pm

I think it's great that we're planning to define this stuff, but I'd like to suggest a couple refinements to the behavior.

Case with a single unlabeled associated value

The proposal suggests:

enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}

would encoded to

{
  "load": [
    "MyKey"
  ]
}

Instead, I would recommend encoding this without the array wrapping it:

{
  "load": "MyKey"
}

This is a closer match for Serde's behavior (see the E::Y example). It also makes more sense with Swift's type system—the surrounding array implies that the associated value of Command.load is a 1-tuple, but there's no such thing as a 1-tuple in Swift. And it is just a cleaner representation of the value generally—internal objects, numbers, strings, or variable-length collections won't need to be wrapped in square brackets.

Case with no associated value

The proposal says:

An enum case without associated values would encode the same as one where all values have labels,
i.e.
enum Command: Codable {
  case dumpToDisk
}
would encode to:
{
  "dumpToDisk": {}
}

You justify this design by saying:

This is done for compatibility reasons. If associated values are added to a case later on, the structure
would not change, unless those values are unlabeled.

But I find this justification pretty thin. There is no particular reason to favor labeled values over unlabeled ones, and I don't think you've proposed any fallback behavior that would allow e.g. case dumpToDisk(volume: String) to decode { "dumpToDisk": {} } without an error.

Instead, I would recommend encoding it as a single flat string, not in a keyed container at all:

"dumpToDisk"

This again matches Serde's behavior more closely (see the E::Z example) and again is a cleaner expression of the value generally. However, it does have a couple of problems: it can't be used at the top level of a JSON document (which would need an object or array) and it wouldn't be expressed by CodingKeys. If you think you need something that works with a keyed container, an alternative would be:

{
  "dumpToDisk": true
}

The true here is a bit odd, as we will never write false there instead, but I think it captures the intent better than empty-object or empty-array.

Sub-coding keys

The proposal says:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store
}

// contains keys for all associated values of `case load`
enum CodingKeys_load: CodingKey {
  case key
}

// contains keys for all associated values of `case store`
enum CodingKeys_store: CodingKey {
  case key
  case value
}

I'm not sure if the compiler has the right capitalization routines to pull this off, but I'd love it if we could instead do:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store

  // contains keys for all associated values of `case load`
  enum Load: CodingKey {
    case key
  }

  // contains keys for all associated values of `case store`
  enum Store: CodingKey {
    case key
    case value
  }
}

Or at least:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store
}

// contains keys for all associated values of `case load`
enum LoadCodingKeys: CodingKey {
  case key
}

// contains keys for all associated values of `case store`
enum StoreCodingKeys: CodingKey {
  case key
  case value
}

My reasons here are purely aesthetic: I think these names are more like the names I would choose if I were writing the synthesized code by hand.

Lantua · December 1, 2020, 11:25pm

Since there are many common ways to define enums coding structure, would it be better if we wrap it inside a protocol?

protocol CaseAsKey: CodingKey { }
protocol CaseAsValue: CodingKey { }

/*
  { "a": 3 }
  { "b": "Test" }
 */
enum X: Codable {
    case a(Int), b(String)
    
    enum CodingKeys: CaseAsKey {
        case a, b
    }
}

/*
  { "key": "a", "value": 3 }
  { "key": "b", "value": "Test" }
 */
enum Y: Codable {
    case a(Int), b(String)
    
    enum CaseValues: Codable {
        case a, b
    }
    
    enum CodingKeys: CaseAsValue {
        case key = "key", value = "value"
    }
}

Ofc the default can be any of them.

George · December 1, 2020, 11:33pm

Very big +1 for having a solution to this problem.

I didn't see in "Alternatives Considered" the much simpler formulation that only enums with a single associated type get automatic Codable synthesis. It is very clear how things would be encoded for

enum Command: Codable {
  struct Load {
    let key: String
  }
  case load(Load)

  struct Store: Codable {
    let key: Key
    let value: Int
  }
  case store(Store)
}

@Dante-Broggi's example would simply not compile:

enum Command: Codable {
  case load(key: String)
  case load(index: Int)
  case store(key: String, Int)
  case store(index: Int, Int)
}

Two of @beccadax's points would also be trivially addressed: the inner object structure is defined by the nested type and the nested coding keys would use the same mechanism as before.

For what it is worth, I've manually written Codable conformance for an uncountable number of enums, and I've always done it this way.

ktoso · December 1, 2020, 11:43pm

Great small improvement ideas @beccadax! Love them all.

beccadax:

There is no particular reason to favor labeled values over unlabeled ones, and I don't think you've proposed any fallback behavior that would allow e.g. case dumpToDisk(volume: String) to decode { "dumpToDisk": {} } without an error.

Instead, I would recommend encoding it as a single flat string, not in a keyed container at all:
"dumpToDisk"

I think that's pretty good actually, +1.

If someone wanted to force the "dumpToDisk": {} anyway it's possible without manually en/decoding, by declaring:

case dumpToDisk(Empty)
struct Empty {}

I guess, which is a simple and natural enough workaround if someone needs this.

That's actually fine even in toplevel and we fixed this recently ( [5.2] SR-12275: JSONEncoder on Linux can't encode number JSON fragments by ktoso · Pull Request #2840 · apple/swift-corelibs-foundation · GitHub backport PR to 5.2, works fine on 5.3 ):

  3> String(data: JSONEncoder().encode("hello"), encoding: .utf8)
$R1: String? = "\"hello\""

(related thread Allowing top-level fragments in JSONDecoder )

drexin · December 2, 2020, 12:19am

beccadax:

I think it's great that we're planning to define this stuff, but I'd like to suggest a couple refinements to the behavior.

Case with a single unlabeled associated value

The proposal suggests:
enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}
would encoded to
{
  "load": [
    "MyKey"
  ]
}
Instead, I would recommend encoding this without the array wrapping it:
{
  "load": "MyKey"
}
This is a closer match for Serde's behavior (see the E::Y example). It also makes more sense with Swift's type system—the surrounding array implies that the associated value of Command.load is a 1-tuple, but there's no such thing as a 1-tuple in Swift. And it is just a cleaner representation of the value generally—internal objects, numbers, strings, or variable-length collections won't need to be wrapped in square brackets.

I think that makes sense and should be straightforward to implement.

beccadax:

Case with no associated value

The proposal says:
An enum case without associated values would encode the same as one where all values have labels,
i.e.
enum Command: Codable {
  case dumpToDisk
}
would encode to:
{
  "dumpToDisk": {}
}
You justify this design by saying:

This is done for compatibility reasons. If associated values are added to a case later on, the structure
would not change, unless those values are unlabeled.

But I find this justification pretty thin. There is no particular reason to favor labeled values over unlabeled ones, and I don't think you've proposed any fallback behavior that would allow e.g. case dumpToDisk(volume: String) to decode { "dumpToDisk": {} } without an error.

Instead, I would recommend encoding it as a single flat string, not in a keyed container at all:
"dumpToDisk"
This again matches Serde's behavior more closely (see the E::Z example) and again is a cleaner expression of the value generally. However, it does have a couple of problems: it can't be used at the top level of a JSON document (which would need an object or array) and it wouldn't be expressed by CodingKeys. If you think you need something that works with a keyed container, an alternative would be:
{
  "dumpToDisk": true
}
The true here is a bit odd, as we will never write false there instead, but I think it captures the intent better than empty-object or empty-array.

The plain value would indeed be problematic. Not necessarily because of top-level values, because those are actually allowed since RFC-8259. The bigger problem is that Decoder does not define functions to check for this. We have to know upfront whether we have a KeyedContainer, UnkeyedContainer, or SingleValueContainer. Adding the required functionality would break ABI.

Another alternative would be to use null as the value, i.e.

{
  "dumpToDisk": null
}

This is unbiased and not less confusing than a boolean value.

beccadax:

Sub-coding keys

The proposal says:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store
}

// contains keys for all associated values of `case load`
enum CodingKeys_load: CodingKey {
  case key
}

// contains keys for all associated values of `case store`
enum CodingKeys_store: CodingKey {
  case key
  case value
}

I'm not sure if the compiler has the right capitalization routines to pull this off, but I'd love it if we could instead do:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store

  // contains keys for all associated values of `case load`
  enum Load: CodingKey {
    case key
  }

  // contains keys for all associated values of `case store`
  enum Store: CodingKey {
    case key
    case value
  }
}

Or at least:

// contains keys for all cases of the enum
enum CodingKeys: CodingKey {
  case load
  case store
}

// contains keys for all associated values of `case load`
enum LoadCodingKeys: CodingKey {
  case key
}

// contains keys for all associated values of `case store`
enum StoreCodingKeys: CodingKey {
  case key
  case value
}

My reasons here are purely aesthetic: I think these names are more like the names I would choose if I were writing the synthesized code by hand.

I think this makes sense. I slightly prefer the last version, because that still has the CodingKeys part in it, which is less likely to to collide with user defined names.

beccadax · December 2, 2020, 1:19am

The JSON and plist decoders both throw DecodingError.typeMismatch if you call container(keyedBy:) when the thing you're decoding isn't an object/dictionary. This behavior was explicitly specified in a doc comment in SE-0166, although that part of the doc comment didn't make it into the final documentation, so I'm not sure if coders are required to behave like this.

This, on the other hand, will definitely break with some coders. For example, the plist format doesn't have a way to write null, so this:

import Foundation
struct HasOptional: Encodable { var value: Int? = nil }
let plist = PropertyListEncoder()
plist.outputFormat = .xml
print(String(decoding: plist.encode(HasOptional()), as: UTF8.self))

Prints this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict/>
</plist>

I don't think any coder would have trouble distinguishing true from a missing key; that's why I chose that value. (I could totally imagine coders having trouble with false, null, 0, "", or even [] or {}, though.)

wowbagger · December 2, 2020, 7:22am

I think maybe we could think of case dumpToDisk as case dumpToDisk(). The latter is invalid in Swift today, so it can probably used as a conceptual stand-in.

I agree with everything else you pointed out. They are very good improvements to the proposal's consistency and complexity.

wowbagger · December 2, 2020, 7:44am

Associated values can also be unlabeled, in which case they will be encoded into an
array instead (that needs to happen even if only one of the value does not have a label):
enum Command: Codable {
  case load(String)
  case store(String, value: Int)
}
would encoded to
{
  "load": [
    "MyKey"
  ]
}
and
{
  "store": [
    "MyKey",
    42
  ]
}

I think it would be better if we try to preserve all the labels, for example:

{
  "store": {
    "": "MyKey",
    "value": 42
  }
}

or,

{
  "store": [
    {"": "MyKey"},
    {"value": 42}
  ]
}

If we choose the latter option, we can keep everything consistent regardless of the number of associated values and labels:

With associated values and a label for each value/parameter:

enum Command: Codable {
  case store(key: String, value: Int)
}

would be encoded to

{
  "store": [
    {"key": "MyKey"},
    {"value": 42}
  ]
}

With associated values and omitted labels:

enum Command: Codable {
  case store(String, value: Int)
}

would be encoded to

{
  "store": [
    {"": "MyKey"},
    {"value": 42}
  ]
}

No associated values:

enum Command: Codable {
  case dumpToDisk
}

would be encoded to:

{
  "dumpToDisk": []
}

I think this will help with overloaded cases as well. Also, it makes the structure more clear when the associated values are of collection types.

I missed the pitch train, so I'm not sure if this has been suggested previously.

EDIT: Replaced "_" with empty string "" as key for associated values without labels. This way it also allows encoding of escaped underscores:

enum Command: Codable {
  case store(`_`: String, value: Int)
}

would be encoded to

{
  "store": [
    {"_": "MyKey"},
    {"value": 42}
  ]
}

EDIT 2: @drexin Just a friendly ping. Do you have any thought/response to this post?

Morten_Bek_Ditlevsen · December 2, 2020, 8:58am

Thank you for the proposal - it's a complex topic, so kudos for trying to solve it for us all.

Also a very big +1 for providing a solution to the problem.

Perhaps the proposal could mention that all keys (from enum case names and from associated value label names) will have the keyEncodingStrategy and keyDecodingStrategy of the JSONEncoder and JSONDecoder applied?

I would like to echo a few of the comments of @beccadax and throw in a small question:

In case of:

enum Command: Codable {
  case load
}

If this were to encode as in one of the suggestions by @beccadax as:

{
  "load": true
}

(here I agree that true is the correct choice of value for the same reasons that Brent gives)

Then load is clearly a key and the encoding of the key should follow the keyEncodingStrategy of the (json) encoder.

If we chose to encode it without a keyed container as simply:

"load"

Then it is no longer a key and would not respect the keyEncodingStrategy.
It might be confusing for users to understand when a case might be a value and when it might be a key.

To me this confusion suggests the following rule:
If an enum is made Codable, then all of it's cases should encode (and decode) to (and from) keyed containers using the case name as the key.

A question (where I feel I know the answer, but would like it clarified):
In case of the following:

enum Command: String, Codable {
  case load
}

Here the compiler would use the existing Codable conformance of RawRepresentable (String) instead of synthesizing this newly suggested conformance, right? Otherwise this would be a behavior breaking change.

I'd very much like to echo the suggestion from others that single unlabeled associated values should encode directly as values with the case as the key:

enum Command: Codable {
  case load(String)
}

encodes as:

{
  "load": "my associated value"
}

Like @George , I've written many, many manual Codable enum conformances exactly in this way.

If this should not be the default, then I'd at least hope for some customization point that can give this behavior.

I think it would be ok to have codable synthesis for a number of well defined situations and refuse to synthesize conformance for other situations.

For instance perhaps don't try to synthesize the conformance when the associated values are a mix of labelled and unlabelled values?

Also perhaps don't try to synthesize the conformance when multiple cases share the same name?

I think the feature will provide a great value even with those two restrictions.

My personal wish list for the encoding of associated values:

No values: encodes as true (using the reasoning that the case name must be a key and it thus must have a value - and the value should not be empty/null/falsey).
Single unlabelled value: encodes the value directly
Any number of labelled values (where all labels are unique): the values encode into a keyed container where the keys are the labels - and the keys can be customized as you suggest.

Thanks again - it will be awesome to have a solution for this in the language!

And also apologies from me for not having commented during the pitch fase.

Morten_Bek_Ditlevsen · December 2, 2020, 9:20am

Idea: Could the proposal perhaps include sample synthesized code that would be generated?

For instance (and this only works with the limitations that I suggested in the post above):

enum Value: Codable {
  case noValues
  case allLabelled(a: String, b: String)
  case singleUnlabelled(String)
}

Could have the following conformance synthesized:

  enum CodingKeys: String, CodingKey {
    case noValues
    case allLabelled
    case singleUnlabelled
  }

  enum AllLabelledCodingKeys: String, CodingKey {
    case a
    case b
  }

  init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    if container.contains(.noValues) {
      self = .noValues
    } else if container.contains(.singleUnlabelled) {
      self = try .singleUnlabelled(container.decode(String.self, forKey: .singleUnlabelled))
    } else if container.contains(.allLabelled) {
      let nested = try container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
      self = try .allLabelled(
        a: nested.decode(String.self, forKey: .a),
        b: nested.decode(String.self, forKey: .b)
      )
    } else {
      throw ..
    }

  }

  func encode(to encoder: Encoder) throws {
    var container = encoder.container(keyedBy: CodingKeys.self)
    switch self {
    case .noValues:
      try container.encode(true, forKey: .noValues)
    case .allLabelled(let a, let b):
      var nested = container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
      try nested.encode(a, forKey: .a)
      try nested.encode(b, forKey: .b)
    case .singleUnlabelled(let value):
      try container.encode(value, forKey: .singleUnlabelled)
    }
  }

George · December 2, 2020, 5:28pm

I've been thinking about this a bit more, and the "how do associated values get encoded" seems isomorphic to "how do tuples get encoded". This supports my alternative proposal above, since we can separate the problem in three:

How do enums with a zero or one associated types get coded?
How do tuples get coded?
How do those to systems interact when an enum's "associated type" is a tuple (this is hand-wavy because caseName(Tuple) is not strictly equivalent to caseName(Tuple.0, Tuple.1...))

The third question seems extremely thorny to me, so I'd be supportive of moving down this list one at a time, and possibly arriving at a state where we decide the complexity of (3) outweigh the benefits.

wowbagger · December 2, 2020, 5:50pm

I don't think tuples will get the auto-synthesis, because they can't conform to Codable (yet).

However, if we consider an enum case's associated values equivalent to a tuple, then I think we can perhaps borrow from this proposal to create compiler-synthesised Codable conformance for tuples. Similar to what I suggested up-thread for enum cases, it's probably the most consistent to encode tuples as arrays of single-keyed dictionaries:

(foo: "bar", 42)

encodes to

[
    {"foo": "bar"}
    {"": 42}
]

("singleValueNoLabel")

encodes to

[
    {"": "singleValueNoLabel"}
]

Void()

encodes to

[
    {}
]

EDIT: Replaced "_" with empty string "" as key for associated values without labels.