Do keep in mind that we still ought to distinguish:
(foo: "bar", 42)
and (foo: "bar", `_`: 42)
and the enum cases based on these.
I suppose
(foo: "bar", `_`: 42)
can be encoded to
[
{"foo": "bar"},
{"`_`": 42}
]
Although, this does make it inconsistent with the synthesis for other types that don't need to worry about labels.
EDIT: Just gave it another thought,
(foo: "bar", 42)
encoding to
[
{"foo": "bar"},
{"": 42}
]
and
(foo: "bar", _: 42)
encoding to
[
{"foo": "bar"},
{"_": 42}
]
is better
We would also need to distinguish between the following:
case foo(_: Int)
case foo(bar: Int)
which wouldn't be a problem in my reduced-scope suggestion since you can't write
case foo(Foo)
case foo(Bar)
as it is considered a redeclaration of an enum case
JSONSerialization
has a an option to allow fragments, but yes, PropertyListSerialization
does not allow that. So I think keeping it an object is the more compatible variant.
That is not the same structure es we would use for the enums, though. That would be:
import Foundation
enum Foo: Encodable {
case bar
enum CodingKeys: CodingKey {
case bar
}
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
try container.encodeNil(forKey: .bar)
}
}
let plist = PropertyListEncoder()
plist.outputFormat = .xml
print(String(decoding: try! plist.encode(Foo.bar), as: UTF8.self))
which results in the following output:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>bar</key>
<string>$null</string>
</dict>
</plist>
Since nil
values are allowed in these positions in Codable
, wouldn't it be the codec's responsibility to handle it properly?
As mentioned above, I would prefer to explicitly disallow duplicate case names for the synthesis. The goal of the proposal is to cover the most common cases in a reasonable way. That means finding the balance between restricting it to special cases and covering all possible cases.
Thank you
The KeyEncodingStrategy
and KeyDecodingStrategy
are specific to JSONEncoder
and JSONDecoder
and apply in the same way as they do for other types. Codable itself (or the synthesis) does not have any influence on that, so I don't think it would contribute to the clarity of the proposal to talk about features of specific codec implementations. The important part is that case names and labels (for cases that have no unlabeled parameters) are represented as CodingKey
s, so the same rules as for the existing cases apply.
Yes, I absolutely agree with this. I think there could be more customization options in the future to allow for the other case as well, but for this particular proposal I think consistency is the better choice.
Correct. The existing cases are not affected in any way.
Yes, I also think that this is reasonable.
I still think that falling back to UnkeyedContainer
for the mixed label case is reasonable. Is there a specific reason you'd prefer it to be rejected?
For the duplicated names I agree that it should be rejected and it should have been part of the proposal, but it slipped my mind while writing the proposal.
Regarding the KeyEncodingStrategy
(etc), you're of course right that this isn't affected by the proposal, and that it will be used 'automatically', so I agree that there's no reason to point it out in the proposal.
Well, this is just my own humble opinion of where to 'draw the line'.
My reasoning is:
There is a choice between trying to handle any enum or a subset of all enums. Trying to handle any enum would likely result in a very complex encoded structure, that would not feel 'natural' for many users.
So it appears that the choice of only handling a subset of all enums has already been made (leaving out duplicate case names, for instance).
And when this choice has been made, then the next choice is where to 'draw the line'.
And this is where my personal taste / gut feeling comes in:
I don't think that I would ever write a custom encoding of an enum case with a mix of labelled and unlabelled values into an UnkeyedContainer
. I just don't think it would be natural for me to do. If I encountered an enum case with this mix, I think that I would 'key' them in the output - most likely explicitly be adding an intermediate Codable
struct carrying all the values and thus having it encoded into a KeyedContainer
. Doing this manually of course allows me to give unique and meaningful names to the keys, so it's not something that can likely be done automatically.
I also don't think that I have encountered many enums with a mix of labelled and unlabelled values in the code bases I have been working on. Perhaps it would be worth a survey to try and figure out how many of them are used in general, how many of these are Codable
and how the Codable
respresentation tends to be implemented. I think someone built a tool to automatically test for occurrences of patterns in swift code across many public swift projects on github, but I guess it would be a huge task to do.
If I did have an enum with this mix, and I found that it would encode it's associated values into an array (or UnkeyedContainer
), then I think I would prefer to 'fix' the enum case and add labels to all values and thus have it encoded into a KeyedContainer
instead. And in that respect it would actually be more helpful to me if the enum could not have Codable
synthesized before it was 'fixed'.
Ok, I know that the above is very highly a matter of personal taste, but there is definitely a balance between a completely general format that can encode any enum (though because of the generality perhaps not in the most 'natural' way) and a more specialized format that only works for a subset of enums (but allows encodings in a way that may feel more natural in each of the specialized, supported cases).
Would it be worth to try and formalize this 'balance' into a Goal section of the proposal?
- Is it a goal to synthesize
Codable
support for any enum? - Is it a goal to solve many use cases by using a very general format? (in this case I think that the
UnkeyedContainer
approach is the most general and ought to be applied for all supported cases, but I am personally not a fan of the encoded output.) - Is it a goal to solve fewer specific use cases with specific rules? (perhaps like the three cases I tried implementing in the
Codable
conformance in my previous post ) - Is it a goal to solve some specific use cases with special formats and then 'fall back' to the more generalized format in the rest of the cases? (in other words, 3. but falling back to 2. for cases with a mix of labelled an unlabelled values).
- What might future customization points look like (other than the key-mapping ones presented in the proposal)?
I know that these are not easy questions and many people may have different preferences.
Apologies for the long rant.
No worries, this is exactly why we are doing the reviews - to get feedback from the community . I really appreciate you and everyone else taking the time to comment here.
I think it makes sense to add this information to the proposal, but let me try to answer your questions:
The goal is to find a reasonable default, meaning that not all cases (e.g. duplicate names) will be supported, but as many cases as we can reasonably support will be. This also means finding reasonable structures for these cases. If all parameters are labeled, I think most users would expect the labels to be used as keys. If there are no labels, it seems reasonable to use an UnkeyedContainer
. Since we can't make up keys for any unlabeled parameters, but we still want to support as many cases as possible, it seems reasonable to fall back to the UnkeyedContainer
for the mixed case. I don't think there is one layout that will fit everyone's use cases / preferences, but the proposed layout (with the changes already discussed) should cover many of them.
As for future customization points, I'd be happy to discuss that, but since it's out of scope of this proposal, I think that should be a separate discussion, so we don't distract from the topic at hand.
I don't think this is true in the single unlabeled parameter case. I've written and seen many implementations of Codable
for enums and not one of them encoded case foo(Bar)
as "foo": [ <encoding of Bar> ]
, instead providing the encoding of Bar
directly under the key ("foo": <encoding of Bar>
). I don't think case foo(<fields of Bar>)
is a "good enough" representation of this very frequent use case, since having a Bar
type is often very helpful (for instance, when customizing decoding for just that one case
using init(from decoder: Decoder)
).
For the record, I think this was acknowledged this in an earlier post, but I wanted to make sure this wasn't being walked back. If I'm understanding correctly, is the current proposal that enum
s with a single associated type be encoded directly (i.e. load(String)
encodes as "load": "string"
), but with multiple unlabled associated types encodes as an array (i.e. load(String, String)
encodes as "load": ["string", "string"]
)?
Here, I would still prefer the multiple-unlabeled case to simply prevent automatic synthesis as it seems a little weird to me that changing load(String)
to load(String, Int)
is a more breaking change than changing load(String, Int)
to load(String, Int, Int)
. Though this isn't the biggest deal for me, as I don't expect I would ever use that form.
Sorry for being imprecise in that response. Yes, I agree that representing single unlabeled parameter cases as the raw value instead of an array makes sense.
Here, I would still prefer the multiple-unlabeled case to simply prevent automatic synthesis as it seems a little weird to me that changing
load(String)
toload(String, Int)
is a more breaking change than changingload(String, Int)
toload(String, Int, Int)
. Though this isn't the biggest deal for me, as I don't expect I would ever use that form.
As I said earlier, I think there is no one solution that fits all use cases. I think the multiple unlabeled case is useful and I would prefer to keep it in.
I like most of the details of this proposal quite well. Thank you for working on it!
Instead, I would recommend encoding this without the array wrapping it:
{ "load": "MyKey" }
I was thinking the same when I read the proposal. +1 to this idea.
I'm not sure if the compiler has the right capitalization routines to pull this off, but I'd love it if we could instead do:
+1 to this idea as well. The _caseName
convention does not feel right for Swift.
The other point of feedback I have is that I do not think unkeyed containers should be used for associated values. Instead, I think we should establish a convention for handling them and recommend use of the coding keys to override the default. The first idea that comes to mind for the convention is to use implicit positional names such as _0
, _1
, etc for unlabeled associated values.
By default, this enum:
enum Command: Codable {
case load(String)
case store(String, value: Int)
}
would encode a store
value as:
{
âstoreâ: {
â_0â: âsome stringâ,
âvalueâ: 42
}
}
To override this, you would declare:
enum CodingKeys: CodingKey {
case load
case store
enum Store: String, CodingKey {
case _0 = âkeyForEncodingâ
case value
}
}
I really like this approach to always encoding multiple-value cases into KeyedContainer
s!
I especially like how this allows for customization of key names for unlabelled cases. Nice!
In this case the synthesis 'rules' would be:
- enums with multiple cases with the same case names will not have
Codable
conformance synthesized. - enums with cases with associated values where multiple labels have the same name will not have
Codable
conformance synthesized. - Any enum instance will encode into a
KeyedContainer
using the case name as the key - meaning that all cases will have a 'value' encoded (even cases with no associated values). - Cases with no associated values encode the value as a fixed value (true, [ ], {}, null, false, 1, 0)
- Cases with one (unlabelled or both labelled and unlabelled?) value encode the value directly.
- Cases with multiple values encode into
KeyedContainers
with unlabelled values mapping to the value position prefixed with an underscore - allowing easy customization of encoding and decoding of these keys.
I really like that!
5 should probably just be the unlabelled single value, but it could apply to all single value cases whether they have values or not.
For 4 I'd definitely prefer true
as the fixed value following the reasoning of @beccadax.
In the above, 2 should of course also disallow enums with cases where there is a clash between the synthesized key names for unlabelled parameters and actual parameter labels:
enum Invalid {
case a(String, _0: String)
}
Sample synthesized conformance:
Note that allLabelled
, mix
and singleLabelled
are all synthesized by the same 'rule'. They are just all provided for clarification.
enum Value: Codable {
case noValues
case allLabelled(a: String, b: String)
case singleUnlabelled(String)
case singleLabelled(label: String)
case mix(String, b: String, String, d: String)
}
Synthesizes something similar to the following:
extension Value {
enum CodingKeys: String, CodingKey {
case noValues
case allLabelled
case singleUnlabelled
case singleLabelled
case mix
}
enum AllLabelledCodingKeys: String, CodingKey {
case a
case b
}
enum MixCodingKeys: String, CodingKey {
case _0
case b
case _2
case d
}
enum SingleLabelledCodingKeys: String, CodingKey {
case label
}
init(from decoder: Decoder) throws {
let container = try decoder.container(keyedBy: CodingKeys.self)
if container.contains(.noValues) {
self = .noValues
} else if container.contains(.singleUnlabelled) {
self = try .singleUnlabelled(container.decode(String.self, forKey: .singleUnlabelled))
} else if container.contains(.allLabelled) {
let nested = try container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
self = try .allLabelled(
a: nested.decode(String.self, forKey: .a),
b: nested.decode(String.self, forKey: .b)
)
} else if container.contains(.mix) {
let nested = try container.nestedContainer(keyedBy: MixCodingKeys.self, forKey: .mix)
self = try .mix(
nested.decode(String.self, forKey: ._0),
b: nested.decode(String.self, forKey: .b),
nested.decode(String.self, forKey: ._2),
d: nested.decode(String.self, forKey: .d)
)
} else if container.contains(.singleLabelled) {
let nested = try container.nestedContainer(keyedBy: SingleLabelledCodingKeys.self, forKey: .singleLabelled)
self = try .singleLabelled(
label: nested.decode(String.self, forKey: .label)
)
} else {
throw NSError(domain: "", code: 0, userInfo: nil)
}
}
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
switch self {
case .noValues:
try container.encode(true, forKey: .noValues)
case let .allLabelled(a, b):
var nested = container.nestedContainer(keyedBy: AllLabelledCodingKeys.self, forKey: .allLabelled)
try nested.encode(a, forKey: .a)
try nested.encode(b, forKey: .b)
case .singleUnlabelled(let value):
try container.encode(value, forKey: .singleUnlabelled)
case let .singleLabelled(label):
var nested = container.nestedContainer(keyedBy: SingleLabelledCodingKeys.self, forKey: .singleLabelled)
try nested.encode(label, forKey: .label)
case let .mix(_0, b, _2, d):
var nested = container.nestedContainer(keyedBy: MixCodingKeys.self, forKey: .mix)
try nested.encode(_0, forKey: ._0)
try nested.encode(b, forKey: .b)
try nested.encode(_2, forKey: ._2)
try nested.encode(d, forKey: .d)
}
}
}
Iâm a strong -1 on this proposal as it is written currently.
The reason that SE-0166 left out encoding of enums with associated values is that unlike structs and RawRepresentable
enums there is no obviously correct format for them. There have been a number of discussions about possible formats, e.g. in the thread linked to from the proposal, without a clear consensus emerging. As such, the problem to be solved here isnât just to introduce a format for enums with associated values â but to introduce a format and show why that format is a better default than the many alternative formats that have been discussed, given the trade-offs involved. This proposal doesnât do that.
One of the specific problems that I see with the suggested format is that it makes it hard to evolve schemas over time. In my experience itâs really common to want go from âthis can only be Aâ to âthis can be A or Bâ, i.e. from a struct to an enum. With the suggested format, this type of change is neither backward compatible nor forward compatible. With a format where the case name is encoded as its own key, this type of change can instead be made both backward compatible and forward compatible.
To illustrate, lets say we have a struct:
struct Foo: Codable {
var a: Int
var b: Int
}
This gets encoded like so:
{
"a": 1,
"b": 2
}
Now we want to evolve this to be an enum where the old Foo
is one of several cases:
enum Bar: Codable {
case foo(a: Int, b: Int)
case baz(b: String)
}
With the proposed format, this would get encoded like so:
{
"foo": {
"a": 1,
"b": 2
}
}
This means that existing Foo
s encoded with earlier versions canât be decoded as Bar.foo
s by the new version (the new version isnât backward compatible) and that Bar.foo
s encoded with the new version canât be decoded as Foo
s by earlier versions (earlier versions arenât forward compatible).
If we instead encoded the name of the case as its own key, like so...
{
"$case": "foo",
"a": 1,
"b": 2
}
...then Bar.foo
s encoded with the new version can automatically be decoded as Foo
s by earlier versions (theyâre forward compatible). And if we default to the .foo
case when the $case
key is missing, then Foo
s encoded with earlier versions can be decoded as Bar.foo
s by the new version (itâs backward compatible).
It may well be that encoding the case name as its own key has other problems that the format proposed here doesnât have. Perhaps the proposed format is really common among existing JSON APIs, and interoperability with those APIs is more important than the ability to evolve schemas over time? I donât know. But whatever format we end up choosing will, by virtue of being the default, be used by many file formats and web services for years to come. That type of decision should rest on much stronger reasoning about the technical merits and trade-offs of the format than the proposal currently has.
At the very least, I think the proposal should compare the major alternative formats that have been suggested, and describe a) how easy or hard they make it to evolve schemas over time, and b) how common they are in the wild (both in handwritten Swift and in other languagesâ serialization libraries, as well as in common APIs that Swift code might interoperate with).
I really appreciate that the resulting encoding formats are challenged. As you note, the formats will be the natural first choice for many developers going forward.
I do not, however, think that forward/backwards compatibility between the two rather arbitrary formats in your example should influence the decision. One might easily come up with a structure that encodes exactly as an enumeration case with multiple values and thus conclude the opposite: namely that the chosen format is good for structural evolution. In truth neither are better for evolution because structures can evolve in any way imaginable - and thus it seems like it does not make a lot of sense to consider this case.
But I definitely agree that it would be good with some âevidenceâ that the chosen encodings are better than others. I like the chosen ones since they overlap with my own experiences encoding enums, but thatâs just a single personal observation.
I do not, however, think that forward/backwards compatibility between the two rather arbitrary formats in your example should influence the decision. One might easily come up with a structure that encodes exactly as an enumeration case with multiple values and thus conclude the opposite: namely that the chosen format is good for structural evolution.
I disagree that the example is arbitrary; in my experience this specific change â going from "only this" to "this or that" â is one of the most common changes made to data formats over time. But perhaps the stripped-down example that I gave makes it feel more arbitrary than it is? A more life-like example would be to go from:
struct Image: Codable {
var resource: ImageResource
}
...to:
enum Media: Codable {
case image(resource: ImageResource)
case video(resource: VideoResource, startTime: Double)
}
I would also dispute this:
In truth neither are better for evolution because structures can evolve in any way imaginable - and thus it seems like it does not make a lot of sense to consider this case.
Both backward compatibility and forward compatibility are often important, and they both put hard constraints on how formats can evolve. For example, backward compatibility is often important when evolving file formats, since people donât like losing their data. And forward compatibility is typically important when evolving web APIs where older clients must be able to communicate with newer versions of the API. As such, formats donât tend to evolve in any way imaginable â they tend to evolve specifically in the ways that preserve compatibility, e.g. by adding optional keys or keys with default values, or by going from "only this" to "this or that". Swiftâs own features for library evolution are a great example of this.
Apologies, I didnât mean âarbitraryâ as in âunrealisticâ, but rather as just one of several ways in which structures may evolve. Iâm not at a computer right now but I will give other examples later.
The reason that SE-0166 left out encoding of enums with associated values is that unlike structs and
RawRepresentable
enums there is no obviously correct format for them.
I think this strikes at the core of my discomfort with this proposal. Thank you for articulating all that, @hisekaldma. Historically, we've reserved protocol requirement synthesis for only those which have such "obviously correct" implementations. This philosophy guided the synthesis of Equatable
and Hashable
implementations, and precluded the inclusion of other protocols like AdditiveArithmetic
and Comparable
. I'm wary about including synthesis for implementations that have the goal of "just do it some way, doesn't matter which."
IMO, such synthesized implementations are better suited for a more generalized synthesis feature that would allow authors to specify exactly how a conformance should be derived (discussed some here for AdditiveArithmetic
, but the ideas are more general than that one protocol, and have cropped up in other threads over the years).
Thanks for your feedback. My thoughts on your proposed format are:
- How would the synthesized decoder decide which case to pick as the default one?
- What if the labels and types are the same for multiple cases and only the case name is different?
The format you propose would make it look identical to the old code and could cause unwanted effects.
I agree that evolution is important, but correctness is more important for the default behavior.