SE-0239: Add Codable conformance to Range types

anandabits · December 19, 2018, 5:11pm

Thank you for responding @itaiferber. I really appreciate you engaging with this discussion.

I wouldn't say JSON is the crux of my argument. As you pointed out, it is only the format of the moment. I have emphasized JSON in my argument because it is a concrete use case everyone can understand. Everything I have been saying can be generalized to any format that is conventionally prioritizes ease of human consumption above compactness.

One of the fundamental points I am making is that for aggregate types I don't think it is possible to define a default that is suitable for all formats, yet Codable requires a type to attempt to just that.

The mantra we like to repeat in the Swift community is that protocols are not just about syntax, but also about semantics. There is a significant semantic conflict between encoding to formats that prioritize compactness and encoding to formats that prioritize human consumption and / or self-documentation.

There is also a significant semantic conflict evidenced by the discussion of whether the encoded representation of a type should be considered a private implementation detail or a public specification that receives community input. I think it is appropriate for the representation to be considered private for binary formats prioritizing compactness but not for textual formats that prioritize human consumption and self-documentation. If human consumption is a priority then the format must be specified and it is reasonable to request input by the community of humans contributing to Swift's evolution.

For these reasons, Codable is starting to feel to me like a problematic overgeneralization that doesn't take the semantics of encoding and serialization formats seriously enough. Fortunately we have options that don't involve tying ourselves to specific formats. We can model the semantic patterns we see in the domain of encoding formats.\

These feels like an ad-hoc solution. The example of URL is a good one. There are a lot of formats where URL should be encoded as a String. If this behavior lives in JSONEncoder then any other encoders for similar textual formats will not benefit from that behavior by default and will need to re-implement that behavior. Worse, there are a lot of types the implementation of any given encoder won't know about.

I can understand this perspective and don't disagree within the context of the design as the team at Apple intended it. If anything, this thread has highlighted the fact that at least some of us in the community want Swift to be good out of the box at targeting textual formats which prioritize human usability (of which JSON is currently most popular). Secondarily, this thread has highlighted the fact that Codable was not designed to meet this goal. I think many of us would also like Swift to be good out of the box at targeting formats that priories compactness. These are not specific formats, but specific priorities that matter a lot in practice when we serialize data.

I am arguing that these defaults are not actually format agnostic. You even state here that they choose an efficient and concise representation. This is fine for some formats, but also not the right choice for many other formats. In fact, it is not great for the formats that happen to be some of the most pervasive formats in the world. JSON may fall out of fashion, but for the foreseeable future I would bet quite heavily that if something new gains favor it would also be a textual format that is reasonably self-documenting.

That's fair. This thread has moved in a direction I didn't anticipate when I first replied. I still do feel it is relevant to the current review though. I would support this proposal as-is if it becomes clear that the semantics of Codable choose to prioritize compactness and we agree that given these semantics Codable is not an adequate solution for formats such as JSON (which should lead to work exploring improved support for textual, self-documenting formats).

As I mentioned above, this feels like a pretty unfortunate ad-hoc solution tied to a specific encoder. Whatever override we would do for JSON would likely be applicable to a very broad range of formats that share similar characteristics (XML, YAML, etc).

I'm speaking more broadly than the current design of Codable. Imagine using attributes or other annotations at the property level that specify encoding policies. I have written a Sourcery template that supports a broad family of annotations for this purpose. This template very intentionally lowers all dates to specific string or numeric representations in order to ensure consistent serialization of values regardless of the date encoding strategy specified on the encoder.

In my experience, encoder level strategies are something to avoid. I don't want the encoded representation of my type changing based on the configuration of the encoder / decoder. This allows incompatible encoded representations to be produced by a single Codable conformance. For this reason, it feels like it goes too far in favoring convenience over precision.

This is good to hear!

I agree. My argument is that the current design doesn't have strong enough semantics and that there are at least two broad families of serialization formats that have conflicting semantics. Codable has in some sense chosen one set of serialization semantics without explicitly stating this and is used for formats that where this set of semantics produces out of the box results that are not great (i.e. encoding ranges as an array). So my argument is fundamentally about correctness, not convenience.

I would modify this to say that sometimes a custom encoding strategy is the way to go. Wrapper types are only one way of getting there. But I do agree that a custom strategy is sometimes required.

There is a third case here: I care, but if the default is specified and happens to align with my needs. And a fourth case: I do care but there is a mechanism to specify the policy / strategy I want to use without writing my own custom implementation.

itaiferber · December 19, 2018, 7:06pm

Thank you for the detailed response! I had my own response sitting here that addressed your points one-by-one, but I don't think that this will actually serve us all that well.

If I may, I think I understand your argument better now to be "given the current limitations of Codable, are we interested in forever committing to a default representation for types which is human readable, or optimized for performance and conciseness?", siding with human readability. Is this correct?

If so, I'd like to express the fact that we're well aware of the current limitations of Codable implementation, due to current language limitations, and time constraints. I agree that this thread has moved in a different direction than originally intended, and while we don't agree on all of the specifics, I think we both want to make Codable better. Codable's limitations don't need to stay, and we actively want to evolve the API and language to a point where we can make it the best it can be — an API which does the right thing in the majority of cases will benefit everyone.

Things that are clear from this thread we're lacking here, among others:

A convenient way to provide easier property-by-property overriding of representations, without having to lose synthesis (e.g. via adaptors/annotations/etc.)
An easier way to express for a given type T across all usages of that type that a human-readable form is preferred over a concise form, without necessarily having to use a format-specific strategy (e.g. a way for Range to potentially offer both representations and choose between them, so all Encoders potentially benefit)

It sounds to me like these two would go a long way toward making this issue less prevalent.

So, can we work together to help design new language and library features to address these constraints, rather than giving up on Codable altogether? We'd like community feedback and involvement here, and I think starting a new thread for this is a good idea. Internally, we've definitely got ideas about where to take things, but haven't had the bandwidth to push them forward yet.

To briefly summarize my thoughts about this matter directly:

I think the human-readability-vs.-performance axis is more interesting to discuss than JSON-vs.-other-formats, and I generally object to the idea that "the right choice" will be obvious, even for formats like JSON. Not everyone writing JSON needs to target human readability, nor is JSON the most human-readable format out there. Even within the collection of folks who have to write JSON, performance and compactness can be very reasonable goals — sometimes, you have to serialize to JSON because the target endpoint requires it; sometimes, it's the format that all the devs on the team know best; sometimes, it's just an easy default to reach for because the server platform has easy-to-use JSON tooling. A lot of work has gone into optimizing JSON parsing and efficiency throughout various frameworks and platforms, and it's not unreasonable to prefer targeting that as a default
However, we should work toward providing clear support for both, in my opinion, and make it significantly easier to express a preference for human readability over performance in a way that still allows types like Range to offer one or both without having to sacrifice either. This is possible today, but more cumbersome than we can likely do in the future
As such, we should invest our efforts into improving language features and tooling around where we'd like Codable to go such that types like Range can give a more performant default, without sacrificing readability where it matters
Thus, Range's default implementation being less readable is reasonable (and IMO preferable in more common cases), and we should aim not at forcing it to be human readable, but offer ways to improve this for formats we care about, out of the box

As a final point, the guidelines we've followed for all currently offered Codable types are as follows, in order of highest to lowest priority:

Backwards compatibility: might the type ever evolve in the future? If so, keyed containers are desirable so we have the most flexibility in offering compatible implementations. For instance, most of our types (e.g. DateComponents) use keyed containers for the future addition of properties; others have a very specific definition which will never change (e.g. CGPoint will never add a 3rd value; Date will only ever need to be represented by its time interval; Data is conceptually always going to be an array of bytes), and in these cases, we stray from keyed containers for:
Performance: within the constraints of compatibility, how can we best represent the type? This is relatively simple: keyed containers are obvious; multiple non-keyed values go in an unkeyed container; single values encode as themselves
Readability: within the constraints of the above two, how do we make the types more readable? (This affects names for key values in keyed containers, and ordering in unkeyed containers)

Hence, following these same guidelines, it's not unreasonable for Range types to use non-keyed containers by default: their representations will never change, and for performance, they may be represented in unkeyed/single-value containers. The goal is now to also offer readability for those who need it.

benrimmington · December 20, 2018, 8:35am

What is your evaluation of the proposal?

+1

Each type of range could also encode/decode an operator string to/from its unkeyed container.

 extension ClosedRange: Encodable where Bound: Encodable {
   public func encode(to encoder: Encoder) throws {
     var container = encoder.unkeyedContainer()
     try container.encode(self.lowerBound)
+    try container.encode("...")
     try container.encode(self.upperBound)
   }
 }

For example, a closed range of Date.distantPast ... Date.distantFuture would be:

[-63114076800,"...",63113904000] or
["0001-12-30T00:00:00Z","...","4001-01-01T00:00:00Z"]

using a JSONEncoder.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Yes.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

A quick reading.

anandabits · December 20, 2018, 9:19pm

With regards to this proposal, yeah. If we are willing to:

commit to a direction that says the semantics of Codable target compact / performant representations
work towards something else (possibly a layer on top of Codable?) with semantics targeting self-documenting representations
provide fully-specified (via SE) default self-documenting representations for the necessary standard library and Foundation types
agree that JSONEncoder and JSONDecoder should prefer the self-documenting representation by default when it becomes available (with a migration for code that relies on current behavior preferring the concise representation)

then I think this proposal is well-aligned with the long term direction and would support it.

I agree, we do share the same goal! I hope you feel this discussion has been helpful in identifying some improvements we need to make to get there.

Right, I think it's fine to make all representations available to all compatible encoders, but we need a way to specify a default that is intended for encoders targeting self-documenting serialization formats, and we need our encoders targeting such formats to use that default.

And ideally it would be possible to specify distinct property-level overrides for both the compact default and the self-documenting default.

These would solve the pressing semantic issue and the steep cliff encountered when non-default representations are necessary. The third issue is the need to have fully specified representations that can be relied upon for our own fully specified formats. Self-documenting representations should be fully specified as the ability to easily correlate the representation with a value of a type is significant motivator for choosing self-documenting representations in the first place.

In the fullness of time, there are some other things are also necessary. Most prominently support for a range of encoding strategies for enums with associated values. The generator I use does this using annotations and it is extremely useful. There are a handful of other policy annotations it supports that would also be good candidates for eventual Codable features. My template also supports coding key annotations which allow coding key overrides to be specified inline with the property instead of having to explicitly declare a CodingKeys enum.

Absolutely! I'm not giving up on Codable - I just want to see its semantics clarified and see us commit to addressing use cases that fall outside of those semantics. New threads to discuss these topics sounds like a good idea to me. Would you be willing to start them based on with the thoughts of the internal team?

Agreed. JSON is only the best contemporary concrete example.

I am certainly not arguing that we should require prioritizing human usability when working with JSON but I do think it's the appropriate default. And of course we want the best performance possible given the constraints imposed by higher priorities so it is great that a lot of work has gone into optimizing JSON performance! But I don't think the existence of that work is an argument for making performance the top priority when choosing what representation is used for JSON by default.

This sounds like a great direction!

I agree, as long as we clarify that the semantics of Codable imply prioritizing performance over human usability.

These guidelines make a lot of sense given the semantics you are assigning to Codable along side a goal of supporting readability in some other way. Thank you for laying them out so clearly! I agree that the implementation chosen

At this point, I have come to the conclusion that I support this proposal on the condition that we clearly specify Codable as having semantics described by the guidelines @itaiferber provided above. Thanks to everyone who contributed to the discussion!

Karl · December 20, 2018, 11:47pm

What is your evaluation of the proposal?

+1

Is the problem being addressed significant enough to warrant a change to Swift?

Yes. As the proposal mentions, the standard library defines/owns both the Range type and Codable protocols. Nobody else can add a conformance without risking conflicts.

Does this proposal fit well with the feel and direction of Swift?

Other standard library types which can support Codable do, so I think it makes sense for Range to do so as well. AFAIK, the standard library makes no guarantees about how these types will be encoded. The only expectation I have is that the encoded representations will continue to be decodable in future versions of the standard library (not documented anywhere AFAIK).

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

N/A

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal and the discussion in this thread. Considered it a couple of times.

If you really care about how your type is represented when encoded (e.g. because you have some API specification to match with), you should not be afraid to write your own conformances to Codable. I can't believe people are talking about it like it's the worst/most difficult thing in the world. For sure, compiler synthesis is convenient when it suits your needs, but sometimes programming involves actually writing some code.

If the default implementation is not right for you, it's very simple to work around:

struct ABC: Codable {
  var name: String
  var values: Range<Int>

  enum JSONKeys: String, CodingKey {
    case name = "name"
    case minVal = "minVal"
    case maxVal = "maxVal"
  }

  func encode(to encoder: Encoder) throws {
    var box = encoder.container(keyedBy: JSONKeys.self)
    try box.encode(name, forKey: .name)
    try box.encode(values.lowerBound, forKey: .minVal)
    try box.encode(values.upperBound, forKey: .maxVal)
  }

  init(from decoder: Decoder) throws {
		let box = try decoder.container(keyedBy: JSONKeys.self)
    name   = try box.decode(String.self, forKey: .name)
    values = try box.decode(Int.self, forKey: .minVal)..<box.decode(Int.self, forKey: .maxVal)
  }

  init(name: String, values: Range<Int>) {
    self.name = name; self.values = values
  }
}

let test = ABC(name: "test", values: 36..<43)
let data = try! JSONEncoder().encode(test)
print(String(bytes: data, encoding: .utf8)!) // {"name":"test","maxVal":43,"minVal":36}

plorenzi · December 21, 2018, 12:19am

Ideally there shouldn't be any distinction between a fast and a readable coding.

Either an objet has properties, then it uses a keyed container, either it is a list, and it uses an unkeyed container. If we keep it that way, there is no question of JSON or binary file, just a data representation which is simple and fairly format agnostic.

If we want to improve performances, isn't there any other solution than changing the coding? like, if we have an object and we are sure than its properties won't change, we can optimize the coding by using an unkeyed contained behind the scene, by ordering the keys in a way or another. We would make this work by declaring the coding keys in a frozen enum.

anandabits · December 21, 2018, 12:35am

Certainly in some cases this is true, but the current proposal is a great example of where this distinction matters a lot. This comes up often enough that it warrants being designed for.

MattSeaman · December 21, 2018, 9:08pm

+1. This was a notable omission to the original set of Codable stdlib types.

This warrants a change to Swift. As a consumer of Codable, I can reasonably expect that ranges will be Codable by default.

This is an intuitive next step for codability in Swift. Range is one of the only remaining Codable holdouts. I had assumed this would breaze through review but it seems there are a lot of concerns about implementation and compatibility.
I’d just like to point out that keeping these things as private implementation detail has been the norm for most stdlib proposals. This is indeed what enables us to be able to change things in the future in a backwards-compatible, non–ABI-breaking way in the future. These theoretical future proposals can address human-readability, conditional synthesis, etc. but I think this proposal fits very well with the stated goals of Codable and the stdlib so far. To change those goals is out of scope for this proposal.

Regarding other languages, NSRange does not appear to conform to NSCoding, but many specific range types like SKRange do.

I read the proposal and this review thread.

tkremenek · December 22, 2018, 5:44pm

While the review period has technically ended, please continue discussion on this proposal. The Core Team will be reviewing this proposal in the new year.

tkremenek · January 11, 2019, 6:30pm

Proposal Accepted

Thank you everyone for the very insightful feedback provided during this review. It provided some great insights into the Swift's community uses and desires on Codable.

Regarding the proposal itself, the Core Team has decided to accept it, with the proposal amended to include the details of the encoding format chosen. The data encoding chosen is an important semantic invariant of the API that is potentially observable by users and important for binary compatibility. Future proposals like this one that discuss adopting Codable should include details — and when necessary — rationale on the encoding chosen. Further, it would be valuable if the current chosen encodings for Standard Library types were also documented so that users using the default encodings for those types can either rely upon those encodings or know when they need to customize their encoding logic for a specific task.

In addition, the Core Team decided to extend the proposal to include Codable conformance for ContiguousArray, which was similarly missing from the Standard Library. This felt like a case that required no additional review discussion.

For Range types, the Core Team concluded from the review conversation that an unkeyed container representation seems like a reasonable default, but also recognized from the discussion on this thread that there are cases where a keyed representation is preferred. What this review thread further illustrated is the need to make Codable more flexible to provide the encoding customizations needed to service different tasks. There is no set of defaults for encoding that will satisfy every use-case. Some use-cases will prioritize serialization performance over human readability, and quite often the inverse. The Core Team would like to encourage the community to channel the energy that appeared in this thread to discussing general enhancements to Codeable to make it more flexible and amendable to serving more use-cases more easily, and @itaiferber has initiated a thread to do just that.

Thank you again for everyone who participated in this review!