SE-0239: Add Codable conformance to Range types

tkremenek · December 14, 2018, 1:19am

The review of SE-0239: Add Codable conformance to Range types begins now and runs through Thursday, December 20th, 2018.

The proposal is written by @dlbuckley, with supporting material from @Ben_Cohen and @moiseev (all three considered the proposal authors).

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0239" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to Swift!

Ted Kremenek
Review Manager

CTMacUser · December 14, 2018, 2:29am

I approve
Yes, in that as many Standard Library types should conform to the SL protocols as practical, especially the ones that can be automatically applied, so user types containing SL types can take advantage of automatic application too.
Yes.
N/A
A quick reading after seeing this post.

anandabits · December 14, 2018, 2:37am

This proposal is very light on details. Given that Codable is used to serialize data (potentially for communication across networks) I think the specific details of the serialization format chosen for aggregate types like the range types are extremely relevant to evaluating a proposal. I think they should be clearly stated and subject to bikeshedding by the community.

I hope the proposal authors will be willing to update the proposal to include these details, or at least post them here for discussion. (I appologize for not having read the discussion thread, if this was addressed there please point me in the right direction)

Ben_Cohen · December 14, 2018, 5:05am

These details are intentionally omitted from the proposal. Codable serializations are not intended to be for human presentation, and how a type chooses to serialize itself (whether or what keys it uses) are not part of its API but it’s private implementation and so shouldn’t be bikeshedded.

Edit: for those interested, the discussion on the implementation PR converged on an unkeyed container.

Alejandro · December 14, 2018, 6:18am

While I can agree that this is useful in preventing things like Conformance of Range to Codable may cause problems for libSwiftPM clients, I do however have concerns about these specific types conforming to Codable. These types are somewhat special because they can be represented in many different ways, whereas types like Int, Bool, [Int], etc. all have a general form in serialization formats (e.g. "key": 123, "key": true, "key": [1, 2, 3]). We actually have a great example of this diverse representation with the issue above where SwiftPM serialized Range as, in json, {"lowerBound": 0, "upperBound": 16}, and SourceKit-LSP serializing Range as, in json, {"start": 0, "end": 16}. The proposal recognizes this and suggests wrapping Range, or similar, and customizing the serialization method used. This is both fine and not fine. Fine in a sense that we're still able to preserve our serialization method, but not fine because it is somewhat creating an issue where developers can't use the stdlib type in their code anymore (serialization wise). I just strongly suggest that we find a method that is represented very commonly.

gwendal.roue · December 14, 2018, 7:32am

Everyday, terabytes of data are encoded with JSONEncoder, and sent to remote servers.

If SE-0239 is accepted, ranges will be serialized as well. People will start to rely on some format.

But the standard library wants to keep the serialization format of ranges private.

We have a conflict.

The solution is well known: just extend standard decoders and encoders with strategies that provide a documented and stable output, just like we already have (JSONEncoder.DateEncodingStrategy, etc.)

gwendal.roue · December 14, 2018, 7:43am

And I have a question. Will the chosen format be decodable for all eternity? Even after standard library Range types are changed/deprecated/replaced (they already did)?

I mean, we, community, may want to check how robust is the chosen format. If it's opaque, it's difficult to judge. If it's not robust, SE-0239 will have a bad reputation.

dlbuckley · December 14, 2018, 8:07am

@gwendal.roue there was a bit of discussion around this already in the lead up to this review, you can find more information about the concerns around adding an additional strategy starting here (the link deliberately starts early in the discussion to give context): https://github.com/apple/swift-evolution/pull/915#issuecomment-442271627

gwendal.roue · December 14, 2018, 8:29am

Thank you Dale! I recommend all reviewers here to click on your link.

bzamayo · December 14, 2018, 9:41am

I support the addition. It’s an obvious addition that isn’t there today because of time constraints and nothing else. I appreciate the flexibility from the Swift Core Team to add this in post-5.0 branch.

The source compatibility issue is something we just have to take on the chin for this one; documentation has always discouraged conforming standard library types to Codable anyway.

Swift offers sufficient machinery to enable payloads of ‘ranges’ that don’t use unkeyed containers (custom structs) and I don’t see a need to offer special affordances here.

svanimpe · December 14, 2018, 9:43am

Like many others, I currently have my own Codable extension on Range. I understand this isn't safe or recommended, but I'm fine with migrating my data to whatever format the Standard Library adopts. However, I don't see the point of Range becoming Codable by default, if its encoded form isn't part of its API.

For example (note that I work mostly server-side):

If I build a back-end and want to use Codable to generate my JSON output, I don't want my output to be an implementation detail as it's part of my API.
If I'm rendering a web front-end and want to use Codable to transform my view model types to a Stencil context, I also don't want my output to be an implementation detail as my template files depend on the structure and property names of the context.
If I'm using Codable to persist entities, I also don't want my output to be an implementation detail as I may need to write queries that depend on the encoded form.

Does this mean I need to use a workaround (wrapper type) for all these cases?

Can you show some examples of use cases where the encoded form can be an implementation detail?

gwendal.roue · December 14, 2018, 9:57am

I support this question :-) Because it there are none, the proposal is purposeless ;-) And exposing some intended valid use cases will help communicating the discouraged ones (all currently "hidden" in the Github thread).

gwendal.roue · December 14, 2018, 10:04am

I have one. It is possible and safe, for example, to communicate between two sibling processes that share the same Codable type (cli/daemon, application/plugin, phone/watch, ...).

dlbuckley · December 14, 2018, 10:57am

I'm working a lot on the server side of things too and there are already some cases where I've had to add wrappers for certain types that conform to Codable due to wanting more control over what the API vends (Measurement is the main one that comes to mind). But I think that highlights the flexibility of the options we have when deciding to use the default Codable conformance of a type or some other representative structure.

By conforming the Range types to Codable, use cases like @gwendal.roue mentioned above don't need any custom implementation and a standard can be agreed on where the implementation detail doesn't matter. In our case we have actually implemented the proposed unkeyed representation of Codable in our API due to speed and query benefits and that it made sense for our particular use case. There are always going to be cases where you might want to vend a type in a slightly different way, but this proposal isn't preventing that option.

If we look throughout the rest of the standard library and swift foundation we can see every other type that makes sense to conform to Codable already does (from String and Int to CGPoint and Decimal), the Range types are just the final ones that are missing.

anandabits · December 14, 2018, 4:35pm

I suspected you would say this. I would agree if Codable was primarily used for encoding to a proprietary binary format. However, in reality it is primarily used to encode and decode JSON. That JSON is often transmitted across networks to programs written in other languages. Therefore it cannot be purely private implementation. It is visible to the world and people will depend on it.

Like it or not, people will write code that implicitly depends on the serialization formats used by standard library types. Further, these formats will end up being used by default in JSON data exposed by REST APIs (as Swift on the server grows).

You can argue this is a bad thing and people get what they deserve if they aren't thoughtful and explicit about their own data formats. On the other hand, you can also argue that in practice the decisions made about Swift standard library and Foundation types will have real world consequences in the JSON exposed by a lot of REST APIs.

My position is that we should choose to be a good citizen of the internet by thinking carefully about the consequences of the choices we make regarding JSON serialization of our standard library and Foundation types. The best place to do that is on Swift evolution, not in pull request discussions.

That said, thank you for the link to the pull request discussions. I read them and some of the issues I am concerned with were indeed discussed there particularly in this thread. @dlbuckley shared an illustrative example:

But lets say that we did use an un-keyed or single value container, this is what it would look like in terms of a struct with a date range property converted into JSON (just stating this for keeping all discussions clear in black and white):

Closed Range:  `{"myRange" : ["1970-01-01T02:46:40Z", "1970-01-01T05:33:20Z"]}`
Range:  `{"myRange" : ["1970-01-01T02:46:40Z", "1970-01-01T05:33:20Z"]}`
PartialRangeUpTo:  `{"myRange" : "1970-01-01T02:46:40Z"}`
PartialRangeThrough:  `{"myRange" : "1970-01-01T02:46:40Z"}`
PartialRangeFrom:  `{"myRange" : "1970-01-01T02:46:40Z"}`

From the JSON perspective the only way you would be able to distinguish between the range types would be documentation for the payload. Compared to:

Closed Range:  `{"myRange" : {"from": "1970-01-01T02:46:40Z", "through": "1970-01-01T05:33:20Z"}}`
Range:  `{"myRange" : {"from": "1970-01-01T02:46:40Z", "upTo": "1970-01-01T05:33:20Z"}}`
PartialRangeUpTo:  `{"myRange" : {"upTo": "1970-01-01T05:33:20Z"}}`
PartialRangeThrough:  `{"myRange" : {"through": "1970-01-01T05:33:20Z"}}`
PartialRangeFrom:  `{"myRange" : {"from": "1970-01-01T05:33:20Z"}}`

@Tony_Parker argued for the chosen design (unkeyed containers) with:

I don't believe that we should impose the cost of carrying documentation strings in the archive itself as a primary goal. String keys can have a documentation as a side benefit. I don't think for mathematically closed types, which can gain no new keys, the documentation side benefit is worth the tradeoff.

and later:

I'm sure we all have experience with under-documented JSON output, and that is why the idea of putting these strings in the archive is attractive. However, my opinion is that these situations are a failure of specifying the JSON correctly. An application level bug, really.

We can't really fix that by using string keys for this one type. There will always be more cases where you can't figure out what the intended purpose of a value in JSON is, so this would be at best a partial solution anyway.

I agree that we cannot fix under-documented JSON. But we can acknowledge that this is a significant problem in the real world. We can also acknowledge that even when JSON is documented choices made in the schema design have consequences on how easy or difficult that JSON is to work with (especially from other languages).

I also agree that this discussion is beyond the scope of a single type. However, I do think that acknowledging these realities should influence the choices we make about how our types are serialized to JSON. This is a complex tradeoff but I feel like the community should play a role in making the decision.

My overarching point here is that when it comes to the JSON serialization format used by Swift types I feel we have a responsibility to consider not just the experience of Swift programmers, but also the broader internet community which will inevitably encounter JSON produced using whatever serialization format we choose. This format should not be considered a private implementation detail.

One unfortunate tension that I noticed in reading the code review discussion is that there really competing goals in different serialization contexts. Sometimes you want an optimized (perhaps binary) serialization. In other cases (such as public JSON APIs) you often want a format that is human readable and relatively clear about meaning even though supplemental documentation is always going to be necessary.

Codable requires types to make a single choice about serialization which is necessarily going to be sub-optimal in one of these contexts. One way to resolve this is to choose the optimized format and use wrappers at a higher level in the system when human readable formats are necessary. Is this the informal policy that is being adopted by the standard library and Foundation?

Have you given any thought to other ways of resolving the tension between the goals of self-documenting data formats and more optimized serialization formats?

Ben_Cohen · December 14, 2018, 4:59pm

The solution would be a different protocol and serialization library that has a goal of producing self-documenting data, unlike Codable which does not have this goal.

I don't know what you mean by a policy. Clearly using Codable in conjunction with wrappers is a technique you can use, and the one being suggested as a workaround if you've rolled your own Range: Codable.

But at that point you've moved out of Codable and into a higher-level capability which happens to use Codable as it's implementation. You might then build on that technique with other helpers to facilitate self-describing serialization or other capabilities. But that doesn't imply any kind of policy stance by the std lib or Foundation.

anandabits · December 14, 2018, 5:06pm

Most people want JSON to be reasonably self-documenting and one of the predominant use cases for Codable (at least for this decade) is JSON. This conflict in goals seems to me like a problem for Swift. We are certainly not being a good internet citizen if the primary tool we reach for to encode JSON has goals that are contrary to producing high quality JSON.

Perhaps we need to consider something which has goals more aligned with what most people want when they encode to JSON. Requiring people to write wrappers everywhere just to get reasonable looking JSON isn't a great answer IMO.

moiseev · December 14, 2018, 8:10pm

Can you please elaborate on this? Why can't they use Range anymore? MyRangeWrapper is a trivial wrapper type, it is constructed from Range and has a range property or the same type. Yes, it is inconvenient to use this scheme, but 'can't use' is a very strong statement I disagree with. If one needs a very specific serialization format in mind, they should use a custom type with a custom conformance anyway.

bzamayo · December 14, 2018, 8:32pm

Just to add on to what I said above, Codable implementations should not optimise for what makes sense for JSON, because the whole point of Codable is that it abstracts over the output format*. As such, the only thing that makes sense to me is for standard library implementations of encode(to:) to prioritise correctness first, compactness second. An unkeyed container for Range fills that role sublimely.

*Perhaps some kind of middleware encoding strategy on the Encoder/Decoder that enabled per-type overrides would be an interesting thing to explore in the future.

anandabits · December 14, 2018, 9:37pm

I understand this. The point I am making is that JSON is an extremely important use case (probably the most common Codable use case by far today) and is not well served by this abstraction. If we push users to use Codable for working with JSON and do not do things that make sense for JSON then we are not being good citizens of the internet IMO.

Given that JSON is (for now) the lingua franca for data exchange on the internet I think Swift needs a solution that produces good JSON by default. Maybe Codable isn't the right answer, but if it isn't we should start having the conversation about what is. If we expect users to shoulder the entire responsibility for producing clean JSON we are contributing to the problem of difficult JSON rather than helping to mitigate it.

Per-type overrides at the Encoder / Decoder level are a poor solution to the kind of issues people run into in the wild when working with JSON.