SE-0239: Add Codable conformance to Range types

Jon_Shier · December 14, 2018, 10:12pm

While I'm sympathetic, I'm not sure it's possible to create encoded representations that meet the best practices for every serialization format. And even though I've long be a Codable critic when it comes to JSON serialization as well as the core team's view of Codable and JSON, I've realized that it makes even more sense for consuming and producing applications to largely separate their network representations from their local Swift types. In my own code, and I've seen this elsewhere, I separate encoding and decoding for the network out in to RawEncodable and RawDecodable protocols.

protocol RawDecodable {
    associatedtype RawType: Decodable
    
    init(_ rawValue: RawType) throws
}

protocol RawEncodable {
    associatedtype RawType: Encodable
    
    func asRawValue() throws -> RawType
}

This way the request to response flow goes through the raw types: MyRequest -> RawMyRequest -> Network -> RawMyResponse -> RawMyResponse. This way the network representation of types is completely separated from their local representation. The raw types can match the expected JSON exactly (and perhaps even be generated from a spec) and transforms can be handled in a single place which can throw errors. This can all be composed generically and your local types don't even have to be Codable at all.

So while we should always push to improve the treatment of JSON in Swift, I'm not sure it's worthwhile to consider every built in representation in that light.

anandabits · December 14, 2018, 10:25pm

Sometimes this makes sense and sometimes it doesn't but I will agree that everyone should take responsibility for having both a good internal model and a good serialization format and often this means having separate types. In fact, I do something quite similar to what you posted in my primary client project.

However, the reality is that many developers do not think much about the wire format if they don't have to. That is why I think languages should provide sensible defaults. Using identical JSON arrays to encode subtly different types of ranges just doesn't seem like a good default to me.

We are not just letting Swift developers down when we do this but also the rest of the developer community that ever encounters REST / JSON APIs written in Swift. If Codable isn't the right answer for providing reasonable default JSON representations in Swift then I think we should start talking about what the right solution is.

Jon_Shier · December 14, 2018, 11:53pm

On that specific point, I have to agree. I also don't see the point of optimizing this particular design for performance and then suggesting most users create a wrapper type, destroying any performance you may have gained. Finally, I can't agree with @Tony_Parker's characterization of Range as a mathematical type, given that it's not constrained to numeric types. @dlbuckley's examples Range<Date> is rather compelling to me, and without proper labeling, use of the raw encoding of the Range types will be severely limited. Literally the only users of that representation will be those who don't need to transmit or receive the data (e.g. local archives). So what's the principle behind the design here? Mathematical (for some definition of) types should have the simplest representations possible?

Alejandro · December 15, 2018, 1:34am

Sure, what I mean by "can't use the stdlib type anymore" (serialization wise), they have to use MyRangeWrapper for all of their Codable structures. E.g.

struct X: Codable {
  // Have to use range wrapper instead of stdlib Range
  // because of custom serialization format
  var location: MyRangeWrapper
}

let decodedX = ...
// Access stdlib Range through .range property on MyRangeWrapper
print(decodedX.location.range)

This pattern of wrapping Range for a specific serialization can lead to many projects having specific range types all trying to capture the same semantic concept, but are forced to create a new type because Range's serialization format doesn't match their need. Like you said, this is a trivial workaround, but it is very inconvenient in that developers can't have "common currency" types from the stdlib for their serialization structures.

I will also add that the proposal feels very insufficiently motivated. What problems is it trying to solve? It almost feels as if the proposal found a stdlib type that wasn't conforming to Codable and just said we added that conformance without stating an actual issue from this type not conforming to Codable.

I still agree that this is useful for solving awkward situations like this, but the proposal doesn't necessarily state this as a motivating cause. I'd much rather see language level solutions where separate modules can define Codable for Range (module specific conformances?) so that they can use common currency types and don't have to create inconvenient situations where they need to wrap stdlib types. Holding off on proposals like this in favor of a language feature down the road allow developers to conform Range to Codable, but accepting this proposal takes that possible future ability away because of the stdlib already declaring this conformance (I'm not sure how module A conformance would interact with module Swift's conformance). This level of friction with the language that the proposal is somewhat encouraging and insufficient motivation is why I'm not in favor of this proposal.

anandabits · December 15, 2018, 1:54am

You should definitely not store the serialization wrapper in your model type! Unfortunately, if you don’t do that then you don’t benefit from Codable synthesis. That’s a bummer, but my experience has been that a metaprogramming solution like Sourcery is often the right choice when it comes to synthesizing serialization code for many other reasons anyway.

There are a broad range of impedance mismatches we have to deal with if we want idiomatic Swift models to interoperate with the kind of JSON that exists in real world contexts. Codable provides a workable target to build on but higher-level solutions are really warranted (things like policy / strategy annotations for types and properties / associated values).

Perhaps this is another reason we should look at providing a better out-of-the-box solution for working with data formats intended for communication on public networks. If we focus on this specific domain instead of a trying to provide an abstraction supporting a wide variety of encoding back-ends we will end up with a much stronger solution for what is certainly one of the most important serialization domains. That would allow Codable to focus on domains where the serialization format can more reasonably be considered an implementation detail (and therefore something where optimizing for size makes more sense).

Karl · December 15, 2018, 2:07am

Maybe you consider it inconvenient; I consider it a core part of the problem you are trying to solve. If you don't have a specific way to encode/decode your data, the compiler will synthesise an implementation for you. In your case, if your Range has specific keys defined by your API response, the most direct way to solve it is to write your own Encodable/Decodable conformances to fit your API (in my experience, there isn't much standardisation around keys for range names - you might see startDate/endDate, startTime/endTime, minValue/maxValue, etc.). We can't match everybody's APIs and everybody's serialisation formats.

As @Jon_Shier said, I find the best approach is to have a "raw" data type, close to the metal. In real life, you almost always encounter weird values/encodings which require a bit of fudging to get right, so this lets you encapsulate that logic nicely. If your API has a common pattern, like a specific encoding for ranges, make a MyRange wrapper type and add your own CodingKeys - the compiler will conveniently fill in the rest for you.

But now you have a problem - your application logic is using "raw" types, and you have to write ugly things like dateRange.underlyingRange to get the deserialised values. So you wrap that raw type in another type which provides a convenient, single-level interface (.dateRange) to be used in the application domain. You won't pay any performance cost for that - the compiler will strip it all away.

There is a problem - see the part about SwiftPM. Since you don't own the type or the protocol, it's not safe for anybody else to add this conformance. Your applications shouldn't do it, and the libraries you use shouldn't do it. The only safe way to do it is... you guessed it! Wrapper types! Even if you don't care about the specific serialisation format and would be happy with some kind of stable default strategy.

dlbuckley · December 17, 2018, 11:42am

There are obviously some strong viewpoints here and I do think a number of issues have been highlighted with Codable as a whole that reside outside of this proposal.

The first point I want to address is motivation; personally it was as simple as needing to encode a few range types to JSON and then send them over the network to and from a client. I was using the ranges to describe workout targets e.g complete this leg of the workout between X and Y seconds. I was a bit confused as to why the Range types had been missed from getting some Codable love and thought it might be a small and easy way to contribute to Swift as I was sure that others would also need this at some point in the future. As it turned out there are many people implementing their own version all over the place, most notable in SPM. This would now turn into a unification of all those custom implementations so everyone could predictably use the range types across projects.

Secondly the use of keyed vs unkeyed; I was, and still am on the side of using a keyed container over the unkeyed container for JSON. JSON is supposed to be descriptive and understandable by a human as well as a machine. By omitting the keys the ranges become smaller, but confusing to understand and simply become values with no human reference point to decipher reliably. If you also look back at the original proposal for Codable it states:

Unkeyed encoding is fragile and generally not appropriate for archival without specific format guarantees, so keyed encoding remains the recommended approach.

That statement is the main reason I originally proposed using keyed vs unkeyed, it holds true then and it holds true now.

But I think @anandabits is correct in highlighting the problem that we are disagreeing on here is that Codable is irrelevant of JSON, Plist, XML etc. and expects one way to encode/decode for all serialisation methods. The original motivation of Codable was unifying a method of archival and not necessarily of creating pleasant API payloads, it just appears to have evolved into that over the course of its introduction. Due to that I think there is merit on both sides of the argument, but the problem is how do we solve it for both parties with different motivations?

I don't think adding more strategies to the Encoder/Decoder types is a good idea due to the reasons previously discussed. I'm wondering (thinking out loud) if there could be a way to do it by using protocol conditional conformance, but that would mean a fair bit of rework elsewhere to make that possible, and thinking about it I'm not even sure if its possible. I don't even know if we want to get into discussions about that here as its probably a topic for another thread, but its worth noting.

anandabits · December 17, 2018, 2:40pm

I don’t think conditional conformance would solve the fundamental issue, although I am curious to hear more about what you have in mind. My instinct is that @Ben_Cohen is right and that it would really require a new protocol. It sounds to me like the semantics of Codable are fundamentally incompatible with encouraging types to have a good JSON representation by default. Perhaps this could build on top of Codable in some way. If there is enough interest, maybe we should start a thread to discuss options.

mdiep · December 17, 2018, 7:21pm

-1

I don't think this makes sense as an addition to Codable. This assumes a structure for ranges, which I don't think fits in with the suggested uses of Codable.

There are two ways to use Codable:

A roundtrippable format where the specific format don't matter
An interchange format where something else is using it and the format must work with other tooling

Codable has been widely suggested—by Apple and the Swift team—as an answer to (2), specifically for JSON encoding/decoding. (Not to say that (1) has been discouraged; just that (2) has been explicitly encouraged.)

The problems is that not all formats have a clear way to encode ranges. JSON is the primary example of that. There's no standard JSON range format. That means that this additional conformance doesn't have a clear encoding—which is why it's not specified in the proposal. Because Codable has aimed to address all formats, I think it should be limited to the current set of "primitive" types.

I believe that making Range conform to Codable will cause problems. IME people will assume this will work because it typechecks—even though most other languages and tools will not encode ranges in a similar fashion. Because of this, I don't believe it's a good idea.

However, this is some prior art for this in the design of Codable: the encoding of Dictionarys. If the key isn't a String or Int, then Swift will encode is an array of arrays. So I wouldn't say that this doesn't fit. But I think it leans into a bad design.

I do think that this is an example of how the design of Codable doesn't work very well in practice. Ideally there'd be a way to extend this without making these assumptions. In particular, Codable assumes that a type is encoded uniformly across all formats; the reality is that types usually want a per-format encoding for interchange.

If Codable was designed in a more extensible way—where types or formats could opt into these extensions—then I think it'd make sense. (e.g. with a final tagless approach. But then some people would probably want JSON to support ranges, so we'd be back in the same boat.)

I'd probably be -1 on any addition to Codable that didn't address these limitations by making Codable types more flexible generally.

I read through the proposal, pitch thread, and review thread. I've also used Codable quite a bit.

Tony_Parker · December 19, 2018, 12:24am

I want to clarify one thing about the design of Codable and specific formats. We designed two escape hatches to handle the scenario where the data in the archive doesn't match what a type expects to decode.

Specific coders can use the "strategy" concept to allow specialization for a particular format. JSONEncoder does this for URL, for example, because the overwhelming majority of JSON uses one string to represent a URL.
If that is insufficient, use a simple wrapper struct type when the format of their encoded data dramatically differs from the format expected by a library type. For example, let's say I had this odd format for URLs in my JSON and I wanted to decode it:

let json = """
[{ "scheme": "http://", "theRest": "www.swift.org" }]
""".data(using: .utf8)!

let decoder = JSONDecoder()

Here is how I could handle that:

struct URLWrapper: Decodable {
    let scheme: String
    let theRest: String
}

extension URL {
    init?(_ wrapper: URLWrapper) {
        self.init(string: wrapper.scheme + wrapper.theRest)
    }
}

let result = try decoder.decode([URLWrapper].self, from: json)
print(result.map(URL.init))

Bonus: you get to use all of the automatically generated synthesis for your wrapper type.

Personally, I think this is a reasonable compromise between the opposite design goals of preserving encapsulation and allowing complete customization.

It looks like many have come to appreciate this pattern as well (here in this thread and elsewhere), so it feels like the right solution for me for Range too.

mdiep · December 19, 2018, 1:05am

Personally, I've found this to be clumsy in practice. Codable is built on the idea that types have encodings. But the strategies move some of the encoding information out of the type. There's no longer a central source of truth, so:

You must be careful to duplicate that information across all uses of that type. These are easy to forget. And even if you have some tests around this code, it can be difficult to exhaustively test.
All uses of Date, e.g., must use the same encoding within a JSON document (You can use a custom decoding strategy that tries a series of decoders, but there's not a great encoding analog). And while it's logical to think that a JSON document would be consistent in its encoding, I've found real-world JSON APIs to be illogical and inconsistent.

Absolutely, and I do that quite a bit. What I dislike about this proposal is that it adds a default wrapper for ranges—a default that's undocumented and unjustified. Your URL example is quite a bit different IMO because there is a very sensible default for encoding URLs.

I think it would be better to leave Range as-is, without default encoding, because IMO it's better not to add a default where a sensible one doesn't exist. Then people won't mistakenly expect the default encoding and will think about what encoding makes the most sense for them.

I've found this to be very little solace. My URLWrapper type, which has 2 fields, can use the synthesized encode/decode. But every type that uses it—which often has many more than 2 fields—is now unable to implement the synthesized encode/decode. (Or you pay the even higher price of using URLWrapper everywhere.)

It would make more sense if Swift had generalized annotations where you could specify a wrapping decoder type inline—while maintaining both synthesized encodes/decodes. But it's much harder to propose and implement general solutions.

anandabits · December 19, 2018, 1:49am

I concur with everything @mdiep says above.

If we want Swift to be a convenient language for working with the kind of JSON that actually exists in the wild we have a long way to go. Encoder / Decoder level strategies are simply not effective in dealing with the JSON produced by many APIs which often require property / value level strategies. It also means that the JSON encoding specified by a type is not stable if it directly encodes values where the strategy may he overriden. I have never encountered a use case where this is actually necessary so it seems like an unfortunate point of fragility in the encoded representation of such types.

As has been pointed out a few times already, wrappers get the job done but also require the user to forgive synthesis. This effectively means that in many contexts the only sensible thing to do is use an extra-linguistic code generator. This is unfortunate for everyone, but especially for beginners trying to learn by working with a public API if interest.

Further, I want to point out that @Tony_Parker’s comments are too exclusively focused on consuming JSON and ignore altogether the fact that many of the criticisms in this thread are focused on the JSON that is produced by Swift, especially the JSON that is produced by default by aggregate standard library and Foundation types. The simple fact is that the design philosophy espoused by the team for Codable is fundamentally incompatible with producing good JSON out of the box.

Stepping back, it is fair to ask whether producing good JSON out of the box should be a goal or not (by the conventional expectations of the web community, not an idiosyncratic Swift community standard). I am arguing that it should be a goal, especially if we want Swift to be a good language for writing web APIs. To repeat what I have said before, we should not need to use wrappers or annotations of any kind at all just to get a reasonable JSON representation of standard library and Foundation types (if we have a goal of being a good language for working with JSON).

As I have also said before, this is especially the case for less experienced or thoughtful developers who will wrote APIs using the simplest code possible without giving a thought to the JSON representation. We should take responsibility at the language and library level to ensure that the JSON produced by our types is reasonable and composed in a reasonable way. This will benefit everyone across the web who consumes APIs written in Swift.

JSON is inherently a rather messy domain. We can either embrace that and provide solutions that meet real world needs or we can punt and leave users to fend for themselves. The current approach of Codable tries to take a middle path while also generalizing to support other serialization formats. I think the feedback in this thread is that this approach isn’t sufficient to make Swift a good language for working with JSON. It also makes it hard to evaluate a proposal like this which would be a no-brainer +1 if Codable was strictly focused on compact, often proprietary, binary formats.

jawbroken · December 19, 2018, 1:54am

I'm struggling to understand this point. What does it mean to “mistakenly expect the default encoding”? They get an encoding for a variety of formats that will successfully round trip through the decoder, which is the goal of the Codable protocol as I understand it. A hypothetical JSONCodable or HumanReadableCodable protocol would/should have a very different design.

Tony_Parker · December 19, 2018, 2:03am

The reason I requested the unkeyed representation for Range in the first place is because I believe the standard library should be producing concise JSON output for Range that does not impose an unavoidable overhead of encoding the same string key N times, which is important in the cases where you encode many Range values at a time (like a big array).

In any case, I support this proposal because I think the best way to improve the interop of JSON and Swift is to continue to make focused, iterative refinements. That will move us forward. We can have conversations about language improvements which can help Codable in separate proposals. There are a lot of great ideas out there, but we don't need to sink this one because we don't have the perfect solution for all other scenarios.

Jon_Shier · December 19, 2018, 2:06am

This seems odd to mention as a Codable escape hatch, as it really has nothing to do with Codable at all. It's just that JSONEncoder/Decoder had to adopt the pattern (or something like it) in order to work with JSON at all.

anandabits · December 19, 2018, 2:11am

I think we have a fundamental disagreement here - if compactness is the priority then JSON is probably not right choice in the first place. JSON by its very nature often repeats the same string key N times. This is usually not considered a liability but in fact is usually considered an advantage of the JSON format as it makes the format a lot easier for humans to work with.

My intent is not to sink this proposal or to be opposed to focused, iterative refinements. In fact, I have intentionally not even offered a clear +1 or -1 on the proposal itself. My posts are intended to call attention to what I think are important issues, the most fundamental of which is that I think if standard library and Foundation types that don't have a trivial or obvious JSON representation are going to have a JSON representation at all that representation should be considered public and should go through the evolution process.

itaiferber · December 19, 2018, 3:54am

I've been meaning to respond to this thread for a while now and haven't quite had the time to summarize my thoughts — I'll be responding to a few points here (@anandabits, I'm not trying to pick on your responses, but you do a great job of hitting all of the points that I'm trying to address from throughout the thread).

I'd like to address the topic of JSON here up front, because it seems to me that this is the crux of your argument (if I am mistaken, please correct me!). In doing this, I'd like to separate the layers of the Codable feature as we designed it, to indicate why I think that focusing on any specific target format is explicitly a non-goal of the Codable feature (at the compiler and stdlib layers) itself, and is instead entirely within the realm of the Encoders and Decoders that implement it.

To summarize the goals we had in mind here:

There is inherently a push and pull between the need for a given type T to produce a given encoding E1, and a consumer of T to receive a given encoding E2. In the lucky cases, E1 and E2 align perfectly, but as this whole thread indicates, E1 and E2 are often different
Thus, someone has to be able to control the output encoding on both the producer and consumer layer, be it the type T itself, the Encoder responsible, or a third party in charge of coordinating the whole process
As such, we designed the feature to offer the following layers of matching E1 to E2:
1. Type T offers its preferred, and default representation in the form of an implementation of init(from:) and encode(to:). This acts as a suitable default that is not intended to target any individual format in existence, present or future. I stress this point because this is by design
2. One layer above, the Encoder may choose to ignore this default representation in favor of one that it decides is more suitable for the specific format the Encoder targets. Format-specific customizations apply here, and they may be opt-in (e.g. encoding strategies, which are not formalized or mandated by the Codable protocols, as noted by @Jon_Shier)
3. The final arbiter is the entity performing the encoding, be it an app or a framework or some other frontend communicating with the service in question. This layer may choose to override the final representation on a case-by-case basis, by either using wrapper types, or strategies if offered, or extension shenanigans to override the implementation of a given type within the current module (which is possible)

[There are other layers which may be injected here — for instance, a type U which contains a T may choose to forgo encoding the T altogether and encode something manually instead; in this instance, it can win out over other strategies as well.]

To be very specific here: the goal is to allow layer 1 to offer a format-agnostic default representation of the type; layer 2 may offer a format-specific refinement of the type representation (whether overridden with a strategy or silently); layer 3 can offer endpoint-specific refinements suitable for a specific API or endpoint.

So within the context of this design, I would say that I personally do not believe that Swift the language, or Swift the standard library should care about producing JSON out of the box, good or bad. The standard library implementations currently deal exclusively with layer 1 of the hierarchy: providing defaults which are format-agnostic, and provide an efficient and concise representation that can work with many varying formats. (See also: XML was really popular 10–15 years ago; JSON is really popular now; format Z is likely to be popular in 5–10 years as well; we should be in the business of looking to the far future as well as to the present.) If we decide to move format-specific implementations into the stdlib, that will be a time to decide on a policy here.

Instead, if we want to focus on JSON, we should limit discussion exclusively to whether a tool like JSONEncoder can and should override the default implementation of Range (which we can do!). I don't think we should be throwing the baby out with the bathwater because of JSON — we can have the best of both worlds, by design: the standard implementation for Range can do something conservative and efficient, and JSONEncoder can do something else, with or without a customizable strategy.

I think this goes to invalidate some of the arguments that you're making: property- and value-level choices can only be made by overriding init(from:) and encode(to:), and in this case, it doesn't matter what implementation exists for T as it's being overridden.

I harp on this because we I see strategies being contested often: strategies are meant to help offer consistent formatting for specific types which may be necessary due to limitations in, say, the parser sitting at the API endpoint. If your API (at the design layer, ignoring the actual server implementation) requires a specific property-by-property format, none of this discussion really applies: you're going to need to offer that override, either with wrapper types sprinkled through (for less granular changes), or by implementing init(from:) and encode(to:) manually (for super granular decisions).

On this note, I'd like to say that we've put some thought into making this easier, and I think it can be. Although we haven't had the bandwidth to implement some of what we've discussed in threads here on the forums about adaptors/annotations which would allow you to customize properties without giving up synthesis, it's something I'd like to see.

That being said, correctness here should, in my mind, trump convenience: convenience can be improved incrementally, but in the arena of serialization formats, we've usually got only one shot at getting things right. Sometimes, wrapper types are simply the way to go by design. As shown in this thread or other discussions, there are varying opinions on what the implementation here should look like, and no matter what we decide, someone will have to adapt the type to their specific endpoint. This is what layer 3 is for.

To leave a final note on my conclusions here: either you care about the particular representation of type T (you need to override the default representation for an endpoint-specific representation, in which case the default representation doesn't matter), or you don't (in which case the default representation doesn't matter). So this is a qualitative matter of "how do we decide on a format-specific default", and "how do we make the default most applicable so we can get folks to fall into the latter case most often".

In my mind, the stdlib should not care about format-specific defaults, and instead, we should pull this JSON discussion into a separate thread and review what we think should be done at the JSON level. This should be done with some amount of internal review and discussion as well. JSONEncoder can offer a different representation, whether overridable via a strategy (e.g. Date) or not (e.g. URL). [And if a strategy is not sufficient for your use-case, none of this matters since you're in the business of overriding the implementation somewhere, no matter what.]

Jon_Hull · December 19, 2018, 4:38am

A conditional +1 if we also add a strategy to JSONEncoder/Decoder with something like:

enum RangeStrategy {
    case array
    case dictionary(start: String, end: String)
}

It would be nice to offer a case for start/length as well, but I am not sure how that would work in the general case which might not support the necessary transforms...

benrimmington · December 19, 2018, 10:41am

If standard library types provide string keys, should they also provide integer keys?

For example:

extension ClosedRange: Codable where Bound: Codable {
  private enum CodingKeys: Int, CodingKey {
    case from = 0x66726f6d
    case thru = 0x74687275
  }
}

JSONEncoder would use "from" and "thru" string keys.
Other encoders could use 'from' and 'thru' integer keys.

jayton · December 19, 2018, 3:25pm

Leaving aside the question of what the ideal representation should be…

…this does not make any sense to me. I don’t see how it can be logically coherent to consider the serialization of a type to be an implementation detail, and also provide encoders for interchange formats. Regardless of what the representation is, it should be specified – for the benefit of JSONEncoder but also for any other encoders for interchange formats that may exist out there.

If it is truly the belief of the core team that the standard library can and should add new opaque, unspecified representations, JSONEncoder should be deprecated and Codable should be discouraged except in language- or framework-specific formats like NSKeyedArchiver.

(I also believe that NSKeyedArchiver is a horrible trap that shouldn’t be exposed as API, and doesn’t motivate a special language feature like Codable synthesis, but that’s a tangent.)