Codable Improvements and Refinements

I’m a huge fan of SE-0030 for this kind of language extension.

But I think SE-0030 needs to be extended:

  • Allow arguments to behaviors (shouldn’t be a problem, init is already in SE-0030):

    var [CodableKey("last_seen")] lastSeen: Date
    
  • Allow having more than one behavior (needs some evaluation, might result in some kind of multi-inheritance):

    var [CodableKey("last_seen"),
         CodableDateFormat(.iso8601),
         observable] lastSeen: Date
    
  • Some kind of runtime reflection will be useful:

    var format = .deferredToDate
    if let [CodableDateFormat] v = x.lastSeen {
       format = v.format
    }
    

If SE-0030 (and above extensions) were already in place, the improvements could be solved as Library-only change without being limited to Codable improvements.

2 Likes

If I could change any one thing about Codable, I'd generalize the compiler magic that makes it possible.

Focusing on Decodable for a minute, if you could take a type, get a list of KeyPaths, get the name of each, and construct an instance of the type by providing values for each key path, then:

  1. Deodable could probably be defined in the stdlib instead of the compiler. The rest of what it does—decoding values from formats—is already in the stdlib.

  2. The community could experiment with wrappers inside init(from: Decoder) that help with the pain points experienced here. It'd be possible to add a way to get what's currently the default synthesized init and override the defaults of some properties. I think this experimentation could lead to better ideas for the stdlib.

  3. People could explore other solutions for problems that Codable isn't a great fit for. Maybe decoding JSON APIs would be better served by an API that was less focused on roundtripping values, e.g.

Encodable could have a similar treatment.


One small change I think would help would be to add return type inference to the decode methods:

func decode<T>(
  _ type: T.Type = T.self, // Add this default
  forKey key: KeyedDecodingContainer<K>.Key
) throws -> T where T : Decodable

I understand the limitations and downsides of return type inference generally, but 99% of the time when I write a decodable implementation, I'm either assigning to a property or passing a value to an init. A default would remove a lot of tedium.

3 Likes

+1, but please start a new thread so others who are interested in the topic have a chance to notice it, thanks!

Just to respond to a few things upthread:

  1. You're right, there would have to be a step three, actually applying these by defining attributes to customize codable, equatable, hashable, etc...

  2. It isn't clear to me what the right design and scope for user defined attributes are. I'm not a strong advocate for unconditionally reflecting attributes into the runtime reflection metadata, but that is done in other systems and seems to work for them. Even if that were the model, we wouldn't be blocked by providing an actual new runtime reflection API (though we really need one :-)

  3. The default implementation could either be dynamic or static, and if it is static, we have the choice of baking it into the compiler (like today, but listening to new attributes) or making it a macro system that "statically reflects" over the attributes and properties of a type (e.g. have a #for loop that can iterate over lists of declaration properties, which stamps out code). I'm a little nervous about an actual dynamic reflection based system, out of concern that it would produce brutally slow code for important things like equatable/hashable but perhaps that's premature optimization. It would definitely be easier to design this than a static reflection based system.

  4. each of these points have lots of interesting tradeoffs, and it would be great to pull in other folks who are interested in these sorts of topics by starting threads dedicated to them, x-referencing this thread for context.

1 Like

We could start with a dynamic solution (ease of use for developers), and wait until we have the macro system to use it statically and only then rewrite Hashable, Equatable, and Codable using them.

It's unfortunate how quickly we jumped into the dream syntax we'd like to see in the far-off space future, because I'm really interested in making refinements to what we already have. :confused:

Two notes around the desire for something annotation-heavy or reflection-heavy:

  1. Regardless of whether it brings you joy or not, the type-safe approach of Codable today is highly desirable for making incorrect programs clearly look incorrect and usually not compile. I would find moving to something unnecessarily based in dynamism and typo-able annotations to be a severe regression. Even if the hypothetical macros were to lean heavily on constexpr, it's still be switching away from something that is verified at compile-time to something that isn't.
  2. The discoverability and maintenance stories for "I just want to customize one behavior!" is really under-defined (nb: my one is always different from your one). Having "synthesize everything" vs. "implement everything yourself" as the two options is extremely clear and easy to teach. The current behavior inarguably fulfills the goal of having the simplest case be the easiest and the most complex cases be possible. I have no idea how someone would get familiar with annotation customization points other than just dumping a big list of syntax on them and hoping they figure it out.

Separately, I don't understand the overriding need to kill CodingKeys. The benefit of some yet-unscoped syntax involving magic and key-paths is unclear, except for it involving a construct that is newer to the language. The language already has a feature for defining enumerated lists of things. Similarly, we don't need bespoke syntax for defining an error domain just to write 20 characters less. Moreover, the keys pulled out of an archive are often orthogonal to the properties of a type.

8 Likes

IMO, I think it's important to discuss what a "perfect" or "natural" solution to this would be. Most of the alternatives I've seen feel like a bolt on, and still filled with annoying boilerplate.

The end goal of this compiler-heavy solution is to move as much of the special built-in synthesis out of the compiler and into the standard library. I would much rather spend time building a new system that simplifies the existing solution, rather than keep bolting onto the existing one.

These would still be done at compile time, afaiu. The annotations would be available inside of the macro, so that it could do manipulation on them, and generate the default implementation. You're not losing any type-safety with this solution.

We're not going to change the fact that users should be consulting the documentation for Codable. A sufficiently comprehensive guide to Codable should provide all the needed documentation to use an annotation based solution.

5 Likes

Yes, that approach would make sense.

-Chris

So, I think we all want user-defined attributes. There's lots of precedent in other languages and they can be used for some really cool things. On the other hand, if we really want arbitrary, user-defined attributes, we'd need some kind of reflection API to access them.

Personally, I'm not sure that coding-keys belong as part of the model type, particularly when you have some kind of external API and multiple serialisation formats to contend with. I think this is really an Encoder/Decoder-level detail.

I think the best thing would be to create some kind of Schema type which represents how to encode/decode a particular type in a particular format. For example, if you're encoding in to some format which has a native representation for Range, you could choose that schema in place of the JSON-compatible two-unlabelled-values approach.

That's the "Enhanced Contextual Overriding of Encoded Representations" option, basically.

  1. Can only speak for the annotation-heavy approach here, none of the suggestions I've seen would sacrifice type safety. I'm assuming the implementation would require importing a static annotation type similar to how other languages handle it. See GSON's @SerializedName annotation.

  2. as the two options is extremely clear and easy to teach

    Easy according to you, but that has not been my experience browsing StackOverflow and /r/Swift compared to other approaches. Part of the problem is the locality of the init(from:) and CodingKey abstractions. Users are working at the property-level and suddenly have to shift to implement an initializer or entirely new CodingKey type. A lot of new developers have a hard time making the mental connection.

    From a mile-high view, annotations can be thought of like the decorator pattern, which IMO is a much cleaner and easier abstraction to understand. It's also closely coupled to the actual property that you're working with which reduces the mental load.

Similarly, we don't need bespoke syntax for defining an error domain just to write 20 characters less.

I think that's oversimplifying things a bit. And even so, I'd say it's in the best interest of every language/library to monitor/eliminate boiler-plate code for the sake of self-preservation.

1 Like

While all the changes mentioned here would be great, I want to throw out something that hasn't been brought up yet.

I'd like for enums to get free Codable synthesis when they're not RawRepresentable. Right now, if you have an enum that isn't RawRepresentable (usually achieved with a String or Int), you have to write the full encoding and decoding functions yourself, which can be very painful. This is even worse when you have associated values on your enum. If you associated values, you can't get the Codable method synthesis at all.

Not allowing non-RawRepresentable enums to get Codable synthesis is one of those technically-correct-yet-practically-frustrating things in Swift that I'd like to see go away. I think we could choose some decent defaults that, even if they wouldn't work great for JSON decoding from web services, would be excellent for simple stuff like writing and object to disk and reading it back out again.

3 Likes

I wonder... could this be done in the form of compile-time evaluation? I haven’t given it enough thought but it strikes me as an interesting avenue to explore. If we passed the AST in and out it gives us “macros” but without token pasting.

1 Like

There has been some talk about that (and why it's kind of difficult) here:

Imo this is one of the limitations for which a solution would really benefit from custom attributes to allow the user to express how exactly their enum cases and associated values should be en/decoded.

1 Like

In general this is true. My team uses a Sourcery template which supports a large number of sum type encodings. This is unfortunately necessary as long as the industry uses serialization formats that do not specify a canonical way of encoding sum types and often cloud languages that also do not support sum types. API developers find many different ways of expressing data that is really best modeled as a sum type.

While that is the case, @soroush specifically says:

I think we could choose some decent defaults that, even if they wouldn't work great for JSON decoding from web services, would be excellent for simple stuff like writing and object to disk and reading it back out again.

The experience my team has gained with a plethora of sum type encodings has lead us to prefer one encoding over the rest. It's hard to say whether the community could agree on a canonical / default, but if we can maybe it's ok to support synthesizing Codable for enums with associated values in the canonical format. Annotations could be added to support more formats in the future.

2 Likes

Even if we could get the Swift community to agree, we'd have to convince all of our api engineers, which would be even more impossible, haha.

I also want to mention Swift on the server projects, where using Codable on both sides means that the exact interchange format doesn't even really matter, as long as we do pick a default.

2 Likes

I wanted to chime in with an option that I'm not sure has been mentioned, maybe rightfully so. I think the compiler synthesizing the encoder and decoder currently is fantastic, but there are times when you need to deviate only one or two keys/properties that operate in a funky way, and that's when you have to add a lot of boilerplate. I'd like to propose an alternative option that I hope is viable, though I haven't dove in enough to understand how it would be accomplished.


Creating a closure-based API that allows me to customize some values. I would like the concept from KeyDecodingStrategy and apply it to the actual type, assuming the default implementation if no overwrite is specified.

2 Likes

This.

There have been so many times when I have a struct where all the properties can be synthesized except one (something like a tuple or enum without a RawValue). It's super annoying to have to write the entire Codable conformance for everything just because of a single variable.

2 Likes

This thread is about soliciting and comparing ideas to address precisely this issue. How do you feel about the various solutions presented?

@xwu Thank you for keeping this thread focused and I apologize for my unproductive reply in the thread :slight_smile:


My personal views reflect those of several others stated upthread that a compiler-centric solution would be optimal over the few library-centric solutions proposed. I'm a lazy person and I'd rather not have to do more work to get my types conforming to Codable. I never really liked having to build an entire CodingKeys enum to get Codable conformance so I agree that using annotations would be a good solution that would feel like less boilerplate, plus with the pitch for user defined attributes I feel that this is tipping towards attribute annotations anyways.

Some have posited that the CodingKeys enum should be replaced and I am inclined to agree, though I also see the benefits that the enum has over simple annotations. Although I do feel that for most use cases the attribute annotations would be adequate, perhaps it would be best to allow one or the other instead of wholly removing the enum with something else. Then the developer can pick the right tool for the job and not all devs are punished with a high amount of boilerplate for simple customizations.

I think the CodingKeys enum is useful (I use it in test JSON for example). It definitely isn't actively harmful, so I think it's here to stay. But I agree that it is highly desirable to have an annotation-based solution used by the compiler's synthesized CodingKeys enum.