Codable Improvements and Refinements

Hello, everyone! As part of the review thread for SE-0239, we received a lot of helpful feedback that's highlighted areas where we thing Codable can improve, and we wanted to take the time after the holidays to split that conversation aside and help carry it along in a more targeted thread. We think there are a lot of potential improvements to be made, and we're really interested in getting community feedback and contributions to help us get closer to where we've always wanted Codable to be.

As part of this, I wanted to gather some of my own thoughts about various improvements that can be made, to help seed some discussion about priorities, and what we'd like to see happen in the next few releases:

  • [Compiler- & Library-Level] Per-Property Refinement of Encoding/Decoding without Abandoning Synthesis: This is probably one of the larger topics to discuss, and potentially one of the more impactful ones. One of the more helpful aspects of the Codable feature is the compiler synthesis of init(from:) and encode(to:), and likewise, one of the more painful aspects is the loss of synthesis due to the need to customize behavior for just one or two properties, out of, say, 19. I've seen developers use Sourcery or similar to produce the equivalent of what the compiler would synthesize, and then use that as a starting point for customization; of course, this is sub-optimal, and it would be nice to be able to address this.

    I've discussed this briefly in other threads (largely under the description of Codable "adaptors"), but to re-cap some potential directions to take this in:

    1. Compiler-heavy solution: some form of property-level annotations that indicate how to synthesize init(from:) and encode(to:) by adjusting how the property is assigned to during encoding/decoding. This could take the form of something similar to SE-0030 Property Behaviors (which was discussed but deferred), or actual user-level annotations (e.g. @codableAdaptor(...) var myProperty: Foo). This subject has been discussed at length, but I'd like to note that it's unlikely we'd want to implement a feature like this just for Codable, so this direction would likely require revisiting this topic and re-generalizing it
    2. Library-heavy solution: some way to describe, via, say, a static func, what adaptor to use for a given Codable property. Adaptors would likely just be functions, and you can imagine this conceptually as just being a mapping from property -> adaptor. Difficulties here lie in representing this in the type system without higher-kinded-types, or more generalized generic functions: because the properties on a type T can have all different types themselves (Int and String and Date and...), there is no way to generalize over all of the adaptors you'd want to use on T without losing type information. This means storage in an actual [KeyPath : Adaptor] is likely out, so some creativity in how to perform this mapping cleanly might be helpful. It's likely that we'll need additional compiler support to get this to work
    3. Tooling-heavy solition: it's also possible (though perhaps less desirable) to punt this in favor of having the language provide easier support for simply doing what developers are doing now: allow the compiler to export the synthesized implementations themselves so Xcode can splat that into your file (rather than going through, say, Sourcery). [Internal: I filed 30472233 a while back for this]
    4. Something else? There are plenty of directions this can go in, so creative feedback is appreciated
  • [Library-Level] Enhanced Contextual Overriding of Encoded Representations: Some of this already exists in the form of encoding and decoding strategies, which exist on the Encoder/Decoder level. It's possible to expand these strategies to cover more types, or to expand strategies to apply replacements for all types on an archive-wide level. There's also room here for potential language enhancements to somehow inject scoped replacements for various types without needing Encoder/Decoder involvement, but that hasn't been sketched out or explored much.

    There's room here for exploration of what we consider reasonable and principled, while making pragmatic decisions about how to encode values. (Originally, we decided against exposing strategies that allowed replacement of all types for the benefit of encapsulation, but perhaps that decision should be revisited.)

  • [Library-Level] Encoder/Decoder Exposition of Capabilities: This can cover a lot of little improvements, but the gist here is to find a way to expose format-level information on Encoders and Decoders that can help provide even more context for how types can encode.

    We already provide userInfo and codingPath to help pass along information to a given type (userInfo is helpful in passing along context that the Encoder/Decoder doesn't know about; codingPath is helpful for telling where in a payload you currently are); we can imagine an additional .capabilities property or similar which exposes information about the Encoder/Decoder itself — what format it is encoding to, whether it supports certain things like reference semantics (which is a concept we'd originally included in the Codable library support, but had to drop for time), and whether it prefers certain representations (e.g. a minimal representation over a human-readable one, or vice versa).

    There's room here for a grab bag of improvements here that we might be able to gather in one spot, and I'm happy to elaborate on thoughts I've had here.

  • Others: of course, there's a lot more that can be proposed and done, but I'd like to offer these as starter ideas for where we can make improvements. If some of the bigger ideas solidify, we can also take discssion of those into separate threads as well to target specific concepts.

We'd love to get further feedback on limitations and pain points that you would find helpful in addressing, and more so, would find thoughts about how to address those points extremely helpful in providing targeted solutions. Thanks!

20 Likes

While I'd love to bring up property behaviors again now that Swift 5 is out of the way, I'm not sure they offer the exact kind of solution we're looking for as that proposal is written. So we'd need someone to pick that back up and incorporate feedback from the discussion around this feature.

If we could come up with a generalization of annotations, I feel like that would offer us the best user experience. But the downside I see is coming up with a design and implementation would take far longer and potentially have a higher probability of being rejected. But again, I think this would be the best solution vs a library one.

I would say this should be abandoned in favor of one of the two others compiler heavy based solutions. I don't really understand the benefit of synthesis over things like property behaviors and custom annotations.

1 Like

One thing I’d like to add, is to allow for the post-decoding customization, often time it’s enough to just fetch some delegate from userInfo, but that’d require me to write the whole decoder myself. A customization point would be nice.

I think we should avoid cluttering up variable declarations with attributes as much as possible. They make code harder to read and reason about, and would scatter the information relevant to Codable throughout the type declaration.

(One exception is a @transient attribute, to exclude a property from automatic synthesis for Equatable, Hashable, and Codable alike.)

For more fine-grained control over encoding and decoding, I would greatly prefer to encapsulate the annotations within the type’s CodingKeys, and let the compiler synthesize the rest. Here is a brief sketch of what that might look like:

First, make it possible to declare CodingKeys as conformng to a marker protocol (eg. Synthesized or Automatic or suchlike), so it can be present without removing compiler synthesis.

Next, introduce annotations for customizing the implementation of encoding / decoding. I don’t know exactly what this would look like, but here’s a conceptual idea:

struct Foo {
  var bar: Int
  var baz: Double
  @transient var cache: String?
  
  enum CodingKeys: Synthesized {
    case bar(VerboseKey("monkey"))
  }
}

From a user perspective, this should act like there is a nested enum, eg. Foo.CodingKeys.RealCodingKeys. Or maybe instead of declaring CodingKeys, one instead declares CodingKeyCustomization and the compiler synthesizes CodingKeys. In any case, the compiler synthesizes the coding keys that actually get used.

The example uses VerboseKey, which is my first attempt at allowing different encoders (eg. JSON) to choose keyed vs. unkeyed, rather than having Foo make a single decision for all encodings. So perhaps the compiler synthesizes two enums, one for verbose keys and one not.

JSON could prefer verbose keys when they are available, and binary formats could prefer non-verbose. This might be the wrong approach here, I don’t know. But the idea is to let the author of Foo provide information about its coding keys that encoders can utilize, while still benefitting from compiler synthesis.

The specifics about what can be customized and how, would need to be nailed down. However, I think it is best to keep all such customization grouped together in one place, rather than scattered throughout the type itself.

5 Likes

Hopefully this isn't too much of a tangent, but one small change I'd really like to see is for the container creation methods on Encoder to be throwing. For example, Encoder.singleValueContainer().

I work with a lot of encoders that don't support the full range of encoding containers and, in order to throw a Swift error, I must create and return a dummy container that has all throwing methods. Either this, or just fatal error. Both of these solutions are a pain.

8 Likes

Would any of these refinements be able to improve the ergonomics of conforming enums?

I'm a fan of this approach since it would generally improve code reuse and abstraction in the standard library and user-land code in many ways.

1 Like

The customization currently supported for synthesis is in the CodingKeys enum, which lies between these two options: it's a compiler feature, but rather than using new syntax, you follow certain conventions with existing syntax. I think we could do something similar here.

For instance, suppose that if you had a type like this:

struct Selection {
  var text: String
  var indices: Range<Int>
}

You could extend its CodingKeys enum with static methods matching a compiler-known convention:

extension Selection.CodingKeys {
  static func indices(from container: KeyedDecodingContainer<Selection.CodingKeys>) throws -> Range<Int> {
    let dict = try container.decode([String: Int].self, forKey: .indices)
    return dict["lower"]! ..< dict["upper"]!
  }
  static func encode(indices: Range<Int>, to container: inout KeyedEncodingContainer<Selection.CodingKeys>) throws {
    let dict = ["lower": indices.lowerBound, "upper": indices.upperBound]
    try container.encode(dict, forKey: .indices)
  }
}

I think you could add these methods as a third layer of synthesis between the two existing layers:

  • The synthesized init(from:)/encode(to:) members on the parent type include property x if CodingKeys has a static x(from:)/encode(x:to:) method with a matching type.
  • A default x(from:)/encode(x:to:) method is synthesized if CodingKeys does not have one but there is case by that name.
  • enum CodingKeys is implicitly declared with a list of cases matching the properties of the parent type if it is not explicitly declared.
8 Likes

Hi Itai,

It has been my dream for some time that we tackle this with a two step process.

  1. Introduce user-defined attributes. These would get @foo syntax (potentially with a prefix or some other distinguished syntax to get them out of the system attribute namespace). Such attributes would get encoded into the runtime metadata systems for decls for use by a future awesome reflection API. Ideally, some attributes such as the IB and core data attributes would switch to being library defined attributes instead of builtin (probably not possible for all of them until other language features come in).

  2. Introduce a way to define default protocol implementations as a macro, and allow that macro to iterate over stored properties (incl. their attributes). All of the synthesis stuff in the compiler (codable, but also equatable, hashable, and the stuff in the S4TF branch) could switch to using this, leading to a great reduction of special cases in the compiler, but also allowing more flexible patterns.

Allowing the macros to be conditionalized on the user defined attributes would allow you to achieve the stuff you're seeking without hacking it all into the compiler.

-Chris

19 Likes

Incidentally, I also hope that some day we revive the old property behaviors proposal, simplify it, and turn it into a 'property macros' proposal. How cool would it be to be able to write:

#delayed var foo : NSFoo

to get a delayed variable instead of IUO.

I believe that rebranding such a feature as a "property macro" would give it much more obvious syntax at point of use (specifically using the # namespace) and would make the resilience story much more obvious: macros aren't ever resilient.

-Chris

12 Likes

Totally unrelated to what @beccadax suggests, I focus on this single line of code he wrote:

I, too, wish that one could extend a type's CodingKeys enum. Even the private synthesized one.

One way is to relax the rules on extensions of private types:

struct Selection: Codable { ... }
extension Selection.CodingKeys: SomeOtherProtocol { ... }

Currently this code fails to compile with error: 'CodingKeys' is inaccessible due to 'private' protection level.

Another way is to allow nested extensions:

struct Selection: Codable {
    ....
    extension CodingKeys: SomeOtherProtocol { ... }
}

The reason why this is useful is that synthesized CodingKeys are our only safe (compiler-blessed) way to use external APIs that are fundamentally string-based. I now strings are not popular, but JSON, SQL, HTTP, etc, are there to stay. And they are string-based. One way not to fight Swift and those APIs is the very very precious synthesized CodingKeys enum. Allowing one to extend CodingKeys makes it easier to talk to external string-based APIs.

4 Likes

I should explain more why I think extending some particular CodingKeys is desirable in order to support string-based APIs.

  1. Using a string-based API does not imply that there are free strings scattered everywhere. A swift wrapper may want to define a SQLColumn or HTTPParameterKey protocol, for example.

  2. Codable can not be assumed to be the alpha and omega of coding and decoding. My own experience writing the SQLite library GRDB shows that Codable has a high runtime cost, mainly due to the heavy use of runtime type checks. (Edit: decoding Decodable records is twice as slow, according to Performance · groue/GRDB.swift Wiki · GitHub)

  3. The consequence is that it is useful to extend a string-based API with support for Codable, instead of rooting that string-based API on Codable. With such a design, one can use the efficient low-level string-based API, or use the handy Codable support, without any raw string in sight, but with a slight performance cost, depending on one's constraints.

  4. The consequence is that CodingKey (the protocol) can not be the root protocol for keys in the string-based API. There really must exist another protocol, SQLColumn, HTTPParameterKey, whatever, that some CodingKeys can be extended with.

  5. QED: one needs to extend some CodingKeys.

On that very topic, I'd like JSONEncoder to be able to output nulls for nil properties:

struct Player: Codable {
    var name: String?
}

try JSONEncoder().encode(Player()) // "{}" NOOOOOOO!

This would remove the majority of cases where I need to write a custom encoding method.

The use case is telling a server: "please set this property to null", instead of "since the key is missing, don't change the value of this property" (in a PUT/PATCH request).

4 Likes

I'm pretty interested in exploring this domain. What possible mechanisms would allow for custom user defined attributes to be able to affect how Codable would work? I'm mostly getting hung up on how exactly you could define these in a way that lets them be flexible/powerful enough that defining custom attributes is actually worthwhile, and not just a way of changing a select few things about a decl.

Would these user attributes in affect be hygienic macros that would allow the macro to actually find and change the structure of generated Codable methods/CodingKeys/etc? Or the structure of the AST in general. Or would this strictly be a runtime reflection based thing only?

I'm certainly no expert in these things, but it feels like a sufficiently powerful hygienic macro system could be used to solve both custom user attributes, as well as property behaviors.

Assuming the attributes can be parameterized, this is very similar to how my Sourcery template works. So an initial +1 to this general direction from me. I haven’t had a chance yet to think through in detail what kind of language support would cover all the use cases my team has encountered and supports with Sourcery but plan to do that soon now that this thread has appeared.

My template supports a wide range of attributes, some of which are only available in specific contexts (i.e. some only apply at the type level on enums with associated values, others only apply on Optional properties, etc). Another example is that we have some annotations that must reference a static function with a specific signature. Do you envision a way to validate correct usage of user-defined attributes and report compiler errors when they are applied incorrectly?

One limitation we have encountered with the Sourcery approach is that we have had to make some properties internal that really should be private. Would the synthesized defauly implementations be able to interact with private properties?

Finally, I imagine the macro would be able to synthesize types to meet associated type requirements, is that correct? I would definitely need to do that for some of the use cases I have.

The way I interpreted Chris’s post is that these features would hang together in a way that allows you to do all the things you can do with Sourcery, but in a lot better and even more powerful way.

Sourcery allows a generator template is provided with metadata about all the types in your project and supports “user-defined attributes” on declarations via comment annotations (i.e. // sourcery: myCustomAttribute). Templates can use this metadata to determine which types to generate code for, and use type, property, case, and associated value annotations to guide synthesis.

I wrote a template my team uses to synthesize our Codable implementations and this has worked out extremely well for us. Our library interacts with a broad family of APIs (hundreds of endpoints) that are all very inconsistent in how they handle various encoding circumstances and some of which produce rather awkward JSON. The template has allowed us to just sprinkle a few policy annotations on our models and get the correct encoding behavior. This has a lot of advantages, but most importantly, the encoding behavior is consistent across our entire suite of models and is immediately clear for developers who learn how the annotations behave.

The way I see this translating to Chris’s idea is that instead of a template, you have a default macro implementation of the protocol which receives the metadata for the type in question. It would use the metadata (and any other relevant metadata it needs to query from the compiler) to synthesize the conformance. So the default implementation itself is the macro and the user-defined attributes only attach metadata to declarations - they are not themselves macros.

Of course Chris left part 3 off, which is the most important part in the context of this thread: designing the attributes that Codable’s default implementation macro would recognize and use when synthesizing conformance. The features Chris describes are only the prerequisites to adding this kind of capability to Codable. For example, without giving it too much thought I imagine a family of EncodingStrategy, DecodingStrategy and CodingStrategy protocols along with @encodingStrategy, @decodingStrategy, and @codingStrategy attributes may turn out to be something we would want to explore. The attributes could be applied to a property with an argument that specifies a type conforming to the relevant strategy protocol which supports the property’s type.

3 Likes

One Idea I just had that might help per property Codable customization, while keeping automatic synthesis is to be able to define a computed property by a (self based) keypath reference. If it does help, it would also help with other simple adaptor types.

Somewhat like this:
(Though this in particular would probably be ambiguous with default value definitions)

struct S {
  private var storage: MyCodableCustomizer<SomeType>
  public var property: SomeType = \.storage.value
}

The way I interpreted Chris' post is that these features would allow you to do most of the things you can do with Sourcery only once step 2 is complete. The first step, user-defined attributes, would only give access to attributes at runtime, very similarly to C#'s Custom Attribute feature. That would be a wonderful feature for many library writers, and the Codable implementation could use theme at compile-time, but it would have to continue to be hand-written in the compiler. Only once step 2 introduces a macro system to implement protocol conformances would Codable be able to be removed from the compiler and transferred to the Standard Library.

Right, the features can't "hang together" at all when one of the most crucial features is entirely absent! :wink:

I would love to start discussing the first step of this plan, Custom Attributes, but the big unknown for me is what reflection API those attributes would be available on. Mirror always felt a bit hacky to me so I think that the real initial step is to design new reflection APIs to be able to fit attributes on.

Thoughts?

1 Like

I agree that a new reflection API should come before attributes are exposed via reflection. But I'm not really interested in reflection and would hate to see a design of a new reflection API hold up work on custom attributes and macros. IMO that's an orthogonal use of attributes and shouldn't be considered a prerequisite. On the other hand, attributes aren't that useful without a way to code against them so it does seem like something else should proceed in parallel.