[Pitch] Roundtripping key coding strategies

I have committed new implementations, new samples and a new pitch text.

The updated pitch text is included below - the updates all relate to the suggestions by @hisekaldma


Roundtripping key coding strategies

Introduction

Many encoders and decoders that can be used with the Codable system, embrace the concept of key coding strategies. They were introduced with JSONEncoder from Foundation, but the general concept is so useful that is has been adopted by many other encoders and decoders.

Here is a brief list of encoders and decoders that adopt the concept of key coding strategies:

Many of these implementations - including JSONEncoder - contain a flaw where not all encoded keys roundtrip correctly when using a key coding strategy.

For instance the key imageURL will encode to "image_url" when used with the convertToSnakeCase keyEncodingStrategy from JSONEncoder. But that encoded key will be transformed to "imageUrl" when applying the convertFromSnakeCase keyDecodingStrategy from JSONDecoder. This means that it will not match the original imageURL key, and the value will not roundtrip.

The underlying issue is the presence of two separate transformations. If there were only one transformation - from key to transformed key - then the issue could be fixed.

But isn't this just an issue with JSONEncoder and JSONDecoder from Foundation that is actually out of scope on Swift Evolution?

No, there is unfortunately an underlying reason as to why there exists two transformations today:

The KeyedDecodingContainer and KeyedDecodingContainerProtocol contain an API called allKeys that is supposed to return all the keys of the container. But internally these keys may be transformed, so a 'reverse' transformation must be applied in order to map to the key type of the KeyedDecodingContainer.

This basically means that any attempt at implementing key coding strategies must include a 'reverse' transformation, which in turn leads to the issue described.

Of course one could argue that this system is already broken by the introduction of key coding strategies (and I will argue that below), so something else is required if we want to fix the situation.

This proposal intends to provide alternative API to perform key coding strategies and also API to avoid key coding strategies for custom CodingKey types in situations where the encoding/decoding would otherwise break.

Swift-evolution thread: [Pre-pitch] Roundtripping key coding strategies

Motivation

Today the Codable system has a leaky abstraction if used with encoders and decoders that perform transformations on their keys.

In Foundation this is currently only present in the JSONEncoder and JSONDecoder, but the issue described is be the same for any other encoders/decoders that would attempt to do something similar to the keyEncodingStrategy of JSONEncoder or keyDecodingStrategy of JSONDecoder.

The issue is that there is currently a pair of transformations present - and since the transformations are lossy, you don't necessarily get to the source key by encoding and decoding it.

For instance if you set the keyEncodingStrategy of a JSONEncoder to .convertToSnakeCase and a similar keyDecodingStrategy to .convertFromSnakeCase and use it with the following struct:

struct Person: Codable {
  var imageURL: URL
}

Then the encoding transform will produce:

{
  "image_url": "..."
}

But the decoding transform will go from snake case to camel case, trying to look up the key: imageUrl , which does not exist.

This is a common source of bugs when using key coding strategies, and at least in code bases that I am familiar with, the workaround is often to add custom coding keys like:

enum CodingKeys: String, CodingKey {
  case imageURL = "imageUrl"
}

This allows the imageURL property to roundtrip when used with the snake case key encoding and decoding, but this is a leaky abstraction, since the developer needs to be aware of the necessity for adding this key - and also this specific key is there to support a specific configuration option of a specific encoder/decoder pair.

Codable entities and the encoder/decoder they are used with are supposed to be decoupled, but in this situation, the developer needs to know if the codable entity is used with an encoder/decoder pair that use key transformations - and also need to remember to map the key correctly, so that it will be 'found' when converting back from snake case to camel case.

Often I have seen attempts to 'fix' the behavior with the notation you would use if you didn't apply a key coding strategy:

enum CodingKeys: String, CodingKey {
  case imageURL = "image_url"
}

which of course is no good when used with snake case conversion, since the key that will be looked up is "imageUrl".

Other times I have seen developers thinking that the custom CodingKey implementation must be a mistake and removing it entirely, because unless you are very familiar with both the use case and the peculiarity of this mapping, then the code does look a bit 'off'.

Finally having this custom coding key also means that you are in trouble if you wish to encode/decode the same entity with an encoder/decoder where you are not using a similar key transform.

As described in the introduction, this issue is not specific to JSONEncoder and JSONDecoder, since all encoder/decoder pairs are basically forced to provide two transformations in order to support the allKeys API on KeyedDecodingContainer. As soon as you have the two transformations, they are basically required to be 'lossless' in order to have any key roundtrip correctly.

An attempt to analyze the allKeys API

In order to figure out how to propose an alternative to the allKeys API, we must first analize it's use cases.

When encoding a simple fixed struct with synthesized CodingKeys, there is usually no use for allKeys.

One use case could be to count all keys to ensure that only the explicitly handled keys are present in the input. For this use case you only need the count of keys.

Another use case is when the keys are dynamic - in the sense that they are perhaps not fully known by the author of the Codable type, but can be extended later on.

One such implementation can be seen with AttributedString here:

https://github.com/apple/swift-corelibs-foundation/blob/2db661061615dc366bd31af779d6f4551cb3197d/Sources/Foundation/AttributedString/AttributedStringCodable.swift#L493

The type of the CodingKey used for this KeyedDecodingContainer is AttributeKey, and it is precisely 'dynamic' in the sense that it can represent any String as it's key value, and the exact use cases are unknown to the implementation since the AttributedString functionality contains an aspect of extensibility.

So what happens when encoding and decoding AttributeString using JSONEncoder and JSONDecoder with snake case key coding strategies? As an example it fails to roundtrip text marked up with the .imageURL property. This property is implemented using a key named NSImageURL. This is encoded to "n_s_image_url" and upon decoding this will look for a key named "NSImageUrl", which does not exist.

https://forums.swift.org/t/pre-pitch-roundtripping-key-coding-strategies/52777/4

Proposed Solution

The proposal is to introduce three changes. One of these is in the domain of Foundation, so it is out of scope for discussion in this forum. I do, however, feel that it is necessary to understand the complete picture, and I think that we could limit discussion about it on the forums to be around: 'do you think that it would be a good idea to create a PR containing these changes to swift-corelibs-foundation and of course then let the engineers at Apple decide on whether or not to accept the changes.

Here are the proposed changes:

  1. Introduce an allRawKeys: [CodingKey] API on KeyedDecodingContainer and KeyedDecodingContainerProtocol.

    In order to not break backwards compatibility, KeyedDecodingContainerProtocol will supply a default implementation of allRawKeys that just returns allKeys, but authors of types conforming to KeyedDecodingContainerProtocol are advised to implement allRawKeys explicitly.

    Create a PR against swift-corelibs-foundation that adds support for allRawKeys for JSONDecoder.

  2. Add a static requirement to the CodingKey protocol: static var isPreformatted: Bool { get } with a default implementation returning false. Encoder and Decoder implementations that support some form of key coding strategy would be advised to implement opting out of key coding strategies for CodingKey types that return true for isPreformatted.

    Create a PR againts swift-corelibs-foundation that respects the isPreformatted property for JSONEncoder and JSONDecoder

  3. Create a PR against swift-corelibs-foundation that deprecates JSONEncoder's keyEncodingStrategy as well as JSONDecoders keyDecodingStrategy and introduces a common keyCodingStrategy that is a transformation in the direction from a CodingKey to an encoded key.

How do these changes help?

For use cases where the coding keys are completely dynamic, any key coding strategy will have the possibility of transforming the keys into a shape that cannot be recognized upon decoding. In that situation it could be relevant to let the CodingKey in question return true for isPreformatted in order to completely opt-out of having the keys transformed upon encoding and decoding.

In order to get a peek into the decoding process, or perhaps check the number of keys, the allRawKeys API on KeyedDecodingContainer could be a solution.

In order to have your synthesized CodingKeys round trip correctly without any manual key mapping or knowledge about how keys are transformed during encoding, use a keyCodingStrategy like useSnakeCase.

Examples

Here is a repository demonstrating a version of JSONEncoder and JSONDecoder that deprecates keyEncodingStrategy and keyDecodingStrategy respectively and introduce a shared keyCodingStrategy instead.

It also respects CodingKey types that return true for isPreformatted (although this is made using a hack as I cannot demonstrate adding stuff to the Swift standard library)

https://github.com/mortenbekditlevsen/JSONCoder

Here are some of the included tests:


final class JSONEncoderTests: XCTestCase {
  func testUseSnakeCase() throws {
    struct Model: Codable {
      var imageURL: String
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .useSnakeCase
    let data = try encoder.encode(Model(imageURL: "a"))

    let expectedString = "{\"image_url\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)

    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = .useSnakeCase
    let model = try decoder.decode(Model.self, from: data)

    XCTAssertEqual(model.imageURL, "a")
  }
  
  // NOTE: This only works in tests and is only included for
  // illustrative purposes.
  func testCustom() throws {
    struct Model: Codable {
      var imageURL: String
    }
    struct MyCodingKey: CodingKey {
      var stringValue: String
      var intValue: Int? { nil }
      init(stringValue: String) {
        self.stringValue = stringValue
      }
      init?(intValue: Int) {
        self.stringValue = "\(intValue)"
      }
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .custom({ codingPath in
      MyCodingKey(stringValue: "\(codingPath.last?.stringValue.hash ?? 0)")
    })
    let data = try encoder.encode(Model(imageURL: "a"))    

    let expectedString = "{\"3520785955319405054\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)
    
    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = encoder.keyCodingStrategy
    let model = try decoder.decode(Model.self, from: data)
    
    XCTAssertEqual(model.imageURL, "a")
  }
  
  func testPreformattedKey() throws {

    struct MyPreformattedCodingKey: CodingKey {
      var stringValue: String
      var intValue: Int? { nil }
      init(stringValue: String) {
        self.stringValue = stringValue
      }
      init?(intValue: Int) {
        self.stringValue = "\(intValue)"
      }
      static var isPreformatted: Bool { true }
    }

    struct Model: Codable {
      var imageURL: String
      func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: MyPreformattedCodingKey.self)
        try container.encode(imageURL, forKey: .init(stringValue: "imageURL"))
      }
      init(imageURL: String) {
        self.imageURL = imageURL
      }
      init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: MyPreformattedCodingKey.self)
        self.imageURL = try container.decode(String.self, forKey: .init(stringValue: "imageURL"))
      }
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .useSnakeCase
    let data = try encoder.encode(Model(imageURL: "a"))

    let expectedString = "{\"imageURL\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)

    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = .useSnakeCase
    let model = try decoder.decode(Model.self, from: data)
    XCTAssertEqual(model.imageURL, "a")
  }

Detailed Design

Adding isPreformatted to `CodingKey

The proposed solution adds a new static requirement to CodingKey, isPreformatted:

public protocol CodingKey: Sendable,
                           CustomStringConvertible,
                           CustomDebugStringConvertible {
  ...

  /// Suggests to `Codable` encoders and decoders that no key encoding or
  /// decoding ought to be performed on `CodingKey`s of this type.
  @available(SwiftStdlib 5.6, *)
  static var isPreformatted { get }
}

extension CodingKey {
  ...
  
  @available(SwiftStdlib 5.6, *)
  public static var isPreformatted { return false }
}

A branch with the implementation can be found here:

https://github.com/mortenbekditlevsen/swift/tree/codable_preformattedcodingkey

Handle isPreformatted coding keys in JSONEncoder

private func _converted(_ key: Key) -> CodingKey {
        // Use the plain key if it is preformatted
        if Key.isPreformatted {
            return key
        }

        switch encoder.options.keyEncodingStrategy {
        case .useDefaultKeys:
            return key
        case .convertToSnakeCase:
            let newKeyString = JSONEncoder.KeyEncodingStrategy._convertToSnakeCase(key.stringValue)
            return _JSONKey(stringValue: newKeyString, intValue: key.intValue)
        case .custom(let converter):
            return converter(codingPath + [key])
        }
    }

Handle isPreformatted coding keys in JSONDecoder

private struct _JSONKeyedDecodingContainer<K : CodingKey> : KeyedDecodingContainerProtocol, TestKeyedDecodingContainerProtocol {
...
    /// Initializes `self` by referencing the given decoder and container.
    init(referencing decoder: __JSONDecoder, wrapping container: [String : Any]) {
        self.decoder = decoder
        self.codingPath = decoder.codingPath

        // Use the plain container if the keys are preformatted
        guard !Key.isPreformatted else {
            self.container = container
            return
        }

        switch decoder.options.keyDecodingStrategy {
        case .useDefaultKeys:
            self.container = container
        case .convertFromSnakeCase:
            // Convert the snake case keys in the container to camel case.
            // If we hit a duplicate key after conversion, then we'll use the first one we saw. Effectively an undefined behavior with JSON dictionaries.
            self.container = Dictionary(container.map {
                key, value in (JSONDecoder.KeyDecodingStrategy._convertFromSnakeCase(key), value)
            }, uniquingKeysWith: { (first, _) in first })
        case .custom(let converter):
            self.container = Dictionary(container.map {
                key, value in (converter(decoder.codingPath + [_JSONKey(stringValue: key, intValue: nil)]).stringValue, value)
            }, uniquingKeysWith: { (first, _) in first })
        }
    }

A branch with support for respecting isPreformatted for JSONEncoder and JSONDecoder can be found here:

https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_preformattedcodingkey

Handling allRawKeys

Introduce new API on KeyedDecodingContainerProtocol

  /// All the raw keys the `Decoder` has for this container.
  ///
  /// Different keyed containers from the same `Decoder` may return different
  /// keys here. This should report all raw keys present in the
  /// container without attempting to convert to the `Key` type of the container.
  var allRawKeys: [CodingKey] { get }

Default implementation returning allKeys

public extension KeyedDecodingContainerProtocol {
  // We need a default implementation in order to not break API
  // This default implementation just returns the existing `allKeys`
  var allRawKeys: [CodingKey] {
    return allKeys
  }
}

New API on the KeyedDecodingContainer

  /// All the raw keys the `Decoder` has for this container.
  ///
  /// Different keyed containers from the same `Decoder` may return different
  /// keys here. This should report all raw keys present in the
  /// container without attempting to convert to the `Key` type of the container.
  public var allRawKeys: [CodingKey] {
    return _box.allRawKeys
  }

A branch with the implementation can be found here:

https://github.com/mortenbekditlevsen/swift/tree/keyeddecodingcontainer_allkeys

A branch with support for allRawKeys for JSONDecoder and PlistDecoder can be found here:

https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_allrawkeys

Handling useSnakeCase

A branch with deprecated keyEncodingStrategy and keyDecodingStrategy and the introduction of keyCodingStrategy can be found here:

https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_keycodingstrategy

The branch contains the following changes from the existing implementation:

  • Introduce keyCodingStrategy on JSONEncoder and JSONDecoder in both Darwin Foundation overlay and swift-corelibs-foundation.
  • Deprecate keyEncodingStrategy and keyDecodingStrategy
  • Implement strategies. If a default keyCodingStrategy is used, there should be a fallback to the deprecated encoding and decoding strategies.

Impact on Existing Code

The allRawKeys is additive, but with a default implementation, so all existing KeyedDecodingContainerProtocol conformers will continue to compile, although it would be advisable to implement a specialized version.

There is also no direct impact for the isPreformatted static requirement to the CodingKey protocol, since the default implementation does not opt out of key coding strategies.

Note that returning true for isPreformatted for an existing CodingKey will change it's encoding and decoding behavior, so that must be done with thoughts about how this interacts with current and future use of key coding strategies.

There will be deprecation warnings for existing keyEncodingStrategy and keyDecodingStrategy, but opting in to a keyCodingStrategy can be done at the leasure of the user.

With regards to any current Decodable conforming type that uses allKeys from the KeyedDecodingContainer upon decoding, I have demonstrated above that it is not reliable when used together with convertFromSnakeCase and any similar transformation. Deprecating this API will let this fact be known and allow the author to take steps to using allRawKeys or alternatively let the CodingKey in use return true for isPreformatted.

Alternatives Considered

Using Dictionary instead of isPreformatted

You can already today ensure that key coding will not be performed on your CodingKey by leveraging the fact that keys in Dictionary are treated as data and not as CodingKeys.

This knowledge can be used to circumvent key coding strategies today. As can be seen in the following gist, the ergonomics are quite horrible, so returning true for isPreformatted on your CodingKey seems like a great win.

Don't touch my keys:
https://gist.github.com/mortenbekditlevsen/7918fb98638f8a9e2b017f0fad12da0b

Graceful fallback for allKeys

Even though using useSnakeCase for a JSONEncoder, the allKeys method on KeyedDecodingContainer could still use the convertFromSnakeCase transformation to recover the same keys as it does today when using the convertFromSnakeCase key decoding strategy. There is, however, no obvious choice for the custom case here, and I guess that in the long run it could easily cause more confusion than benefit.

Full allKeys support for simple enum backed CodingKeys

I did a small hack based on the great work by @stephencelis and @mbrandonw in their swift-case-paths library.

This hack basically allows you to generate an array of all cases of an enum without associated values. In other words, it queries the runtime to return information that is comparable to what the CaseIterable conformance gives us at compile time.

Using that hack, the allKeys implementation could iterate over all cases of your CodingKey when that CodingKey is an enum like the synthesized ones.

Having access to these cases allow you to perform the key coding transformation correctly and return a list of keys present in the KeyedDecodingContainer.

A fallback, in case your CodingKey is not a plain enum-backed version could be to attempt initializing the CodingKey directly from the encoded key in the KeyedDecodingContainer - or the fallback could even be to attempt the graceful fallback described in the section above.

While fun to play around with, this solution seems a bit strange, and as it mainly only fully works with plain enums, I don't consider it fit for actual use.

Adding a new protocol PreformattedCodingKey to the Swift standard library

The initial revision of this pitch text suggested adding a new protocol, PreformattedCodingKey to the standard library in place of the static requirement for isPreformatted on CodingKey in this revision.

The current solution was suggested by @hisekaldma, and I agree that it is a much nicer solution than adding a new protocol without requirements to the standard library.

Acknowledgements

A huge thanks to @norio_nomura for the original PR to include useSnakeCase as a key coding strategy.

Thanks to @hisekaldma for suggesting adding a requirement to CodingKey rather than introducing a new protocol.

Many thanks to everyone providing feedback on the pre pitch discussion.

Revision history

  • Initial version
  • Changed addition of a new protocol PreformattedCodingKey to being a static requirement on the CodingKey protocol.
1 Like