[Pre-pitch] Roundtripping key coding strategies

Hi all,
Here's a preliminary pitch text. I have working code, but I haven't polished it up for the series of individual PRs that I would like yet. There's also still two TODOs in the Detailed Design section.

But if anyone has feedback for the current state of the document, it would be very welcome!

Roundtripping key coding strategies

Introduction

Many encoders and decoders that can be used with the Codable system, embrace the concept of key coding strategies. They were introduced with JSONEncoder from Foundation, but the general concept is so useful that is has been adopted by many other encoders and decoders.

A brief list of encoders and decoders that adopts the concept of key coding strategies:

Many of these implementations - including JSONEncoder contains a flaw where not all encoded keys roundtrip correctly when using a key coding strategy.

For instance the key imageURL will encode to image_url when used with the convertToSnakeCase keyEncodingStrategy from JSONEncoder. But that same key will be transformed to imageUrl when applying the convertFromSnakeCase keyDecodingStrategy from JSONDecoder.

The underlying issue is that there are two separate transformations involved. If there were only one transformation - from key to transformed key - then the issue could be fixed.

But isn't this just an issue with JSONEncoder and JSONDecoder from Foundation that is actually out of scope on Swift Evolution?

No, there is unfortunately an underlying reason as to why there exists two transformations today:

KeyedDecodingContainer contains an API called allKeys that returns all the keys of the container. But internally these keys may be transformed, so a 'reverse' transformation must be applied in order to map to the key type of the KeyedDecodingContainer.

This basically means that any attempt at implementing key coding strategies must include a 'reverse' transformation, which again leads to the issue described.

Of course one could argue that this system is already broken by the introduction of key coding strategies (and I will argue that below), so something else is required if we want to fix the situation.

This proposal intends to provide alternative API to perform key coding strategies and also API to avoid key coding strategies for custom types in situations where the encoding/decoding would otherwise break.

Swift-evolution thread: [Pre-pitch] Roundtripping key coding strategies

Motivation

Today the Codable system has a leaky abstraction if used with encoders and decoders that perform transformations on their keys.

In Foundation this is currently only present in the JSONEncoder and JSONDecoder , but the issue described is be the same for any other encoders/decoders that would attempt to do something similar to JSON(En/De)coder s key(En/De)codingStrategy .

The issue is that there is currently a pair of transformations present - and since the transformations are lossy, you don't necessarily get to the source key by encoding and decoding it.

For instance if I set the keyEncodingStrategy of a JSONEncoder to .convertToSnakeCase and a similar keyDecodingStrategy to .convertFromSnakeCase and use it with the following struct:

struct Person: Codable {
  var imageURL: URL
}

Then the encoding transform will produce:

{
  "image_url": "..."
}

But the decoding transform will go from snake case to camel case, trying to look up the key: imageUrl , which does not exist.

This is a common source of bugs when using key coding strategies, and at least in code bases that I am familiar with, the workaround is often to add custom coding keys like:

enum CodingKeys: String, CodingKey {
  case imageURL = "imageUrl"
}

This allows the imageURL property to roundtrip when used with the snake case encoding and decoding, but this is a 'leaky abstraction', since the developer needs to be aware of the necessity for adding this key - and also this specific key is there to support a specific configuration option of a specific encoder/decoder pair.

Codable entities and the encoder/decoder they are used with are supposed to be decoupled, but in this situation, the developer needs to know if the codable entity is used with an encoder/decoder pair that use key transformations - and also need to remember to map the key correctly, so that it will be 'found' when converting back from snake case to camel case.

Often I have seen attempts to 'fix' the behavior with the notation you would use if you didn't apply a key coding strategy:

enum CodingKeys: String, CodingKey {
  case imageURL = "image_url"
}

which of course is no good when used with snake case conversion, since the key that will be looked up is "imageUrl"

Other times I have seen developers thinking that the custom CodingKey implementation must be a mistake and removing it entirely, because unless you are very familiar with both the use case and the peculiarity of this mapping, then the code does look a bit 'off'.

Finally having this custom coding key also means that you are in trouble if you wish to encode/decode the same entity with an encoder/decoder where you are not using a similar key transform.

As described in the introduction, this issue is not specific to JSONEncoder and JSONDecoder, since all encoder/decoder pairs are basically forced to provide two transformations in order to support the allKeys API on KeyedDecodingContainer. As soon as you have the two transformations, they are basically required to be 'lossless' in order to have any key roundtrip correctly.

An attempt to analyze the allKeys API

In order to figure out how to propose an alternative to the allKeys API, we must first analize it's use cases.

When encoding a simple fixed struct with synthesized CodingKeys, there is usually no use for allKeys.

One use case could be to count all keys to ensure that only the explicitly handled keys are present in the input. For this use case you only need the count of keys.

Another use case is where the keys are dynamic - in the sense that they are perhaps not fully known by the author of the Codable type, but can be extended later on.

One such implementation can be seen with AttributedString here:

https://github.com/apple/swift-corelibs-foundation/blob/2db661061615dc366bd31af779d6f4551cb3197d/Sources/Foundation/AttributedString/AttributedStringCodable.swift#L493

The key type used for this KeyedDecodingContainer is AttributeKey, and it is precisely 'dynamic' in the sense that it can represent any String as it's key value, and the exact use cases are unknown to the implementation since the AttributedString functionality contains an aspect of extensibility.

So what happens when encoding and decoding AttributeString using JSONEncoder and JSONDecoder with snake case key coding strategies? It fails to roundtrip text marked up with the .imageURL property. This property appears to be marked up using a key named NSImageURL. This is encoded to n_s_image_url and upon decoding this will look for a key named NSImageUrl, which does not exist.

https://forums.swift.org/t/pre-pitch-roundtripping-key-coding-strategies/52777/4

Proposed Solution

The proposal is to introduce three changes. One of these is in the domain of Foundation, so it is out of scope for discussion in this forum. I do, however, feel that it is necessary to understand the complete picture, and I think that we could limit discussion about it on the forums to be around: 'do you think that it would be a good idea to create a PR containing these changes to swift-corelibs-foundation and of course then let Apple decide on whether or not to accept the changes.

Here are the proposed changes:

  1. Introduce an allRawKeys: [CodingKey] API on KeyedDecodingContainer and KeyedDecodingContainerProtocol.

    In order to not break backwards compatibility, KeyedDecodingContainerProtocol will supply a default implementation of allRawKeys that just returns allKeys, but authors of types conforming to KeyedDecodingContainerProtocol are advised to implement allRawKeys explicitly.

    Create a PR against swift-corelibs-foundation that adds support for allRawKeys for JSONDecoder.

  2. Introduce a protocol in the standard library named PreformattedCodingKey. Encoder and Decoder implementations that support some form of key coding strategy would be advised to implement opting out of key coding strategies for CodingKey types that conform to PreformattedCodingKey.

    Create a PR againts swift-corelibs-foundation that respects the PreformattedCodingKey for JSONEncoder and JSONDecoder

  3. Create a PR against swift-corelibs-foundation that deprecates JSONEncoders keyEncodingStrategy as well as JSONDecoders keyDecodingStrategy and introduces a common keyCodingStrategy that is a transformation in the direction from a CodingKey to an encoded key.

How do these changes help?

For use cases where the coding keys are completely dynamic, any key coding strategy will have the possibility of transforming the keys into a shape that cannot be recognized upon decoding. In that situation it could be relevant to let the CodingKey in question conform to PreformattedCodingKey in order to completely opt-out of having the keys transformed upon encoding and decoding.

In order to get a peek into the decoding process, or perhaps check the number of keys, the allRawKeys API on KeyedDecodingContainer could be a solution.

In order to have your synthesized CodingKeys round trip correctly without any manual key mapping or knowledge about how keys are transformed during encoding, use a keyCodingStrategy like useSnakeCase.

Examples

Here is a repository demonstrating a version of JSONEncoder and JSONDecoder that deprecates keyEncodingStrategy and keyDecodingStrategy respectively and introduce a shared keyCodingStrategy instead.

It also respects conformance to the included PreformattedCodingKey protocol (although this pitch proposes that this protocol is added to the Swift standard library and not to Foundation)

https://github.com/mortenbekditlevsen/JSONCoder

Here are some of the included tests:


final class JSONEncoderTests: XCTestCase {
  func testUseSnakeCase() throws {
    struct Model: Codable {
      var imageURL: String
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .useSnakeCase
    let data = try encoder.encode(Model(imageURL: "a"))

    let expectedString = "{\"image_url\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)

    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = .useSnakeCase
    let model = try decoder.decode(Model.self, from: data)

    XCTAssertEqual(model.imageURL, "a")
  }
  
  // NOTE: This only works in tests and is only included for
  // illustrative purposes.
  func testCustom() throws {
    struct Model: Codable {
      var imageURL: String
    }
    struct MyCodingKey: CodingKey {
      var stringValue: String
      var intValue: Int? { nil }
      init(stringValue: String) {
        self.stringValue = stringValue
      }
      init?(intValue: Int) {
        self.stringValue = "\(intValue)"
      }
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .custom({ codingPath in
      MyCodingKey(stringValue: "\(codingPath.last?.stringValue.hash ?? 0)")
    })
    let data = try encoder.encode(Model(imageURL: "a"))    

    let expectedString = "{\"3520785955319405054\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)
    
    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = encoder.keyCodingStrategy
    let model = try decoder.decode(Model.self, from: data)
    
    XCTAssertEqual(model.imageURL, "a")
  }
  
  func testPreformattedKey() throws {

    struct MyPreformattedCodingKey: PreformattedCodingKey {
      var stringValue: String
      var intValue: Int? { nil }
      init(stringValue: String) {
        self.stringValue = stringValue
      }
      init?(intValue: Int) {
        self.stringValue = "\(intValue)"
      }
    }

    struct Model: Codable {
      var imageURL: String
      func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: MyPreformattedCodingKey.self)
        try container.encode(imageURL, forKey: .init(stringValue: "imageURL"))
      }
      init(imageURL: String) {
        self.imageURL = imageURL
      }
      init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: MyPreformattedCodingKey.self)
        self.imageURL = try container.decode(String.self, forKey: .init(stringValue: "imageURL"))
      }
    }

    let encoder = JSONEncoder()
    encoder.keyCodingStrategy = .useSnakeCase
    let data = try encoder.encode(Model(imageURL: "a"))

    let expectedString = "{\"imageURL\":\"a\"}"
    XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)

    let decoder = JSONDecoder()
    decoder.keyCodingStrategy = .useSnakeCase
    let model = try decoder.decode(Model.self, from: data)
    XCTAssertEqual(model.imageURL, "a")
  }

Detailed Design

Adding PreformattedCodingKey

The proposed solution adds a new protocol, PreformattedCodingKey:

/// Suggests to `Codable` encoders and decoders that no key encoding or.
/// decoding ought to be performed on `CodingKey`s of this type.
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public protocol PreformattedCodingKey { }

Handle PreformattedCodingKey conforming keys in JSONEncoder

private func _converted(_ key: CodingKey) -> CodingKey {
        // Use the plain key if it is preformatted
        if key is PreformattedCodingKey {
            return key
        }

        switch encoder.options.keyEncodingStrategy {
        case .useDefaultKeys:
            return key
        case .convertToSnakeCase:
            let newKeyString = JSONEncoder.KeyEncodingStrategy._convertToSnakeCase(key.stringValue)
            return _JSONKey(stringValue: newKeyString, intValue: key.intValue)
        case .custom(let converter):
            return converter(codingPath + [key])
        }
    }

Handle PreformattedCodingKey conforming keys in JSONDecoder

private struct _JSONKeyedDecodingContainer<K : CodingKey> : KeyedDecodingContainerProtocol, TestKeyedDecodingContainerProtocol {
...
    /// Initializes `self` by referencing the given decoder and container.
    init(referencing decoder: __JSONDecoder, wrapping container: [String : Any]) {
        self.decoder = decoder
        self.codingPath = decoder.codingPath

        // Use the plain container if the keys are preformatted
        guard !(Key.self is PreformattedCodingKey.Type) else {
            self.container = container
            return
        }

        switch decoder.options.keyDecodingStrategy {
        case .useDefaultKeys:
            self.container = container
        case .convertFromSnakeCase:
            // Convert the snake case keys in the container to camel case.
            // If we hit a duplicate key after conversion, then we'll use the first one we saw. Effectively an undefined behavior with JSON dictionaries.
            self.container = Dictionary(container.map {
                key, value in (JSONDecoder.KeyDecodingStrategy._convertFromSnakeCase(key), value)
            }, uniquingKeysWith: { (first, _) in first })
        case .custom(let converter):
            self.container = Dictionary(container.map {
                key, value in (converter(decoder.codingPath + [_JSONKey(stringValue: key, intValue: nil)]).stringValue, value)
            }, uniquingKeysWith: { (first, _) in first })
        }
    }

Handling allRawKeys

TODO: Add actual suggested code here:

  • Introduce new API on KeyedDecodingContainerProtocol
  • Default implementation returning allKeys
  • New API on the KeyedDecodingContainer

Handling useSnakeCase

TODO: Add actual suggested code here:

  • Introduce keyCodingStrategy on JSONEncoder and JSONDecoder in both Darwin Foundation overlay and swift-corelibs-foundation.
  • Deprecate keyEncodingStrategy and keyDecodingStrategy
  • Implement strategies. If a default keyCodingStrategy is used, there should be a fallback to the deprecated encoding and decoding strategies.

Impact on Existing Code

The allRawKeys is additive, but with a default implementation, so all existing KeyedDecodingContainerProtocol conformers will continue to compile, although it would be advisable to implement a specialized version.

Also no direct impact for the PreformattedCodingKey protocol, since adoption of this protocol is additive.

Note that conforming an existing CodingKey to PreformattedCodingKey will change it's encoding and decoding behavior, so that must be done with thoughts about how this intersects with current and future use of key coding strategies.

There will be deprecation warnings for existing keyEncodingStrategy and keyDecodingStrategy, but opting in to a keyCodingStrategy can be done at the leasure of the user.

With regards to any current Decodable conforming type that uses allKeys from the KeyedDecodingContainer upon decoding, I have demonstrated above that it is not reliable when used together with convertFromSnakeCase. Deprecating this API will let this fact be known and allow the author to take steps to using allRawKeys or alternatively let the CodingKey in use conform to PreformattedCodingKey.

Alternatives Considered

Using Dictionary instead of PreformattedCodingKey

You can already today ensure that key coding will not be performed on your CodingKey by leveraging the fact that keys in Dictionary are treated as data and not as CodingKeys.

This knowledge can be used to circumvent key coding strategies today. As can be seen in the following gist, the ergonomics are quite horrible, so conforming your CodingKey to the PreformattedCodingKey seems like a great win.

Don't touch my keys:
https://gist.github.com/mortenbekditlevsen/7918fb98638f8a9e2b017f0fad12da0b

Graceful fallback for allKeys

Even though using useSnakeCase for a JSONEncoder, the allKeys method on KeyedDecodingContainer could still use the convertFromSnakeCase transformation to recover the same keys as it does today when using the convertFromSnakeCase key decoding strategy. There is, however, no obvious choice for the custom case here, and I guess that in the long run it could easily cause more confusion than benefit.

Full allKeys support for simple enum backed CodingKeys

I did a small hack based on the great work by @stephencelis and Brandon Williams in their swift-case-paths library.

This hack basically allows you to generate an array of all cases of an enum without associated values. In other words, it queries the runtime to return information that is comparable to what the CaseIterable conformance gives us at compile time.

Using that hack, the allKeys implementation could iterate over all cases of your CodingKey when that CodingKey is an enum like the synthesized ones.

Having access to these cases allow you to perform the key coding transformation correctly and return a list of keys present in the KeyedDecodingContainer.

A fallback, in case your CodingKey is not a plain enum-backed version could be to attempt initializing the CodingKey directly from the encoded key in the KeyedDecodingContainer - or the fallback could even be to attempt the graceful fallback described in the section above.

While fun to play around with, this solution seems a bit strange, and as it mainly only fully works with plain enums, I don't consider it fit for actual use.

Acknowledgements

A huge thanks to @norio_nomura for the original PR to include useSnakeCase as a key coding strategy.

Many thanks to everyone providing feedback on the pre pitch discussion.

Revision history

  • Initial version

EDIT:
Removed a leftover suggestion from previously about also deprecating allKeys. After suggestion from Itai Ferber above I am currently not suggesting to deprecate that.

5 Likes