I have committed new implementations, new samples and a new pitch text.
The updated pitch text is included below - the updates all relate to the suggestions by @hisekaldma
Roundtripping key coding strategies
- Proposal: SE-NNNN
- Author: Morten Bek Ditlevsen
- Review Manager: TBD
- Status: Awaiting implementation
Introduction
Many encoders and decoders that can be used with the Codable
system, embrace the concept of key coding strategies
. They were introduced with JSONEncoder
from Foundation
, but the general concept is so useful that is has been adopted by many other encoders and decoders.
Here is a brief list of encoders and decoders that adopt the concept of key coding strategies
:
- https://github.com/apple/swift-corelibs-foundation/blob/main/Darwin/Foundation-swiftoverlay/JSONEncoder.swift
- https://github.com/MaxDesiatov/XMLCoder/blob/main/Sources/XMLCoder/Encoder/XMLEncoder.swift
- https://github.com/firebase/firebase-ios-sdk/blob/master/FirebaseDatabaseSwift/Sources/third_party/RTDBEncoder/RTDBEncoder.swift
- https://github.com/groue/GRDB.swift/blob/master/README.md#column-names-coding-strategies
Many of these implementations - including JSONEncoder
- contain a flaw where not all encoded keys roundtrip correctly when using a key coding strategy
.
For instance the key imageURL
will encode to "image_url"
when used with the convertToSnakeCase
keyEncodingStrategy
from JSONEncoder
. But that encoded key will be transformed to "imageUrl"
when applying the convertFromSnakeCase
keyDecodingStrategy
from JSONDecoder
. This means that it will not match the original imageURL
key, and the value will not roundtrip.
The underlying issue is the presence of two separate transformations. If there were only one transformation - from key to transformed key - then the issue could be fixed.
But isn't this just an issue with JSONEncoder
and JSONDecoder
from Foundation
that is actually out of scope on Swift Evolution?
No, there is unfortunately an underlying reason as to why there exists two transformations today:
The KeyedDecodingContainer
and KeyedDecodingContainerProtocol
contain an API called allKeys
that is supposed to return all the keys of the container. But internally these keys may be transformed, so a 'reverse' transformation must be applied in order to map to the key type of the KeyedDecodingContainer
.
This basically means that any attempt at implementing key coding strategies must include a 'reverse' transformation, which in turn leads to the issue described.
Of course one could argue that this system is already broken by the introduction of key coding strategies (and I will argue that below), so something else is required if we want to fix the situation.
This proposal intends to provide alternative API to perform key coding strategies and also API to avoid key coding strategies for custom CodingKey
types in situations where the encoding/decoding would otherwise break.
Swift-evolution thread: [Pre-pitch] Roundtripping key coding strategies
Motivation
Today the Codable
system has a leaky abstraction if used with encoders and decoders that perform transformations on their keys.
In Foundation
this is currently only present in the JSONEncoder
and JSONDecoder
, but the issue described is be the same for any other encoders/decoders that would attempt to do something similar to the keyEncodingStrategy
of JSONEncoder
or keyDecodingStrategy
of JSONDecoder
.
The issue is that there is currently a pair of transformations present - and since the transformations are lossy, you don't necessarily get to the source key by encoding and decoding it.
For instance if you set the keyEncodingStrategy
of a JSONEncoder
to .convertToSnakeCase
and a similar keyDecodingStrategy
to .convertFromSnakeCase
and use it with the following struct:
struct Person: Codable {
var imageURL: URL
}
Then the encoding transform will produce:
{
"image_url": "..."
}
But the decoding transform will go from snake case to camel case, trying to look up the key: imageUrl
, which does not exist.
This is a common source of bugs when using key coding strategies, and at least in code bases that I am familiar with, the workaround is often to add custom coding keys like:
enum CodingKeys: String, CodingKey {
case imageURL = "imageUrl"
}
This allows the imageURL
property to roundtrip when used with the snake case key encoding and decoding, but this is a leaky abstraction, since the developer needs to be aware of the necessity for adding this key - and also this specific key is there to support a specific configuration option of a specific encoder/decoder pair.
Codable
entities and the encoder/decoder they are used with are supposed to be decoupled, but in this situation, the developer needs to know if the codable entity is used with an encoder/decoder pair that use key transformations - and also need to remember to map the key correctly, so that it will be 'found' when converting back from snake case to camel case.
Often I have seen attempts to 'fix' the behavior with the notation you would use if you didn't apply a key coding strategy:
enum CodingKeys: String, CodingKey {
case imageURL = "image_url"
}
which of course is no good when used with snake case conversion, since the key that will be looked up is "imageUrl"
.
Other times I have seen developers thinking that the custom CodingKey
implementation must be a mistake and removing it entirely, because unless you are very familiar with both the use case and the peculiarity of this mapping, then the code does look a bit 'off'.
Finally having this custom coding key also means that you are in trouble if you wish to encode/decode the same entity with an encoder/decoder where you are not using a similar key transform.
As described in the introduction, this issue is not specific to JSONEncoder
and JSONDecoder
, since all encoder/decoder pairs are basically forced to provide two transformations in order to support the allKeys
API on KeyedDecodingContainer
. As soon as you have the two transformations, they are basically required to be 'lossless' in order to have any key roundtrip correctly.
An attempt to analyze the allKeys
API
In order to figure out how to propose an alternative to the allKeys
API, we must first analize it's use cases.
When encoding a simple fixed struct with synthesized CodingKeys
, there is usually no use for allKeys
.
One use case could be to count all keys to ensure that only the explicitly handled keys are present in the input. For this use case you only need the count of keys.
Another use case is when the keys are dynamic - in the sense that they are perhaps not fully known by the author of the Codable
type, but can be extended later on.
One such implementation can be seen with AttributedString
here:
The type of the CodingKey
used for this KeyedDecodingContainer
is AttributeKey
, and it is precisely 'dynamic' in the sense that it can represent any String
as it's key value, and the exact use cases are unknown to the implementation since the AttributedString
functionality contains an aspect of extensibility.
So what happens when encoding and decoding AttributeString
using JSONEncoder
and JSONDecoder
with snake case key coding strategies? As an example it fails to roundtrip text marked up with the .imageURL
property. This property is implemented using a key named NSImageURL
. This is encoded to "n_s_image_url"
and upon decoding this will look for a key named "NSImageUrl"
, which does not exist.
https://forums.swift.org/t/pre-pitch-roundtripping-key-coding-strategies/52777/4
Proposed Solution
The proposal is to introduce three changes. One of these is in the domain of Foundation
, so it is out of scope for discussion in this forum. I do, however, feel that it is necessary to understand the complete picture, and I think that we could limit discussion about it on the forums to be around: 'do you think that it would be a good idea to create a PR containing these changes to swift-corelibs-foundation
and of course then let the engineers at Apple decide on whether or not to accept the changes.
Here are the proposed changes:
-
Introduce an
allRawKeys: [CodingKey]
API onKeyedDecodingContainer
andKeyedDecodingContainerProtocol
.In order to not break backwards compatibility,
KeyedDecodingContainerProtocol
will supply a default implementation ofallRawKeys
that just returnsallKeys
, but authors of types conforming toKeyedDecodingContainerProtocol
are advised to implementallRawKeys
explicitly.Create a PR against
swift-corelibs-foundation
that adds support forallRawKeys
forJSONDecoder
. -
Add a static requirement to the
CodingKey
protocol:static var isPreformatted: Bool { get }
with a default implementation returningfalse
.Encoder
andDecoder
implementations that support some form of key coding strategy would be advised to implement opting out of key coding strategies forCodingKey
types that returntrue
forisPreformatted
.Create a PR againts
swift-corelibs-foundation
that respects theisPreformatted
property forJSONEncoder
andJSONDecoder
-
Create a PR against
swift-corelibs-foundation
that deprecatesJSONEncoder
'skeyEncodingStrategy
as well asJSONDecoder
skeyDecodingStrategy
and introduces a commonkeyCodingStrategy
that is a transformation in the direction from aCodingKey
to an encoded key.
How do these changes help?
For use cases where the coding keys are completely dynamic, any key coding strategy will have the possibility of transforming the keys into a shape that cannot be recognized upon decoding. In that situation it could be relevant to let the CodingKey
in question return true
for isPreformatted
in order to completely opt-out of having the keys transformed upon encoding and decoding.
In order to get a peek into the decoding process, or perhaps check the number of keys, the allRawKeys
API on KeyedDecodingContainer
could be a solution.
In order to have your synthesized CodingKeys
round trip correctly without any manual key mapping or knowledge about how keys are transformed during encoding, use a keyCodingStrategy
like useSnakeCase
.
Examples
Here is a repository demonstrating a version of JSONEncoder
and JSONDecoder
that deprecates keyEncodingStrategy
and keyDecodingStrategy
respectively and introduce a shared keyCodingStrategy
instead.
It also respects CodingKey
types that return true
for isPreformatted
(although this is made using a hack as I cannot demonstrate adding stuff to the Swift standard library)
https://github.com/mortenbekditlevsen/JSONCoder
Here are some of the included tests:
final class JSONEncoderTests: XCTestCase {
func testUseSnakeCase() throws {
struct Model: Codable {
var imageURL: String
}
let encoder = JSONEncoder()
encoder.keyCodingStrategy = .useSnakeCase
let data = try encoder.encode(Model(imageURL: "a"))
let expectedString = "{\"image_url\":\"a\"}"
XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)
let decoder = JSONDecoder()
decoder.keyCodingStrategy = .useSnakeCase
let model = try decoder.decode(Model.self, from: data)
XCTAssertEqual(model.imageURL, "a")
}
// NOTE: This only works in tests and is only included for
// illustrative purposes.
func testCustom() throws {
struct Model: Codable {
var imageURL: String
}
struct MyCodingKey: CodingKey {
var stringValue: String
var intValue: Int? { nil }
init(stringValue: String) {
self.stringValue = stringValue
}
init?(intValue: Int) {
self.stringValue = "\(intValue)"
}
}
let encoder = JSONEncoder()
encoder.keyCodingStrategy = .custom({ codingPath in
MyCodingKey(stringValue: "\(codingPath.last?.stringValue.hash ?? 0)")
})
let data = try encoder.encode(Model(imageURL: "a"))
let expectedString = "{\"3520785955319405054\":\"a\"}"
XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)
let decoder = JSONDecoder()
decoder.keyCodingStrategy = encoder.keyCodingStrategy
let model = try decoder.decode(Model.self, from: data)
XCTAssertEqual(model.imageURL, "a")
}
func testPreformattedKey() throws {
struct MyPreformattedCodingKey: CodingKey {
var stringValue: String
var intValue: Int? { nil }
init(stringValue: String) {
self.stringValue = stringValue
}
init?(intValue: Int) {
self.stringValue = "\(intValue)"
}
static var isPreformatted: Bool { true }
}
struct Model: Codable {
var imageURL: String
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: MyPreformattedCodingKey.self)
try container.encode(imageURL, forKey: .init(stringValue: "imageURL"))
}
init(imageURL: String) {
self.imageURL = imageURL
}
init(from decoder: Decoder) throws {
let container = try decoder.container(keyedBy: MyPreformattedCodingKey.self)
self.imageURL = try container.decode(String.self, forKey: .init(stringValue: "imageURL"))
}
}
let encoder = JSONEncoder()
encoder.keyCodingStrategy = .useSnakeCase
let data = try encoder.encode(Model(imageURL: "a"))
let expectedString = "{\"imageURL\":\"a\"}"
XCTAssertEqual(String(data: data, encoding: .utf8), expectedString)
let decoder = JSONDecoder()
decoder.keyCodingStrategy = .useSnakeCase
let model = try decoder.decode(Model.self, from: data)
XCTAssertEqual(model.imageURL, "a")
}
Detailed Design
Adding isPreformatted
to `CodingKey
The proposed solution adds a new static requirement to CodingKey
, isPreformatted
:
public protocol CodingKey: Sendable,
CustomStringConvertible,
CustomDebugStringConvertible {
...
/// Suggests to `Codable` encoders and decoders that no key encoding or
/// decoding ought to be performed on `CodingKey`s of this type.
@available(SwiftStdlib 5.6, *)
static var isPreformatted { get }
}
extension CodingKey {
...
@available(SwiftStdlib 5.6, *)
public static var isPreformatted { return false }
}
A branch with the implementation can be found here:
https://github.com/mortenbekditlevsen/swift/tree/codable_preformattedcodingkey
Handle isPreformatted
coding keys in JSONEncoder
private func _converted(_ key: Key) -> CodingKey {
// Use the plain key if it is preformatted
if Key.isPreformatted {
return key
}
switch encoder.options.keyEncodingStrategy {
case .useDefaultKeys:
return key
case .convertToSnakeCase:
let newKeyString = JSONEncoder.KeyEncodingStrategy._convertToSnakeCase(key.stringValue)
return _JSONKey(stringValue: newKeyString, intValue: key.intValue)
case .custom(let converter):
return converter(codingPath + [key])
}
}
Handle isPreformatted
coding keys in JSONDecoder
private struct _JSONKeyedDecodingContainer<K : CodingKey> : KeyedDecodingContainerProtocol, TestKeyedDecodingContainerProtocol {
...
/// Initializes `self` by referencing the given decoder and container.
init(referencing decoder: __JSONDecoder, wrapping container: [String : Any]) {
self.decoder = decoder
self.codingPath = decoder.codingPath
// Use the plain container if the keys are preformatted
guard !Key.isPreformatted else {
self.container = container
return
}
switch decoder.options.keyDecodingStrategy {
case .useDefaultKeys:
self.container = container
case .convertFromSnakeCase:
// Convert the snake case keys in the container to camel case.
// If we hit a duplicate key after conversion, then we'll use the first one we saw. Effectively an undefined behavior with JSON dictionaries.
self.container = Dictionary(container.map {
key, value in (JSONDecoder.KeyDecodingStrategy._convertFromSnakeCase(key), value)
}, uniquingKeysWith: { (first, _) in first })
case .custom(let converter):
self.container = Dictionary(container.map {
key, value in (converter(decoder.codingPath + [_JSONKey(stringValue: key, intValue: nil)]).stringValue, value)
}, uniquingKeysWith: { (first, _) in first })
}
}
A branch with support for respecting isPreformatted
for JSONEncoder
and JSONDecoder
can be found here:
https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_preformattedcodingkey
Handling allRawKeys
Introduce new API on KeyedDecodingContainerProtocol
/// All the raw keys the `Decoder` has for this container.
///
/// Different keyed containers from the same `Decoder` may return different
/// keys here. This should report all raw keys present in the
/// container without attempting to convert to the `Key` type of the container.
var allRawKeys: [CodingKey] { get }
Default implementation returning allKeys
public extension KeyedDecodingContainerProtocol {
// We need a default implementation in order to not break API
// This default implementation just returns the existing `allKeys`
var allRawKeys: [CodingKey] {
return allKeys
}
}
New API on the KeyedDecodingContainer
/// All the raw keys the `Decoder` has for this container.
///
/// Different keyed containers from the same `Decoder` may return different
/// keys here. This should report all raw keys present in the
/// container without attempting to convert to the `Key` type of the container.
public var allRawKeys: [CodingKey] {
return _box.allRawKeys
}
A branch with the implementation can be found here:
https://github.com/mortenbekditlevsen/swift/tree/keyeddecodingcontainer_allkeys
A branch with support for allRawKeys
for JSONDecoder
and PlistDecoder
can be found here:
https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_allrawkeys
Handling useSnakeCase
A branch with deprecated keyEncodingStrategy
and keyDecodingStrategy
and the introduction of keyCodingStrategy
can be found here:
https://github.com/mortenbekditlevsen/swift-corelibs-foundation/tree/codable_keycodingstrategy
The branch contains the following changes from the existing implementation:
- Introduce
keyCodingStrategy
onJSONEncoder
andJSONDecoder
in both Darwin Foundation overlay and swift-corelibs-foundation. - Deprecate
keyEncodingStrategy
andkeyDecodingStrategy
- Implement strategies. If a default
keyCodingStrategy
is used, there should be a fallback to the deprecated encoding and decoding strategies.
Impact on Existing Code
The allRawKeys
is additive, but with a default implementation, so all existing KeyedDecodingContainerProtocol
conformers will continue to compile, although it would be advisable to implement a specialized version.
There is also no direct impact for the isPreformatted
static requirement to the CodingKey
protocol, since the default implementation does not opt out of key coding strategies.
Note that returning true
for isPreformatted
for an existing CodingKey
will change it's encoding and decoding behavior, so that must be done with thoughts about how this interacts with current and future use of key coding strategies.
There will be deprecation warnings for existing keyEncodingStrategy
and keyDecodingStrategy
, but opting in to a keyCodingStrategy
can be done at the leasure of the user.
With regards to any current Decodable
conforming type that uses allKeys
from the KeyedDecodingContainer
upon decoding, I have demonstrated above that it is not reliable when used together with convertFromSnakeCase
and any similar transformation. Deprecating this API will let this fact be known and allow the author to take steps to using allRawKeys
or alternatively let the CodingKey
in use return true
for isPreformatted
.
Alternatives Considered
Using Dictionary
instead of isPreformatted
You can already today ensure that key coding will not be performed on your CodingKey
by leveraging the fact that keys in Dictionary
are treated as data and not as CodingKey
s.
This knowledge can be used to circumvent key coding strategies today. As can be seen in the following gist, the ergonomics are quite horrible, so returning true
for isPreformatted
on your CodingKey
seems like a great win.
Don't touch my keys:
https://gist.github.com/mortenbekditlevsen/7918fb98638f8a9e2b017f0fad12da0b
Graceful fallback for allKeys
Even though using useSnakeCase
for a JSONEncoder
, the allKeys
method on KeyedDecodingContainer
could still use the convertFromSnakeCase
transformation to recover the same keys as it does today when using the convertFromSnakeCase
key decoding strategy. There is, however, no obvious choice for the custom
case here, and I guess that in the long run it could easily cause more confusion than benefit.
Full allKeys
support for simple enum backed CodingKey
s
I did a small hack based on the great work by @stephencelis and @mbrandonw in their swift-case-paths library.
This hack basically allows you to generate an array of all cases of an enum without associated values. In other words, it queries the runtime to return information that is comparable to what the CaseIterable
conformance gives us at compile time.
Using that hack, the allKeys
implementation could iterate over all cases of your CodingKey
when that CodingKey
is an enum like the synthesized ones.
Having access to these cases allow you to perform the key coding transformation correctly and return a list of keys present in the KeyedDecodingContainer
.
A fallback, in case your CodingKey
is not a plain enum-backed version could be to attempt initializing the CodingKey
directly from the encoded key in the KeyedDecodingContainer
- or the fallback could even be to attempt the graceful fallback described in the section above.
While fun to play around with, this solution seems a bit strange, and as it mainly only fully works with plain enums, I don't consider it fit for actual use.
Adding a new protocol PreformattedCodingKey
to the Swift standard library
The initial revision of this pitch text suggested adding a new protocol, PreformattedCodingKey
to the standard library in place of the static requirement for isPreformatted
on CodingKey
in this revision.
The current solution was suggested by @hisekaldma, and I agree that it is a much nicer solution than adding a new protocol without requirements to the standard library.
Acknowledgements
A huge thanks to @norio_nomura for the original PR to include useSnakeCase
as a key coding strategy.
Thanks to @hisekaldma for suggesting adding a requirement to CodingKey
rather than introducing a new protocol.
Many thanks to everyone providing feedback on the pre pitch discussion.
Revision history
- Initial version
- Changed addition of a new protocol
PreformattedCodingKey
to being a static requirement on theCodingKey
protocol.