RFC: Can this `Codable` bug still be fixed?


(Adrian Zubarev) #1

When Decodable and other archiving protocols were introduced there was a nasty oversight that persists to this day. Some nested dictionaries will encode in a weird non-convenient array format and also require the same format to decode correctly (natural nested dictionaries fail to decode with custom keys).

In the next example json_2 represents the JSON with the mentioned strange format (if you'd encode SomeType it probably will look like this).

enum Key : String, CodingKey, Decodable {
  case a
}

struct Inner : Decodable {
  var test: String
}

struct SomeType<T> : Decodable where T : Decodable {

  enum CodingKey : String, Swift.CodingKey {
    case something
  }

  let storage: T

  init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKey.self)
    storage = try container.decode(T.self, forKey: .something)
  }
}

func decode<T>(_ value: String, _: T.Type) where T : Decodable {
  let decoder = JSONDecoder()
  guard
    let data = value.data(using: .utf8)
  else { return }
    do {
      let test = try decoder.decode(SomeType<T>.self, from: data)
      print(test)
    } catch {
      print(error)
    }
}

let json_1: String = """
{
  "something": {
    "a": {
      "0001": {
        "test": "Hello"
      },
      "0002": {
        "test": "Swift"
      }
    }
  }
}
"""

let json_2: String = """
{
  "something": [
    "a", {
      "0001": {
        "test": "Hello"
      },
      "0002": {
        "test": "Swift"
      }
    }
  ]
}
"""

//typeMismatch(
//  Swift.Array<Any>,
//  Swift.DecodingError.Context(
//    codingPath: [CodingKey(stringValue: "something", intValue: nil)],
//    debugDescription:
//      "Expected to decode Array<Any> but found a dictionary instead.",
//    underlyingError: nil
//  )
//)
decode(json_1, [Key: [String: Inner]].self)

// The following works
decode(json_1, [String: [String: Inner]].self)

//SomeType(
//  storage: [
//    Key(stringValue: "a", intValue: nil): [
//      "0002": Inner(test: "Swift"),
//      "0001": Inner(test: "Hello")
//    ]
//  ]
//)
decode(json_2, [Key: [String: Inner]].self)

To fix the issue in my codebase I had to create a wrapper type over dictionaries to manually fix the encoding and decoding issue.

struct DictionaryCoder<Key, Value> where Key : Hashable {
  let dictionary: [Key: Value]
  init(_ dictionary: [Key: Value]) {
    self.dictionary = dictionary
  }
}

extension DictionaryCoder : Decodable where Key: CodingKey, Value: Decodable {
  init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: Key.self)
    let values = try container.allKeys.map {
      ($0, try container.decode(Value.self, forKey: $0))
    }
    dictionary = Dictionary(uniqueKeysWithValues: values)
  }
}

extension DictionaryCoder : Encodable where Key: CodingKey, Value: Encodable {
  func encode(to encoder: Encoder) throws {
    var container = encoder.container(keyedBy: Key.self)
    try dictionary.forEach {
      try container.encode($0.value, forKey: $0.key)
    }
  }
}

decode(json_1, DictionaryCoder<Key, [String: Inner]>.self)

However I would wish for a general solution.


(Jordan Rose) #2

This was a very deliberate decision, since it's unclear what stringification rules to use for non-string types. (Especially for types that don't conform to LosslessStringConvertible.)

This could probably be handled on an encoder-by-encoder basis as a nonStringKeyEncodingStrategy or similar, probably with LosslessStringConvertible and/or RawRepresentable-with-String-RawValue serving as the backing behavior. As noted, though, for backwards-compatibility with existing archives we'd need to keep the default behavior the same as it is today. It's also a little unfortunate that this would get replicated across JSONEncoder and PlistEncoder, but not every encoder would necessarily make the same decisions (or have the same restrictions).

cc @itaiferber


(Itai Ferber) #3

Indeed, as @jrose notes, this is neither an oversight, nor a bug, but the intended behavior. See, for instance, SR-9023 for discussion about encoding e.g. [UUID : String]. As an example, let's use UUID.

UUID can choose to encode as anything that it wants (and chooses to encode as a String in the common case), but Dictionary has no knowledge of that, especially not prior to actually encoding the value. In order to encode its contents, Dictionary must choose a container, and at that, it has to choose between something keyed and something unkeyed. Keyed containers can only accept CodingKeys as keys, and CodingKeys can only be formed from Strings and Ints; you cannot form a CodingKey from a UUID without stringifying the UUID first, and Dictionary has no knowledge of UUID specifically to be able to do that.

As such, the only thing that Dictionary can fall back to doing is encoding key-value pairs into an unkeyed container.

This is true for all types which are not Int or String as far as Dictionary is concerned — if it can't go into a CodingKey, it can't be used as a native dictionary key.


Note that we have SR-7788 tracking the implicit conversion of things which are RawRepresentable as String or Int [this is currently not being done], and you can imagine something similar for things which are LosslessStringConvertible. However, we would have to be very careful about backwards compatibility here, as something as innocuous as SR-1858 UUID should conform to RawRepresentable would suddenly break backwards compatibility for everyone.

So, as usual, in cases of ambiguity like this, the right solution is to indicate exactly what you want by writing the code: if you expect the UUID to be encoded as a String, you will have to perform that conversion.

One thing we can do to at least make this easier, for instance, is offer an adaptor type which takes a strategy for converting from arbitrary types to CodingKeys for use as dictionary keys, and you can use that rather than having to write it yourself.


Note also that Encoders and Decoders may choose to intercept specific types and encode them in a different format (e.g. JSONEncoder/JSONDecoder encode URLs as strings rather than dictionaries); Dictionary can't make any static assumptions about arbitrary types, since the specific Encoder/Decoder might have a different representation in mind.