Encoding int-keyed dictionaries

I'm creating a custom encoder backed up by a custom encoding container, and want to distinguish the case of string-key based dictionary vs int-key based dictionary:

try! ["123" : 0].encode(to: MyEncoder()) // does one thing
try! [123 : 0].encode(to: MyEncoder())   // does another thing

In order to see whether the key is integer or a string my container is using a simple check to see if intValue is nil or non nil:

struct MyKeyedEncodingContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
    mutating func encode<T: Encodable>(_ value: T, forKey key: Key) throws {
        if let intKey = key.intValue {
            // make an int-key dictionary
        } else {
            // make a string-key dictionary
        }
        ...
}

However this doesn't work reliably as the key created with a string like "123" returns a non-nil intValue. The following minimal test demonstrates this:

struct MyEncoder: Encoder {
    var codingPath: [CodingKey] { fatalError() }
    var userInfo: [CodingUserInfoKey : Any] { fatalError() }
    
    func container<Key: CodingKey>(keyedBy keyType: Key.Type) -> KeyedEncodingContainer<Key> {
        print("keyType", keyType) // _DictionaryCodingKey
        print("abc key", keyType.init(stringValue: "abc")!.intValue) // nil – expected
        print("int key", keyType.init(stringValue: "123")!.intValue) // 123 - unexpected
        fatalError()
    }
    func unkeyedContainer() -> UnkeyedEncodingContainer { fatalError() }
    func singleValueContainer() -> SingleValueEncodingContainer { fatalError() }
}

It feels like a bug (or at least a misfeature) that keyType.init(stringValue: string)!.intValue could return non-nil result depending upon string.


However, short of fixing that mis-feature, is there some other check I could use to tell int from string keys?

Wait, the framework treats "123" as an integer? What if I really need a string key with the characters "123"? It’s the YAML Norway problem all over again.

…but at least it depends on the type of CodingKey, right?

3 Likes

intValue is an API available from NSString, no? Prob getting bridged to String here.

Prob would be better to attempt to first decode as one type, then the other. e.g. container.decode(Int.self) and catching any DecodingError.typeMismatch in order to attempt decoding a String

A few things here:

  1. In the general case, CodingKeys are not guaranteed to preserve how they were created (and for maximum flexibility, ideally would always offer both String and Int values) — so inspecting the presence of intValue isn't guaranteed to help you make any form of distinction

  2. _DictionaryCodingKey specifically will offer an intValue if the key was created with a numeric string to maximize compatibility for round-tripping: if you have [Int: Foo] and your target data format doesn't support Int keys, the encoder can use the keys' stringValue to encode [String: Foo] ; then at decode time, you can request [Int: Foo], the decoder can read the string keys, and offer you an [Int: Foo] back out

    • It's necessary to expose this via CodingKeys because so that you can request a KeyedDecodingContainer's allKeys property and get something meaningful back out
    • Because keys have no state for encoding vs. decoding, yes, it is then possible to create a key for encoding that has an intValue, and it would be up to the Encoder to choose appropriately between intValue and stringValue as needed
  3. Either way, at the level of checking CodingKey values for stringValue vs. intValue is already "too far down": you need to check for encode([String: Foo], forKey: ...) vs. encode([Int: Foo], forKey: ...) one level up, since you don't know at encode(Foo, forKey: ...) time that you're actually in the context of encoding a value that's part of a dictionary

  4. This isn't what's coming in to play here: CodingKey has an intValue property and the concern is that the coding key type that Dictionary uses allows String keys to be coerced into Int keys, making distinction between them at a too-low layer difficult

3 Likes

Thank you for the clarification and correction, Itai. I understand the issue now.

1 Like

Question: Would it be the case where all keys in a dictionary would be String or all being Int? Or would your solution need to handle keys of mixed types in the same dictionary?

Thanks for you answer!

However I do not quite understand how to fix it!

I am converting arbitrary encodable values into the closely matching dictionary/array counterparts. Given this input:

struct S: Codable {
    let intValues: [Int: Int]
    let stringValues: [String: Int]
}
struct Test: Codable {
    var items: [[String : S]]
}

let value = Test(items: [["item" : S(intValues: [1 : 1, 2: 2], stringValues: ["1" : 1, "2": 2])]])

I am getting this output:

stringDictionary(
    [
        "items": TheValue.array(
            [
                TheValue.stringDictionary(
                    [
                        "item": TheValue.stringDictionary(
                            [
                                "intValues": TheValue.stringDictionary(
                                    ["1": TheValue.int(1), "2": TheValue.int(2)]
                                ),
                                "stringValues": TheValue.stringDictionary(
                                    ["1": TheValue.int(1), "2": TheValue.int(2)]
                                )
                            ]
                        )
                    ]
                )
            ]
        )
    ]
)

Whilst obviously it should be this:

                                "intValues": TheValue.intDictionary(
                                    [1: TheValue.int(1), 2: TheValue.int(2)]
                                ),
                                "stringValues": TheValue.stringDictionary(
                                    ["1": TheValue.int(1), "2": TheValue.int(2)]
                                ),

How do I do this properly?

Attaching the whole test app
import Foundation
import Combine

enum TheValue {
    case null
    case bool(Bool)
    case string(String)
    case int(Int)
    case stringDictionary([String: TheValue])
    case intDictionary([Int: TheValue])
    case array([TheValue])
    
    var stringDictionary: [String: TheValue]? {
        switch self {
            case .stringDictionary(let value): value
            default: nil
        }
    }
    var intDictionary: [Int: TheValue]? {
        switch self {
            case .intDictionary(let value): value
            default: nil
        }
    }
    var array: [TheValue] {
        switch self {
            case .array(let value): value
            default: fatalError()
        }
    }
}

class Enc: Encoder {
    var value: TheValue = .null
    var codingPath: [CodingKey]
    var userInfo: [CodingUserInfoKey : Any] { fatalError() }
    
    init(codingPath: [CodingKey] = []) {
        self.codingPath = codingPath
    }
    func container<Key: CodingKey>(keyedBy keyType: Key.Type) -> KeyedEncodingContainer<Key> {
        value = .stringDictionary([:])
        let container = MyKeyedEncodingContainer<Key>(codingPath: codingPath, encoder: self)
        return KeyedEncodingContainer(container)
    }
    func unkeyedContainer() -> UnkeyedEncodingContainer {
        value = .array([])
        return MyUnkeyedEncodingContainer(encoder: self)
    }
    func singleValueContainer() -> SingleValueEncodingContainer {
        value = .null
        return MySingleValueEncodingContainer(encoder: self)
    }
}

struct MyKeyedEncodingContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
    let codingPath: [CodingKey]
    let encoder: Enc

    mutating func encodeNil(forKey key: Key) throws { fatalError() }
    mutating func encode<T: Encodable>(_ value: T, forKey key: Key) throws {
        var dictionary = encoder.value.stringDictionary ?? [:]
        encoder.value = .null
        try value.encode(to: encoder)
        dictionary[key.stringValue] = encoder.value
        encoder.value = .stringDictionary(dictionary)
    }
    mutating func nestedContainer<NestedKey: CodingKey>(keyedBy keyType: NestedKey.Type, forKey key: Key) -> KeyedEncodingContainer<NestedKey> { fatalError() }
    mutating func nestedUnkeyedContainer(forKey key: Key) -> UnkeyedEncodingContainer { fatalError() }
    mutating func superEncoder() -> Encoder { fatalError() }
    mutating func superEncoder(forKey key: Key) -> Encoder { fatalError() }
}

struct MyUnkeyedEncodingContainer: UnkeyedEncodingContainer {
    let encoder: Enc
    var codingPath: [any CodingKey] { fatalError() }
    var count: Int { fatalError() }
    mutating func encodeNil() throws { fatalError() }
    mutating func encode<T: Encodable>(_ value: T) throws {
        var array = encoder.value.array
        encoder.value = .null
        try value.encode(to: encoder)
        array.append(encoder.value)
        encoder.value = .array(array)
    }
    mutating func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type) -> KeyedEncodingContainer<NestedKey> where NestedKey : CodingKey { fatalError() }
    mutating func nestedUnkeyedContainer() -> any UnkeyedEncodingContainer { fatalError() }
    mutating func superEncoder() -> any Encoder { fatalError() }
}

struct MySingleValueEncodingContainer: SingleValueEncodingContainer {
    let encoder: Enc
    var codingPath: [CodingKey] { fatalError() }
    
    mutating func encodeNil() throws { encoder.value = .null }
    mutating func encode(_ value: Bool) throws { encoder.value = .bool(value) }
    mutating func encode(_ value: String) throws { encoder.value = .string(value) }
    mutating func encode(_ value: Int) throws { encoder.value = .int(value) }
    mutating func encode<T>(_ value: T) throws where T : Encodable {
        try value.encode(to: encoder)
    }
}

class MyTopLevelEncoder: TopLevelEncoder {
    func encode<T: Encodable>(_ value: T) throws -> TheValue {
        print(type(of: value))
        let enc = Enc()
        try value.encode(to: enc)
        return enc.value
    }
}

struct S: Codable {
    let intValues: [Int: Int]
    let stringValues: [String: Int]
}
struct Test: Codable {
    var items: [[String : S]]
}

let value = Test(items: [["item" : S(intValues: [1 : 1, 2: 2], stringValues: ["1" : 1, "2": 2])]])
let e = MyTopLevelEncoder()
let res = try! e.encode(value)
print(res)

Edit: the fatalError() above are for the paths that are not triggered in this minimal example (one of my approaches which I call "fatalError driven development").

To put my question slightly differently: consider I want to implement something like JSON that allows integer dictionary keys. I am not after the text form specifically, but that would be a great analogy. How would I implement that? Desired output in terms of this hypothetical JSON² of the above input example would be:

{
	"items": [
		{
			"item" : {
				"intValues": {1: 1, 2: 2},
				"stringValues": {"1" : 1, "2": 2} // these stay being strings!
			}
		}
	]
}

Briefly: the entry-point for customizing the encoding behavior of a specific type is in your implementations of

  • TopLevelEncoder.encode<T>(_:)
  • SingleValueEncodingContainer.encode<T>(_:),
  • UnkeyedEncodingContainer.encode<T>(_:), and
  • KeyedEncodingContainer.encode<T>(_:forKey:)

These are the calls in which someone has handed you a value and requested that you encode it, and it's your opportunity to inspect the value and decide what to do with it. If you call value.encode(to:), you are then deferring to the type itself to request a specific encoded representation — but you don't have to.

For example, when S.encode(to:) calls through to MyKeyedEncodingContainer.encode(intValues, forKey: .intValues), you could be inspecting the type of value to create an .intDictionary instead of a .stringDictionary. On the marked line here, you're calling Dictionary<Int, Int>.encode(to:) directly, instead of handing the Int keys:

struct MyKeyedEncodingContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
    func encode(_ value: some Encodable, forKey key: Key) {
        var dictionary = encoder.value.stringDictionary ?? [:]
        encoder.value = .null
/* --> */ try value.encode(to: encoder) /* <-- */
        dictionary[key.stringValue] = encoder.value
        encoder.value = .stringDictionary(dictionary)
    }
}

How this could look instead:

struct MyKeyedEncodingContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
    func encode(_ value: some Encodable, forKey key: Key) {
        if let intKeyedDictionary = value as? [Int: any Encodable] {
            var encoded = [Int: TheValue]()
            for (k, v) in intKeyedDictionary {
                encoder.value = null
                try v.encode(to: encoder)
                encoded[k] = encoder.value
            }

            encoder.value = .intDictionary(encoded)
        } else {
            // current impl
        }
    }
}

// Repeat for `MyUnkeyedEncodingContainer`, `MySingleValueEncodingContainer`

The above code still has a (hint: recursive) problem — see if you can spot it.

Answer

This code is still calling try v.encode(to: encoder) directly, so if you were to try to encode [Int: [Int: String]], the inner [Int: String] values would be encoded incorrectly — this would call Dictionary<Int, String>.encode(to:), which you don't want.

You need to make sure that all calls that could encode a value have a chance to check the type of the value and intercept it. One way to do this:

for (k, v) in intKeyedDictionary {
    encoder.value = null
    try SingleValueEncodingContainer(encoder: encoder, codingPath: ...).encode(v)
    encoded[k] = encoder.value
}

Since SingleValueEncodingContainer.encode<T>(_:) should also be handling [Int: ...] specially, you would correctly be intercepting recursive values this way.

1 Like

Thank you, will try that approach.

    func encode(_ value: some Encodable, forKey key: Key) {
        if let intKeyedDictionary = value as? [Int: any Encodable] {

Just to confirm: this line would take O(n) in both time and space?
Or is it O(1), e.g. we know that all keys are the same, we check the first and it's Int so no need to check them all, and as the value was some Encodable there's no need to convert it to any Encodable box?

I kept it short for simplicity, but I think that as written, this will construct a wholly new dictionary: the key type check should be "free", but I believe this will still box up the values.

When you get to the stage of optimizing this, it should in practice be better to use a protocol with a concrete conformance and offload to that:

private protocol MyIntKeyedDictionary {
    func encode(to encoder: Enc) throws
}

extension Dictionary: MyIntKeyedDictionary where Key == Int, Value: Encodable {
    func encode(to encoder: Enc) throws {
        var encoded = [Int: TheValue]()
        for (k, v) in self {
            encoder.value = .null
            try SingleValueEncodingContainer(encoder: encoder, codingPath: ...).encode(v)
            encoded[k] = encoder.value
        }

        encoder.value = .intDictionary(encoded)
    }
}

struct MyKeyedEncodingContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {
    func encode(_ value: some Encodable, forKey key: Key) {
        if let intKeyedDictionary = value as? MyIntKeyedDictionary {
            try intKeyedDictionary.encode(to: encoder)
        } else {
            // current impl
        }
    }
}

// etc.
1 Like

Great, will try that!


I have a tangentially related question. Not that this is frequently encountered, but still it looks somewhat odd that dictionaries with integer-but-not-int keys don't use keyed representation. What harm would it be if they were?

let v: [Int32: String] = [1 : "one", 2 : "two", 3: "three"]
let s = String(data: try! JSONEncoder().encode(v), encoding: .utf8)!
print(s)

outputs:

[1,"one",2,"two",3,"three"]

IOW, why integer types don't conform to CodingKeyRepresentable:

extension Int32: @retroactive CodingKeyRepresentable {
    public init?(codingKey: some CodingKey) {
        self = Int32(codingKey.intValue!)
    }
    public var codingKey: CodingKey {
        IntCodingKey(intValue: Int(self))!
    }
}

where:

struct IntCodingKey: CodingKey {
    var stringValue: String
    var intValue: Int?
    
    init(stringValue: String) {
        self.stringValue = stringValue
        self.intValue = Int(stringValue)
    }
    init?(intValue: Int) {
        self.intValue = intValue
        self.stringValue = "\(intValue)"
    }
}

which would make more standard looking dictionary containers out of the box:

{"2":"two","3":"three","1":"one"}

The primary risk is a backwards-incompatible behavior change for these types. When CodingKeyRepresentable was introduced, String- and Int-keyed dictionaries already behaved as they do now, but other integer-keyed dictionaries did not. Given their very niche usage, it's not clear the benefit would be worth that risk.