Dictionarys encoding strategy

cherrywoods · April 18, 2018, 8:23am

I don‘t quite know where to put my question, it‘s actually not a usage question, but an implementation reasoning question.

So Dictionaries encode as keyed containers, only if Keys are Strings or Ints. As soon as it comes to Int8 or UInt they are encoding as unkeyed container.

Why just String and Int? I think all types conforming to LosslessStringConvertible could be transformed into „lossless, unique” string keys, as it’s documentation states. Or am I getting the LosslessStringConertible protocol wrong?

If not, Implementation should be pretty simple:

Key.self is LosslessStringConvertible.Type does the ckeck,
(key as! LosslessStringConvertible).description returns a identifying string that can be used as key
(Key.self as! LosslessStringConvertible.Type).init(codingKey.stringValue) conertd from coding keys to dictionary keys. If the initalizer returns nil, it‘s a type missmatch.

Is this maybe for compatibility with older versions, where this casts didn‘t work? I can‘t imagine another good reason. I would be happy to know about them, if they exist.

itaiferber · April 18, 2018, 4:33pm

There are two answers to your questions here:

There are corner cases here around integer coding keys which don't always have good answers. The biggest issue to deal with is conversion between integer sizes:

It's always safe to promote an integer type smaller than Int to an Int, so it would be possible to encode those values as CodingKeys. On decode, though, you could get a CodingKey whose integer value does not fit in, say, an Int8. You could consider this to be a type mismatch, potentially, but it's one thing for an integer value to not map to a real CodingKey, and another for the integer value to be completely out of bounds. Given the API contract of CodingKeys, it's reasonable to assume that they should be able to use the entire range of Int values; it's a different matter to expect to decode an Int8 value from an unkeyed container and get back something that would require an Int64 or similar
A bigger problem is what happens for types like UInt — if a dictionary has UInt keys which all happen to fit in an Int, should it be encoded as a keyed container? And if one of the keys does not fit in an Int? There is an inconsistency in formats here based on the actual values of the keys, and not the types themselves

It is considerably simpler to define the behavior here around the CodingKeys contract itself: all keys have a String value, and potentially an Int value. Those types map losslessly into CodingKeys; otherwise, we do the safe thing of not making assumptions.

The casts would work, certainly, but the question is one of semantics. Just because a type conforms to LosslessStringConvertible, does it mean that it's safe/reasonable to convert it into a string key and back?

There are types for which the string representation might be significantly larger or more difficult to represent than the underlying Codable representation
Considering also the interaction with the above types: Int8 is LosslessStringConvertible, which means that in a format that differentiates between Int and String keys, encoding [1: "one", 2: "two"] as [Int: String] would encode as [1: "one", 2: "two"], while encoding the same value as [Int8: String] would encode it as ["1": "one", "2": "two"]; the differences here might be non-obvious if you don't know the original types

From a semantic standpoint, though — it's not necessarily clear what the intent is behind annotating a type with LosslessStringConvertible; just because a type is indeed convertible doesn't necessarily mean we should prioritize that format over their own Codable format.

None of these are insurmountable differences, by the way; it's possible to come up with semantics that make sense for all of these cases. It is, however, significantly simpler to express that "String- and Int-keyed dictionaries use CodingKeys; everything else uses the unkeyed format".