Let’s see… I need to page some of this back into memory.
Re _DictionaryCodingKey: the current version has existed in its current form since it was first introduced.
Why does _DictionaryCodingKey.init(stringValue:) attempt to parse the string value into an Int?
This is primarily for the benefit of decoding. Some formats (like JSON) only support String keys for dictionaries, which means that on encode, we have to convert Int keys to Strings. On decode, however, we have to try to go the other way:
- We create a
KeyedDecodingContainer keyed by a key type which attempts to parse Int keys
- We access
allKeys on that container, and if any key couldn’t be parsed as an Int, we throw an error
This does mean that there is a performance hit on encode, yes — and it’s absolutely worth considering improving performance by possibly splitting out encoding behavior from decoding behavior. Note that it is possible (though I think very unlikely) that something could rely on this on encode:
- Ostensibly, you can imagine an
Encoder/Decoder which would prefer to use Int keys where possible for efficiency (or as a requirement of the format — the opposite of something like JSON which requires String keys), and given a dictionary created with keys that could be all Ints, would prefer to convert them… But at this point, I think this is a pretty big stretch. I haven’t seen any Int-preferring encoders/formats in the wild that would benefit from this
- Potentially more usefully, you can imagine a migration scenario in which existing code which encoded
Int-keyed dictionaries has now been expanded to want to encode as String-keyed instead. This allows the old Int-keyed values to still be accessibly as if they were originally String-keyed and vice versa: introducing new data into an old version of an app. But, this is a bit of a strange use-case
Should _DictionaryCodingKey.init(intValue:) store the Int key as a String?
As alluded to above, I would say that many (if not most) formats do not support Int values as keys (e.g. JSON). Conversion to a String has to happen somewhere, and at the moment, the resulting value is easier to store. Could we benefit from recalculating the value every time (as opposed to converting the once and storing)? Potentially, but I think it might be difficult to measure.
The conversion has to happen at least once for every key on encode and decode, so doing it lazily doesn't inherently offer a benefit beyond storage. To see a real benefit to storing the resulting value as we do now, you’d need to access the key’s stringValue at least twice, and it’s hard to tell how often that might happen. It at least depends on how the Encoder/Decoder touches the keys, and how often user code accesses them either via allKeys or codingPath.
So does that mean we should convert to an enum? Also hard to say — in order to hit the case where you want to fit a _DictionaryCodingKey into an existential box, you’d need to want to use such a key. But, an EitherCodingKey type (i.e. either String or an Int) is less generally helpful as public API than, say, AnyCodingKey which takes (String, Int?) directly — in which case, you don’t benefit from the conversion to an enum anyway. We could expose both types of keys and let users choose, but it feels like muddying the waters a bit…
In all, my gut feeling is that the performance impact one way or another is likely negligible in the vast majority of use-cases, and I think we might be hard-pressed to see real benefits one way or another.
I think regarding both questions: it would be interesting to see some real-world measurements of performance to see how big of an impact either one makes (potentially-wasteful String → Int conversion on encode; unnecessary memory storage on encode/decode). With dynamic code like this, it can be really difficult to try to capture a wide swath of real-world scenarios, but it might be helpful if we can find something.