SE-0320: Coding of non String / Int keyed Dictionary into a KeyedContainer

I'm not sure if @TaskLocal is suitable; nor if this kind of scoped configuration is necessary.

If you wanted something like a DictionaryKeyEncodingStrategy, but for all encoders, Dictionary.encode(to:) could look for a specific CodingUserInfoKey.

Foundation's new @CodableConfiguration also looks interesting, but it seems to be only used by attributed strings.


Will there be default implementations for raw-value enums?

extension CodingKeyRepresentable
where Self: RawRepresentable, RawValue == String {
  public var codingKey: CodingKey {
    _AnyCodingKey(stringValue: rawValue)! // FIXME
  }
  public init?(codingKey: CodingKey) {
    self.init(rawValue: codingKey.stringValue)
  }
}

extension CodingKeyRepresentable
where Self: RawRepresentable, RawValue == Int {
  public var codingKey: CodingKey {
    _AnyCodingKey(intValue: rawValue)! // FIXME
  }
  public init?(codingKey: CodingKey) {
    guard let intValue = codingKey.intValue ??
            Int(codingKey.stringValue) else {
      return nil
    }
    self.init(rawValue: intValue)
  }
}

(FIXME: _AnyCodingKey and _DictionaryCodingKey can have non-failable initializers.)

Hi @benrimmington ,
Your default implementations seem like a perfect idea - as the end user still has to opt-in to being CodingKeyRepresentable - now it's just possible to do so without a lot of boilerplate. Really awesome!

This kind of defaulting (where Self: ...) is not a technique I have been using a lot myself, so I completely overlooked this possibility.

Thanks for bringing this up - I think that will be a natural part of the implementation, should the proposal be accepted! And as you say, the internal _AnyCodingKey can just have non-failable initializers instead of failable ones.

I'll have to look more into @CodableConfiguration - I haven't been able to find much documentation for the use cases it solves.

And you're right, the CodingUserInfoKey would be another way of opting in to a specific behavior, but this again places the configuration point at the use level (when initializing the encoders and decoders) rather than at the type level. There may also be a point that the 'user' part of the name CodingUserInfoKey suggests that this is something that can be used by the 'end user' and not something that the standard library should use? I'm not sure.

Thanks for your excellent feedback!

The default implementations for Encodable & Decodable enums are similar, except for some reason they are in extensions of RawRepresentable.

1 Like

Again, I am so thrilled about this suggestion as it takes the boilerplate out of all of my personal use cases for this proposal (types conforming to RawRepresentable, String and Int).

Could a similar default implementation be done for one of the protocols that Int8, Int16 etc conforms to? For instance FixedWidthInteger? I haven’t tried it out myself, but I assume that it would work like the defaulting you presented, @benrimmington

I think only RawValue == String and RawValue == Int should be supported by default.

  1. The same default behavior (e.g. for both UInt8 and RawValue == UInt8 keys) is easier to explain.

  2. The containers only support a limited set of primitive types, and intentionally don't offer generic APIs (using FixedWidthInteger, etc.)

  3. The integer-indexed feature seems to be underused (and untested?), so it might be difficult to support correctly.


If there won't be a public AnyCodingKey type, could the CodingKey protocol be updated, to allow easier conformance? For example, the init(intValue:) and var intValue requirements might have default implementations (that return nil).

Otherwise, AnyCodingKey could be an enum with stringValue(String) and intValue(Int) cases, to fit within the existential container's inline buffer.

_DictionaryCodingKey.init(stringValue:) attempts to parse an integer value from any string. Can it be changed to self.intValue = nil instead?

Hey Ben - thanks for the feedback. I definitely follow you on the FixedWidthInteger suggestion - it's not a use case that I have myself either, I just thought I would bring it up.

Good idea with defaults for CodingKey. Are protocol extensions containing defaults backwards compatible? Meaning that if I build an app using iOS 16+ and take advantage of this particular defaulting - will this run fine with a stdlib from iOS 15?

Do you mean in case AnyCodingKey were to be made a public type in the standard library? I'm afraid I don't know what it means to fit within the existential container's inline buffer. Does it have something to do with optimizing the representation of the type at runtime?

And as for the last question:

I'm afraid I don't have an answer to that. Could there be an encoder/decoder that took advantage of the presence of the intValue even though it was parsed from a String?

No, the default implementations would need @available — unless @_alwaysEmitIntoClient can be used instead?

(Library Evolution: blog post and documentation.)

Yes, the proposed codingKey property is an existential value (of any type conforming to the CodingKey protocol), so an existential container is returned.

If the size of the struct/enum is 3 words or less (i.e. 24 bytes on 64-bit) then it will be stored inline; otherwise an external ref-counted buffer will be allocated.

A struct holding both String and Int? is over the limit; an enum holding either String or Int is under the limit (on 64-bit):

struct StructCodingKey: CodingKey {
  let stringValue: String
  let intValue: Int?
}

enum EitherCodingKey: CodingKey {
  case stringValue(String)
  case intValue(Int)
}

MemoryLayout<CodingKey>.size        //-> 40 bytes (inline buffer, etc.)
MemoryLayout<StructCodingKey>.size  //-> 25 bytes
MemoryLayout<EitherCodingKey>.size  //-> 17 bytes
MemoryLayout<String>.size           //-> 16 bytes
MemoryLayout<Int?>.size             //->  9 bytes
MemoryLayout<Int>.size              //->  8 bytes

The proposed initializer could be generic, to avoid the overhead of an existential value.


Should the initializer and property be requirements of separate protocols?

public protocol DecodableKey: Decodable {
  init?<T: CodingKey>(decodingKey: T)
  // or
  init?(decodingKey: EitherCodingKey)
}

public protocol EncodableKey: Encodable {
  var encodingKey: Any & CodingKey { get }
  // or
  var encodingKey: EitherCodingKey { get }
}

A type that's only concerned with decoding wouldn't need to provide its own coding key representation.

Cool, yes I thought that was what you were describing, I just wasn't 100% certain. It sounds like it could be a quite big performance improvement when encoding and decoding dictionaries.

So I just need to digest the suggestions:
Do you suggest that an enum like EitherCodingKey should replace the existing _DictionaryCodingKey? For that I do not know the answer, as it's similar to whether or not it's ok to nil out the intValue instead of trying to parse the String to an Int in the existing implementation. I don't know if there's a use case for preserving both representations that I am failing to see.


Really nice trick with the generic initializer - that basically prevents an indirection at runtime, right? Is there some annotation regarding generic specialization that should be added too? Akin to this: https://github.com/apple/swift/blob/main/docs/Generics.rst#specialization


Regarding whether or not this should be modelled as one or two protocols:
If it's a question of 'ergonomics' for the end user, then I don't think it's 'worth' the cost of explaining why two protocols are needed vs. just implementing conformance to the one key.

My intuition is that the types that conform to CodingKeyRepresentable must be very plain types and most likely always wrappers for String or Int somehow - and as such it would be no trouble to add conformance to 'both directions' of coding in one go - even though you may in fact only be using the system for say encoding.

If there's some other reason for splitting it up than ergonomics, like performance, somehow, then I'm all for it.

Hi @jrose,
Just returning to your comment with an extra question:

Might it be perceived as 'polluting' of an Int and a String to have a .codingKey property? And also the extra initializer?
Or do you think that it's quite uncontroversial to add this extra conformance?

There has been a fair amount of discussion around that lately, hasn’t there. :-/ I think it’s a drawback of our model, sure, but personally I think the cost of not being able to use String in a CodingKeyRepresentable context is worse than having some additional APIs that aren’t applicable in a concrete one. We should continue discussing how to improve that situation, sure, but it’s a general problem and to me it deserves a general solution, not one-off compromises with each new protocol.

3 Likes

Thanks @jrose,
I've updated the implementation PR to add conformance for String and Int.

I have a few unanswered questions left that I would very much like comments on from @Tony_Parker and @itaiferber , if at all possible:

The first is about the suggestion from @benrimmington that the current _DictionaryCodingKey be changed to an enum carrying either a String or an Int. The argument is that the enum would fit into the size of the existential container and thus bring some performance to encoding and decoding of Dictionary values.

The current implementation tries to parse the stringValue to an Int in the initializer, which the enum version would not do. Furthermore - the 'stringValue' must always be present - and this would have to be a calculated property that interpolates the Int value again and again in case the enumeration holds an Int.

So the questions are: Do you know of any use cases where the parsing of the string to an int value is necessary/beneficial? Might it break anything to change this? In case the coding key has an Int value - do you think that this Int value would likely be used first naturally - or might we see a whole lot of interpolating that into a String?

A final question: If we change the _DictionaryCodingKey to being an enum - is it worth exposing this implementation publicly for others that may want to conform a type to CodingKeyRepresentable?
Note here, that I added the default conformance for RawRepresentable, String and Int to the implementation PR, so for all types that are already RawRepresentable, String and Int, you can already add conformance to CodingKeyRepresentable without any boilerplate at all. This may decrease the number of use cases for making the internal coding key implementation public.

Thanks! :-)

I'd also like to see more feedback, although the two-week review has ended now.

I think it's a question of convenience versus correctness. The parsing has to happen somewhere, in order to round-trip a Dictionary.Key == Int through a string-indexed container (e.g. JSON object). However, for a Dictionary.Key == String with BigInt or Int128 strings, should the _DictionaryCodingKey be offering maybe-nil intValue keys? Or should the parsing happen later, during decoding.


In the current proposal, without an associated type, a key with unrelated string and integer values may not round-trip:

extension MyCodableType: CodingKeyRepresentable {

  var codingKey: Any & CodingKey {
    // This example uses a compiler-synthesized coding key.
    enum MyCodingKeys: Int, CodingKey { case answer = 42 }
    return MyCodingKeys.answer
    //-> MyCodingKeys(stringValue: "answer", intValue: 42)
  }

  init?<T: CodingKey>(codingKey: T) {
    print(codingKey)
    // Prints one of the following:
    //-> _DictionaryCodingKey(stringValue: "answer", intValue: nil)
    //-> _DictionaryCodingKey(stringValue: "42", intValue: 42)
    return nil
  }
}
1 Like

Let’s see… I need to page some of this back into memory.

Re _DictionaryCodingKey: the current version has existed in its current form since it was first introduced.


Why does _DictionaryCodingKey.init(stringValue:) attempt to parse the string value into an Int?

This is primarily for the benefit of decoding. Some formats (like JSON) only support String keys for dictionaries, which means that on encode, we have to convert Int keys to Strings. On decode, however, we have to try to go the other way:

  1. We create a KeyedDecodingContainer keyed by a key type which attempts to parse Int keys
  2. We access allKeys on that container, and if any key couldn’t be parsed as an Int, we throw an error

This does mean that there is a performance hit on encode, yes — and it’s absolutely worth considering improving performance by possibly splitting out encoding behavior from decoding behavior. Note that it is possible (though I think very unlikely) that something could rely on this on encode:

  • Ostensibly, you can imagine an Encoder/Decoder which would prefer to use Int keys where possible for efficiency (or as a requirement of the format — the opposite of something like JSON which requires String keys), and given a dictionary created with keys that could be all Ints, would prefer to convert them… But at this point, I think this is a pretty big stretch. I haven’t seen any Int-preferring encoders/formats in the wild that would benefit from this
  • Potentially more usefully, you can imagine a migration scenario in which existing code which encoded Int-keyed dictionaries has now been expanded to want to encode as String-keyed instead. This allows the old Int-keyed values to still be accessibly as if they were originally String-keyed and vice versa: introducing new data into an old version of an app. But, this is a bit of a strange use-case

Should _DictionaryCodingKey.init(intValue:) store the Int key as a String?

As alluded to above, I would say that many (if not most) formats do not support Int values as keys (e.g. JSON). Conversion to a String has to happen somewhere, and at the moment, the resulting value is easier to store. Could we benefit from recalculating the value every time (as opposed to converting the once and storing)? Potentially, but I think it might be difficult to measure.

The conversion has to happen at least once for every key on encode and decode, so doing it lazily doesn't inherently offer a benefit beyond storage. To see a real benefit to storing the resulting value as we do now, you’d need to access the key’s stringValue at least twice, and it’s hard to tell how often that might happen. It at least depends on how the Encoder/Decoder touches the keys, and how often user code accesses them either via allKeys or codingPath.

So does that mean we should convert to an enum? Also hard to say — in order to hit the case where you want to fit a _DictionaryCodingKey into an existential box, you’d need to want to use such a key. But, an EitherCodingKey type (i.e. either String or an Int) is less generally helpful as public API than, say, AnyCodingKey which takes (String, Int?) directly — in which case, you don’t benefit from the conversion to an enum anyway. We could expose both types of keys and let users choose, but it feels like muddying the waters a bit…

In all, my gut feeling is that the performance impact one way or another is likely negligible in the vast majority of use-cases, and I think we might be hard-pressed to see real benefits one way or another.


I think regarding both questions: it would be interesting to see some real-world measurements of performance to see how big of an impact either one makes (potentially-wasteful StringInt conversion on encode; unnecessary memory storage on encode/decode). With dynamic code like this, it can be really difficult to try to capture a wide swath of real-world scenarios, but it might be helpful if we can find something.

2 Likes

I have thought of one use case for Int-preferred keys - a CSV coder. It could also support string keys if the format has a header row, but if it doesn't then the only way to index into the fields is by position on each line.

3 Likes

Thank you so much for your comments, @itaiferber , @Tony_Parker and @benrimmington !

I think that it would probably be best not to touch _DictionaryCodingKey - and also to not expose any public CodingKey implementation.

You mention a non-roundtripping CodingKey implementation, @benrimmington :

First of all, I thought I understood enum CodingKey conformance. I thought it was all done through the RawRepresentable conformance, but as I can gather from your comment and from the example there's code synthesis involved. I did not know that. :-)

Secondly: Is this something that should be fixed? Or might this behavior be expected in case you craft a CodingKey so cunningly? Isn't it you own 'responsibility' that a CodingKey round-trips? I mean, it's easy to create CodingKeys that do not:

struct Test: CodingKey {
    var stringValue: String = "hello"
    init?(stringValue: String) {
        return nil
    }
    var intValue: Int? { return 23 }
    init?(intValue: Int) {
        return nil
    }
}

Hi @benrimmington ,
Would you happen to have any feedback for my latest question? :slight_smile:

I've already posted too much in this review, but it seems that a similar example was previously discussed in the pitch thread. (The feedback was that most types won't have this issue, so an associated type isn't worthwhile.)

1 Like

Thanks you so much, @benrimmington .

This review for SE-0320 has concluded and the proposal was returned for revision.

Thank you to everyone for the feedback and contributions to this proposal.

1 Like