SE-0320: Coding of non String / Int keyed Dictionary into a KeyedContainer

tomerd · August 4, 2021, 4:40pm

The review of SE-0320: Coding of non String / Int keyed Dictionary into a KeyedContainer, begins now and runs through Aug 18, 2021.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager or direct message in the Swift forums).

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for helping improve the Swift programming language and ecosystem.

Tom Doron
Review Manager

jrose · August 5, 2021, 2:58am

A small question: the proposal says “no standard library types will conform at this time”, but is it worth adding conformances for Int and String? I think JSONEncoder would still test for those explicitly (it’s the common case and more efficient and also necessary for compatibility because of bridging…) but it’d potentially be useful elsewhere.

Morten_Bek_Ditlevsen · August 5, 2021, 10:46am

Hi @jrose ,
It will of course be hard to add that conformance later on, so that could be done.
Could you help me imagining a future scenario where this might be useful - or is it rather a case of it being natural that Int and String are CodingKeyRepresentable?

jrose · August 5, 2021, 3:10pm

I was thinking of some hypothetical case where someone is making a wrapper function, or statically limiting a generic argument to CodingKeyRepresentable types because they want to, well, use it as a key. It makes sense for JSONEncoder to write three copies of this logic, but others in similar situations in the future shouldn’t have to (and in the type case probably can’t without giving up compile-time checking altogether).

benrimmington · August 6, 2021, 6:54am

Could a task-local flag and property wrapper be used for opting in?

public enum DictionaryTaskLocals {
  @TaskLocal
  public static var shouldUseKeyedContainer: Bool = false

  @propertyWrapper
  public struct ShouldUseKeyedContainer<Wrapped> {
    public var wrappedValue: Wrapped
    public init(wrappedValue: Wrapped) {
      self.wrappedValue = wrappedValue
    }
  }
}

They would only be available on OS versions that support the new behavior.

enum MyEnum: String, Codable { case one, two }

struct MyStruct: Codable {
  @DictionaryTaskLocals.ShouldUseKeyedContainer
  var array: [MyStruct]

  @DictionaryTaskLocals.ShouldUseKeyedContainer
  var dictionary: [MyEnum: MyStruct]
}

The property wrapper would bind to the task-local flag, and then forward calls to the dictionary.

extension DictionaryTaskLocals.ShouldUseKeyedContainer: Decodable
where Wrapped: Decodable {
  public init(from decoder: Decoder) throws {
    wrappedValue =
    try DictionaryTaskLocals.$shouldUseKeyedContainer.withValue(true) {
      try .init(from: decoder)
    }
  }
}

extension DictionaryTaskLocals.ShouldUseKeyedContainer: Encodable
where Wrapped: Encodable {
  public func encode(to encoder: Encoder) throws {
    try DictionaryTaskLocals.$shouldUseKeyedContainer.withValue(true) {
      try wrappedValue.encode(to: encoder)
    }
  }
}

The dictionary would only attempt the new behavior if the task-local flag is set.
It would fallback to the old behavior when a keyed container isn't available.

(EDIT: Generalized the property wrapper.)

Morten_Bek_Ditlevsen · August 9, 2021, 6:06am

Hi @benrimmington ,
Thanks for the suggestion - and sorry for the late reply. I just had to digest how the task-locals work.

So this requires the author of MyStruct to remember to add the property wrapper locally. I do consider that to be a workaround similarly to the CodableKey property wrapper mentioned in the proposal.

A goal of the proposal is to be able to add the dictionary encoding/decoding behavior to the key type, so that you will not need to remember other decorations of the code at the use point (MyStruct).

Your solution gives an extra configuration point (the task local value), which I guess can be thought of as both good and bad. Ultimately I think that it would make the API harder to use since you need to remember both the local decoration using the property wrapper - and you need to set the task local value.

I did find your suggestion educational and inspirational. Are there any existing cases of using task local values for configuration like you suggest?

Please let me know if you think I missed anything in interpreting your suggestion. :-)

benrimmington · August 14, 2021, 2:23am

I'm not sure if @TaskLocal is suitable; nor if this kind of scoped configuration is necessary.

If you wanted something like a DictionaryKeyEncodingStrategy, but for all encoders, Dictionary.encode(to:) could look for a specific CodingUserInfoKey.

Foundation's new @CodableConfiguration also looks interesting, but it seems to be only used by attributed strings.

Will there be default implementations for raw-value enums?

extension CodingKeyRepresentable
where Self: RawRepresentable, RawValue == String {
  public var codingKey: CodingKey {
    _AnyCodingKey(stringValue: rawValue)! // FIXME
  }
  public init?(codingKey: CodingKey) {
    self.init(rawValue: codingKey.stringValue)
  }
}

extension CodingKeyRepresentable
where Self: RawRepresentable, RawValue == Int {
  public var codingKey: CodingKey {
    _AnyCodingKey(intValue: rawValue)! // FIXME
  }
  public init?(codingKey: CodingKey) {
    guard let intValue = codingKey.intValue ??
            Int(codingKey.stringValue) else {
      return nil
    }
    self.init(rawValue: intValue)
  }
}

(FIXME: _AnyCodingKey and _DictionaryCodingKey can have non-failable initializers.)

Morten_Bek_Ditlevsen · August 14, 2021, 11:02am

Hi @benrimmington ,
Your default implementations seem like a perfect idea - as the end user still has to opt-in to being CodingKeyRepresentable - now it's just possible to do so without a lot of boilerplate. Really awesome!

This kind of defaulting (where Self: ...) is not a technique I have been using a lot myself, so I completely overlooked this possibility.

Thanks for bringing this up - I think that will be a natural part of the implementation, should the proposal be accepted! And as you say, the internal _AnyCodingKey can just have non-failable initializers instead of failable ones.

I'll have to look more into @CodableConfiguration - I haven't been able to find much documentation for the use cases it solves.

And you're right, the CodingUserInfoKey would be another way of opting in to a specific behavior, but this again places the configuration point at the use level (when initializing the encoders and decoders) rather than at the type level. There may also be a point that the 'user' part of the name CodingUserInfoKey suggests that this is something that can be used by the 'end user' and not something that the standard library should use? I'm not sure.

Thanks for your excellent feedback!

benrimmington · August 14, 2021, 12:23pm

The default implementations for Encodable & Decodable enums are similar, except for some reason they are in extensions of RawRepresentable.

github.com

apple/swift/blob/392ba002c88346bc8d5cf8e19cb233d17794e844/stdlib/public/core/Codable.swift#L4577-L4612


      
          extension RawRepresentable where RawValue == String, Self: Encodable {
            /// Encodes this value into the given encoder, when the type's `RawValue`
            /// is `String`.
            ///
            /// This function throws an error if any values are invalid for the given
            /// encoder's format.
            ///
            /// - Parameter encoder: The encoder to write data to.
            public func encode(to encoder: Encoder) throws {
              var container = encoder.singleValueContainer()
              try container.encode(self.rawValue)
            }
          }
          
          extension RawRepresentable where RawValue == String, Self: Decodable {
            /// Creates a new instance by decoding from the given decoder, when the
            /// type's `RawValue` is `String`.
            ///
            /// This initializer throws an error if reading from the decoder fails, or
            /// if the data read is corrupted or otherwise invalid.

This file has been truncated. show original

Morten_Bek_Ditlevsen · August 14, 2021, 5:34pm

Again, I am so thrilled about this suggestion as it takes the boilerplate out of all of my personal use cases for this proposal (types conforming to RawRepresentable, String and Int).

Could a similar default implementation be done for one of the protocols that Int8, Int16 etc conforms to? For instance FixedWidthInteger? I haven’t tried it out myself, but I assume that it would work like the defaulting you presented, @benrimmington

benrimmington · August 15, 2021, 9:15pm

I think only RawValue == String and RawValue == Int should be supported by default.

The same default behavior (e.g. for both UInt8 and RawValue == UInt8 keys) is easier to explain.
The containers only support a limited set of primitive types, and intentionally don't offer generic APIs (using FixedWidthInteger, etc.)
The integer-indexed feature seems to be underused (and untested?), so it might be difficult to support correctly.

If there won't be a public AnyCodingKey type, could the CodingKey protocol be updated, to allow easier conformance? For example, the init(intValue:) and var intValue requirements might have default implementations (that return nil).

Otherwise, AnyCodingKey could be an enum with stringValue(String) and intValue(Int) cases, to fit within the existential container's inline buffer.

_DictionaryCodingKey.init(stringValue:) attempts to parse an integer value from any string. Can it be changed to self.intValue = nil instead?

Morten_Bek_Ditlevsen · August 16, 2021, 6:32am

Hey Ben - thanks for the feedback. I definitely follow you on the FixedWidthInteger suggestion - it's not a use case that I have myself either, I just thought I would bring it up.

Good idea with defaults for CodingKey. Are protocol extensions containing defaults backwards compatible? Meaning that if I build an app using iOS 16+ and take advantage of this particular defaulting - will this run fine with a stdlib from iOS 15?

Do you mean in case AnyCodingKey were to be made a public type in the standard library? I'm afraid I don't know what it means to fit within the existential container's inline buffer. Does it have something to do with optimizing the representation of the type at runtime?

And as for the last question:

I'm afraid I don't have an answer to that. Could there be an encoder/decoder that took advantage of the presence of the intValue even though it was parsed from a String?

benrimmington · August 16, 2021, 1:44pm

No, the default implementations would need @available — unless @_alwaysEmitIntoClient can be used instead?

(Library Evolution: blog post and documentation.)

Yes, the proposed codingKey property is an existential value (of any type conforming to the CodingKey protocol), so an existential container is returned.

If the size of the struct/enum is 3 words or less (i.e. 24 bytes on 64-bit) then it will be stored inline; otherwise an external ref-counted buffer will be allocated.

A struct holding both String and Int? is over the limit; an enum holding either String or Int is under the limit (on 64-bit):

struct StructCodingKey: CodingKey {
  let stringValue: String
  let intValue: Int?
}

enum EitherCodingKey: CodingKey {
  case stringValue(String)
  case intValue(Int)
}

MemoryLayout<CodingKey>.size        //-> 40 bytes (inline buffer, etc.)
MemoryLayout<StructCodingKey>.size  //-> 25 bytes
MemoryLayout<EitherCodingKey>.size  //-> 17 bytes
MemoryLayout<String>.size           //-> 16 bytes
MemoryLayout<Int?>.size             //->  9 bytes
MemoryLayout<Int>.size              //->  8 bytes

The proposed initializer could be generic, to avoid the overhead of an existential value.

Should the initializer and property be requirements of separate protocols?

public protocol DecodableKey: Decodable {
  init?<T: CodingKey>(decodingKey: T)
  // or
  init?(decodingKey: EitherCodingKey)
}

public protocol EncodableKey: Encodable {
  var encodingKey: Any & CodingKey { get }
  // or
  var encodingKey: EitherCodingKey { get }
}

A type that's only concerned with decoding wouldn't need to provide its own coding key representation.

Morten_Bek_Ditlevsen · August 17, 2021, 6:41am

Cool, yes I thought that was what you were describing, I just wasn't 100% certain. It sounds like it could be a quite big performance improvement when encoding and decoding dictionaries.

So I just need to digest the suggestions:
Do you suggest that an enum like EitherCodingKey should replace the existing _DictionaryCodingKey? For that I do not know the answer, as it's similar to whether or not it's ok to nil out the intValue instead of trying to parse the String to an Int in the existing implementation. I don't know if there's a use case for preserving both representations that I am failing to see.

Really nice trick with the generic initializer - that basically prevents an indirection at runtime, right? Is there some annotation regarding generic specialization that should be added too? Akin to this: https://github.com/apple/swift/blob/main/docs/Generics.rst#specialization

Regarding whether or not this should be modelled as one or two protocols:
If it's a question of 'ergonomics' for the end user, then I don't think it's 'worth' the cost of explaining why two protocols are needed vs. just implementing conformance to the one key.

My intuition is that the types that conform to CodingKeyRepresentable must be very plain types and most likely always wrappers for String or Int somehow - and as such it would be no trouble to add conformance to 'both directions' of coding in one go - even though you may in fact only be using the system for say encoding.

If there's some other reason for splitting it up than ergonomics, like performance, somehow, then I'm all for it.

Morten_Bek_Ditlevsen · August 17, 2021, 10:24am

Hi @jrose,
Just returning to your comment with an extra question:

Might it be perceived as 'polluting' of an Int and a String to have a .codingKey property? And also the extra initializer?
Or do you think that it's quite uncontroversial to add this extra conformance?

jrose · August 17, 2021, 4:24pm

There has been a fair amount of discussion around that lately, hasn’t there. :-/ I think it’s a drawback of our model, sure, but personally I think the cost of not being able to use String in a CodingKeyRepresentable context is worse than having some additional APIs that aren’t applicable in a concrete one. We should continue discussing how to improve that situation, sure, but it’s a general problem and to me it deserves a general solution, not one-off compromises with each new protocol.

Morten_Bek_Ditlevsen · August 19, 2021, 6:56am

Thanks @jrose,
I've updated the implementation PR to add conformance for String and Int.

I have a few unanswered questions left that I would very much like comments on from @Tony_Parker and @itaiferber , if at all possible:

The first is about the suggestion from @benrimmington that the current _DictionaryCodingKey be changed to an enum carrying either a String or an Int. The argument is that the enum would fit into the size of the existential container and thus bring some performance to encoding and decoding of Dictionary values.

The current implementation tries to parse the stringValue to an Int in the initializer, which the enum version would not do. Furthermore - the 'stringValue' must always be present - and this would have to be a calculated property that interpolates the Int value again and again in case the enumeration holds an Int.

So the questions are: Do you know of any use cases where the parsing of the string to an int value is necessary/beneficial? Might it break anything to change this? In case the coding key has an Int value - do you think that this Int value would likely be used first naturally - or might we see a whole lot of interpolating that into a String?

A final question: If we change the _DictionaryCodingKey to being an enum - is it worth exposing this implementation publicly for others that may want to conform a type to CodingKeyRepresentable?
Note here, that I added the default conformance for RawRepresentable, String and Int to the implementation PR, so for all types that are already RawRepresentable, String and Int, you can already add conformance to CodingKeyRepresentable without any boilerplate at all. This may decrease the number of use cases for making the internal coding key implementation public.

Thanks! :-)

benrimmington · August 19, 2021, 12:22pm

I'd also like to see more feedback, although the two-week review has ended now.

I think it's a question of convenience versus correctness. The parsing has to happen somewhere, in order to round-trip a Dictionary.Key == Int through a string-indexed container (e.g. JSON object). However, for a Dictionary.Key == String with BigInt or Int128 strings, should the _DictionaryCodingKey be offering maybe-nil intValue keys? Or should the parsing happen later, during decoding.

In the current proposal, without an associated type, a key with unrelated string and integer values may not round-trip:

extension MyCodableType: CodingKeyRepresentable {

  var codingKey: Any & CodingKey {
    // This example uses a compiler-synthesized coding key.
    enum MyCodingKeys: Int, CodingKey { case answer = 42 }
    return MyCodingKeys.answer
    //-> MyCodingKeys(stringValue: "answer", intValue: 42)
  }

  init?<T: CodingKey>(codingKey: T) {
    print(codingKey)
    // Prints one of the following:
    //-> _DictionaryCodingKey(stringValue: "answer", intValue: nil)
    //-> _DictionaryCodingKey(stringValue: "42", intValue: 42)
    return nil
  }
}

itaiferber · August 19, 2021, 3:50pm

Let’s see… I need to page some of this back into memory.

Re _DictionaryCodingKey: the current version has existed in its current form since it was first introduced.

Why does _DictionaryCodingKey.init(stringValue:) attempt to parse the string value into an Int?

This is primarily for the benefit of decoding. Some formats (like JSON) only support String keys for dictionaries, which means that on encode, we have to convert Int keys to Strings. On decode, however, we have to try to go the other way:

We create a KeyedDecodingContainer keyed by a key type which attempts to parse Int keys
We access allKeys on that container, and if any key couldn’t be parsed as an Int, we throw an error

This does mean that there is a performance hit on encode, yes — and it’s absolutely worth considering improving performance by possibly splitting out encoding behavior from decoding behavior. Note that it is possible (though I think very unlikely) that something could rely on this on encode:

Ostensibly, you can imagine an Encoder/Decoder which would prefer to use Int keys where possible for efficiency (or as a requirement of the format — the opposite of something like JSON which requires String keys), and given a dictionary created with keys that could be all Ints, would prefer to convert them… But at this point, I think this is a pretty big stretch. I haven’t seen any Int-preferring encoders/formats in the wild that would benefit from this
Potentially more usefully, you can imagine a migration scenario in which existing code which encoded Int-keyed dictionaries has now been expanded to want to encode as String-keyed instead. This allows the old Int-keyed values to still be accessibly as if they were originally String-keyed and vice versa: introducing new data into an old version of an app. But, this is a bit of a strange use-case

Should _DictionaryCodingKey.init(intValue:) store the Int key as a String?

As alluded to above, I would say that many (if not most) formats do not support Int values as keys (e.g. JSON). Conversion to a String has to happen somewhere, and at the moment, the resulting value is easier to store. Could we benefit from recalculating the value every time (as opposed to converting the once and storing)? Potentially, but I think it might be difficult to measure.

The conversion has to happen at least once for every key on encode and decode, so doing it lazily doesn't inherently offer a benefit beyond storage. To see a real benefit to storing the resulting value as we do now, you’d need to access the key’s stringValue at least twice, and it’s hard to tell how often that might happen. It at least depends on how the Encoder/Decoder touches the keys, and how often user code accesses them either via allKeys or codingPath.

So does that mean we should convert to an enum? Also hard to say — in order to hit the case where you want to fit a _DictionaryCodingKey into an existential box, you’d need to want to use such a key. But, an EitherCodingKey type (i.e. either String or an Int) is less generally helpful as public API than, say, AnyCodingKey which takes (String, Int?) directly — in which case, you don’t benefit from the conversion to an enum anyway. We could expose both types of keys and let users choose, but it feels like muddying the waters a bit…

In all, my gut feeling is that the performance impact one way or another is likely negligible in the vast majority of use-cases, and I think we might be hard-pressed to see real benefits one way or another.

I think regarding both questions: it would be interesting to see some real-world measurements of performance to see how big of an impact either one makes (potentially-wasteful String → Int conversion on encode; unnecessary memory storage on encode/decode). With dynamic code like this, it can be really difficult to try to capture a wide swath of real-world scenarios, but it might be helpful if we can find something.

Tony_Parker · August 19, 2021, 3:59pm

I have thought of one use case for Int-preferred keys - a CSV coder. It could also support string keys if the format has a header row, but if it doesn't then the only way to index into the fields is by position on each line.