Bug or PEBKAC?

can't that be asserted at runtime, just like how we assert expectations of types when decoding?

For future readers, here's my solution:

import Foundation

struct UserDefaultsDomain: Hashable {
	var domain: String
	
	init(_ domain: String) { self.domain = domain }
}

extension UserDefaultsDomain: CodingKey {
	init(stringValue: String) { self.init(stringValue) }
	var stringValue: String { self.domain }
	
	init?(intValue: Int) { nil }
	var intValue: Int? { nil }
}

extension UserDefaultsDomain: Codable {
	init(from decoder: Decoder) throws {
		self.init(try decoder
			.singleValueContainer()
			.decode(String.self)
		)
	}

	func encode(to encoder: Encoder) throws {
		var container = encoder.singleValueContainer() 
		try container.encode(domain)
	}
}


struct UserDefaultsConfigurations {
	var configurations: [UserDefaultsDomain: [String: String]]
}
extension UserDefaultsConfigurations: Codable {
	init(from decoder: Decoder) throws {
		let container = try decoder.container(keyedBy: UserDefaultsDomain.self)

		self.configurations = Dictionary(uniqueKeysWithValues: 
			container.allKeys.lazy.map { domain in
				let values = try! container.decode([String: String].self, forKey: domain)
				return (key: domain, value: values)
			} 
		)
	}
	
	func encode(to encoder: Encoder) throws {
		var container = encoder.container(keyedBy: UserDefaultsDomain.self)
		
		for (domain, values) in self.configurations {
			try container.encode(values, forKey: domain)
		}
	}
}

let testData = UserDefaultsConfigurations(configurations: [
	UserDefaultsDomain("a.b.c"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
	UserDefaultsDomain("com.example"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
	UserDefaultsDomain("x.y.z"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
])

// Encode to JSON
let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted
let jsonData = try encoder.encode(testData)
let jsonString = String(data: jsonData, encoding: .utf8)!
print(jsonString)

// decode back 
let decoder = JSONDecoder()
let decoded = try decoder.decode(UserDefaultsConfigurations.self, from: jsonData)
dump(decoded)

do { // and back to JSON!
	let encoder = JSONEncoder()
	encoder.outputFormatting = .prettyPrinted
	let jsonData = try encoder.encode(decoded)
	let jsonString = String(data: jsonData, encoding: .utf8)!
	print(jsonString)
}

That is basically what it does for String and Int. But Dictionary in the standard library cannot attempt to downcast to your custom type, because it has no knowledge of it.

Hypothetically, it might work to dynamically check by first attempting to encode itself into a different encoder and checking what was actually produced. If the keys happened to all be compatible, then it could forcefully encode in dictionary form. The problem is that the coder itself can override things, and the Dictionary cannot reliably recreate either its type or its state. If it used a default JSONEncoder and got something compatible, say "2020‐02‐13", its forced encoding into WhateverEncoder might fail if that encoder does something else with the key, such as ["year": "2020", "month": "02", "day": "13"].

Why can't that be asserted at runtime, just like how we assert expectations of types when decoding?

Here's my working example, for future lost readers.

import Foundation

struct UserDefaultsDomain: Hashable {
	var domain: String
	
	init(_ domain: String) { self.domain = domain }
}

extension UserDefaultsDomain: CodingKey {
	init(stringValue: String) { self.init(stringValue) }
	var stringValue: String { self.domain }
	
	init?(intValue: Int) { nil }
	var intValue: Int? { nil }
}

extension UserDefaultsDomain: Codable {
	init(from decoder: Decoder) throws {
		self.init(try decoder
			.singleValueContainer()
			.decode(String.self)
		)
	}

	func encode(to encoder: Encoder) throws {
		var container = encoder.singleValueContainer() 
		try container.encode(domain)
	}
}


struct UserDefaultsConfigurations {
	var configurations: [UserDefaultsDomain: [String: String]]
}
extension UserDefaultsConfigurations: Codable {
	init(from decoder: Decoder) throws {
		let container = try decoder.container(keyedBy: UserDefaultsDomain.self)

		self.configurations = Dictionary(uniqueKeysWithValues: 
			container.allKeys.lazy.map { domain in
				let values = try! container.decode([String: String].self, forKey: domain)
				return (key: domain, value: values)
			} 
		)
	}
	
	func encode(to encoder: Encoder) throws {
		var container = encoder.container(keyedBy: UserDefaultsDomain.self)
		
		for (domain, values) in self.configurations {
			try container.encode(values, forKey: domain)
		}
	}
}

let testData = UserDefaultsConfigurations(configurations: [
	UserDefaultsDomain("a.b.c"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
	UserDefaultsDomain("com.example"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
	UserDefaultsDomain("x.y.z"): [
		"key1": "value1",
		"key2": "value2",
		"key3": "value3",
	],
])

// Encode to JSON
let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted
let jsonData = try encoder.encode(testData)
let jsonString = String(data: jsonData, encoding: .utf8)!
print(jsonString)

// decode back 
let decoder = JSONDecoder()
let decoded = try decoder.decode(UserDefaultsConfigurations.self, from: jsonData)
dump(decoded)

do { // and back to JSON!
	let encoder = JSONEncoder()
	encoder.outputFormatting = .prettyPrinted
	let jsonData = try encoder.encode(decoded)
	let jsonString = String(data: jsonData, encoding: .utf8)!
	print(jsonString)
}

How do you expect the assertion to look like? Even if the Dictionary knows the type of the key (which it does), it still doesn't know if the encode function will produce something that is key-compatible.

It knows that String is ok, and that Int is ok. It doesn't know if UserDefaultsDomain is ok.

Closest would be to check for CodingKey conformance. With that said, the ship has sailed, not much can be done about it.
nvm. That wouldn’t work.

Why isn't a check for conformance to CodingKey sufficient?

More generally during the process of decoding itself, the Dictionary could produce a decoder that it provides to its Key type (which is statically known). The Key will do its thing to decode itself. As a result, the Dictionary can verify that what was encoded into its encoder was a String, or one of the (U)Int(8|16|32|64) variants.

Better yet, this assertion could be delegated to the JSON/YAML/Whatever decoder, to have them define for themselves what is or isn't valid as a key in their format.

That would work, but would go behind the type’s back. The type would lose the ability to customize its own encoding.

The encoder can already do that.

Ultimately I am not the one you have to convince, and I did not write the any of the implementation for Codable. This is a “Using Swift” thread. You asked “why”, and I happened to be able to answer that first question generally. The separate question “Can we change it?” is for the “Evolution” section of the forum and for someone else to answer.

You can only use dynamic check for conformance (let key = key as? Codable, and the likes). This on its own is already a foot-gun if the static type is not Codable but each and every element happens to be Codable (via subtyping).

Let’s say it’s ok, and that we want to do keyed encoding in such a case. We still have more problems.

KeyedEncodingContainer requires you to know static type which:

  • The static type may not be Codable.
  • The dynamic type may not be the same.
    • We may not have common type for all elements that also happens to be Codable.
    • Even if we have it, I don’t think we have any mechanism to figure out that type.

I may missed something (like static conformance checking, Key.self as Codable.self), but I don’t see a pure Swift implementation that would work unless you significantly change the coding model. At which point we’re probably comfortably well outside of Using Swift tag.

It has the same problem I mentioned above. Also we probably want to have proper round-trip story. So if one side couldn’t, the other follows suit

Forgot to mentioned, but you can also use property wrapper

@propertyWrapper
struct CodableKey<Key, Value>: Codable where Key: CodingKey & Hashable, Value: Codable {
    var wrappedValue: [Key: Value]
    
    func encode(to encoder: Encoder) throws {
        var container = try encoder.container(keyedBy: Key.self)
        for (key, value) in wrappedValue {
            try container.encode(value, forKey: key)
        }
    }
    
    init(wrappedValue: [Key: Value]) {
        self.wrappedValue = wrappedValue
    }
    
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: Key.self) 
        wrappedValue = try Dictionary(uniqueKeysWithValues: container.allKeys.map { try ($0, container.decode(Value.self, forKey: $0)) })
    }
}

Then you can do

struct AA: Codable {
    @CodableKey var tt: [Int8: String]
}
extension Int8: CodingKey { ... }

let value = AA(tt: [3: "44"])

let encoder = JSONEncoder()
try String(data: encoder.encode(value), encoding: .ascii)
/*
{
  "tt": {
    "3": "44"
  }
}
 */

With the only problem being that it doesn't support local/global variables (yet).

3 Likes

Could you elaborate on what you mean by this?

Yep, you sure did answer that! Now I'm trying to figure out what motivated the current implementation

Not @SDGGiesbrecht (well, obviously). Anyhow, you'll notice that any implementation you can come up with doesn't call Key.encode. So Key has no say in how it wants to be encoded.

1 Like

From the proposal:

They may or may not answer, but they are the only ones who will know for sure.

@itaiferber, @Tony_Parker

As the original author of the code, I can give some background as to why the implementation behaves this way (though as I no longer work at Apple, I can't speak to any decisions about changing this in the future, if at all).

Answering questions sequentially:

  • Why doesn't Dictionary assert at runtime that a given key encodes as either a String or an Int in the end? For several reasons:

    1. Knowing what a given key encodes as requires encoding it and then checking — there is no way to know up-front how an object prefers to encode because it can choose one of many options based on Encoder.codingPath, Encoder.userInfo, global state, etc. This means that the only way to know for sure is to encode the entire key object and then check, which means that you have to potentially set aside a container for it, encode a potentially large object graph (the object has control of program flow once you call its encode(to:) method so you have to wait until it's done), and then inspect.

      For some encoders, this might be prohibitive: if you want to write an encoder that streams output as soon as objects hit your containers' encode(...) methods (e.g. you want to keep potential memory footprint down as much as possible), as soon as you hit a Dictionary, you can no longer stream keys

    2. Even if you decide you want to encode a key and check what type it encoded after the fact, you don't have a guarantee that all keys in a dictionary will encode the same way. If I have a dictionary of 100 objects where the first 99 keys encode as strings but the last key encodes as an object, you have no way of knowing that would happen until after you encode the last key. This means that you have to potentially hold all encoded keys in memory until you know for sure what container you can use to hold them all, and if you've already made a decision about what container to use, it might be the case that you have to go back and change containers

    3. Asserting that all encoded dictionaries must have String/Int keys, must be consistent between the two, or must have a structure representable natively by any given format is needlessly prohibitive. It's possible to allow any dictionary whose keys and values are encodable and decodable to encode and decode (assuming you know their static types), so there's no real reason to prohibit that

  • Why doesn't Dictionary check for conformance to CodingKey and expand the number of types it supports beyond String and Int? For two reasons:

    1. Although it is unlikely, just because an object conforms to CodingKey doesn't necessarily mean that it would prefer to encode as its .stringValue or .intValue. Any object which conforms to Codable can choose its own representation, and although it is very unlikely that you'd be encoding a Codable & CodingKey object, it's not guaranteed

    2. More importantly, we wanted to deter the potential usage of [CodingKey : <Codable type>] over the use of nested containers. Syntactically, it's really easy to reach for a Dictionary, but that forgoes the strong static typing that we built with containers; it's relatively rare that you want to encode a dictionary keyed by an arbitrary set of CodingKeys, but rather, a known set of CodingKeys. When you own the key type, it's much easier to reach for a nested container

  • Why doesn't Dictionary check for conformance to RawRepresentable and expand the number of types it supports beyond String and Int?

    As far as I'm concerned, this is a bug, but one that will be very unlikely to change. There's been a lot of discussion of this in the past (see Using RawRepresentable String and Int keys for Codable Dictionaries and JSON Encoding / Decoding weird encoding of dictionary with enum values) and unfortunately, it's a behavior-breaking change. Dictionary can't change its implementation without producing data which is not compatible with older versions of the stdlib

    NOTE: Why is this case different from the CodingKey case above? Because the semantics of RawRepresentable indicate that the type is truly equivalent to its raw type.

    You can potentially make a similar claim about LosslessStringConvertible, but that one's a bit more iffy. Just because you can convert to a String and back doesn't mean that the type intends this.


Since it's unlikely that Dictionary can change these things without massive breaking consequences, the best chance forward is for additional API on existing Encoders and Decoders to twiddle with these and allow dictionaries to be intercepted, like with existing encoding/decoding strategies. /cc @Morten_Bek_Ditlevsen

7 Likes

Wow, this property wrapper worked out really well. Reduced a bunch of code, focused the responsibilities of each of the pieces involved, and allowed me to remove an undesirable intermediate type (whose purpose was to just implement this custom dict coding logic).

It surprised me to learn that the dict type syntax in @CodableKey var tt: [Int8: String] is compatible with struct CodableKey<Key, Value>. Very neat!

1 Like

If you are encoding a large object graph as the key, the overall result wouldn't be a String/Int (because each layer would need its own container), meaning it wouldn't be a valid JSON key and thus would fail anyway.

There would be a small cost for types which are known to behave correctly (like String/Int directly). Perhaps the special-case checks could be used to bypass those.

Yes, that's true. It would, however, require you to coordinate between containers and share state so you have the opportunity for a given container to signal its parent (or similar) to indicate that a non-trivial object graph is being encoded. (Far from impossible, but difficult to get right once you have to start spreading this sort of logic across multiple locations.)

It should be indistinguishable from normal Dictionary store property. That is more-or-less the point of property wrapper.

The differences would be:

  • Custom get-set logic via wrappedValue (we're not using this, so just use stored property).

  • All conformance syntheses of the variable (AA.tt) is done on Codable, rather than Dictionary.

    If you also need Hashable, Equatable, you can just add them to CodableKey, and let the synthesiser do the job.

  • extra variables _tt, $tt.

Other than that, it should look and feel like normal Dictionary, initialisation via literal, calling functions, assigning values, etc.

Hi all,
As Itai mentions, the best chance forward is likely to extend the API on existing Encoders and Decoders.
I have tried to do that in this PR:

https://github.com/apple/swift/pull/26257

The PR introduces a DictionaryKeyEncodingStrategy on JSONEncoder (and similar on JSONDecoder) that allows opt-in to the behavior of letting RawRepresentable String keyed Dictionaries be encoded as dictionaries instead of lists.

public enum DictionaryKeyEncodingStrategy {
     case `default`
     case rawRepresentableStringKeysAsDictionaries
 }

As can be seen in the PR discussion, @Tony_Parker mentions that he would like more discussion about the topic, so that the best possible solution can be selected going forward.

So please chime in about how you could see the issue solved in a non-behavior-breaking way, and perhaps we'll be able to land a solution in a future Swift version! :slight_smile:

1 Like

I'm not sure it needs to be that complicated.

Currently, if the JSONEncoder detects a [String: Encodable] dictionary, it manually iterates the key-value pairs, encodes the value, and places the result in an NSMutableDictionary.

It should be possible to instead:

  • iterate the key-value pairs
  • encode the key in to a separate container
  • validate that the container holds exactly one String (however deeply nested it may be)
  • associate that String with the value in the resulting NSMutableDictionary.

This would naturally allow all other primitive types (like Date), as well as single-item arrays of primitives, to be encoded when they are used as dictionary keys. The important thing is that your key-type boils down in to a single String when encoded. That seems like a more reasonable limitation, based on how your values encode relative to the JSON spec, rather than how you model them in the Swift type system (which really has nothing at all to do with JSON).

JSONDecoder similarly detects dictionaries, so we can create a custom Decoder which does the reverse: it only has a single String, so any object-graph you inflate from it can only decode a single value, however deeply-nested it may be. That's simple enough to validate.

I have a rough patch which does that, and it appears to work without regressing anything in the test suite. I'll need to check a bunch of edge-cases before it's ready for a PR though (not that I expect it to ever be merged).

1 Like

If I've understood you correctly, this does sound like the original approach I tried to imply (and I believe I misunderstood your follow-up response): if you encode the key it's trivial to check if the encoded value boils down to a String/Int, you just have to check after it's done encoding. With additional state, it's possible to confirm this before it's done encoding, but this can get messy.

The original point in context was simply that there's no way to guess up-front: either way, you have to start encoding the key in order to find out (and at that point, you have to let the key finish encoding fully before checking further keys).

1 Like

just hit this bug today. so unfortunate it's still unfixed. it really feels like bug, even that now i know that encoding dictionaries as lists in some cases was implemented on purpose (IMHO that's unnecessary and plain wrong). i support the idea of having an opt-in for a correct behaviour in a form of a new "StringKeyRepresentable" protocol.

Hi @tera,

It would be great if you could voice your support on the pitch thread here:

The corresponding proposal is written, and a member of the Swift Core Team did assign themselves to the proposal PR. :slight_smile:

Sincerely,
/Morten