Hello, I created the bug here: OrderedDictionary decoding. · Issue #310 · apple/swift-collections · GitHub
But now I'm curious, what is the correct approach to decoding here.
Should it be fully compatible with an original Dictionary or be implemented like now (expects unkeyedContainer
)?
This is probably related to the nature of the KeyedDecodingContainer's keys, which are also not ordered.
To me it sounds reasonable that encoding and decoding preserves the order as it does today, so my guess is that it is intentional and not a bug.
This is indeed expected, as it's technically correct that many encodings don't support ordered dictionaries (like JSON, technically). However, that makes the default coding behavior rather useless, as the vast majority of people would simply expect to be able to preserve ordering into a dictionary (like JSON, practically). Suggested workaround is to wrap your OrderedDictionary
into a type that preserves order as expected.
It seems reasonable to me that it should use the same encoded form as Dictionary
.
The JSON specification states:
The JSON syntax … does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
Generic decoding can't make any assumptions about the use, but users of the JSON may need to re-encode it at some point and may require that order be preserved (per their "specific use" semantics, as the standard says), therefore OrderedDictionary
must be an option (otherwise, what is?).
Ditto here, I'd vote for a change where encoded OrderedDictionary representation is normal JSON object so this test works:
let x: [String: String] = ["foo" : "1", "bar" : "2", "baz" : "2"]
let data = try! JSONEncoder().encode(x)
let y = try! JSONDecoder().decode(OrderedDictionary<String, String>.self, from: data)
let data2 = try! JSONEncoder().encode(y)
let z = try! JSONDecoder().decode([String: String].self, from: data2)
specifically IMHO this should be true:
OrderedDictionary -> JSON -> OrderedDictionary // preserves order
OrderedDictionary -> JSON -> Dictionary // order is n/a obviously
Dictionary -> JSON -> OrderedDictionary // ordered "as per JSON"
If it's too late to change the current default behaviour maybe we can opt in by specifying an option on encoder/decoder object similar to how we do it with key decoding strategy and output formatting.
Keep in mind that OrderedDictionary
's Codable
conformance extends to all Codable
formats, and not just JSON. Using an UnkeyedEncodingContainer
is entirely correct for maintaining the order of the dictionary, even if in certain formats it produces something that doesn't look like a dictionary; the key is that the runtime representation of the OrderedDictionary
is the same.
For JSON specifically, the spec indeed places absolutely no restrictions on the meaning of key-value pairs in dictionaries — which means that in practice, most JSON parsers and encoders don't preserve order at all, because it usually requires more work to do so. This is at least partially due to the fact that in many languages, dictionaries are themselves unordered, and decoding is very often eager, rehydrating an entire payload into language-native collections. (This is part of the reason why, at the moment, JSONDecoder
can't preserve ordering: because it uses JSONSerialization
under the hood, it gets back fully-hydrated Dictionary
values back, so ordering has already been lost by the time that JSONDecoder
gets the data. This is something that may change with the upcoming swift-foundation
work and rewrite.)
This means that if you're using JSON to communicate with any sort of service, you have to be very careful because can't really rely on "ordered dictionary " → service → "ordered dictionary" producing the same results. (Unless you encode the dictionary as an array, in which case it can't ever reasonably be reordered, unless intentionally.)
This is actually possible today, without changing the encoded representation of OrderedDictionary
at all:
OrderedDictionary
→ JSON →OrderedDictionary
is obviously possibleOrderedDictionary
→ JSON →Dictionary
is already possible:Dictionary
already knows* how to decode from an array of key-value pairs, in the same way thatOrderedDictionary
encodesDictionary
→ JSON →OrderedDictionary
is also possible, by havingOrderedDictionary.init(from:)
check for a keyed container, and if one is present, decode from that (i.e., the reverse of whatDictionary
does). The caveat here is thatOrderedDictionary
has no choice but to trust that the order of the keyed given by the keyed contain is the same as was in the original data, which currently is not the case (and in general, decoders for a variety of formats can't make that promise) — but forJSONDecoder
specifically, could be
*The tricky thing is that at the moment, if you try to decode a dictionary keyed by either String
or Int
, Dictionary
assumes that you must have a keyed container representation, so you'll get an error if you try to decode OrderedDictionary<String, ...>
as Dictionary<String, ...>
, but this can be loosened. You can also work around this by using a wrapper key type that encodes/decodes as a String
(even a RawRepresentable
type should work), so this does work:
import Foundation
import OrderedCollections
let orderedDict: OrderedDictionary = ["one": 1, "two": 2, "three": 3]
print(orderedDict) // => [one: 1, two: 2, three: 3]
let encoded = try! JSONEncoder().encode(orderedDict)
print(String(data: encoded, encoding: .utf8)!) // => ["one",1,"two",2,"three",3]
struct Key: RawRepresentable, Hashable, Decodable {
let rawValue: String
}
let decoded = try! JSONDecoder().decode([Key: Int].self, from: encoded)
print(decoded) // => [Key(rawValue: "three"): 3, Key(rawValue: "one"): 1, Key(rawValue: "two"): 2]
Also keep in mind that changing the encoded representation of a type isn't something you can do loosely (or in some cases, at all). Once the encoded representation of a type is public, that representation can be written out to disk and persisted indefinitely; changing the representation means that older software versions (the app/package/whatever), unless written in a very forward-thinking manner, will choke on the new data format.
The bar for breaking backwards compatibility in data serialization (in the general case, at least) is exceedingly high, because you risk throwing away support for otherwise perfectly valid software which can't necessarily be updated.
Maybe we are missing an abstraction here, like "SortedKeyedContainer". I see you point of how we'd got to what we have, just the end result doesn't look ideal.
Please clarify: it is possible to implement in the standard library just not implemented now, and could be considered in the future, or you mean something else?
Heh, "String" key was the thing I checked, so I didn't notice it could work.
OTOH, it does feel somewhat wrong that I can encode an array: ["foo", "1", "bar", "2"] into JSON and read it back as a Dictionary
. I'd never assume that's possible and would consider that a bug if you didn't tell this is by design.
That's why I said:
IMHO, if we have the facilities like changing the key names completely or key capitalisation between snake/camel case) we could as well have this other option of what container to choose.
Well, passing dictionary formatted key value pairs through a text JSON file and back would preserve the order, but it of course would be naïve to assume that passing it through an arbitrary "service" that could change the order by re-encoding JSON would keep the order intact. That's expected. That's where the option of array serialisation would be helpful (be it the default or the opt-in behaviour).
What I mean is that it's possible to update OrderedDictionary.init(from:)
to try to fetch an unkeyed container, as it does today — if one is present, then it can decode with the same approach that it already does. However, if attempting to fetch an unkeyed container fails due to a type mismatch, it can attempt to fetch a keyed container instead, and attempt to decode from the keyed container (like Dictionary
currently does).
Because OrderedDictionary
lives in swift-collections, no update to the standard library is necessary; it would just be a package update.
That's why I said:
If it's too late to change the current default behaviour maybe we can opt in by specifying an option on encoder/decoder object similar to how we do it with key decoding strategy and output formatting.
Indeed, I just wanted to reinforce that.
IMHO, if we have the facilities like changing the key names completely or key capitalisation between snake/camel case) we could as well have this other option of what container to choose.
Sure, encoders and decoders can offer this option! Nothing stopping that — though it's outside the purview of the standard library. Foundation can choose to offer toggles for what to do with ordered collections on JSONEncoder
/JSONDecoder
(though it does get a bit tricky, because Foundation can't import swift-collections
to know about OrderedDictionary
, so some protocol conformances might be necessary to make this happen).
but it of course would be naïve to assume that passing it through an arbitrary "service" that could change the order by re-encoding JSON would keep the order intact. That's expected.
Assertions about naïveté aside, the definition of "service" here can be pretty wide. Passing the data to older versions of your app using an stdlib/Foundation that doesn't support ordered containers, passing your ordered dictionary to any encoder/decoder that doesn't care to preserve order, storing ordered data on disk but later attempting to manipulate it with a tool that doesn't support preserving ordering — all are implicit ways of unexpectedly losing ordering without necessarily being aware of it.
It's not a bad thing, necessarily, and it doesn't at all mean that maintaining ordering during decoding shouldn't be supported, but it does mean you need to be very careful: you can only really assume your OrderedDictionary
means anything if you know that you wrote the data out with ordered keys and are reading it back as-is, without anything else having touched it. Pretty much any other scenario is up for grabs.
I'd love to reopen this discussion if possible. As a maintainer of the server-side GraphQL implementation (which does have a requirement for specific ordering of encoded JSON fields), I'd love to have a more standard solution. Currently, our solution has involved copying a large portion of the JSONEncoder code to avoid any usage of unordered dictionaries, which has obvious maintainability problems.
If it's too late to change the current default behaviour maybe we can opt in by specifying an option on encoder/decoder object similar to how we do it with key decoding strategy and output formatting.
Is this seen as the most viable path forward for control over encoding order? I'd be happy to contribute if so. Or do you see alternatives for my use case?