OrderedDictionary decoding

Hello, I created the bug here: OrderedDictionary decoding. · Issue #310 · apple/swift-collections · GitHub
But now I'm curious, what is the correct approach to decoding here.
Should it be fully compatible with an original Dictionary or be implemented like now (expects unkeyedContainer)?
This is probably related to the nature of the KeyedDecodingContainer's keys, which are also not ordered.

To me it sounds reasonable that encoding and decoding preserves the order as it does today, so my guess is that it is intentional and not a bug.

2 Likes

This is indeed expected, as it's technically correct that many encodings don't support ordered dictionaries (like JSON, technically). However, that makes the default coding behavior rather useless, as the vast majority of people would simply expect to be able to preserve ordering into a dictionary (like JSON, practically). Suggested workaround is to wrap your OrderedDictionary into a type that preserves order as expected.

It seems reasonable to me that it should use the same encoded form as Dictionary.

The JSON specification states:

The JSON syntax … does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

Generic decoding can't make any assumptions about the use, but users of the JSON may need to re-encode it at some point and may require that order be preserved (per their "specific use" semantics, as the standard says), therefore OrderedDictionary must be an option (otherwise, what is?).

2 Likes

Ditto here, I'd vote for a change where encoded OrderedDictionary representation is normal JSON object so this test works:

let x: [String: String] = ["foo" : "1", "bar" : "2", "baz" : "2"]
let data = try! JSONEncoder().encode(x)
let y = try! JSONDecoder().decode(OrderedDictionary<String, String>.self, from: data)
let data2 = try! JSONEncoder().encode(y)
let z = try! JSONDecoder().decode([String: String].self, from: data2)

specifically IMHO this should be true:

OrderedDictionary -> JSON -> OrderedDictionary // preserves order
OrderedDictionary -> JSON -> Dictionary // order is n/a obviously
Dictionary -> JSON -> OrderedDictionary // ordered "as per JSON"

If it's too late to change the current default behaviour maybe we can opt in by specifying an option on encoder/decoder object similar to how we do it with key decoding strategy and output formatting.

Keep in mind that OrderedDictionary's Codable conformance extends to all Codable formats, and not just JSON. Using an UnkeyedEncodingContainer is entirely correct for maintaining the order of the dictionary, even if in certain formats it produces something that doesn't look like a dictionary; the key is that the runtime representation of the OrderedDictionary is the same.

For JSON specifically, the spec indeed places absolutely no restrictions on the meaning of key-value pairs in dictionaries — which means that in practice, most JSON parsers and encoders don't preserve order at all, because it usually requires more work to do so. This is at least partially due to the fact that in many languages, dictionaries are themselves unordered, and decoding is very often eager, rehydrating an entire payload into language-native collections. (This is part of the reason why, at the moment, JSONDecoder can't preserve ordering: because it uses JSONSerialization under the hood, it gets back fully-hydrated Dictionary values back, so ordering has already been lost by the time that JSONDecoder gets the data. This is something that may change with the upcoming swift-foundation work and rewrite.)

This means that if you're using JSON to communicate with any sort of service, you have to be very careful because can't really rely on "ordered dictionary " → service → "ordered dictionary" producing the same results. (Unless you encode the dictionary as an array, in which case it can't ever reasonably be reordered, unless intentionally.)

This is actually possible today, without changing the encoded representation of OrderedDictionary at all:

  1. OrderedDictionary → JSON → OrderedDictionary is obviously possible
  2. OrderedDictionary → JSON → Dictionary is already possible: Dictionary already knows* how to decode from an array of key-value pairs, in the same way that OrderedDictionary encodes
  3. Dictionary → JSON → OrderedDictionary is also possible, by having OrderedDictionary.init(from:) check for a keyed container, and if one is present, decode from that (i.e., the reverse of what Dictionary does). The caveat here is that OrderedDictionary has no choice but to trust that the order of the keyed given by the keyed contain is the same as was in the original data, which currently is not the case (and in general, decoders for a variety of formats can't make that promise) — but for JSONDecoder specifically, could be

*The tricky thing is that at the moment, if you try to decode a dictionary keyed by either String or Int, Dictionary assumes that you must have a keyed container representation, so you'll get an error if you try to decode OrderedDictionary<String, ...> as Dictionary<String, ...>, but this can be loosened. You can also work around this by using a wrapper key type that encodes/decodes as a String (even a RawRepresentable type should work), so this does work:

import Foundation
import OrderedCollections

let orderedDict: OrderedDictionary = ["one": 1, "two": 2, "three": 3]
print(orderedDict) // => [one: 1, two: 2, three: 3]

let encoded = try! JSONEncoder().encode(orderedDict)
print(String(data: encoded, encoding: .utf8)!) // => ["one",1,"two",2,"three",3]

struct Key: RawRepresentable, Hashable, Decodable {
    let rawValue: String
}

let decoded = try! JSONDecoder().decode([Key: Int].self, from: encoded)
print(decoded) // => [Key(rawValue: "three"): 3, Key(rawValue: "one"): 1, Key(rawValue: "two"): 2]

Also keep in mind that changing the encoded representation of a type isn't something you can do loosely (or in some cases, at all). Once the encoded representation of a type is public, that representation can be written out to disk and persisted indefinitely; changing the representation means that older software versions (the app/package/whatever), unless written in a very forward-thinking manner, will choke on the new data format.

The bar for breaking backwards compatibility in data serialization (in the general case, at least) is exceedingly high, because you risk throwing away support for otherwise perfectly valid software which can't necessarily be updated.

3 Likes

Maybe we are missing an abstraction here, like "SortedKeyedContainer". I see you point of how we'd got to what we have, just the end result doesn't look ideal.

Please clarify: it is possible to implement in the standard library just not implemented now, and could be considered in the future, or you mean something else?

Heh, "String" key was the thing I checked, so I didn't notice it could work.

OTOH, it does feel somewhat wrong that I can encode an array: ["foo", "1", "bar", "2"] into JSON and read it back as a Dictionary. I'd never assume that's possible and would consider that a bug if you didn't tell this is by design.

That's why I said:

IMHO, if we have the facilities like changing the key names completely or key capitalisation between snake/camel case) we could as well have this other option of what container to choose.

Well, passing dictionary formatted key value pairs through a text JSON file and back would preserve the order, but it of course would be naïve to assume that passing it through an arbitrary "service" that could change the order by re-encoding JSON would keep the order intact. That's expected. That's where the option of array serialisation would be helpful (be it the default or the opt-in behaviour).

2 Likes

What I mean is that it's possible to update OrderedDictionary.init(from:) to try to fetch an unkeyed container, as it does today — if one is present, then it can decode with the same approach that it already does. However, if attempting to fetch an unkeyed container fails due to a type mismatch, it can attempt to fetch a keyed container instead, and attempt to decode from the keyed container (like Dictionary currently does).

Because OrderedDictionary lives in swift-collections, no update to the standard library is necessary; it would just be a package update.

Indeed, I just wanted to reinforce that.

Sure, encoders and decoders can offer this option! Nothing stopping that — though it's outside the purview of the standard library. Foundation can choose to offer toggles for what to do with ordered collections on JSONEncoder/JSONDecoder (though it does get a bit tricky, because Foundation can't import swift-collections to know about OrderedDictionary, so some protocol conformances might be necessary to make this happen).

Assertions about naïveté aside, the definition of "service" here can be pretty wide. Passing the data to older versions of your app using an stdlib/Foundation that doesn't support ordered containers, passing your ordered dictionary to any encoder/decoder that doesn't care to preserve order, storing ordered data on disk but later attempting to manipulate it with a tool that doesn't support preserving ordering — all are implicit ways of unexpectedly losing ordering without necessarily being aware of it.

It's not a bad thing, necessarily, and it doesn't at all mean that maintaining ordering during decoding shouldn't be supported, but it does mean you need to be very careful: you can only really assume your OrderedDictionary means anything if you know that you wrote the data out with ordered keys and are reading it back as-is, without anything else having touched it. Pretty much any other scenario is up for grabs.

1 Like

I'd love to reopen this discussion if possible. As a maintainer of the server-side GraphQL implementation (which does have a requirement for specific ordering of encoded JSON fields), I'd love to have a more standard solution. Currently, our solution has involved copying a large portion of the JSONEncoder code to avoid any usage of unordered dictionaries, which has obvious maintainability problems.

If it's too late to change the current default behaviour maybe we can opt in by specifying an option on encoder/decoder object similar to how we do it with key decoding strategy and output formatting.

Is this seen as the most viable path forward for control over encoding order? I'd be happy to contribute if so. Or do you see alternatives for my use case?

1 Like