Decoding different versions of content structure

Stiivi · November 18, 2024, 6:03pm

I need to decode a JSON structure which might be present in different versions. The decoding is based on the version – different key names, different structures or objects present within.

The source data is assumed to be a dictionary which might contain a value with a version number according to which the further decoding should happen. If the top level dictionary does not have the version key, then some – very likely the most recent version is assumed and used.

The way that I am doing it now is using sparsely documented protocol DecodableWithConfiguration. The documentation says that it is _"used for types that require additional static information". I might be abusing it, since I set a value within the config while decoding (see below). Not sure what the consequences might be.

Here follows a minimal example where just one value of an item is version-dependent.

The top level container with optional version key:

struct ThingContainer: DecodableWithConfiguration {
    public class DecodingConfiguration {
        var version: String
        init(version: String) {
            self.version = version
        }
    }

    let things: [Thing]
    
    enum CodingKeys: String, CodingKey {
        case version
        case things
    }
    
    public init(from decoder: any Decoder, configuration: DecodingConfiguration) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)

        if let version = try container.decodeIfPresent(String.self, forKey: .version) {
            configuration.version = version
        }

        things = try container.decode([Thing].self, forKey: .things, configuration: configuration)
    }
}

The item that is version dependent, in the (fictional) past the value was encoded as value, current version is using number:

struct Thing: DecodableWithConfiguration {
    typealias DecodingConfiguration = ThingContainer.DecodingConfiguration
    
    let value: Int
    
    enum CodingKeys: String, CodingKey {
        case obsoleteValue = "value"
        case value = "number"
    }

    public init(from decoder: any Decoder, configuration: DecodingConfiguration) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        switch configuration.version {
        case "0": // old version
            value = try container.decode(Int.self, forKey: .obsoleteValue)
        default: // current and default
            value = try container.decode(Int.self, forKey: .value)
        }
    }
}

Example use:

let config = ThingContainer.DecodingConfiguration(version: "0")
let decoder = JSONDecoder()

let oldVersion = """
{
    "version": "0",
    "things": [ {"value": 10, "number": 20} ]
}
"""
let oldData = oldVersion.data(using: .utf8)!
let oldContainer = try decoder.decode(ThingContainer.self, from: oldData, configuration: config)
print("Old version value: \(oldContainer.things[0].value)")

let newVersion = """
{
    "version": "1",
    "things": [ {"value": 10, "number": 20} ]
}
"""
let newData = newVersion.data(using: .utf8)!
let newContainer = try decoder.decode(ThingContainer.self, from: newData, configuration: config)
print("New version value: \(newContainer.things[0].value)")

I am currently using the above, however, I wonder whether it is the designated way? If no, what are the potential issues with the above code? What would be some other Swift-way to do it?

QuinceyMorris · November 18, 2024, 11:56pm

IDK, but it seems like DecodingConfiguration exists as supporting machinery for the CodableConfiguration property wrapper. However, your independent use of DecodingConfiguration seems uncontroversial.

I think you could do much the same thing with the userInfo property on the decoder, though with the same drawback — you're passing around an immutable collection value, which means you have to put a mutable object reference inside it if you want to change the controlling value during decoding.

Personally, I prefer not to be too clever with stuff like this. (I'm the most likely person to get hurt if I try.) You could take 2 passes at it: decode once to get the version number (using a custom struct whose only member is version) and then a second decode using either a ThingV1 or a ThingV2 struct. That approach at least tends to let you use synthesized decoding, which has some advantages over being stuck with all the boilerplate in the code you posted.

Stiivi · November 21, 2024, 8:22am

Having "decodable struct per version" seems reasonable, especially from cleanliness perspective. I will consider it on my next refactoring.

My external data transformation flow is roughly:

Foreign data (various forms and their versions) → Codable representation of native yet still imprecise foreign data (with some ForeignThing protocol) → transformation from ForeignThing to the actual domain data model.

The native yet still imprecise is something that can exist as struct-per-version, as you suggested.

QuinceyMorris · November 21, 2024, 5:16pm

One thing to consider is how this gets more complex in the future. It's in the nature of such problems that "versioning" itself gets imprecise. For example, if the data contains a version number, some details depend on the version number, but some depend on other ad-hoc factors, and the factors themselves get intertwined. Eventually, any structured approach gets messier and messier.

At a certain point, it may become preferable to decode into JSONSerialization first, which is essentially an unstructured collection of JSON data. JSONSerialization is really a much older Obj-C class (NSJSONSerialization), which means the values are Obj-C "property list" types. A comprehensive solution may involve establishing proper Swift equivalents of the JSON types.

If the complexity is great enough to go to that trouble, you then have infinite freedom in the "still imprecise foreign data" phase of your workflow.

Usually, it's not worth the trouble, but keep it in mind as a possibility.

tera · November 22, 2024, 3:15am

For simple cases there's a simpler ad-hoc approach that doesn't require custom codable support:

struct Thing: Codable {
    var number: Int? // old
    var value: Int?  // new
}

extension Thing {
    var theNumber: Int { value ?? number! }
}

or a variation of this that uses a minimal amount of codable support to rename the key in question from "number" to "value" and have the dynamic type-user-facing property unchanged to keep existing code compile.

Stiivi · November 22, 2024, 11:19am

I agree with what you said.

I considered JSONSerialization, that was the reason for this question. It is still not in the new open-source swift Foundation though (which is another constraint).

In the future I would like to add reading of more variety of JSON structures. Having data warehousing and data quality background, I do not trust anything that is foreign. Sometimes added complexity is necessary to either make the reading more tolerable (with sensible defaults and default transformations) or at least to give me and the user more precise error information with potential contextual/semantic hints. Raw JSON gives that possibility for the cost of added complexity.

Codable is great for structures that I have full control of, especially if I am their originator. It is also great for ephemeral data, that live just enough to be moved around. Not so great for something that is intended for long term storage where the original application is expected to evolve or when there might be different "foreign" applications producing the data.

However, reading foreing structuers is tangential topic to this post, I diverted slightly.