DeepCodable: Encode and decode deeply-nested data into flat Swift objects

MPLewis · July 22, 2022, 6:46am

I recently built DeepCodable, a package to encode and decode arbitrarily-nested data into flat Swift structs, by defining the coding paths with a result builder. I personally have been wanting something like this for a long time when interacting with third-party APIs, so I decided to build it.

As a concrete example, if you wanted to decode some nested JSON like this:

{
	"a": {
		"b": {
			"c": {
				"d1": {
					"e1": "Some value" 
				},
				"d2": {
					"e2": {
						"f2": "Other value"
					}
				}
			}
		}
	}
}

Your only option today with normal Codable (without writing a custom init(from:) implementation) looks something like this:

struct SomeObject: Codable {
	struct A: Codable {
		struct B: Codable {
			struct C: Codable {
				struct D1: Codable {
					let e1: String
				}
	
				struct D2: Codable {
					struct E2: Codable {
						let f2: String
					}
	
					let e2: E2
				}
	
				let d1: D1
				let d2: D2
			}
			
			let c: C
		}
	
		let b: B
	}
	
	let a: A
}

That's certainly expressive of the underlying structure of the data, but it's a lot of nested type definitions and instances created for what amounts to decoding two values. And, in my experience with data like this, you often want to pull the actual values into a flat object anyways, meaning you end up having to write another layer of types to translate between the two structures.

With DeepCodable, you can instead write something like this:

struct SomeObject: DeepCodable {
	static let codingTree = CodingTree {
		Key("a") {
			Key("b") {
				Key("c") {
					Key("d1") {
						Key("e1", containing: \._e1)
					}
					
					Key("d2") {
						Key("e2") {
							Key("f2", containing: \._f2)
						}
					}
				}
			}
		}
	}

	@Value var e1: String
	@Value var f2: String
}

To me, this is a lot more expressive of how the object should map to its serialized version, and lets you work in Swift with the decoded values much more easily.

This topic has come up many times over the years, and all the proposed or existing solutions I could find tried to overload the meaning of . in a coding key to mean a sub-key, where dots can absolutely be part of valid literal keys in JSON (and many other formats). In addition, if you have multiple fields you want to encode/decode at the bottom of some deep hierarchy, you end up rewriting the intermediate paths multiple times instead of expressing your coding paths in a format closer to the actual structure.

Here are some of the key features of DeepCodable:

Provides custom init(from:) and encode(to:) implementations
- Compatible with all existing encoders and decoders, not just JSON
Actual values get decoded using completely normal Codable semantics
- If you're trying to decode a normal Codable object at the bottom (or another DeepCodable object!), it will just work
If using just DeepDecodable or DeepEncodable, don't interfere with the other direction of encoding (Encodable or Decodable, respectively)
- You can absolutely decode something from a deeply nested representation and then right back to the normal flattened Encodable representation, or vice-versa
Conformance can be added by an inheriting protocol by providing the codingTree static property
- For instance, I built a GitHub GraphQL client on top of DeepCodable, which can both generate a GraphQL query string and the codingTree from a result builder representation of the GraphQL query structure, allowing automatic decoding of GraphQL responses

Anyways, would love to see what the community thinks, and get any feedback you might have!

tera · July 22, 2022, 11:37am

MPLewis:

struct SomeObject: Codable {
	struct A: Codable {
		struct B: Codable {
			struct C: Codable {
				struct D1: Codable {
					let e1: String
				}
	
				struct D2: Codable {
					struct E2: Codable {
						let f2: String
					}
	
					let e2: E2
				}
	
				let d1: D1
				let d2: D2
			}
			
			let c: C
		}
	
		let b: B
	}
	
	let a: A
}

To me that's totally fine struct. As a minor point I'd like Codable been implicit ("if all fields are Codable the struct is Codable") to reduce some visual noise.

I wonder what will happen to that machinery if e2 or f2 above both or individually were named e1 (which is totally valid use case with names like "id", "name", "type", "value", etc).

On that point I'd like to see something like this in the future:

struct S: Codable {
    @json("a different name here") var field: Int
}

Gero · July 22, 2022, 12:09pm

I think that is not the point. Of course the struct is fine, but the question is "For what purpose do I define a type?" In the scenario @MPLewis lays out you do not need it anywhere in your code (if you do, sure, use regular Codable!). It's a result of an API wrapping the data you are interested in into other objects you are never interested in.

So with plain Codable, you basically define a type just to get the compiler to synthesize the necessary CodingKeys enum and init(from:) for you (as that's easier to do and read than writing a custom init(from:) and have a bunch of non-obvious CodingKey conforming private enums).
In the end it's basically just a trick to get some autogenerated code, you wouldn't define these types otherwise. Sure, they don't hurt (besides a standard SwiftLint rule chastising you for too much nesting), but they don't really do anything for you besides taking up lines of code.

I've personally worked with unreasonably nested API responses, which is why I definitely think the library has merit and is very useful, so kudos from me!

@MPLewis: I especially like you seem to try to stay as much within the "regular" Codable and not touch it. I didn't have the time to look into your code, but I assume your result builder/property wrapper basically acts like what the compiler synthesizes for Codables, just respecting the nesting structure you define with the Keys, correct?
I think you could then in the future perhaps provide another initializer for the CodingTree type that takes an array of strings or some other "format language" (think like the NSLayoutConstraint format strings) to construct the tree/chain of Keys needed, right? It's just an idea, but depending on how your payload looks this would skip you the literal syntax tree (i.e. line breaks and indents) if you want to avoid it and/or bracket hell.

samdeane · July 22, 2022, 4:10pm

This looks cool - thanks for building it.

I haven't thought this through (), but might it be possible to collapse groups that only exist in order to access deeper structures, without needing to interpret something as a separator, using variadic arguments?:

struct SomeObject: DeepCodable {
    static let codingTree = CodingTree {
        Key("a", "b", "c") {
            Key("d1", "e1", containing: \._e1)
            Key("d2", "e2", "f2", containing: \._f2)
        }
    }
    
    
    @Value var e1: String
    @Value var f2: String
}

MPLewis · July 22, 2022, 7:35pm

@Gero hit the nail on the head here - if the existing Codable nested-type definitions work for you and express the data you're working with well, that's absolutely fine. DeepCodable really shines when the serialized data representation doesn't match your Swift object models, and you want a way of expressing "I want a flat(ter) Swift object because of the semantics of the data I'm operating on in Swift, but I'm also bound to some external API specification and need a way to convert between the two".

Dealing with the concrete example, if you truly only cared about e1 and f2 and decided to go with the normal Codable implementation, your options for accessing those properties are:

object.a.b.c.d1.e1 and object.a.b.c.d2.e2.f2
Writing computed properties to forward reads and writes down to the underlying values
Writing some other layer of translation to a flatter Swift object model (maybe another type that can initialize itself from a SomeObject instance?)

This all is an incredible amount of boilerplate that isn't very expressive of your Swift object model - instead, you're binding your Swift types to some third party's decisions that they made for entirely different reasons. With DeepCodable, you can simply bridge to and from these external representations at the serialization boundary, which feels more natural (to me at least) and removes a lot of the boilerplate you'd have to do otherwise.

MPLewis · July 22, 2022, 7:40pm

I'm using entirely native Codable containers when traversing the serialized data, so this isn't a problem at all - identical key names that have different parent nodes are parsed just as they would be with normal decoding.

Effectively, this entire package boils down to manipulating the underlying Codable containers for all the intermediate keys so you don't have to, then passing back off to the normal encoding/decoding for the actual values.

MPLewis · July 22, 2022, 7:59pm

Much appreciated! Nested API responses are exactly what finally tipped me over to building this.

Exactly! I'm basically just translating the codingTree you define into a series of KeyedDecodingContainer.nestedContainer(keyedBy:forKey:) calls (and the corresponding call for encoding), and then when I finally reach something that contains a value I pass right back off to the normal Codable implementation for the actual decoding. It means you can do things like this if you want to decode, say, a list of some more complicated object that just happens to be buried below a bunch of other keys:

struct OtherObject: DeepDecodable {
	static let codingTree = CodingTree {
		Key("a") {
			Key("b") {
				Key("c", containing: \._children)
			}
		}
	}

	struct Child: Decodable {
		struct YetAnotherChild: Decodable {
			let property: String
		}

		let id: Int
		let yetAnotherChild: YetAnotherChild
	}

	@Value var children: [Child]
}

Absolutely. I've actually already done this in my GitHub GraphQL client, though admittedly you're specifying the coding tree in a very similar format. But the exact same principles can be applied to translate from one specification format into the CodingTree expected by DeepCodable, and it will decode just fine - all that has to be done is emit a tree of coding nodes, and everything else is taken care of. I'll definitely take a look at NSLayoutConstraint format strings and see how easy it would be to tack on.

MPLewis · July 22, 2022, 8:06pm

@samdeane thanks for the suggestion, that's a great ergonomic improvement! And it will also be very simple to tack on, just need to add an initializer to create a series of nested nodes in that tree instead of just one.

One other thing I've been thinking about is that I think that in Swift 5.7 (specifically, with buildPartialBlock available) I can simply reduce the intermediate keys to raw string literals, like:

struct SomeObject: DeepCodable {
	static let codingTree = CodingTree {
		"a" {
			"b" {
				"c" {
					"d1" {
						Key("e1", containing: \._e1)
					}
					
					"d2" {
						"e2" {
							Key("f2", containing: \._f2)
						}
					}
				}
			}
		}
	}

	@Value var e1: String
	@Value var f2: String
}

That would also help shave off some of the boilerplate - I'll be playing with that and your suggestion next (and seeing if there's even some way to combine the two into a super minimal syntax) as 5.7 gets closer to landing.

davdroman · July 23, 2022, 10:16am

I wasn't aware of this possibility. How would the buildPartialBlock implementation look like in order to enable that?

MPLewis · July 25, 2022, 12:53am

I just did some digging using the 5.7 nightly image and it doesn't actually look like this will work, with the above simplified result builder not even passing syntax checking. My previous hope with buildPartialBlock (which was not based on any actual testing, just a hope from reading the proposal) was that writing something like this (completely in the abstract, not thinking about DeepCodable specifics):

Tree {
	"a" {
		"1"
		"2"
	}
}

would get transformed into a series of calls like:

buildExpression(_ string: String) -> Node, with input parameter "a"
buildExpression(_ string: String) -> Node, with input parameter "1"
buildExpression(_ string: String) -> Node, with input parameter "2"
buildExpression(@Builder _ builder: () -> [Node]) -> ChildContainerNode, with a builder representing nodes "1" and "2"
buildPartialBlock(accumulated: [Node], next: ChildContainerNode) to combine the result of steps 1 and 4

and then I'd be able to do some transformation to attach an instance of ChildContainerNode to the last Node instance, even if that ends up being a little painful internally.

However, compiling this gets me error: consecutive statements on a line must be separated by ';', so it doesn't look like result builder syntax is taken into account at the time I had hoped it would be during compilation.

The closest I seem to be able to compile is:

Tree {
	"a"; {
		"1"
		"2"
	}
}

and this works as I had expected, with me being able to attach the child nodes to the intended parent in buildPartialBlock.

Whether any of that is a good idea if it worked is an entire other question, since I'd now have to account for someone doing something like:

Tree {
	"a"
}

without ever specifying children, and I no longer would be able to enforce at compile-time in DeepCodable that someone provides either a target KeyPath or a result builder for children like I can now - those checks would have to be postponed until runtime.

Anyways, looks like my original idea won't be possible in 5.7 after all, though I'd love to hear if anyone with more result builder experience than me has any other thoughts.

MPLewis · July 26, 2022, 9:14am

@samdeane I've implemented your suggestion, thanks again!

Now it's possible to specify a coding tree for the original example I gave exactly like you had suggested:

struct SomeObject: DeepCodable {
    static let codingTree = CodingTree {
        Key("a", "b", "c") {
            Key("d1", "e1", containing: \._e1)
            Key("d2", "e2", "f2", containing: \._f2)
        }
    }
    
    
    @Value var e1: String
    @Value var f2: String
}

I've added test cases to cover this new syntax, but feel free to let me know if there are any issues.

Also bumped the requirements down to Swift 5.4, if anyone was trying to use this in older projects. I could potentially also support back to 5.1 using the older @_functionBuilder attribute, but that would require a little bit of surgery for availability checks - I'm happy to implement it if anyone wants it, but won't if there's not a demand for it.

samdeane · July 26, 2022, 9:56am

Superb - thanks!

I don't have an excuse to use it right now, but I'm sure I will before long, so will shout if I hit any snags :)

Gero · July 27, 2022, 7:04am

Btw, that's also pretty much what I meant. The format strings were just an example for a way to "shorthand" this (I also thought using an array of strings is better -> no magic string(part)s).
It's nice that now you can "compact" the tree (visually) a bit, but also have the option to "write it out" so it's easier to recognize if you have, for example, seen a pretty-formatted JSON payload before.

Atm I unfortunately have no usecase for this (I am involved in the definition of our current internal API and I am its only client so I get a say in avoiding unnecessary deep structures ), but I will mark it down for future uses for sure!

MPLewis · August 2, 2022, 9:34am

Circling back to making the syntax even more minimal, I realized I could solve some problems with a custom operator, and have a minimal example working like this in Swift 5.6:

struct SomeObject: DeepCodable {
	static let codingTree = CodingTree {
		"a" => "b" => "c" => {
			"d1" => "e1" => \Self._e1
			"d2" => "e2" => "f2" => \Self._f2
		}
	}
  
	@Value var e1: String
	@Value var f2: String
}

One drawback that I haven't been able to work around yet is needing to explicitly specify Self in the key path, but I'll keep poking at it and see if there's something I can do about it. I don't think there is much hope though - currently, Key is a pre-specialized typealias to work around the same type inference issue, but with this minimal syntax I don't have much in the way of tools to give the compiler additional type context.

So, is this a net improvement over the more verbose syntax? It does cut down on some of the boilerplate, but want to make sure clarity isn't lost in the process. Both styles can be mixed as well, if you really wanted to for whatever reason.

Also, does anyone have any thoughts on bikeshedding the actual operator? I picked => due to its precedence in PHP as a "maps to" operator, and one of the few relatively short operators that hadn't already been defined with another associativity than I needed (right).

jjrscott · August 3, 2022, 10:40am

Bit of inside, but I really feel you wrt to this comment/requirement:

public protocol DeepDecodable: _DeepCodingTreeDefiner, Decodable {
	/**
	Initialize an instance with only default values.
	This empty initializer is required for decoding, as the implementation uses `KeyPath`s to fill in the instance's properties as it decodes the encoded tree.
	This also precludes the use of `let` properties as the values must be modifiable by the decoding implementation during tree traversal.
	*/
	init()
}

I appreciate that it needs to be that way (right now) but still

MPLewis · August 3, 2022, 6:39pm

Yeah, definitely wish there were a way to drop that requirement, somehow being able to provide key paths and their values to an initializer instead. I also considered autogenerating some of that code with something like Sourcery, but in order for that to be anywhere near a good experience there would have to be good SPM plugins available and that doesn’t seem to be something that’s on the horizon yet.

Luckily the property wrappers take care of hiding away a lot of the side effects of needing an empty initializer, and I’ve only put that requirement in place where absolutely necessary (decoding only, encoding can simply use normal properties/initializers).

timothycosta · August 11, 2022, 3:22am

Just my 2c but the previous Key("a", "b", "c") version was clearer.

Though I don't have an immediate use for this project, I love what you've done!