Serialization in Swift

Another case to study: PotentCodables, I think it is a well written Swift Package bridgin many formats together, even CBOR, kudos the author.

My main gripe with Codable is that it tries to be a mediator of multiple serialization formats which is a feature nobody asked for. Creating a single protocol to abstract the different formats away is a neat exercise, like PropertyListDecoder vs JSONDecoder, but in practice is something that's never relevant. It then limits how you can design the decoder (something highly specific to the format) and leads to worse performance (arguably the most important metric), something we've already seen with JSONDecoder.

For developers this means we ended up with a "Jack of all trades, master of none" solution, when we'd rather have an "Expert in one" for each domain. Keep in mind that many developers are coming from backgrounds in other languages where this is largely a solved problem, so we had expectations that Codable would be a step forward and not backward.

At this point I'm afraid further changes will make the problem worse with fragmentation. Is it possible for future iterations to exist as a standalone library? I'd also prefer Apple to continue investing in language features (like property wrappers, code synthesis, etc) that allows the community to write better serializers.

13 Likes

Yes it is very valuable to give specific examples of how a certain format / optimization can not be (efficiently) implemented in Codable, or how some additional customization options could improve them. That is exactly what we are looking for. I just saw some comments while scrolling through the thread that sounded more aimed towards shortcomings of specific codec implementations, rather than Codable itself and that's what my comment was about.

Maybe this is a feature that would be useful for serialization evolution

8 Likes

Codable is something I’ve worked with extensively both from a client API/library POV and working through custom (En/De)coders, so I have some thoughts :slightly_smiling_face:

Here is, from my user perspective, what I like about the current Codable approach and how I think those strengths can be improved.


1. The ideal use case is declarative and simple to implement. - Adopt Codable and I’m ready to go!

Ways to improve this:

A. Enable custom serialization without customizing (En/De)codable implementations.
Something as simple as converting an Int to a String should be possible with a declarative interface rather than implementing the (en/de)code manually.

The most currently accessible API choice for this is likely Annotations. I’ve been able to achieve much of this functionality with PropertyWrappers in CodableWrappers. However PropertyWrappers have limitations, a small-but-unavoidable performance cost, and affect the ergonomics of those properties.

The idealized version of this in my head would be user-customizable and tie in directly to the generated (En/De)codable implementation and handle things like converting between Types, handling empty/error cases more gracefully, using custom Date encodings, etc.

B. Add declarative support for additional real world APIs currently impossible to represent without (en/de)coding manually
Some APIs don't follow best practices. The simplest examples are a property that can be an Int or Bool, or a mixed array: [0, “7 Results”, false].

I would prefer to not deal with bad formats like that, but they exist. Codable handling this gracefully would be a big plus. There are a number of ways this could be added, e.g. with protocols, custom serialization of a property, or for an entire Type.

Again my idealized version would involve hooking into the generated (En/De)codable implementation.

C. Allow customizing Keys without overriding the entirety of CodingKeys
This is something I and others have been wanting since day one!

Again Annotations are a good option here. Simply enabling options like @CodingKey(“MyCodingKey”) and @SnakeCaseCodingKey would replace almost all the custom CodingKeys implementations I’ve written.

And again, my idealized version would tie in directly to the generated CodingKeys implementation.

2. It’s ambivalent towards Format. Nothing extra is required to support JSON/YAML/etc.

Ways to improve this:

A. Support serialization formats that (En/De)coder currently cannot support without non-standardized additions
My main experience with this is XML Attributes but I’m sure there are others.

Adding this would likely require additional containers the relevant Serializers respect and simpler Serializers treat as a simpler Container such as KeyedEncodingContainer. Annotations and Protocols could both be used to indicate this in a declarative way. This would depend highly on the implementation.

B. Make implementing a new (En/De)coder simpler.

As others have pointed out, understanding the full API needed to implement MyDataFormatEncoder is pretty opaque without just copy-pasting an existing implementation and even then it took a while for me to grasp how it all fits together. I’m optimistic a simpler abstraction can be built on top of the existing Types, but I don’t understand the problem space well enough to know if there may be a better solution.

C. Official support for more formats
There are currently only 2 shipping with the language.

Whether included in Foundation or blessed as an approved/maintained serializer, there is a lack of availability (and/or visibility) of additional options, or even what the requirements would be for such options.


Other Thoughts

In terms of performance, that’s something I thankfully haven’t had to be too concerned about, but I know it’s something that has caused some to move away from using Codable at all. My primary question in this context is whether the performance issue is something inherent to the current approach of Codable or something the individual (En/De)coders) can optimized in the current environment?

Regardless, I doubt it can be optimized to the point where low-level or resource-sensitive environments will be able to use it due to the relatively high level of abstraction. In that context my vote would be for a new API (or new (En/De)coders) ok with sacrificing some the safety and abstraction for raw performance.


Wrapping Up

Those are some of my high-level thoughts on the pain points I've hit. I'm considering throwing together a broader more detailed writeup. If anyone’s interested in working on something like that feel free to DM me! :slight_smile:

9 Likes

The most painful point I have with Decodable is that it's all or nothing. If a single property out of 10 mismatch the json format, or needs a default value, or is nested in a sub-dictionary, you have to override the whole init(from: Decoder) and re-implement everything.

Like @anandabits or @GetSwifty above, I ended up by setting up a way to annotate declaratively my properties to customise the way they are decoded using Decodable.

In fact, I just open-sourced recently BackedCodable which is a single Property Wrapper allowing a per property customisation so maybe I can list everything all requirements I had and I implemented on this property wrapper and you could consider all these features as something I'd like in Decodable by default.

1. Powerful key management
I want to be able to specify a key path to a property instead of a single key. With json responses following the JSON API spec most of the time you have the "id" and the "type" at the payload root, but everything else is in an "attributes" payload. In my models I want all properties flatten in a single type.

Also, it's quite common to have different keys / key paths for a same properties across different endpoints. Ex: zip_code and postal_code.

2. Per property customization
With BackedCodable, not only I can specify a custom key path for each property but I can also provide:

  • a date decoding strategy
  • a default value
  • decoding options (ex: .lossy to allow lossy collections decoding)

3. Delegate Decoding Strategy
Some types don't have a single intrinsic representation. Dates and Data already support delegate decoding strategies, but we should be able to do that with any type. For exemple: UIColor can be encoded in many ways:

"
    "title_color": "DF5C43",
    "foreground_color": {
        "hue": 255,
        "saturation": 128,
        "brightness": 128
    },
    "background_color": {
        "red": 255,
        "green": 128,
        "blue": 128
    }
"

and again, I'd like to be able to choose a different decoder for each property.

That said, I agree with what has been said above: I don't think the property wrapper is the best solution - at least without additional features as I had to accept some trade-offs.

You can see all these features in action in this file

15 Likes

My use case is 3D games. I currently use Codable with the JSON serializer to import assets at runtime, but it’s too slow.
3D geometry ends up with bazillions of keys. There’s way too much work happening and I already know the type of data at compile time, because the same code encoded it.

I’d like to see more speed for high confidence data, like assets shipped with an application. A simple hash should be enough to maintain safety at runtime.

If Codable is kept and expanded we should also have a binary serializer included with Swift. I need to save/load compact data that is verified at compile time. Doing this seems just as important as JSON, though I can see the argument expanded to XML and more, so maybe thats not the right solution.

3 Likes

JSON seems to be well suited for basic class/struct layouts that contain Integers, Strings, 'Dates' where type doesn't have strong prominence or at the very least the 'type' of the variables are exactly what will be encoded and decoded with very little leniency. It's difficult --for example-- to encode or decode 'Any' or an Array of Codables with varying types. My thesis: JSON is the wrong format to be focusing on.

Swift is strongly typed, JSON is not. We keep trying to figure out ways to make JSON fit in a Swift world -- and it almost works fine. As I have been moving from XML (which is way more expressive) to JSON which is more compact and less expressive, I have found it frustrating that there isn't something in-between which allows for us to specify the type when it can't easily be assumed.

I have looked into the SION project that allow way more expressiveness with very similar expectations that we have for JSON and the same (similar) compact format -- the difference being that Swift literals are used. I can imagine that SION can be only slightly extended to include type data.

SION is as compact as JSON but more expressive. The developer has also gone the distance and generated JavaScript, Go, Swift, Rust, and Python 3 implementations.

Here is an example of SION form the developer Dan Kogai:

[
    "array": [
        nil,
        true,
        1,      // Int in decimal
        1.0,    // Double in decimal
        "one",
        [1],
        ["one" : 1.0]
    ],
    "bool": true,
    "data": .Data("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"),
    "date": .Date(0x0p+0),
    "dictionary": [
        "array" : [],
        "bool" : false,
        "double" : 0.0,
        "int" : 0,
        "nil" : nil,
        "object" : [:],
        "string" : ""
    ],
    "double": 0x1.518f5c28f5c29p+5, // double in hex
    "ext": .Ext("1NTU"),            // 0xd4,0xd4,0xd4
    "int": -42,                     // int in hex
    "nil": nil,
    "string": "漢字、カタカナ、ひらがなの入ったstring😇",
    "url": "https://github.com/dankogai/"
]

SION's expressiveness comes form values and keys being enums cases. So if you want to specify that a string should be interpreted as data you write .Data("FF888AA"). This could easily be extended to allowing for more typed data such as objects .Object(type: "MyClass", init: [10, 15,32]) or something similar maybe even .MyClass([10, 15,32]) or MyClass([10, 15,32]).

JSON is really not suited for the complex issues we are trying to solve, because we want badly for the type information to be divorced from the data graph -- which only mostly works -- but it can't always work and leaves us stuck in a very rigid world where we are trying to find cleaver ways to store things in everything in String literals.

I hope that we consider the possibility that JSON might not be the right format and that something like SION should be adopted.

1 Like

The Swift community cannot unilaterally change what serialization formats are in use by the wider ecosystem. Should Swift find a way to support things like SION? Sure, absolutely. Should Swift stop focusing on the JSON use-case? Absolutely not. JSON is currently the overwhelmingly most common serialization format on the web. An order of magnitude more users will use JSONEncoder/JSONDecoder than anything else. That use-case should still remain an extremely high priority.

17 Likes

I agree with @Jon_Shier here. Something like property wrappers for easy coding customization is desirable, but property wrappers themselves leak implementation details by transforming the types of your Codable object and that is not desirable, IMO.

I think something like Rust's #[attribute] system would be more fitting for this kind of metaprogramming. It's possible that this kind of support for extensibility could be added to Swift's @attributes, but there are a lot of open questions about how that could work in general and how it would work with the existing @attributes and @_attributes.

1 Like

There are up to 4 "pieces of code" related to each property:

  1. The property declaration itself (which gives you the name, and type)
  2. The coding key definition (which duplicates the property name, and gives you the coding key string value)
  3. The encoding logic in init(from decoder: Decoder) (which duplicates the coding key name)
  4. The decoding logic in encode(to encoder: Encoder) (which duplicates the coding key name)

These 4 things can be really far apart when you have lots of properties, or complex encoding/decoding logic.

This makes understanding a particular property more difficult, and I think it adds to the surprising ease with which you can accidentally write a encoding/decoding pair which can't correctly round-trip a value, as others have mentiond.

Simple Example

(Scroll right to see my professional illustration)

struct Person: Codable  {
    let firstName: String
    let lastName: String
    let age: Int

    private enum CodingKeys: String, CodingKey {
        case firstName = "first_name" // ---------------------------------------------+
        case lastName = "last_name"   // ---------------------------------------------|---+
        case age // ------------------------------------------------------------------|---|---+
    }                                                                          //     |   |   |
                                                                               //     |   |   |
    init(from decoder: Decoder) throws {                                       //     |   |   |
        let container = try decoder.container(keyedBy: CodingKeys.self)        //     |   |   |
        self.firstName = try container.decode(String.self, forKey: .firstName) // <---+   |   |
        self.lastName = try container.decode(String.self, forKey: .lastName)   // <---|---+   |
        self.age = try container.decode(Int.self, forKey: .age)                // <---|---|---+
    }                                                                          //     |   |   |
                                                                               //     |   |   |
    func encode(to encoder: Encoder) throws {                                  //     |   |   |
        var container = encoder.container(keyedBy: CodingKeys.self)            //     |   |   |
        try container.encode(firstName, forKey: .firstName) // <----------------------+   |   |
        try container.encode(lastName, forKey: .lastName)   // <--------------------------+   |
        try container.encode(age.name, forKey: .age)        // <------------------------------+
    }
}
7 Likes

I really dislike the date format as property wrapper approach. In Lambda we have used this a couple of times and the only benefit I can see is less code to maintain...

Reasons against this approach:

  1. the property wrapper demands that the property is defined as var which is not intended here.
  2. the date encoding can not be changed without a new major version, since the property wrapper is part of the public api. (Bleeding implementation details)
  3. every access of eventTime has the overhead of going through the property wrapper.
1 Like

Maybe it would be interesting to describe the coding keys through a sort of Tree.


struct Coordinate {
    var latitude: Double
    var longitude: Double
    var elevation: Double
}

{
    "latitude" : 39.73915360,
    "longitude" : -104.98470340,
    "additionalInfo" : {
        "elevation" : 1608.637939453125
    }
    
}

Tree {
    Property("latitude", \.latitude)
    Property("longitude", \.latitude)
    Container("additionalInfo") {
        Property("elevation", \.elevation)
    }
}

This tree could be automatically synthesized and mutable, so it wouldn't be necessary to create it from scratch to change something.

// Synthesized Tree for Coordinate Struct
Tree {
    Property("latitude", \.latitude)
    Property("longitude", \.latitude)
    Property("elevation", \.elevation)
}


extension Coordinate {
    func codingTree(tree: Tree<Self>) -> Tree<Self> {
        tree
            .replace("elevation") {
                Container("additionalInfo") {
                    Property("elevation", \.elevation)
                }
            }
    }
}

1 Like

With reference types Codable is simply broken.
Easy example:

//	THE ELEPHANT IN THE ROOM

import Foundation

class Ref : CustomStringConvertible, Codable {
	var	name : String
	
	init( name: String ) {
		self.name	= name
	}
	
	var description: String {
		return "\(name) (\(String(format: "%p",unsafeBitCast( self, to: Int.self ))))"
	}
}

let ref		= Ref( name: "Ref" )
let encoded	= [ ref,ref ]
let data	= try! JSONEncoder().encode( encoded )
let decoded	= try! JSONDecoder().decode( [Ref].self, from: data )

print("Serialized:\n\t\( encoded ) --> same reference? \(encoded[0] === encoded[1]) ")
print("Deserialized:\n\t\( decoded ) --> same reference? \(decoded[0] === decoded[1]) ")

/*
OUTPUT:

Serialized:
	[Ref (0x10059eb20), Ref (0x10059eb20)] --> same reference? true
Deserialized:
	[Ref (0x10059bc10), Ref (0x10059bb20)] --> same reference? false
Program ended with exit code: 0
*/
1 Like

After having written a number of encoders/decoders these are the main issues I keep coming across. Many of these have already been brought up

  1. Difficulty of applying different encoding strategies for different properties. The one many people have brought up is the date formatting, but nobody seems to have mentioned encoding strategies for Collection objects. When encoding XML there are multiple ways to encode an array or a dictionary. Many include adding additional nodes with custom names.
    Like many other people I have gone down the property wrapper route to define these. I'm not overly keen on this route but it seems to be the only way without completely complicating the encoder/decoder. Property wrappers do have one major limitation though when used in this way. You can't use them to store instance information about how to decode a particular variable. While
@ISO8601DataCoder var date: Date

would decode fine

@DateCoder(formatter: ISO8601Formatter()) var date: Date

would not as the DateCoder property wrapper has not been initialized at decode time.

  1. Decoding is brittle. As soon as a decoder fails to find a value it throws an error. An ability to provide an alternative if no value is found would go a long way to help resolving this.

  2. Writing encoders/decoders is painful. Maybe it is part and parcel of the process but I don't feel Codable is helping. Generally the process involves managing a temporary tree structure separate from the encoding/decoding containers which you throw away at the end. And then once you've written your basic encoder that covers most cases you end up rewriting it to so you can support nestedContainers.

  3. This is a much more specific point but the core Swift UnkeyedContainers ie Array and Set do not use the specialized decode/encode functions for base types so encoding a [String] uses UnkeyedDecodingContainer.decode<T>(_: T.Type) which then creates a SingleValueDecodingContainer for each String instead of using UnkeyedDecodingContainer.decode(_: String.Type). This is a real performance sink if decoding large arrays of Int, String etc.

The main improvement I'd like to see though is some form of declarative syntax to define how variables should be encoded/decoded. Property wrappers are not powerful enough to provide this at this point in time and possibly in the end are not the best route to choose to provide this given their impact on the runtime. A declarative syntax would go a long way to resolving points 1 and 2.

2 Likes

It‘s not broken, it‘s how most serialization libraries / formats work and is also nothing that Codable itself has any control over, but solely depends on the codec / format. In your example you are using JSON, which does not have any support for references, so the encoded data contains every value as a separate entry. There is simply no way for the decoder to figure out if they should be the same reference.

7 Likes

I wish there was a way to better handle coding/decoding arrays of generic types. For instance, if I have GenericObject class, and then SpecificAObject: GenericObject and SpecificBObject: GenericObject, if i encode an array of let objects: [GenericObject] then I get some nice JSON. but there's not an easy way for me to decode that json back into the SpecificAObject and SpecificBObject classes. I would love a way to be able to tell the decoder which of my subclasses to use when decoding.

I often hear the comparison of "in Objective-C use NSCoding, in Swift use Codable`, but it solves dramatically fewer problems than NSCoding, inheritance being one of them.

Thanks for your work, i'm excited to see where all this goes!

7 Likes

To date, Codable supports two formats (JSON and PropertyList) and both exhibit the same problem.
Therefore it is correct to say that Codable does not support reference types, except for the most trivial applications.
Give me a third format capable of storing reference graphs and I'll change my judgment.
Objective-C solved this problem 30 years ago.

Again, this is not an inherent property of Codable. Codable is an API abstraction for serialization. We are not talking about specific formats here, but about the abstraction and how to improve it. Nothing in Codable prevents a library from implementing support for references. Please keep this thread on topic.

Edit: Codable supports many more formats. The two you mention are only the ones available in Foundation, but there are many more available as third party libraries.

FWIW, Un/keyedEncodingContainer has encodeConditional for references. Though, frankly speaking, I haven't seen any (third-party) coders deduplicate the encoded object.

1 Like