The future of serialization & deserialization APIs

Developers might be willing to update their code to a new API, but updating the encoding scheme itself will in many cases be out of the question because there will be a tonne of serialized data already out there, and for web APIs there will be programs in other languages that already support that scheme that aren't going to be rewritten to support changes to Swift.

Before finalising the design, perhaps you could create a repo for public submissions of existing JSON encoding schemes - similar to the Swift compatibility suite - that this system will need to be able to continue to support?

Also (related) some kind of built-in support for schema updates and migrations (similar to CoreData/SwiftData) would be a great feature, as this is another pain point in Codable.

Even just a way to specify a default value for new non-optional properties would reduce a lot of the need for adding manual decoder implementations to apps in post-1.0 releases.

10 Likes

Seconded.

The format not going to change point is a critical point. This new approach really has to accept and work with the reality that a large percentage of the time we're consuming server side API generated data and it is not going to change to meet the needs of a new Swift library (only one of the server clients).

Likewise cyclical references and polymorphic fields are not unheard of.

The other points are good as well.

I hope the "foundation" tag indicates that it'll be an open source implementation like swift-foundation so that more developers can help find bugs and fix them. Though the idea that it be a separate package seems worthy of consideration.

1 Like

In my eyes, any new solution should support Swift's Embedded mode. I don't see any blockers in supporting that with this approach, but feel it's warranted to mention.

While trying to create very similar macros previously, I've also felt that it's hard to emit the right compiler error/warnings into the sourcecode. Particularly, if you have nested types those would need to (likely) conform to your protocols. Right now, a macro has no way to know that members conform to certain protocols. That makes debugging these macros in user apps more painful.

5 Likes

So just touching on zero-copy representations for a minute (e.g. flatbuffers/capnproto) - was that considered at all?

E.g. we are currently using a Swift in code definition of our data models, which we use as a basis for generating flatbuffers representations (the schema) and glue that back together with codegen, so we have an abstract representation that can either be a zero-copy object (from e.g. an mmap() on a read only memory page, or directly from the network) - or it can be an actual Swift type (struct) when just used locally. Flatbuffers really buys us zero-copy and evolution capability.

We do want to support zero-copy and we do want to support evolution of the serialised data format without breaking things - it seems this new support will give us the latter, but not the former?

Also what level of flexibility is considered - would parsing/generating arbitrary formats (with proper macro annotations) similar to what e.g. KaiTai struct supports be in scope and something that would be tried to be supported? E.g. if you have a wire format that is defined, but that is e.g. fixed format, or with key/value pairs (but not JSON) - is this something that should work?

Overall, just try to understand how wide range of ser/des tasks that is expected to be covered.

5 Likes

As happens with Codable today, we need the stdlib to be able to conform to the format-agnostic protocols, which means it isn't possible for that to live in a package.

As for where the JSONCodable protocols live, that is still TBD. I definitely see the opportunity here to lower this below Foundation to allow broader adoption. On the other hand, there may be an argument to continue to allow clients to set encoder-/decoder-wide encoding strategies for Foundation-only Data and Date types which can't be expressed outside of Foundation.

Perhaps there's some kind of "layered" approach where the core Data/Date-ignorant encoder/decoder live in a package, and another one in Foundation wraps it to add this support for Data/Date.

Excellent question. I'm trying to figure out if there's some way to express in a macro that an annotation should only apply if the type is not directly supported by the encoder being used. So for instance, if the same struct happens to be used for both JSONCodable and PropertyListCodable, only JSON would use the CodingFormat(.iso8601) annotation.

I'm still figuring out the full extent of capabilities of macros, so I'd love to hear ideas in this vein.

Your design already wants to have format specializations. I imagine that right now it looks like this:

extension Date: JSONCodable {
	init(container: JSONContainer)
	func encode(to container: inout JSONContainer) throws
}

Instead you could implement specializations as their own type, still specific to both a data type and a container type:

struct ISO8601DateCoder: JSONCodingSpecialization { // "YYYY-MM-DD"
	func encode(_ date: Date, into container: inout JSONContainer)
	func decode(_ date: Date, from container: JSONContainer)
}

struct UnitedStatesDateCoder: JSONCodingSpecialization { // "MM-DD-YYYY"
	func encode(_ date: Date, into container: inout JSONContainer)
	func decode(_ date: Date, from container: JSONContainer)
}

and then @CodingFormat could take in a specialization.

Yes, supporting embedded mode is a desirable goal here. As we go through this process of designing the APIs, I will make sure it builds properly in embedded mode and will request feedback from experts in that area to avoid any potential pitfalls.

The design is currently free from existentials, and my understanding is that the limited use of metatype parameters as type hints is indeed compatible. This case is one reason why it will be very important to have a design that avoids the need for dynamic type casting like JSON and property list coders in Foundation currently do.

1 Like

I would encourage considering the design for use with encoding/decoding CBOR and in particular how COSE extends that for subobject (partial object) encryption.

CBOR features heavily in forthcoming IETF standards and it would be great if Swift were a best fit for modern internet protocols.