The future of serialization & deserialization APIs

Developers might be willing to update their code to a new API, but updating the encoding scheme itself will in many cases be out of the question because there will be a tonne of serialized data already out there, and for web APIs there will be programs in other languages that already support that scheme that aren't going to be rewritten to support changes to Swift.

Before finalising the design, perhaps you could create a repo for public submissions of existing JSON encoding schemes - similar to the Swift compatibility suite - that this system will need to be able to continue to support?

Also (related) some kind of built-in support for schema updates and migrations (similar to CoreData/SwiftData) would be a great feature, as this is another pain point in Codable.

Even just a way to specify a default value for new non-optional properties would reduce a lot of the need for adding manual decoder implementations to apps in post-1.0 releases.

11 Likes

Seconded.

The format not going to change point is a critical point. This new approach really has to accept and work with the reality that a large percentage of the time we're consuming server side API generated data and it is not going to change to meet the needs of a new Swift library (only one of the server clients).

Likewise cyclical references and polymorphic fields are not unheard of.

The other points are good as well.

I hope the "foundation" tag indicates that it'll be an open source implementation like swift-foundation so that more developers can help find bugs and fix them. Though the idea that it be a separate package seems worthy of consideration.

1 Like

In my eyes, any new solution should support Swift's Embedded mode. I don't see any blockers in supporting that with this approach, but feel it's warranted to mention.

While trying to create very similar macros previously, I've also felt that it's hard to emit the right compiler error/warnings into the sourcecode. Particularly, if you have nested types those would need to (likely) conform to your protocols. Right now, a macro has no way to know that members conform to certain protocols. That makes debugging these macros in user apps more painful.

5 Likes

So just touching on zero-copy representations for a minute (e.g. flatbuffers/capnproto) - was that considered at all?

E.g. we are currently using a Swift in code definition of our data models, which we use as a basis for generating flatbuffers representations (the schema) and glue that back together with codegen, so we have an abstract representation that can either be a zero-copy object (from e.g. an mmap() on a read only memory page, or directly from the network) - or it can be an actual Swift type (struct) when just used locally. Flatbuffers really buys us zero-copy and evolution capability.

We do want to support zero-copy and we do want to support evolution of the serialised data format without breaking things - it seems this new support will give us the latter, but not the former?

Also what level of flexibility is considered - would parsing/generating arbitrary formats (with proper macro annotations) similar to what e.g. KaiTai struct supports be in scope and something that would be tried to be supported? E.g. if you have a wire format that is defined, but that is e.g. fixed format, or with key/value pairs (but not JSON) - is this something that should work?

Overall, just try to understand how wide range of ser/des tasks that is expected to be covered.

5 Likes

As happens with Codable today, we need the stdlib to be able to conform to the format-agnostic protocols, which means it isn't possible for that to live in a package.

As for where the JSONCodable protocols live, that is still TBD. I definitely see the opportunity here to lower this below Foundation to allow broader adoption. On the other hand, there may be an argument to continue to allow clients to set encoder-/decoder-wide encoding strategies for Foundation-only Data and Date types which can't be expressed outside of Foundation.

Perhaps there's some kind of "layered" approach where the core Data/Date-ignorant encoder/decoder live in a package, and another one in Foundation wraps it to add this support for Data/Date.

Excellent question. I'm trying to figure out if there's some way to express in a macro that an annotation should only apply if the type is not directly supported by the encoder being used. So for instance, if the same struct happens to be used for both JSONCodable and PropertyListCodable, only JSON would use the CodingFormat(.iso8601) annotation.

I'm still figuring out the full extent of capabilities of macros, so I'd love to hear ideas in this vein.

Your design already wants to have format specializations. I imagine that right now it looks like this:

extension Date: JSONCodable {
	init(container: JSONContainer)
	func encode(to container: inout JSONContainer) throws
}

Instead you could implement specializations as their own type, still specific to both a data type and a container type:

struct ISO8601DateCoder: JSONCodingSpecialization { // "YYYY-MM-DD"
	func encode(_ date: Date, into container: inout JSONContainer)
	func decode(_ date: Date, from container: JSONContainer)
}

struct UnitedStatesDateCoder: JSONCodingSpecialization { // "MM-DD-YYYY"
	func encode(_ date: Date, into container: inout JSONContainer)
	func decode(_ date: Date, from container: JSONContainer)
}

and then @CodingFormat could take in a specialization.

Yes, supporting embedded mode is a desirable goal here. As we go through this process of designing the APIs, I will make sure it builds properly in embedded mode and will request feedback from experts in that area to avoid any potential pitfalls.

The design is currently free from existentials, and my understanding is that the limited use of metatype parameters as type hints is indeed compatible. This case is one reason why it will be very important to have a design that avoids the need for dynamic type casting like JSON and property list coders in Foundation currently do.

1 Like

I would encourage considering the design for use with encoding/decoding CBOR and in particular how COSE extends that for subobject (partial object) encryption.

CBOR features heavily in forthcoming IETF standards and it would be great if Swift were a best fit for modern internet protocols.

There's two modes of "streaming" support to consider:

The first is what both Rust Serde and this design sans async will support: pulling bytes from something like a file descriptor into a buffer on demand through synchronous APIs like read().

The second is full blown async support that can pull additional bytes from anywhere, including asynchronous sources.

I'm open to the idea, but I hesitate a little because I presume there will be some inevitable overhead imposed by suspension points at essentially every function call boundary, and the involvement of the Concurrency library. Happy to be proven wrong thoughā€”it shouldn't be too hard to sprinkle async throughout the prototype and make some measurements.

This should work just fine, as the expansions of @JSONCodable and @FooCodable should be completely separate and parallel extensions. One of the only intersection points to consider is the macro annotations on propertiesā€”ideally we'll be able to establish a common "vocabulary" of generic annotations that any format-specialized macro can pick up, like default-specifying macros or key name altering macros.

Perhaps it's poorly communicated, but this is one of the core tenets of the proposalā€”"format specialized" protocols. JSONCodable, PlistCodable, etc. should have full freedom to craft their interface around each format's individuals needs and specialities.

At one stage, the "format specialized" protocols was the entirety of the design. However, while looking at adoption scenarios, I realized that this design presented a problem with "currency" types that are owned by frameworks/libraries, but used by application-level serializable types.

The concrete scenario that stuck out to me was Range (to get specific, let's say Range<UInt64>). It's perfectly reasonable for a client's JSONCodable-compliant struct to want to include a Range<UInt64> as one of its serializable properties. However, Range lives in the standard libraryā€”it cannot conform to JSONCodable within the standard library. Well, then maybe the JSON package provides that conformance? It certainly could since the package is dependent on the stdlib. But that is neither a sustainable, nor generally applicable strategy. Suppose the stdlib adds another currency type that clients want to encode? Or suppose a client wants to encode a CGRectā€”the JSON package can't provide that conformance, and neither can CoreGraphics.

Hence the introduction of the format-agnostic protocols in parallel with the format-specialized ones. Range and CGRect can, in similar fashion to Codable, describe their serializable members abstractly, allowing a specific encoder/decoder to interpret those instructions. The difference from Codable being that we avoid all the OTHER downsides of Codable the OP describes.

A new JSONEncoder and PropertyListEncoder would have no problem taking values/types conforming to this format-agnostic protocol (I've been referring to it personally as CommonCodable, but this is very much a placeholder). Some formats (XML? CSV?) might be able to do the same with some compromises. Other formats might not be able to handle it at all. And that's OK. A specific format's encoder/decoder is allowed to omit CommonCodable support if that makes sense, but it means that clients of that format may need to do some extra work to make the types they want to serialize compatible.

I'm confused how this suggestion would fit into the overall design. Delegating directly to types the job of converting themselves to and from bytes seems like the opposite of what modern high-level, format-agnostic serialization APIs are trying to achieve.

1 Like

Apologies for confusion with the Foundation tag.

The intent is to have the "format agnostic" protocols live in the stdlibā€”not a packageā€”as stdlib types will want to adopt these protocols themselves.

However, it's certainly possible, and even likely, that PropertyListCodable ends up defined in swift-foundation, given how it defines the primitive types of Date and Data.

"Easy for third parties to write their own encoders and decoders" is certainly an important goal here. The macro reliance is a bit of a hurdle to deal with, but I'm hoping in common cases we'll be able to find ways to mitigate that.

It's only mentioned briefly in the OP, but I did reference something similar here:

I am developing generic Encoder , Decoder , and Container types that operate on format-specific primitive values, e.g. JSONPrimitive or PropertyListPrimitive .

This implies that formats are encouraged under this design to provide their own JSONValue or JSONPrimitive-esque types for which one use is to support easier Codable compatibility. But they would certainly be usable in the scenario you describe here.

The catch is that using one of these in your type kicks you firmly out of "format-agnostic" mode, and further ties to to a single specific format. And that's probably exactly what you expect and want in the case you're describing. This would be indicated by your type conforming to JSONDecodable instead of (placeholder name!) CommonDecodable, which unlocks your ability to use whatever JSON-specific features the JSON package providesā€”something that couldn't be done easily in the forcibly format-agnostic Codable world.

This is a great suggestion. Would you mind sharing one example of something you'd expect to be submitted to such a suite?

It's worth noting that for JSON specifically (as well as other formats, like plist) we can and should certainly guarantee structural equivalence between both Encodable and JSONEncodable on identical structs. However, it's not really tenable to ensure byte-level equivalence where key order comes into play. (The upside is that JSONEncodable should guarantee predictable key ordering, where Codable + present day JSONEncoder does not.)