"New Codable" prototype available for feedback

(Apologies for the re-post. We thought it best to create a new topic for increased visibility since some might not be following the original thread.)

Hello again Swift Community!

Here we are, almost a full year after my “future of serialization & deserialization APIs” post. I’ve spent a lot of time since then working to make this into a reality, and I’m happy to announce that I have something concrete for you to try out!

Please take a look at GitHub - swiftlang/swift-foundation at experimental/new-codable · GitHub ! You’ll find therein the core components of this design exposed in a new target temporarily called NewCodable:

  • CommonEncodable and CommonDecodable protocols. These are the format-agnostic protocols that are the closest relatives to Encodable and Decodable, but are designed in such a way that avoids many of the performance limitations of the latter.

  • CommonEncoder and CommonDecoder protocols. Similarly, these are format-agnostic protocols for encoders and decoders in the same vein as Encoder and Decoder, but with a better design.

  • JSONEncodable and JSONDecodable protocols. These are shaped similarly to their Common counterparts, but with modifications that are specific to JSON, including certain performance optimizations.

  • JSONDecoderProtocol (naming discrepancy acknowledged, but I can’t name it JSONDecoder!) with concrete implementations JSONParserDecoder and JSONPrimitiveDecoder.

  • JSONDirectEncoder (no protocol here yet, despite room for an alternative JSONElementEncoder to mirror JSONElementDecoder. Will discuss elsewhere.)

  • “Top-level” NewJSONEncoder and NewJSONDecoder (names TBD, since I similarly can’t name them JSONEncoder and JSONDecoder!)

  • JSONElement and other supporting protocols and types.

Please note that swift-foundation is acting as a convenient and temporary holding place for this project. We’ll evolve it on this branch for a while as we iterate on it and refine it. We’ll determine what the final locations for the various pieces will be as we get closer to its productization.

This is still very much a work in progress! Please excuse our dust! You’ll find plenty of TODOs scattered throughout, somewhat haphazard inlining of functions, and probably plenty of bugs and issues. I have ported over many of the tests from the existing JSON Codable-based unit tests in swift-foundation, but a few of those are still failing for various reasons.

Despite all of that, I hope some of you get an opportunity to take a look and try adopting the protocols use the new encoders/decoders with some of your own types. Let us know about any pain points or friction you’re experiencing with the API.

There are some benchmarks included (still of bespoke infrastructure) which are quite promising. When decoding twitter.json, I personally observe that JSONDecodable and JSONParserDecoder provide a throughput improvement of approximately 6x over Decodableand JSONDecoder! I hope that with community involvement we’ll be able to improve both overall quality and performance to even higher levels.

To try using the “New Codable”, change your swift-foundation dependency to

    .package(
        url: "https://github.com/apple/swift-foundation.git",
        branch: "experimental/new-codable"
    )

and import NewCodable. Swift 6.3 is required.

To run the tests, execute the following in a swift-foundation checkout of this branch:

swift test --filter NewCodableTests

To run the benchmarks:

swift test --configuration release --filter NewCodableBenchmarks

Please take note of this rough roadmap that lies ahead for this project:

  1. Finalize the Common* protocols

  2. Finalize the JSON* APIs

  3. Support Embedded Swift

  4. Add support for XML and Binary Property List

  5. Design and implement macros for @JSONCodable, @PropertyListCodable, @CommonCodable, and even @Codable.

  6. Publish a library for owners of other format-specific protocols to facilitate development of their own @<Format>Codable macros that use common patterns established by the above macros.

If you see an issue or have a question, please speak up! I’ll be happy to try to address any concerns. Feel free to create an Issue on the swift-foundation GitHub tagged with the new-codable label. If you want to fix something, feel free to create a Pull Request against this experimental/new-codable branch, again tagged with the new-codable label.

To help drive community involvement towards finalizing milestones #1 and #2, I will be starting threads that discuss different aspects of the prototype’s API with explanations of why that design was chosen, challenges we’re still facing, and alternatives we’ve considered. I hope to get your focused feedback in those discussions.

As a footnote, while the inspiration for the origins of this project was heavily rooted in Rust serde, I found that the general shape of the musli crate was a better fit for adapting to Swift. (Though, it’s quite a bit more reliant on closures with inout arguments rather than consumed and reborrowed mutable references. I’ll start a discussion about that as well.)

35 Likes

I am super excited about this, twice excited even!

On the one hand it's just great to see progress towards modern serialization in Swift - we need to catch up with today's performance expectations. It was my impression that good old Codable was increasingly becoming a real bottleneck in certain cases.

On the other hand I am super curious if/how this can fit into an Embedded WebAssembly in the Browser use case, where code size matters a lot and we are running in a JS runtime that ships JSON serialization for free.

I haven't had time to fully process the API shapes of it all, but I am curious if you have thoughts on this use case.

I see roughly three directions:

  • use "NewJSONEncoder/Decoder" in Swift, interop with copied strings/byte arrays
    (probably nicest experience in Swift, probably biggest code size cost)

  • create JSEncoder/JSDecoder (common? or JSON?) that encodes to JS objects and decodes from JS objects + JS-native JSON.parse/stringify
    more interop calls, but less data copying and no JSON parsing in Swift... not sure how this would perform

  • create specialized JS encoders + macros
    not quite sure what that would give us, but again the goal is to "reuse" the built-in JSON serialization that ships with javascript

Any thoughts or comments are appreciated!

Also, I noticed a lot of files start with "import Foundation(Essentials)" - I know this is meant to be temporary, but it would be great to start organizing the code in a way to contain the spread. As it is right now it looks pretty messy to try this out for Embedded Swift.

3 Likes

This is very exciting! It’s about time Swift got the fast and expressive serialization facilities it deserves.

After skimming the post and some of the code I have three questions/concerns:

  1. Why do we overload “visit(_: …)” in the decoder visitor protocols? Doesn’t that slow down type checking?
  2. I’m a little concerned about all these new protocols that most users won’t touch polluting the namespace. However, I understand it’s important for performance. Are there any mitigation strategies?
  3. You mentioned there’s an up to 6x performance improvement over the current JSONDecoder, which is great. So I’m wondering if we plan to benchmark against other json serialization libraries like simdjson which offers incredible performance.
1 Like

Could this use the default value provided normally? var tags: [String] = []

BTW, do you plan to have an ability to opt-out some fields from being encodable/decodable?
Either on a per field manner or, on a per type manner.

I wonder if this could be more generic like this (pseudocode):

    // generic
    func encode(to encoder: inout JSONEncoder2) throws -> T {
        try encoder.encodeStructFields() { fieldEncoder in
           fieldEncoder.allFields { field in
              try fieldEncoder.encode(field: field.title, value: field.value)
           }
        }
    }

in which case having this per-type might be not required. Similarly for decoding.


I like the desire, but maybe we could invest into a more complete, robust and performant mirror machinery first... and reimplement new codable on top of that.

1 Like

Yeah, there are quite some limitations here... Note how you had to put decoding Int before decoding Double. Or what if there was a Float in one case's payload and Double in another, or String in one and StringProtocol type in another. The limitations are more serious that just that, e.g, you might have an enum with:

enum E {
    case a(A)
    case b(B)
    ...
}

where A and B are similar types as far as decoding is concerned:

// example
struct A { let x: UInt }
struct B { let x: Int }

or a more complicated case:

enum E {
    case a([A])
    case b([B])
    ...
}

with the payload to decode being an array of 100K positive elements and the last element being negative.

Ideally type information should be available during decoding one way or another... JSON won't be able preserving that (aside from embedding extra information in some custom way) but other serialisation formats (like XML) will be able preserving it.

2 Likes

@kperryua Amazing work, this looks all very good. I am playing with the new API and adopting it in some projects. One question though:

Shouldn‘t the top-level decoder (and encoder) add a “span-based” API for coding? Currently you always need to go through Data which seems … unnecessary.

public func decode<T: JSONDecodable & ~Copyable>(_ type: T.Type, from data: borrowing RawSpan) throws(CodingError.Decoding) -> T

?

Oh this is an interesting concept I hadn’t considered. The intersection of WebAssembly and JS runtimes in Swift is unknown to me. Is there already a bridge in this environment between JS objects and Swift values and a way to call JS functions from Swift?

I can definitely see the value in making encoder/decoder level interfaces that reuse Javascript’s native JSON (de)serialization. However, I think there’s also an interesting tradeoff to consider here. One of the advantages of a decodable type is that it specifies which fields of JSON the client cares about and which can be skipped entirely. JSON.parse is always going to deserialize the entire tree as an intermediate representation and then perform the decode on that structure (similar to the original implementation of JSONDecoder which used JSONSerialization to make a Foundation object structure). I’m sure that such an implementation would still be very performant overall, but wonder if there’s also still desire to reduce memory overhead by performing a selective parse.

Yes, I agree, we should start sooner than later to get this codebase ready for Embedded. I especially don’t want to get caught in a situation where the API itself accidentally makes that harder than necessary!

It’s primarily there today because there’s some unanswered questions about how what level of support we want in NewJSONDecoder for Data and Date and other Foundation types that JSONDecoder enjoys. Like the various instance-global strategies. It’s hard to implement this in a way that allows Foundation to layer it on top of NewJSONDecoder via extension. I’ll go more into depth on this later.

3 Likes

There very much is!

Ideally the final code size of the new JSON serialization in an Embedded wasm binary is small enough that this can just be the "default" - but it is pretty hard to compete against "for free" ...

1 Like

The point of the visitor is that the decoder gets to inform the client which primitive type it has encountered next in the parse. This overload style is very very similar to both serde’s and musli’s visitor patterns.

We already have similar levels of overloading in present-day Encoder/Decoder and Codable container APIs. I haven’t seen any type checker performance issues reported for those.

The only way I can see how to avoid “polluting” the namespace is by hoisting the protocols into some other type. I’m not aware of any common precedent for doing so, however. Doing so might also have the effects of increasing verbosity and decreasing discoverability.

I don’t have any immediate plans for formalizing this, but I’d welcome a community contribution. I have done these comparisons myself informally, and we still do currently fall somewhat short of simdjson and serde in terms of throughput. Some of the additional overhead is, in my estimation, unavoidable due to the additional safety and correctness guarantees of Swift. Some of it just due to the fact that the benchmarks are still written to create the standard CoW’d arrays/dictionaries instead of ~Copyable variants. And various other things. Obviously simdjson uses highly advanced SIMD constructions, which I haven’t been able to replicate in this implementation (though I have done some experimentation in this area). This is another area I’d gladly welcome community involvement in!

1 Like

Have you looked at building JSON decoding on top of other JSON engines, not just the Swift native version? ZippyJSON was able to put simdjson under Decodable, and ReerJSON does something similar on top of yyjson. Have you seen what kind of overhead we're still looking at?

Short answer: Maybe. I think this touches on macro design descisions, which are far from set in stone.

Longer answer: One complication I foresee is in regards to decoded value initialization. Note that in this design we’re using static func decode(from:) -> Selfinstead of an init(from:). Therefore there is an implicit dependency on a memberwise initializer, or at least an initializer that takes all the decodable fields. Initializing the property directly like this has two implications: 1) it must be a var in order to be overwritten by an initializer (probably not a showstopper), and 2) such an initializer must take an Optional for this field and know to overwrite the value IFF the parameter is not nil. This is not the behavior of a standard memberwise initializer.

This question of initializers opens a can of worms I was intending to put off until we do macro design—do we require an extant memberwise initializer? Does the macro generate an initializer itself, called by decode(from:)? Something else?

1 Like

Yes, this is straightforward to provide with macro attributes, though opted-out fields must be accounted for upon decoding in order to be able to instantiate the type (like, providing a default value).

2 Likes

I get the desire, but I only have so much influence on the language and runtime development, and have to work with what we have available today. I’ve not gotten the impression that “complete, robust and performance mirror machinery” is high on the priority list at the moment.

Also, there’s a hidden type system complication in your pseudocode. What is the type signature of the allFields() closure and fieldand .value? Is there a way to ensure it provides static dispatch and compatibility with Embedded? What does the decoding side look like, and how do you use this collection of decoded “fields” to initialize the type?

If your proposal is to use existentials, then, even with the new support for existentials in Embedded Swift, my understanding is that this kind of design would be outside the limitations for what it supports.

Happy to be proven wrong on this one though. I spent a fair bit of time trying various solutions to make this kind of thing work!

Consider this approach for both issues:

User(from: decoder, default: User(name: "customised"))

The default parameter is "defaulted" and if absent then default member wise initialiser is used. The idea is that it acts as a placeholder some (var) fields of which are overwritten by the values from JSON/plist/etc.

A pseudocode for a generic decoding that won't involve a temporary dictionary:

    init<T>(from decoder: JSONDecoder2, `default`: T = T()) {
    	self = `default`
        ...
        T.allFields.forEach { field in
            if decoder.has(field: field.title) {
                self[field] = decoder.decode(field: field.title)
            }
        }
    }

This approach pushing away from lets to vars, which could be somewhat undesirable, understandably, but hopefully not a show stopper.


So imagine we do what you propose first. And later (say in a year) do a more complete, robust and performant mirror machinery, that if existed today we'd probably use for new codable. Will we change Codable then at that point, risking breaking something?

A long time ago, I wrote a library called HappyCodable, and I added macro support as soon as it became available. Here are some of my experiences and thoughts:

@CodingKey should allow supporting multiple keys at the same time. For example, change it to CodingKeys, which is more intuitive—you can tell at a glance that it supports multiple keys:

@CodingKeys("keyA", "keyB")
let value: Int

Both of these can be decoded into a value.

{
    "keyA": 10
}

{
    "keyB": 10
}

Or keep CodingKey but allow multiple usages (though this is less intuitive):

@CodingKey("keyA") @CodingKey("keyB")

@CodingDefault should support being simplified to direct usage with var like:

var value = xxxx

In a data type with lots of optionals, I would rather just use var directly.

Missing: @Uncoding should prevent this property from being encoded/decoded, and the developer needs to provide a default value, for example:

@Uncoding
let id = UUID()
@Uncoding
let id: String // error: not initialized

Missing: @ElementNullable allows the JSON to contain nullable elements, but filters them out during decoding. For example:

{
    "data": ["a", "b", null]
}
@ElementNullable
var data: [String] = [] // After decoding, data will be ["a", "b"]
2 Likes

I like that!

Here you mean that the subsequent two values will use these names, right?

@CodingKeys("keyA", "keyB")
var x = 1 // keyA
var y = 2 // keyB
var z = 3 // z

For arrays this could be quite dangerous... as indices are shifted:

{
    "names": ["a", null, "b"]
    "values": [1, 2, 3]
}

An unorthodox idea (for future Swift? as it's probably impossible now):

// not current Swift
struct User: NewCodable {
    var id: UUID                             // normal case
    var userName: String & SnakeCased        // user_name
    var dob: Date & ISO8601                  // iso8601 date formatting
    var value: Double & ThrowOnNonConformingValues // throws on non-conforming numbers
    var extraData: Data & Base64             // base64 encoded
    var password: Optional<String & Uncoded> // won't be in JSON
    var picture: UIImage? & Uncoded          // or maybe this? (won't be in JSON)
    var wfh: (Bool & Uncoded) = true         // and this (won't be in JSON)
}

Somewhat similar to Objective-C's SomeClass<SomeProtocol>

Could possibly be used to support more advanced use cases, albeit somewhat less flexibly than what JSONEncoder/Decoder has. Example:

protocol CustomDateEncoding {
    static func custom(_ : Date, _ : encoder: any Encoder)
}
struct MyCustomDateEncoding: CustomDateEncoding {
    static func custom(_ : Date, _ : encoder: any Encoder) {
        .. ...
    }
}

struct User {
    ...
    var dob: Date & MyCustomDateEncoding
    ...
}

Plus we'd still need something for the whole type and/or encoder (things like outputFormatting or sortedKeys).

Yes, absolutely! This is just something one of those simple things that I hadn’t gotten around to yet. :grinning_face:

I made sure that we’ll keep track of this with this issue.

2 Likes

I haven’t. It’d be an interesting experiment to see what kind of boost it provides, if any.

My concern with this approach is that, at least as implemented in these projects, it instantiates the entire parsed JSON document in memory as an intermediate representation, then wraps the decoder around that. I’m aiming to try to avoid an IR whenever possible. In theory it seems like that should have the highest possible performance ceiling, in terms of both CPU and memory usage, especially for JSONDecodable types that don’t decode the entire JSON document. Maybe there’s a way to use these libraries in a way that’s driven by the decode requests so the IR isn’t necessary?

I would like to avoid requiring a “default-initialized” value for each client struct. This technique would work for an intermediate struct for which all the fields are optional, and which is then used to instantiate the real struct. (see the various JSONBuilder types in Twitter.swift in the project’s benchmarks for an example of something like what I’m imagining).

I still see a couple of roadblocks with your pseudocode:

  1. JSONParserDecoder cannot provide a .has(field:) call. It does not collect keys from JSON objects in advance, nor does it allow decoding values from the object in arbitrary orders. It only allows decoding in the order present in the JSON document. We could only implement this with JSONPrimitiveDecoder, which requires additional CPU and memory.
  2. In order for self[field] = decoder.decode(field: field.title) to maintain a compile-time value type, required by decoding, the type for field would need to be generic over the field’s type, e.g. struct Field<Value: JSONDecodable>. However, an allFields.forEach function would need to erase that type parameter to work (e.g. func forEach(_ closure: (AnyField) -> Void)), which blocks this. Perhaps with the improved reflection this could be expanded at compile time into separate type-specific invocations of the closure for each field. But it would still be type-driven decoding, and therefore incompatible with the parser-driven decode of this new design.

These things are what makes the decoding side much more difficult to model in this way than the encoding side. Definitely open to any creative solutions to these blockers!

Any reflection improvements would need to be compatible with parser-driven decoding in order to be useful to this project. If such end up existing, which feels unlikely, my expectation is that we could use it with the proposed interfaces, or that we could easily extend those interfaces to work with it.

These are excellent suggestions for macros, thank you! These should be feasible. Please be sure to chime in again when we start the macro design in earnest.

1 Like