The future of serialization & deserialization APIs

Sure, I'll try to come up with something. I think there are actually two possible types of submission:

  1. Examples of complex real world API schemes that the system should be able to support

  2. Examples of actual Swift Codable object graphs whose serialized representation should not change under the new system

The former is easier to provide but requires more work on your part to verify. The latter is more useful to you, but also harder to extract cleanly from a project so you might not get many submissions :sweat_smile:

Should that be with: [BlueskyDecodingScheme].self ? (And if I'm on the right track there, more likely something like with: [BlueskyPost].self)?)

I'm trying to figure out how the "decoding scheme" would represent recursive structure. A related question: if a Post has an author: User but we need to use the Bluesky-specific User decoding scheme, how do we spell that?

(I suggested something along these lines above as "format-shifting". One nice thing to notice is that if the API separates the model-type from encoding-scheme-type, still the default could be to use the same type for both: getting involved with explicit schemes could be strictly opt-in.)

Thank you for sharing this thoughtful exploration of a new serialization and deserialization model for Swift. I appreciate the detailed breakdown of the limitations of Codable and the rationale behind a new API, particularly the emphasis on performance and flexibility. The visitor pattern is an intriguing approach with clear performance benefits.

I’d like to propose a slightly different, more dynamic API surface that could offer more flexibility while keeping the core serialization logic minimal.

I have concerns about the concrete @JSONCodable macro because it could make it harder for developers to opt into serialization capabilities when introducing new serialization formats. Instead, I believe a more flexible approach — taking inspiration from libraries like the Swift Snapshot Testing framework from Point-Free — would allow serialization strategies to be independently defined and extended by library authors.

@Codable(.json)
struct BlogPost {
    let id: Int
    let title: String
    let subtitle: String?
    @CodingKey("date_published") 
    @CodingFormat(.iso8601)
    let publishDate: Date
    let body: String
    @CodingDefault([])
    let tags: [String]
}

In this model, the @Codable macro would generate CodingFields and add conformances (e.g., JSONCodable), while leaving the implementation details open for different serialization strategies. This keeps the standard library API lightweight while allowing developers to introduce optimized encoders or new serialization formats independently.

Handling non-native types

For handling types that are not natively representable in certain formats, I like your approach of using @CodingFormat macros or property wrappers to handle format conversions. Perhaps the compiler could also emit diagnostics to improve developer awareness of tooling.

@Codable(.json)
struct BlogPost {
    let publishDate: Date 
    // Error: Date does not conform to JSONCodable. 
    // Fix-it: Use a coding format like @CodingFormat(.iso8601) to represent it as a String.
}

Additionally, maybe encoding strategies could even be applied at a higher level:

@Codable(.json)
@CodingKeys(.snakeCased)
@CodingFormat(.defaultDateEncoding(.iso8601))
struct BlogPost {}

Or, alternatively, a more compact form:

@Codable(.json.snakeCasedKeys.defaultDateEncoding(.iso8601))
struct BlogPost {}

This would allow users to specify global encoding preferences directly on the type, similar to how JSONDecoder supports configuration at the instance level today.

Supporting multiple formats

I like @tikitu's idea to handle multiple formats. This model could also take advantage of this approach by stacking macros for multiple encoding formats:

@Codable(.json)
@Codable(.csv)
struct BlogPost { ... }

Slim core API with extensible strategies

A slim core API can live in the standard library with related macros, and CodingStrategies could be separate implementations. This approach would empower developers to implement their own encoders, decoders, and transformation strategies/formatters. A possible standard library surface might include:

macro @Codable<CodingStrategy>
macro @CodingKey
macro @CodingFormat

protocol CodingStrategy {
    associatedtype Input
    associatedtype Output
}

Maybe some built-in implementations

struct JSONCoder<Input: JSONCodable>: CodingStrategy {
    typealias Output = Data
    [...]
}

extension CodingStrategy {
    static func json<Input>() -> JSONCoder<Input>() where Self == JSONCoder<Input>
}

extension String: JSONCodable {}
extension Int: JSONCodable {}
extension Bool: JSONCodable {}
extension Array: JSONEncodable where Element: JSONEncodable {}
extension CodingFormat: JSONEncodable where ConversionStrategy.Output: JSONEncodable {}

This could then be extended by Foundation adding support for Date and Data:

struct ISO8601Encoder: ConversionStrategy {
    typealias Input = Date
    typealias Output = String

    func encode(_: Input) -> Output { ... }
}

extension Date: PlistCodable {}

This structure would support the most important built-in serialization formats like JSON and Property List while enabling developers to define and implement custom formats such as CSV or MessagePack independently. These are just some initial thoughts — do you think this approach would be feasible?

1 Like

A huge +1 on this point. The Versioning Problem™ is a big one.

4 Likes

That depends on whether the API required one scheme object per Codable object, or one scheme per coder.

I imagined that the BlueskyCodingScheme might employ something like a visitor pattern where you can just add extra conversion methods for every type you support.

What an exciting thread! I'd like to put in a request to please consider error handling. A common source of grief for beginners is difficulty in reading the error messages thrown by Codable. Some information is missing, and it's formatted such that you really have to do some digging to understand it. It is especially unapproachable to beginners.

Here are some things I'd like to see in an improved tool. I'm mentioning them now in case it's useful to think about what kind of file/line metadata might be useful to carry along with the macro-annotated types and fields described above:

  • A clear, concise path from the root of the source data to the field that is causing a problem.
  • If possible, some context from the source data. Give me bytes, lines, characters, anything!
  • A more straightforward explanation of why the data that was encountered is not satisfactory for the decoding specified.
  • A reference to the line of code that defined the field that had a problem.

Here's an example of what Codable does now:

valueNotFound(Swift.String, Swift.DecodingError.Context(codingPath: [_JSONKey(stringValue: "Index 0", intValue: 0), CodingKeys(stringValue: "address", intValue: nil), CodingKeys(stringValue: "city", intValue: nil), CodingKeys(stringValue: "birds", intValue: nil), _JSONKey(stringValue: "Index 1", intValue: 1), CodingKeys(stringValue: "name", intValue: nil)], debugDescription: "Expected String value but found null instead.", underlyingError: nil))

Quick, find what's wrong! :upside_down_face:

A while back, I wrote an experimental micro library, UsefulDecode, which attempted to format the error messages from Codable. Here is what UsefulDecode produces for the same error message:

Value not found: expected 'name' (String) at [0]/address/city/birds/[1]/name, got:
{
  "feathers" : "some",
  "name" : null
}

Per my last bullet point above, something like this in an error message would be useful:

Could not decode value "abc123" which was specified as 'Date' with format 'iso8601' at MyModel.swift:43

I hope the example illustrates why I think it's important to be thinking about this, and I hope it sparks some good conversations about what might be needed to produce such diagnostics.

34 Likes
  1. I think this is still doable within this design, though not without some compromises.

Rust Serde includes a macro attribute that it calls "flatten". When this is added to a nested {Des,S}erializable property, it changes the semantics of the encoding to put all the keys of the nested object on the same level as the one it contains. So in your case you could have this:

@JSONCodable
struct MyModel {
    @Coding(.flatten) // or whatever
    let apiProperties: StandardMetadata

    let modelSpecificProperty: String
}

This would put all the properties of StandardMetadata in the same JSON object as modelSpecificProperty, but at runtime it's accessed as a nested type in the model object (or you could do the fancy dynamicMemberLookup stuff you're doing in the other thread).

The compromise is that since the MyModel Visitor type doesn't know about the properties of StandardMetadata (and the macro certainly wouldn't be able to see into them to generate code for them), its implementation would actually need to gather up all the unknown keys it encounters into a dictionary with the values wrapped in generic "JSONValue" objects and use that dictionary to create the backing for another Decoder type that gets passed to StandardMetadata's decode function. (If you're curious, expanding the Serde generated code for this attribute is very illustrative.) Depending on the number and complexity of the unknown sub-values encountered, this is not terribly slow, but it's certainly not the optimal fast path for decoding.

5 Likes
  1. My other niche interest is format-shifting […]

This is a good question. I just did a little research about how this is handled in Serde, and the answer appears to usually be "use different struct types".

I hope we can make this easier in Swift. I think either of these approaches has merit, but it'll greatly depend on how easily we can make macros implement them.

1 Like

Perhaps, but how would this look mechanically? Do we duplicate the API surface—one that is synchronous everywhere, and another one that has everything decorated with async?

One way to address the versioning problem could be similar to CoreData migration policy and staged migration. Essentially, you hand the decoder a closure that can operate on the raw object dict and apply changes before it is decoded.

1 Like

I'm wondering… how committed do we need to be exactly to shipping this in standard library and then tying the future evolution of this feature to swift updates?

Why not a swift-coding package? Are there private apis in Swift we expect to need in the new macro? Are there types in the standard library we need to adopt the new macro and we need to defend against a circular dependency?

Ahh… this was addressed earlier. My mistake for missing this one:

4 Likes

Out of the box support for encoding and decoding [String: Any] would be welcome.

Another issue that comes to mind is a way to model the semantic difference between a key being null/nil, and not being present at all.
For example, let’s say I want to update a user record on the server. I can encode the client version of the user record and send that to the server, but that will contain all fields, even those that haven’t changed. This increases the risk of clobbering changes on the backend that the client does not yet know about.
Conversely, if the backend sends changes to the client, there’s no (easy) way to differentiate between a field having been set to null, and a field being omitted because it hasn’t changed.

7 Likes

@kperryua - I noticed you replied to most feedback, but so far not to the questions posed here - just so it doesn’t get dropped:

1 Like

Oh that's a nice thought, there's no reason the scheme has to be spelled similar to the data model. My hunch is it might be quite challenging to encode all of a scheme encompassing nested type structure in a single visitor-pattern type (thinking about weird complications like "both these properties decode to User, but their json representations are different: you need to know the property keypath to choose the right one") but I guess not impossible.

Oh flatten is lovely -- and the performance compromise is exactly what you would expect. It's very cool that the design leaves the right space for this to exist!

Just to be really explicit about the "just use multiple model types" approach to format-shifting: the challenge is reliably keeping those types in sync. The problem is not so much the boilerplate (although that's burdensome) but the possibility to have correctly-compiling code that misses copying over some properties. Swift is almost there with the definite-initialisation rules, but properties with default values can get skipped in an explicit initialiser with no compiler error... and optionals implicitly have the default value nil: that means every optional property is a chance to forget to sync across models.

This isn't necessarily the serialisation API's problem to solve! But it would be delightful if we did get space for a better solution as part of this design.

I think you'd do something like this:

struct EncodingScheme {

  func encode(_ user: User, coder: JSONCoder) {
    // encode user in normal way
  }

  func encode(_ userList: BSkyUserList, coder: JSONCoder) {
    for user in userList.users {
      // bsky-specific user encoding
    }
  }
}

So basically the encode function for a type gets to decide how its child objects are encoded, and for any cases where you don't need special behavior (e.g a regular [User] array) you wouldn't add an encode function for it and the encoder would just use the default behavior of JSONEncoder.

2 Likes

I would love this to be natively supported, without the current hackarounds:

enum Beverage: Codable {
    case coffee
    case tea
    case juice

    @CodingEnumDefaultCase
    case unknown 
}
4 Likes

I would be nice if error handling had more options in it than current approach. Currently, you can either return a correct object or fail with an error (untyped). Default behavior is that if decoding of one structure fails, everything fails. It does not make much sense if an object is deeply nested in general case (maybe for sensitive or critical data).
It would be great if the init could be made nullable, or you can define 'return nil' behavior in any other way.

3 Likes

Thanks! I haven't forgotten. Some replies just take longer than others :grinning_face:

3 Likes

Thanks for the detailed explanation. I agree with your points to a degree.

If I'm interfacing with another system that requires a specific format AND structure, then it's my responsibility to use and craft types that conform to that format's own protocol and describe the proper structure. I probably shouldn't want to encode a Swift.Range<UInt64> to send to some JSON web API that doesn't know what a Swift.Range is.

And if I'm encoding an entire top-level structure purely for storage or transmission for decoding by my own code, then yeah, I don't care which format that data is encoded in as long as it's fast, efficient, and secure.

However, I could argue that a logical corollary to your categorical statements is that someone should never want to encode (non-POD) currency types from the stdlib or other package/framework to a specific format like JSON or plist, etc. and that only case #2 ("encode yourself to some opaque Data) applies. I don't think that quite holds up to scrutiny. I think there's some wiggle room in between those in the real world.

  1. There is no single fastest, most efficient, most secure encoding for every struct. A client may choose to specifically encode their structs as binary plist for the de-duping it supports. Or maybe some other binary format that has a more space efficient representation of numerical values. Or other reasons I can't think of.
  2. Sometimes human readability is a legitimate factor. Maybe you want to encode your Range<UInt64> or your AttributedString in human-readable JSON, even if it's only your own code that actually deserializes it later.
  3. Some Swift-specific systems your application interacts with might require a specific format like JSON or plist, while also requiring a structure that matches the default Codable encoding of that type.

Thus, I feel it's still important to allow non-POD currency types to be able to describe themselves in generic terms (the format-agnostic protocols) so that JSONCodable/PropertyListCodable/etc. types can still freely embed these types in their structs, when appropriate.

That all said, I don't think this proposal at all precludes the ability to add something like

protocol OpaqueSerializable {
  func serialize(to span: inout MutableRawSpan) throws
  static func deserialize(from span: RawSpan) throws -> Self
}

or whatever you're envisioning for case #2.

(Just to respond to this at the same time, even though it's somewhat of a reiteration.)

I think format and structure are separable concepts in some cases. It would not be unheard of for a communication protocol to expect a format of JSON, but a structure that specifically matches what a Range<UInt64> would encode.

1 Like