The future of serialization & deserialization APIs

Thanks Darko. I can certainly see how we could implement this for cases where the payload has an explicit null value or something else that doesn't match any of the expected cases of {"coffee", "tea", "juice"}. I believe this is equivalent to Serde's other attribute. This would indeed be a useful addition.

This may not be what you're intending, but I don't think we can make it work this simply in the case where the key for this value is missing in the payload. For instance:

@JSONCodable
struct Person {
    let name: String
    let favoriteBeverage: Beverage
}

Payload: { name: "Kevin" } with the intended result of Person(name: "Kevin", favoriteBeverage: .unknown).

Since the macro on Person cannot see into the definition of Beverage, its expansion cannot generate .unknown as a default value here. We would need an explicit protocol requirement that expresses "default decodable value if missing (or throw an error)". Otherwise, the author of Person would need to decorate favoriteBeverage with an explicit @CodingDefault(.unknown).

If we think it's worthwhile, we can look into adding such a protocol requirement, but I haven't been able to find prior art for this pattern, so it would require some additional consideration.

for what it’s worth, this is exactly the pattern that swift-json and swift-bson converged upon. the respective protocol hierarchies (BSONEncodable, JSONEncodable, etc) are very similar to one another, which helps facilitate Knowledge Transference between libraries, but they are still independently tailored to the idiosyncrasies of each serialization format, which i think is the right balance.

these libraries really ought to have macros, but the issues blocking macro adoption are well known at this point, so we really need SwiftPM to

  1. support prebuilt SwiftSyntax, and
  2. support multiple incompatible major versions of SwiftSyntax

for this to be viable. my understanding is a solution to the first problem is in the works, but i have not heard of any concrete plans to tackle the second problem yet.

for now i’m just relying on GitHub Copilot to generate the boilerplate for me.

4 Likes

This is a common request, but it's still fundamentally incompatible with compile-time type-driven serialization. This would take us back to the NSCoding scheme of encoding type information which has inherent security issues.

I'm aware this is a very common need! With the visitor model, this difference is certainly is more visible than the Codable model. Specifically a struct visitor can see, after all the keys have been enumerated, which ones were NOT present. This allows the visitor code to behave differently in this case, compared to the value being present, but null.

That said, leveraging this pattern requires using something other than Optional<T> for the property in question, since it is a tri-state. An Optional<Optional<T>> could work, but those aren't always well-liked because of unwrapping ergonomics. I've also considered a type like this:

enum Patch<T> {
    case absent
    case null
    case value(T)
}

Whether or not that should be a standardized type or let to ad hoc designs, I'm undecided.

2 Likes

Maybe this feature can define a standard pattern for implementing this in a type-safe way, similar to SwiftUI’s EnvironmentKeys. Maybe it could even provide a macro. :slight_smile:

Yes, I think macros could be a way to provide some interesting escape hatches for both of the problems I laid out, since it allows opting into it without affecting common case performance.

For example, a macro could provide a set property on a type that holds the list of keys that were actually present, allowing us to differentiate from keys that are null.

For [String: Any], a possible escape hatch would store the raw data and box it up in a way that lets us defer decoding until access time.

Now that I’ve typed this out, it sounds like I’m just bolting KeyedDecodingContainer back onto this new proposal, which might be fine if it’s opt-in and solves real problems.

On the opposite end of the spectrum to what’s been discussed so far, today I wrote a little script to fetch and parse some JSON data from the web. I needed it to be short and snappy so a junior colleague could easily use and modify it. I just needed one or two values from the response. And I found myself wanting (for the first time since boarding the type safety train in fact), a quick-and-dirty way to pluck some values from the JSON without all the Decodable boilerplate.

What I was wishing for at the time is for something where I could specify some kind of path expression and a leaf type.

let authors = try JSONDecoder().decode([String].self, from: data, at: "data.articles[0].metadata.authors[*].name")

Perhaps there are already packages that do this sort of thing. But it would be super powerful to have something lightweight like that built in for scripting.

10 Likes

There are multiple scenarios where apps want to distinguish null and absent, depending on whether we're encoding or decoding:

  1. Encoding for a third party that only accepts value, or absent (and considers null as invalid). We do not need a tri-state, but we need to specify that Optional.nil is encoded as an absent key.

    Example: some OpenAPI specs as below:

    type: object
    properties:
      name:
        type: string
    
  2. Encoding for a third party that only accepts value, or null (and considers absent as invalid). We do not need a tri-state, but we need to specify that Optional.nil is encoded as an explicit null value.

    Example: some OpenAPI specs as below:

    type: object
    required:
      - name
    properties:
      name:
        type: string
        nullable: true
    
  3. Encoding for a third party that accepts value, null, or absent, and distinguishes the three cases (here we really need a tri-state).

    Example: some OpenAPI specs as below:

    type: object
    properties:
      name:
        type: string
        nullable: true
    
  4. Decoding an Optional where both absent and null are decoded as nil.

  5. Decoding an Optional where only value or absent are valid, and null should throw a decoding error.

  6. Decoding an Optional where only value or null are valid, and absent should throw a decoding error.

  7. Decoding a tri-state.

That's a lot of cases. And I probably have forgotten some.

In my humble opinion, the target API, where P means Programming, should allow the host app to program the exact desired behavior for both encoding and decoding. If the target API is too "descriptive", and hardcodes a limited set of possible handlings, with no possible extension, we developers will hit a blocker.

Patch would not need to be standardized if it were possible for the host app to write their custom null/absent handling code (ideally, for specific properties only, independently of the other properties).

7 Likes

On the top of my head, we sometimes need to customize the decoding of "blank values", for example:

  • Decoding a non-optional String in such a way that null, absent are decoded as "".
    This happens when the app wants to encode the lack of information as "".

  • Decoding a String? in such a way that null, absent, "", maybe even strings only made of white spaces, are decoded as nil.
    This happens when the app wants to encode the lack of information as nil.

Some for Int, Double, etc. where 0 could take place of "".

All of that was customizable and possible with Decodable (often verbose, maybe with the help of a dedicated value type, but possible).

I wish it were possible for apps to provide custom decoding strategies without the need to write a macro.

3 Likes

Currently we solve it like this:

public protocol UnknownCaseRepresentable: RawRepresentable, CaseIterable where RawValue: Equatable {
    static var unknown: Self { get }
}

public extension UnknownCaseRepresentable {
    init(rawValue: RawValue) {
        if let value = Self.allCases.first(where: { $0.rawValue == rawValue }) {
            self = value
        } else {
            self = .unknown
        }
    }
}

Then the enum conforming to the protocol has to define an unknown case:

enum Beverage: Codable, UnknownCaseRepresentable {
    case coffee
    case tea
    case juice

    case unknown 
}

This is mostly used for the case when API's extend the enum cases but old App versions don't know those cases. Without UnknownCaseRepresentable older App versions would simply crash on decoding. We would have to limit ourselfs to kinda "frozen" enums everywhere by using only standard Swift decoding. But this is basically a problem on every API contract which uses enums. That's why I would love this problem to be adressed natively by Swifts decoding. (in any way)

There may well be other good reasons for this to live in the standard library, but this statement doesn't immediately strike me as a dealbreaker for having it live in a package. If it were otherwise possible for the new protocols to live in a package, that package could be responsible for providing the conformances of standard library types to its protocols.

9 Likes

I believe this would imply the protocol conformance must be retroactive? Is there precedent for Apple packages to ship retroactive on standard library types when that conformance is then depended on by clients of that standalone package?

A retroactive conformance is specifically a conformance of a type you don't own to a protocol that you also don't own—the extension lives in neither of the original modules. They're risky because the owner of either the type or the protocol could want to one day provide their own.

In this case, a hypothetical swift-coding package would have no trouble conforming Int to one of its own protocols, because there's no risk that the standard library could ever provide the same conformance.

3 Likes

A package can declare conformances to its own protocols without them being retroactive. There are other reasons that a package may not be practical here, though.

4 Likes

Ahh… correct… swift-coding depends on swiftswift can't depend on swift-coding. Makes sense. Thanks!

Is something like this possible for the generated visitor instead?

    struct Visitor: JSONDecodingStructVisitor {
        typealias DecodedValue = BlogPost
        func visit(decoder: inout JSONDecoder2.StructDecoder) throws -> BlogPost {
            var result: BlogPost
            while let field = try decoder.nextField(CodingFields.self) {
                switch field {
                case .title: result.title = try decoder.decodeValue(String.self)
                case .subtitle: result.subtitle = try decoder.decodeValue(String.self)
                case .publishDate:
                    let formatted = try decoder.decodeValue(String.self)
                    result.publishDate = try Date.ISO8601FormatStyle().parse(formatted)
                case .body: result.body = try decoder.decodeValue(String.self)
                case .tags: result.tags = try decoder.decodeValue([String].self)
                case .unknown: try decoder.skipValue()
            }
            return result
        }
    }
}

It would allow default values to be expressed in natural Swift instead of special macro / property wrapper whatever it is, like for tag below.

@JSONCodable
struct BlogPost {
    let title: String
    let subtitle: String?
    @CodingKey("date_published") @CodingFormat(.iso8601) // unrelated: too bad these must preceed, not follow, oh well
    let publishDate: Date
    let body: String
    // was:
    //@CodingDefault([])
    //let tags: [String]
    // now:
    let tags: [String] = []
}

Is there some reason I'm missing why we wouldn't want that?

EDIT: Of course struct initialization rules don't allow initialization of BlogPost like this, as pointed out by @kperryua. See my followup post below.

1 Like

Thanks for the feedback!

Error handling is still an area that needs some thorough consideration in this design.

Can you help clarify what your concrete expectations are in terms of the final result of decoding a partially-incorrect payload? Do you want a macro attribute that lets you define a default value instead? (including nil, if the property is Optional?) e.g.

@JSONDecodable
struct Person {
    @CodableDefault(nil)
    let name: String?

    @CodableDefault(42)
    let favoriteNumber: Int64
}

… or is this something you expect the decodable type itself to provide? e.g.

@JSONDecodable
@CodableDefault(.init(latitude: 0.0, longitude: 0.0))
struct Geoposition {
    let latitude: Double
    let longitude: Double
}

Absolutely, these are great suggestions. Some may even make an excellent PR against existing the DecodingError implementation!

Ooh, that sounds like something a macro could capture and communicate via errors. This would only be surfaced as a debugDescription of course, but it does seem super useful.

1 Like

There's a lot I like about this POV. Before a conversation turns into "we block an infra before we implement x, y, and z"… we can have a conversation about "how do we ship an infra so that product engineers can bring x, y, and z".

1 Like

Thanks for the exhaustive rundown of cases! That's certainly more than I anticipated.

What is the ideal way to express the desired strategy?

At the risk of referencing Serde too much, this sounds along the lines of what its serialize_with and deserialize_with field attributes allow. Basically, instead of using the standard Serializable or Deserializable implementation for the given type, it instructs the macro to use an alternate implementation. Does this sound like the desired tool here?

I would love this, but Swift's struct initialization rules don't allow it unless all of the properties on BlogPost are optional OR have explicit default values.

1 Like