[Pitch] JSONDecoder.nullDecodingStrategy

JSONDecoder currently does not differentiate between a payload containing an explicit null or the field being omitted. I would like to propose an additive change to include a decoding strategy for these situations:

class JSONDecoder {
    ...

    enum NullDecodingStrategy {
	    // Default, today's behavior
	    case implicit

	    // If struct contains "var anInt: Int?", valid JSON payloads must explicitly contain either {'anInt': <integer>} or {'anInt': null}.  Not containing 'anInt' results in a decoding error.
	    // If struct contains "var anInt: Int??", {'anInt': <integer>} decodes as .some(.some(<integer>)), {'anInt': null} decodes as .some(.none), and the field being omitted results decodes as .none.
	    case explicit
    }

    var nullDecodingStrategy: NullDecodingStrategy = . implicit

    ...
}

(DISCLAIMER: Might need some bikeshedding.)

2 Likes

JSONDecoder does actually differentiate between these cases โ€” you can distinguish by calling decode(Int?.self, forKey: ...) (which necessitates that the field be present, but the value may be null) vs. decodeIfPresent(Int.self, forKey: ...) (which allows either the field to be omitted, or the value to be null):

import Foundation

struct Foo : Decodable {
    let a: Int?
    let b: Int?
    
    private enum CodingKeys: String, CodingKey {
        case a, b
    }
    
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        a = try container.decode(Int?.self, forKey: .a)
        b = try container.decodeIfPresent(Int.self, forKey: .b)
    }
}

let payloads = [
    // Succeeds
    """
    {"a": 5, "b": 10}
    """.data(using: .utf8)!,
    
    // Succeeds
    """
    {"a": null, "b": 10}
    """.data(using: .utf8)!,
    
    // Fails, "a" may not be missing
    """
    {"b": 10}
    """.data(using: .utf8)!,
    
    // Succeeds
    """
    {"a": 5, "b": null}
    """.data(using: .utf8)!,
    
    // Succeeds
    """
    {"a": 5}
    """.data(using: .utf8)!
]

let decoder = JSONDecoder()
for payload in payloads {
    do {
        let foo = try decoder.decode(Foo.self, from: payload)
        print(foo)
    } catch {
        print("Failed to parse '\(String(data: payload, encoding: .utf8)!)': \(error)")
    }
}

One thing to keep in mind when using synthesized conformances is that optional values get decodeIfPresent(...) called for them, rather than decode(...), to be more permissive.


This isn't unique to JSONDecoder, BTW, but a requirement of the KeyedDecodingContainer protocol:

Is there a use case you have in mind which this does not cover?

3 Likes

The problem with the Foo example above is that it still obfuscates from any consumer of Foo whether the value is present or nil, which is more aptly represented via nested Optionals (e.g. Optional<Optional>). While it's true the support can be implemented manually with additional legwork, the goal of NullDecodingStrategy is to streamline this + enable synthesized conformances to provide this for free. This is especially useful for data querying languages (such as GraphQL) that make a strong semantic distinction between a field in a returned payload being null and it being omitted altogether.

This would ideally look like:

struct Foo: Codable {
    var a: Int
    var b: Int?
    var c: Int??
}

let payloads = [
    // Succeeds, (a: 5, b: .some(10), c: .some(.some(15)))
    """
    {"a": 5, "b": 10, "c": 15}
    """.data(using: .utf8)!,

    // Succeeds, (a: 5, b: .some(10), c: .some(.none))
    """
    {"a": 5, "b": 10, "c": null}
    """.data(using: .utf8)!,

    // Succeeds, (a: 5, b: .some(10), c: .none)
    """
    {"a": 5, "b": 10}
    """.data(using: .utf8)!,

    // Succeeds, (a: 5, b: .none, c: .none)
    """
    {"a": 5, "b": null}
    """.data(using: .utf8)!,

    // Fails, "a" cannot be null
    """
    {"a": null, "b": 10, "c": 15}
    """.data(using: .utf8)!,

    // Fails, "a" may not be missing
    """
    {"b": 10}
    """.data(using: .utf8)!,

    // Fails, "b" may not be missing
    """
    {"a": 5}
    """.data(using: .utf8)!
]

let decoder = JSONDecoder()
decoder.nullDecodingStrategy = .explicit

for payload in payloads {
    do {
        let foo = try decoder.decode(Foo.self, from: payload)
        print(foo)
    } catch {
        print("Failed to parse '\(String(data: payload, encoding: .utf8)!)': \(error)")
    }
}

Having not worked with GraphQL directly, please enlighten me โ€” in what use cases is it important for a consumer of Foo to know whether one of its fields was missing vs. null?

My concern about distinguishing between Optional and Optional<Optional> is multi-fold:

  1. The current behavior for handling of optionals falls out very nicely from a very simple implementation on Optional that follows from general rules: encoding an Optional<T> encodes the inner T if it is present, regardless of what T is โ€” it could be a type like Int (in so, unwrapping Int? โ†’ Int), or it could be an optional itself. Encoding an Int?? encodes an Int?, which encodes an Int; the rule for the Optional itself is very simple. (Decoding is similar here โ€” decoding an Int?? decodes an Int? which decodes an Int, wrapping the values as we unwind.)

    However, when we give preferential treatment to Optional<T> where T == Optional<U>, we can no longer rely on these general rules and care about the specifics of what T is. Besides the runtime hassle of unwrapping Optionals (because we can't yet ask if T is Optional because Optional is generic), we have to start checking this on every decode call, which is less efficient.

  2. Giving preferential treatment to such a type also raises more questions โ€” what does Int??? then imply? Int???? (i.e. what makes Optionalโฟ<T> special for n == 2?)

  3. It's not always valid to read into the type annotation in this way. Consider the following:

    struct Field<T> {
        // We might have a T, who knows.
        let name: String
        let value: T?
    
        init(name: String, value: T? = nil) {
            self.name = name
            self.value = value
        }
    }
    
    struct MyType {
        let foo: Field<Int?>
    }
    

    Through an ostensibly reasonable composition of types, we've ended up with a Field whose value is an Int??. Is it reasonable to read into the meaning of that on its own? Maybe, maybe not, but the decoder certainly can't distinguish between what you meant and what it got

Given this, and the fact that T?? types can be difficult to consume correctly (if you're not careful about how you unwrap, you could end up throwing away information; this is just my personal opinion, though), would it not be nicer to work with a field type other than Optional<Optional<T>> which does actually semantically indicate the difference between the field being missing and the value being null? (You would lose automatic synthesis, yes, but you can add extensions to the container protocols that make it really easy to decode a Field<T> [where Field can be an enum that strongly distinguishes between .none and .missing].)

1 Like

I know this is old, but FWIW I'm currently consuming a feed where the publisher has documented that a key not being present is semantically different than where the key is present but the value is null, and I need to respond accordingly to those semantic differences.

Is creating my own keys and decoding field by field with all the boilerplate that @itaiferber proposes above still the only viable approach to manage this?

That will work. You might also encapsulate the same logic inside a propertywrapper and apply that to your property.

Implementing init(from:) yourself is currently the only way to handle this, since neither decode(_:forKey:) nor decodeIfPresent(_:forKey:) will do what you want:

  1. decode(_:forKey:) will throw an error if the key is missing
  2. decodeIfPresent(_:forKey:) won't distinguish between a missing key and a null value

Distinguishing between non-existent key and a null value requires using contains(_:), and you'd need to decide how to handle the result. This, for example, is one way to represent the data (but not the only way):

struct S: Decodable {
    // Double-Optional:
    // * Outer layer indicates key missing vs. present
    // * Inner layer indicates value `null` vs. present
    var x: Int??

    init(from decoder: any Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        if container.contains(.x) {
            x = container.decode(Int?.self, forKey: .x)
        } else {
            x = nil
        }
    }
}

FWIW, if your type is Encodable too, you could leverage the fact that letting the compiler synthesize Encodable conformance will generate the keys for you, and then use those keys in your init(from:).


Unfortunately, you can't handle this with a property wrapper because of how property wrappers are represented under the hood, and how they interact with compiler synthesis.

A type like

struct S {
    @MyPropertyWrapper var x: Int?
}

generates code that looks approximately like

struct S {
    private var _x: MyPropertyWrapper<Int?>
    var x: Int? {
        get { _x.wrappedValue }
        set { _x.wrappedValue = newValue }
    }
}

This produces an init(from:) that looks like

init(from decoder: any Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    _x = try container.decode(MyPropertyWrapper<Int?>.self, forKey: .x)
}

Because of how the synthesized code is structured, an error would get thrown on a missing key before the property wrapper code would ever get called, one level deeper.

(A bit more info on this in Cleaner Way to Decode Missing JSON Keys? - #7 by itaiferber and [Pitch] Nullable and Non-Optional Nullable Types - #17 by itaiferber)


There's also quite a bit of prior discussion in threads like

5 Likes

Thanks. I'll give that a go and reach back if either results of questions. Appreciate your help.

1 Like

Yeah, one need to use explicit custom wrapper type instead of a property wrapper to generalize this:

let value: ExplicitNullCodable<Int>?

Then in case of a missing field, value itself would be nil. And the wrapper type can be

struct ExplicitNullCodable<Value>: Codable where Value: Codable {
    let value: Value?

    init(from decoder: any Decoder) {
        // implement on your own 
    }
}

Then it would more likely need a bunch of conformances to โ€œexpressible by literalโ€ to use it conveniently.

I think this is the thing to look at first:

[1, 2, nil, 4]

Convert this to json? Easy, it could be modelled with [Int?].

Now, what about this one:

[1, 2, nil, "4"]

:thinking:

How about:

enum Value: Equatable {
    case int(Int)
    case string(String)
    case null
}

struct Null {}
With some obvious boilerplate
extension Value: Codable {
    func encode(to encoder: Encoder) throws {
        var container = encoder.singleValueContainer()
        switch self {
            case .int(let v): try container.encode(v)
            case .string(let v): try container.encode(v)
            case .null: try container.encode(Null())
        }
    }
    
    init(from decoder: Decoder) throws {
        let container = try decoder.singleValueContainer()
        if let v = try? container.decode(Int.self) {
            self = .int(v)
        } else if let v = try? container.decode(String.self) {
            self = .string(v)
        } else if container.decodeNil() {
            self = .null
        } else {
            fatalError("TODO")
        }
    }
}

extension Null: Codable {
    func encode(to encoder: Encoder) throws {
        var container = encoder.singleValueContainer()
        try container.encodeNil()
    }
    init(from decoder: Decoder) throws {
        fatalError("unused")
    }
}

Test app:

var items: [Value] = [.int(1), .int(2), .null, .string("4")]
let data = try! JSONEncoder().encode(items)
let s = String(data: data, encoding: .utf8)!

outputs:

[1, 2, null, "4"]

and back:

let items2 = try! JSONDecoder().decode([Value].self, from: data)
print(items2)
precondition(items == items2) // ok

This way looks more straightforward / obvious than playing tricks with "??"

Another take:

struct S: Codable, Equatable {
    let one: Int
    let two: String
    let three: MyOptional<Int>
    let four: MyOptional<Int>
}

Encoding this into JSON gives:

{"one":1,"three":null,"two":"2","four":4}
MyOptional type (quite small)
enum MyOptional<Wrapped: Codable & Equatable>: Codable & Equatable, ExpressibleByNilLiteral {
    init(nilLiteral: Void) { self = .none }
    case none, some(Wrapped)
    init(from decoder: Decoder) throws {
        let container = try decoder.singleValueContainer()
        if container.decodeNil() {
            self = .none
        } else {
            self = .some(try container.decode(Wrapped.self))
        }
    }
    func encode(to encoder: Encoder) throws {
        var container = encoder.singleValueContainer()
        switch self {
            case .none: try container.encodeNil()
            case .some(let wrapped): try container.encode(wrapped)
        }
    }
}
Test app
var value = S(one: 1, two: "2", three: nil, four: .some(4))
let encoder = JSONEncoder()
let data = try! encoder.encode(value)
let s = String(data: data, encoding: .utf8)!
let value2 = try! JSONDecoder().decode(S.self, from: data)
precondition(value == value2)
print(s)

โ€”โ€”โ€”โ€”โ€”โ€”

Edit: this fails to decode json with absent key:

{"one":1,"two":"2","four":4}

:slightly_frowning_face:

An extension on KeyedDecodingContainer is required to decode wrapped or custom optional types.

I use this property wrapper technique.

1 Like

Yes, for the same reason above: compiler synthesis only generates decodeIfPresent() calls for Optional. We've, er, had this discussion before, on more than one occasion :sweat_smile:


Although this works, it's very important to understand why and how it works, and take into account that this is a very fragile solution that is hard to recommend.

An extension to KeyedDecodingContainer only works because:

  1. Codable compiler synthesis currently manipulates the AST directly, generating code as-if you had written the source code yourself; because of this, actual resolution of method overloads allows type checking to select more specific overloads for specific calls (e.g., those found in extensions)
  2. An extension in your module makes that overload available for the compiler to select within your module (and modules you make it visible to)

It's really important to keep in mind that this does not actually change the behavior of KeyedDecodingContainer at all, just that it monkey-patches a new overload of the method with the same name that makes it visible at synthesis time. This means that:

  1. Modules which don't have this extension at compile-time will silently behave differently from ones that do. This can happen when, e.g.,
    • Module A declares an internal extension, and Module B encodes and decodes the same type
    • Module A declares a public extension, but Module B is compiled before it and encodes and decodes the same type
  2. You can accidentally and silently end up in situation (1) by moving types/extensions between sibling modules
  3. You can't override behavior for modules you don't control
  4. If other modules also define a similar extension, their version will win out over yours when they compile
  5. If compiler synthesis changes (e.g., it starts performing Codable method lookup only in the stdlib), this is liable to silently break
1 Like

I know, yet it is surprising me every time! :rofl:

2 Likes

The warning about its limited usage noted, but I could see bits and pieces every here and there but not a standalone working out of the box sample. Could anyone show me the whole thing? And does it work only with the keyed containers, not with singleValueContainer I am trying to use above?

There really isn't very much to this trick (so you've probably already seen all of it). Let's say you've got a type that looks like

struct S {
    static func foo<T>(_ t: T) {
        print("S.foo<\(T.self)>(\(t))")
    }
}

You can call it like so:

S.foo(42) // => S.foo<Int>(42)

struct MyType {}
S.foo(MyType()) // => S.foo<MyType>(MyType())

If you define an extension that overloads foo for your custom type,

extension S {
    static func foo(_ mt: MyType) {
        print("Custom S.foo<MyType>(\(mt))")
    }
}

then you get

S.foo(42) // => S.foo<Int>(42)
S.foo(MyType()) // => Custom S.foo<MyType>(MyType())

At the callsite of S.foo(MyType()), the compiler gets to choose between S.foo<T>(_: T) and S.foo(_: MyType), and it selects the more specific option when it can.


The principle is the same: if you define

extension KeyedDecodingContainer {
    func decode(_ t: MyType.Type, forKey key: Key) -> MyType {
        // ...
    }
}

then

init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    myVal = try container.decode(MyType.self, forKey: .myVal)
}

will call the more specific KeyedDecodingContainer.decode(_: MyType.Type, forKey: Key) instead of the generic one.

The "trick" is that Codable synthesis will output code as if you had written the above directly in a source file, so the compiler has the opportunity to select between the overloads; if it happened at a later stage in compilation, this wouldn't work.

This works with all types.

If you add an extension to singleValueContainer and call it from the same module, it will get picked up. The compiler only ever synthesizes code using keyed containers, though, so this would only be relevant to custom init(from:) implementations in your code which you call directly.

3 Likes

Thank you, what you write makes sense... But KeyedDecodingContainer is a struct while SingleValueContainer is a protocol, is that what prohibits me using this approach?

enum MyType: Codable { case none, some }

struct S1: Codable {
    var a = 0
    var b: MyType = .some
}

typealias S2 = [MyType]

extension KeyedDecodingContainer {
    func decode(_ t: MyType.Type, forKey key: Key) -> MyType {
        self.contains(key) ? .some : .none
    }
}
extension SingleValueDecodingContainer {
    func decode(_ type: MyType.Type) throws -> MyType {
        fatalError("never here") // not hitting this ๐Ÿ›‘
    }
}
let data1 = "{\"a\":1, \"b\":2}".data(using: .utf8)!
let value1 = try! JSONDecoder().decode(S1.self, from: data1) // โœ…
print(value1)

let data2 = "[42]".data(using: .utf8)!
let value2 = try! JSONDecoder().decode(S2.self, from: data2) // ๐Ÿ›‘ runtime trap
print(value2)

No, you're hitting a few unrelated issues here:

  1. Because MyType doesn't conform to RawRepresentable with a RawValue of one of the Codable primitives, it gets synthesized conformance, and synthesized conformance for enums produces dictionaries.

    The runtime trap you get when decoding value2 is actually a type mismatch, and your try! turns it into a trap:

    do {
        let value2 = try JSONDecoder().decode(S2.self, from: data2)
        print(value2)
    } catch {
        print(error)
        // => Swift.DecodingError.typeMismatch(Swift.Dictionary<Swift.String, Any>, Swift.DecodingError.Context(codingPath: [_CodingKey(stringValue: "Index 0", intValue: 0)], debugDescription: "Expected to decode Dictionary<String, Any> but found number instead.", underlyingError: nil))
    }
    
    
  2. Even if you give MyType a RawValue (e.g., enum MyType: Int, Codable { ... }) which will allow it to decode as an Int, neither JSONDecoder().decode(MyType.self, from: ...) nor JSONDecoder().decode([MyType].self, from: ...) will actually end up hitting your extension, because:

    1. decode(MyType.self, from: ...) calls MyType.init(from:) directly on the top-level data, and
    2. decode([MyType].self, from: ...) has no need to construct a single-value container internally for every value in the array; instead, it will call MyType.init(from:) directly for every contained value, bypassing your code

Here's an example that does hit this code path:

import Foundation

enum MyType: Int, Codable {
    case none, some
}

struct S: Decodable {
    let t: MyType

    init(from decoder: any Decoder) throws {
        let container = try decoder.singleValueContainer()
        t = try container.decode(MyType.self)
    }
}

// Comment this in or out to see the change in behavior.
/*
enum MyError: Error {
    case err
}

extension SingleValueDecodingContainer {
    func decode(_: MyType.Type) throws -> MyType {
        throw MyError.err
    }
}
*/

let json = "1".data(using: .utf8)!
do {
    let value = try JSONDecoder().decode(S.self, from: json)
    print(value) // without extension => S(t: MyType.some)
} catch {
    print(error) // with extension => err
}
2 Likes

Nice!

Although it looks like I'm still back to what we've discussed a few times before: Optional has superpowers that I can't simulate:

import Foundation

enum MyOptional: Codable { case none, some, someOther }

struct MyOptionalDoublyWrapped: Codable {
    let t: MyOptional
    init(_ t: MyOptional) { self.t = t }
    init(from decoder: any Decoder) throws {
        let container = try decoder.singleValueContainer()
        t = try container.decode(MyOptional.self)
    }
    func encode(to encoder: Encoder) throws {
        fatalError("TODO")
    }
}
struct SomethingReal: Codable {
    var x = 0
    var y: MyOptionalDoublyWrapped = .init(.some)
}

extension SingleValueDecodingContainer {
    func decode(_: MyOptional.Type) throws -> MyOptional {
        if decodeNil() {
            return .none
        } else if (try? decode(Int.self)) != nil {
            return .some
        } else {
            return .someOther
        }
    }
}
let data1 = ###"{"x":1, "y":42}"###.data(using: .utf8)!
let value1 = try JSONDecoder().decode(SomethingReal.self, from: data1)
print(value1) // โœ… .some

let data2 = ###"{"x":1, "y":null}"###.data(using: .utf8)!
let value2 = try JSONDecoder().decode(SomethingReal.self, from: data2)
print(value2) // โœ… .none

let data3 = ###"{"x":1}"###.data(using: .utf8)! 
let value3 = try JSONDecoder().decode(SomethingReal.self, from: data3)
// ๐Ÿ›‘"No value associated with key CodingKeys(stringValue: \"y\"
print(value3)