Decoding of optionals missing in json

tera · September 30, 2021, 9:03am

extracting from another thread. i believe this behaviour is broken and needs fixing:

struct S: Decodable {
    var x: Int
    var y: Optional<Int>
}

let data = "{\"x\": 1}".data(using: .utf8)!
let x = try! JSONSerialization.jsonObject(with: data, options: [])

this must fail, as y is not marked as having default value. it currently doesn't fail in any form (including short-hand Int?). the current behaviour doesn't allow you to opt-opt of this "no key ==> nil" behaviour. the fixed version would make it more flexible and allow to differentiate:

"{ x:1}"
    var: y: Int?       ===>  ERROR *** not the current behaviour
    var y: Int? = nil  ===>  y=nil
    var y: Int? = 3    ===>  y= Optional(3)  *** not the current behaviour

"{ x:1, y: null}"
    var: y: Int?       ===>  y=nil
    var y: Int? = nil  ===>  y=nil
    var y: Int? = 3    ===>  y=null

"{ x:1, y: 2}"
    var: y: Int?       ===>  y=Optional(2)
    var y: Int? = nil  ===>  y=Optional(2)
    var y: Int? = 3    ===>  y= Optional(2)

this is a breaking change. cases similar to "var y: Int? = 3" would be rare in practice. cases similar to "var: y: Int?" would cause runtime errors when "y" is missing in json (desired behaviour).

please note that this issue is separate to the infamous "Pre-pitch: remove the implicit initialization of Optional variables". both full Optional and short T? forms are affected.

Jumhyn · September 30, 2021, 9:20am

I don’t think I love the idea that default values of properties would affect the behavior of the synthesized init(from:), personally. I’ve never had a situation where it was dire to differentiate the “present but null” and “not present” situations, but in such cases it seems like you should be able to write a custom @RequireExplicitNull property wrapper that achieves the desired behavior.

xwu · September 30, 2021, 1:01pm

I do not understand the code block that follows. What language is the code written in, and what are you trying to demonstrate with it?

tera · September 30, 2021, 1:55pm

uncompressed version

func foo(_ json: String) {
	let data = json.data(using: .utf8)!
	let x = try! JSONSerialization.jsonObject(with: data, options: [])
	print(x)
}

---------
// case 1, no default value
struct S: Decodable {
	var y: Optional<Int>
}

// case 1.1, no y in json
foo("{}")
expected behaviour: runtime error "y key is absent"
actual behaviour: prints: { y: nil } 🐞🐞🐞🐞🐞🐞🐞🐞

// case 1.2, null y in json
foo("{\"y\": null }")
expected / actual behaviour: prints: { y: nil }

// case 1.3, y=2 in json
foo("{\"y\": 2 }")
expected / actual behaviour: prints: { y: Optional(2) }
---------
// case 2, default value = nil
struct S: Decodable {
	var y: Optional<Int> = nil
}

// case 2.1, no y in json
foo("{}")
expected / actual behaviour: prints: { y: nil }

// case 2.2, null y in json
foo("{\"y\":null }")
expected / actual behaviour: prints: { y: nil }

// case 2.3, y=2 in json
foo("{\"y\":2 }")
expected / actual behaviour: prints: { y: Optional(2) }

---------
// case 3, default value = 3
struct S: Decodable {
	var y: Optional<Int> = 3
}

// case 3.1, no y in json
foo("{}")
expected behaviour: prints: { y: Optional(3) }
actual behaviour: prints: { y: nil } 🐞🐞🐞🐞🐞🐞🐞🐞

// case 3.2, null y in json
foo("{\"y\":null }")
expected / actual behaviour: prints: { y: nil }

// case 3.3, y=2 in json
foo("{\"y\":2 }")
expected / actual behaviour: prints: { y: Optional(2) }

itaiferber · September 30, 2021, 2:07pm

FWIW, this was an explicit design decision made when designing Codable: an Optional property indicates that a specific property need not be present in the encoded data — whether it is not present because the key was missing or the value was null doesn't factor into the default behavior.

You can override this behavior if need be by choosing decode(_:forKey:) over decodeIfPresent(_:forKey:) if you need to disambiguate, but the common case rarely cares and a more permissive default helps more code use a synthesized init(from:).

As you note in your examples, too, default values for variables are actually never taken into account when synthesizing init(from:), and this was purposeful too: a default value applied to a variable is not necessarily a reliable indicator of explicit permission to ignore encoded data. For example, many properties may have sensible defaults outside of decoding data, and may developers might prefer to assign default values to those properties (e.g. to benefit from getting synthesized initializers), but those values might be different from what's expected during decoding. null values may be significant, even if the default is not nil.

So code synthesis is conservative and considers the data you're decoding from to be the source of truth, and if you'd like to customize the behavior, it's absolutely possible to do. (Synthesis can't write any code that you couldn't have written yourself.)

In any case, changing behavior here is likely to be massively behavior-breaking. This is something that we'd need to be extremely cautious about, because this would be behavior-breaking in code that developers did not write themselves.

tera · November 13, 2021, 9:08pm

The new behaviour can be an explicit opt-in, e.g. a new strategy for JSONDecoder, in this case it wouldn't be breaking existing code. Symmetrical to that could be a new opt-in strategy for JSONEncoder that would emit nil values as "a": nil rather than omitting them.

tera · November 13, 2021, 9:53pm

combining this with the idea just discussed in another thread, all in all the new opt-ins could be like so (all names are preliminary):

enum KeyDecodingStrategy {
    /// With this opt-in strategy missing in JSON optional fields with not get overridden with nil
    ///
    /// Example:
    ///  struct Foo {
    ///      var x: Int
    ///      var y: Int?
    ///      var z: Int! = 123
    ///      var w: Int? = 456
    ///  }
    ///
    ///  JSON: { "x": 1 }
    ///  Result with this strategy OFF: { x: 1, y: nil, z: nil, w: nil }
    ///  Result with this strategy ON: { x: 1, y: nil, z: Optional(123), w: Optional(456) }
    ///
    case dontOverrideFieldsThatHaveDefaultValues

    /// This strategy will require the presence of optional fields.
    ///
    /// Example:
    ///  struct Foo {
    ///      var x: Int?
    ///      var y: Int? = nil
    ///      var z: Int? = 123
    ///  }
    ///
    ///  JSON: {}
    ///  Result with this strategy OFF: { x: nil, y: nil, z: nil }   // dontOverrideFieldsThatHaveDefaultValues = off
    ///  Result with this strategy OFF: { x: nil, y: nil, z: Optional(123) }   // dontOverrideFieldsThatHaveDefaultValues = on
    ///  Result with this strategy ON: // runtime error, x & y fields are missing
    ///
    case requreFieldsThatDontHaveDefaultValues
}

enum KeyEncodingStrategy {

    /// Emit nil values
    ///
    /// Example:
    ///  struct Foo {
    ///      var x: Int
    ///      var y: Int?
    ///  }
    ///
    ///  Value { x: 1, y: nil }
    ///  Result with this strategy OFF: "{ "x": 1 }
    ///  Result with this strategy ON: "{ "x": 1, "y": nil }
    ///
    case emitExplicitNil
}

roosterboy · November 14, 2021, 12:21am

For encoding, it should be null, not nil, yes?

tera · November 14, 2021, 12:45am

sure, a typo, should be null in json form.

itaiferber · November 15, 2021, 2:35am

Unfortunately, with the current state of how Codable synthesis works, it's not quite as simple as introducing new strategies to various encoders and decoders — we would also need to update how the compiler synthesizes the implementation of init(from:) and encode(to:) to take advantage of this somehow. Let's take a look at why. [I'll be focusing largely on the decoding side of things here, but the same applies to possible new encoding strategies.]

Taking your original example of

struct S: Decodable {
    var x: Int
    var y: Optional<Int>
}

we can expand the Decodable conformance as the compiler would. To somewhat simplify the process, synthesizing Encodable/Decodable conformance does the following:

Checks to see whether the type has a CodingKeys enum
- If so, validates the type to ensure each CodingKeys case maps to a property by name, and that all non-computed properties map to a case in the CodingKeys enum
- If not, synthesizes a CodingKeys enum which satisfies the above, if possible
Implements init(from:)/encode(to:) by synthesizing a method which
- Creates a keyed container
- Iterates over each property in the type's CodingKeys enum, calling the matching encode(..., forKey:)/decode(..., forKey:) on the container for each key in the enum
  - Optional types have decodeIfPresent called for the type, rather than decode, as mentioned above

When I say "synthesize" here, I mean that the compiler inserts new nodes into the AST which represents the code as parsed from your file — and there is no magic here. The insertions the compiler performs behave exactly as if you had typed the lines of code into your source file directly.

Expanding the above rules for S, you get the equivalent of

struct S: Decodable {
    var x: Int
    var y: Optional<Int>

    private enum CodingKeys: CodingKey {
        case x
        case y
    }

    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        x = try container.decode(Int.self, forKey: .x)
        y = try container.decodeIfPresent(Int.self, forKey: .y)
    }
}

This behaves exactly as if you had typed this into a source file yourself, with nothing special about it. I don't have the tools to compile the above (and dump the AST for comparison), but you should be able to confirm this (barring any typos or minor details I've missed above).

To the issue at hand: note that at no point in synthesis does the compiler take into consideration whether y has a default value, nor for that matter, whether y is declared as Int? or Optional<Int>. It could, because it has access to the source in your file, but it doesn't at the moment. This means that right now, the specific spelling of the type of y is not present in any way in the implemented init(from:), nor is y's default value. This is an issue for your suggested strategy idea, because it means that JSONDecoder has no access to either that spelling, nor whether a default value exists — as is, at the moment, it would not be possible to implement this.

Could this be implemented? Absolutely. You can imagine new additions to the Decoder protocol which offer an opportunity for the compiler to pass this information over to a Decoder with something like:

protocol KeyedDecodingContainerProtocol {
    func decode<T: Decodable>(_ type: T.self, forKey key: Key, defaultValue: T) throws -> T
    func decodeIfPresent<T: Decodable>(_ type: T.self, forKey key: Key, defaultValue: T??, spelling: Spelling) throws -> T?
}

and in the case of S or Foo above, you'd see the compiler generate something like:

// S
x = try container.decode(Int.self, forKey: .x)
y = try container.decodeIfPresent(Int.self, forKey: y, defaultValue: .none, spelling: .desugaredOptional)

// Foo
x = try container.decode(Int.self, forKey: .x)
y = try container.decodeIfPresent(Int.self, forKey: .y, defaultValue: .some(nil), spelling: .sugaredOptional)
z = try container.decodeIfPresent(Int.self, forKey: .z, defaultValue: 123, spelling: .sugaredIUO)
w = try container.decodeIfPresent(Int.self, forKey: .w, defaultValue: 456, spelling: .sugaredOptional)

Then, any decoder which decodes a Foo could decide what to do for each of those fields.

Some consequences of this approach:

The container types and protocols need overloads for all of these types, with default implementations in terms of the older methods which throw the info away
- This muddies the water a bit for authors of custom init(from:) and encode(to:), because these methods are unlikely to provide much of a benefit for them (and I'll elaborate on this a bit below*) — this might significantly expand the API surface, but only really for the benefit of synthesized code
- This makes life a bit more difficult for Encoder/Decoder authors, because they have a new set of methods they must implement, and if they forget an overload to one of the methods, the compiler cannot help them because it has a default implementation. (This leads to subtle broken behavior for clients)
The compiler implementation of synthesis needs to be updated to support this, and existing source code needs to be recompiled in order to re-synthesize implementations
- If you're a consumer of a binary framework, you might be surprised to see existing types fail to respect these new options until you get a newly recompiled version of the framework (or vice versa — you may have to watch new versions of the framework to catch behavioral differences and account for them)

There are definitely other solutions out there which might work better, but about this one specifically: IMO, there's non-trivial risk in all of the potential corner cases; it also feels a bit strange and clunky that the compiler needs to "leak" source-specific information to Encoders and Decoders at runtime to implement this.

However: if we'd need to update the compiler to support this anyway, it's worth considering how far we could get if we just updated the compiler, without having to touch libraries at all. *Callout from above: if you already want the behavior of .dontOverrideFieldsThatHaveDefaultValues for a type like Foo, how might you write init(from:) yourself?

One possibility is

init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    x = try container.decode(Int.self, forKey: .x)
    y = try container.decodeIfPresent(Int.self, forKey: .y)

    // Don't overwrite `z` from its current default if no value is present.
    if let _z = try container.decodeIfPresent(Int.self, forKey: .z) {
        z = _z
    }

    // Don't overwrite `w` from its current default if no value is present.
    if let _w = try container.decodeIfPresent(Int.self, forKey: .w) {
        w = _w
    }
}

Note that the above is entirely possible with existing language features in source, without needing any library updates. In other words, if someone were already writing init(from:) themselves, the introduction of the decode*(..., defaultValue: T...) methods suggested above (or similar) might not be beneficial.

This solution still isn't perfect, as:

Changing this to become the new default behavior is extremely behavior-breaking without user intervention, which is a no-go
- Likely, you'd want some way to opt in to this on a type- or property-basis, but there's no current spelling for annotating a property in this way. There have been a lot of suggestions for how to do this over the years, but no solution has felt unanimously "right" (and significantly and clearly better than just writing your own init(from:))
You can't override this for a specific Decoder, unlike the other solution (a pro in some cases, and a con in others)
Same binary framework concerns as above: without recompilation, code won't update to make use of this, and it may be problematic to mix-and-match code compiled with different compiler versions

To be clear, none of these problems are intractable, and I don't mean to imply as such (or that this isn't worth addressing in some way!) — with enough motivation, it's possible to come up with a clear story for all of this. But there are challenges which make this unfortunately not quite as easy as adding a strategy, and questions we'd need to answer before we can make this happen.

tera · November 15, 2021, 9:07am

Wow, thanks a lot for looking at it and for such a detailed and deep analysis!
It is indeed more complicated than might appear on the surface.

Would it help if the decoder API had a new function that accepted a value instead of a type? Then client would pass the already created and filled "template" value, some of the fields of that value will get overridden from JSON while the rest fields will stay intact. Perhaps that should work only for "var" labeled fields, not "let".

Below is a quick and dirty decoder prototype idea that exploits this direction. It needs "offset_of" and "size_of" that I have no idea how to make (short of generating them manually with a switch of MemoryLayout.offset / size calls... but that in turn would be the boilerplate we were trying to get rid of to begin with!)

// a quick & dirty decoder prototype. only POD types for now.
// here a value is passed as a parameter, not a type
// value is used as an initial state
// fields present in JSON will override fields those in value
// fields absent in json will be left intact

func decode<T>(_ value: T, from data: Data) -> T {
    let dict = try! JSONSerialization.jsonObject(with: data) as! [String: Any]
    
    var value = value
    withUnsafeMutableBytes(of: &value) {
        let pointer = $0.baseAddress!
        for child in Mirror(reflecting: value).children {
            let name = child.label!
            if let any = dict[name] {
                let offset = offset_of(field: name, inType: T.self)!
                let size = size_of(field: name, inType: T.self)!
                var val: T = convert(fromAny: any)
                memmove(pointer + offset, &val, size)
            }
        }
    }
}

func offset_of<T>(field: String, inType: T.Type) -> Int? {
    fatalError("TODO")
}

func size_of<T>(field: String, inType: T.Type) -> Int? {
    fatalError("TODO")
}

func convert<T>(fromAny: Any) -> T {
    // convert any to T here.
    // e.g. "1.0" <-> 1, Int16 <-> Int, Float <-> Double, etc
    fatalError("TODO")
}

Instead of offset_of / size_of I could have used child.offset and child.size... if there was such a thing and convert is trivial in comparison.

itaiferber · November 15, 2021, 2:57pm

I think as a whole, this might end up running into issues mentioned above (re: updating libraries, the compiler, etc.), but: this might be worth exploring on its own merits.

One often-requested feature that Codable doesn't currently offer is the ability to update an existing model with Codable data, rather than exclusively creating new instances (i.e., func update(from decoder: Decoder) throws). There's some overlap here — you can definitely imagine a type with default values for variables as a template value which can be updated rather than initialized.

The main hurdle to overcome here would be Swift's initialization rules (which would not make sense to relax in the general case): you can't pass a partially-initialized value to a method, or call a method like update(from:) on it. We'd need to either:

Restrict this feature to types with variables which all have default values (which isn't effectively different from the situation today: you still need to initialize all variables before returning from some init, and then update other fields — so this doesn't help a ton if values don't have a reasonable default value)
Rely on unsafe behavior to pass partially-initialized values around, which I think we wouldn't want to do either. The implementation you offer assumes the Decoder would never try to read from any of the fields in value, but it's easy to imagine wanting to switch on some field in value to decide on how to decode; if value has uninitialized fields you try to read from, everything goes out the window
Add some new language features that allow you to pass around uninitalized types, with a type-safe API that can tell you whether a field has been initialized or not (and prevents you from reading from fields which are not), e.g., you can imagine a PartiallyInitialized<T> with KeyPath-based accessors that access the underlying type
Something else I'm not thinking of

The initialization rule difficulty is a bit of a larger topic, and I think the rules hold well in general — we wouldn't want to relax or change them. But there's definitely room for thought and exploration around here that could drive this forward, I think.

(As an aside: a solution here may also align well with a post-init feature that folks have also requested in the past for Codable, e.g. for fixing up object graphs with circular references, and otherwise. Obj-C solves this with solution (2) above, but there might be a better solution to pull out and crystallize from this.)