Pitch: `UnkeyedDecodingContainer.moveNext()` to skip items in deserialization

igorkulman · March 26, 2019, 1:24pm

UnkeyedDecodingContainer.moveNext() to skip items in deserialization

Using JSONDecoder if you need to deserialize an heterogeneous array containing classes of multiple types, you use UnkeyedDecodingContainer in a code like this

struct Feed: Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: FeedKeys.self)
        var messagesArrayForType = try container.nestedUnkeyedContainer(forKey: FeedKeys.messages)
        var messages = [Message]()

        var messagesArray = messagesArrayForType
        while(!messagesArrayForType.isAtEnd)
        {
            let message = try messagesArrayForType.nestedContainer(keyedBy: MessageTypeKey.self)
            let type = try message.decode(String.self, forKey: MessageTypeKey.type)
            switch type {
            case .avatar:
                messages.append(try messagesArray.decode(AvatarMessage.self))
            case .add:
                messages.append(try messagesArray.decode(AddMessage.self))
            }
        }
        self.messages = messages
    }
}

The problem

The problem is when you decide to ignore an element of the JSON array

struct Feed: Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: FeedKeys.self)
        var messagesArrayForType = try container.nestedUnkeyedContainer(forKey: FeedKeys.messages)
        var messages = [Message]()

        var messagesArray = messagesArrayForType
        while(!messagesArrayForType.isAtEnd)
        {
            let message = try messagesArrayForType.nestedContainer(keyedBy: MessageTypeKey.self)
            let type = try message.decode(String.self, forKey: MessageTypeKey.type)
            switch type {
            case .avatar:
                messages.append(try messagesArray.decode(AvatarMessage.self))
            case .add:
                messages.append(try messagesArray.decode(AddMessage.self))
            case .remove:
                // skip, no longer needed in the app
                // how to move to the next item in the JSON array?
            }       
        }
        self.messages = messages
    }
}

There is currently no way to skip an item in the JSON array when you do not need it for some reason.

Current workarounds

The best thing you can currently do is to create some kind of a dummy class

private struct DummyCodable: Codable {}

and use it for all the items you want to skip

struct Feed: Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: FeedKeys.self)
        var messagesArrayForType = try container.nestedUnkeyedContainer(forKey: FeedKeys.messages)
        var messages = [Message]()

        var messagesArray = messagesArrayForType
        while(!messagesArrayForType.isAtEnd)
        {
            let message = try messagesArrayForType.nestedContainer(keyedBy: MessageTypeKey.self)
            let type = try message.decode(String.self, forKey: MessageTypeKey.type)
            switch type {
            case .avatar:
                messages.append(try messagesArray.decode(AvatarMessage.self))
            case .add:
                messages.append(try messagesArray.decode(AddMessage.self))
            case .remove:
                _ = try? messagesArray.decode(DummyCodable.self)
            }       
        }
        self.messages = messages
    }
}

Proposed solution

A better solution would be to add UnkeyedDecodingContainer.moveNext(); a new method that moves the index by 1 item, so the there is no need for a workaround

struct Feed: Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: FeedKeys.self)
        var messagesArrayForType = try container.nestedUnkeyedContainer(forKey: FeedKeys.messages)
        var messages = [Message]()

        var messagesArray = messagesArrayForType
        while(!messagesArrayForType.isAtEnd)
        {
            let message = try messagesArrayForType.nestedContainer(keyedBy: MessageTypeKey.self)
            let type = try message.decode(String.self, forKey: MessageTypeKey.type)
            switch type {
            case .avatar:
                messages.append(try messagesArray.decode(AvatarMessage.self))
            case .add:
                messages.append(try messagesArray.decode(AddMessage.self))
            case .remove:
                messagesArray.moveNext()
            }       
        }
        self.messages = messages
    }
}

Other uses

This new method could be also useful if you want to ignore incomplete data

struct Feed: Decodable {
    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: FeedKeys.self)
        var messagesArrayForType = try container.nestedUnkeyedContainer(forKey: FeedKeys.messages)
        var messages = [Message]()

        var messagesArray = messagesArrayForType
        while(!messagesArrayForType.isAtEnd)
        {
            if let message = try? messagesArray.decode(Message.self) {
                 messages.append(message)
            } else {
                 messagesArray.moveNext()
            }
        }
        self.messages = messages
    }
}

Basically adding a way to solve [SR-5953] Decodable: Allow "Lossy" Array Decodes · Issue #4414 · apple/swift-corelibs-foundation · GitHub

Avi · March 26, 2019, 1:28pm

In a stream format, such as JSON, how would the method know how many bytes to skip?

itaiferber · March 26, 2019, 4:02pm

The method wouldn't need to — we should be able to give this method a default implementation which does what folks are already doing w/ the equivalent of try decode(DummyClass.self).

I support this API addition, though I think moveNext() could just be called skip() or skipNext(), which might be a little bit clearer than "move"

Avi · March 26, 2019, 4:55pm

I don't understand how the decoder would know what to skip. Would the assumption be that it should skip a complete object? What if the element is a single value?

duan · March 26, 2019, 4:59pm

UnkeyedDecodingContainer is expected to be backed by a list of values with a "current position" state that indicate the decoding progress. A skip operator would just increase that value by 1.

Maybe the operator should be .skip(by:)?

Avi · March 26, 2019, 5:15pm

I missed that the discussion is about UnkeyedDecodingContainer. Silly me. I still don't understand how it would know the size of the next object if it doesn't know what type it is, but that's not a question for this forum.

igorkulman · March 26, 2019, 5:21pm

Looking at the source code in JSONEncoder.swift it already uses an internal currentIndex: Int to know the position.

So the implementation could probably be as simple as

public mutating func skip() throws {
    guard !self.isAtEnd else {
        throw DecodingError.valueNotFound(Any?.self, DecodingError.Context(codingPath: self.decoder.codingPath + [_JSONKey(index: self.currentIndex)], debugDescription: "Unkeyed container is at end."))    
    }

    self.currentIndex += 1
}

skip() seems to be a better name than my original moveNext() idea.

itaiferber · March 26, 2019, 6:01pm

This is left up to the format and the parser. All used formats that I can think of have enough information to be able to distinguish type information without requiring explicit input from a driver.

For instance, a JSON decoder needs to look ahead by one token to tell if it's looking at an object, an array, or number/string/null/boolean value. Skipping individual numbers/strings/null/boolean values are easy — for arrays and dictionaries, the parser would need to parse until the end of the object in order to skip the whole thing (i.e., the closing } or ]).

To give a concrete example:

Unkeyed container:
[ 42, "hello", [1, 2, 3], { "hello": "world" }, null ]
  ^~~ current index
      ^~ after 1 skip
               ^~~ 2 skips
                          ^~~3 skips
                                                ^~~ 4 skips

Keep in mind the unkeyed container does not represent the flattened linear representation of the above (i.e. [42, "hello", 1, 2, 3, "hello", "world", null]), so there's still structure to be able to skip.

Avi · March 26, 2019, 6:07pm

I have to deal with XDR for the Stellar blockchain. It encodes to a byte stream, and there is no way to know, from the data itself, what data type you are looking at. It could be anything from a Int32 to a UInt8 buffer to a user-defined type. There are no tags in the data to distinguish fields.

itaiferber · March 26, 2019, 6:12pm

How does your decoder currently handle

struct EmptyStruct : Codable {}

struct Container : Decodable {
    init(from decoder: Decoder) throws {
        var container = try decoder.unkeyedContainer()
        try container.decode(EmptyStruct.self)
        try container.decode(Int.self)
    }
}

? This is isomorphic to that (and in fact, the default implementation I'm imagining will actually simply decode an empty struct type like this).

The method here will need to look like

public mutating func skip(by count: Int = 1) throws

so the throws here could be one escape hatch for you.

jrose · March 26, 2019, 6:43pm

I think we have to choose whether it's okay for some decoders not to support all the features of Decodable, or whether things like XDR or @Mike_Ash's toy binary coder are "not real coders".

gwendal.roue · March 26, 2019, 6:50pm

I think it's common for decoders to fatalError in unsupported coding scenarios. For example, some decoders are "flat" and don't support nested objects (think a database row). They are still "real coders", but with a limited feature set.

Jumhyn · March 26, 2019, 6:55pm

Why not have the skip method accept a generic type argument?

public mutating func skip<T>(_ type: T.Type) throws

Avi · March 26, 2019, 6:56pm

I don't have a decoder based on Codable. One reason is history (Swift 4.1 broke what I had), and now it's due to the inability to encode into different representations with the same implementation. I have need to encode into JSON, for display, and to XDR, for communication.

An EmptyStruct wouldn't be represented at all within XDR, if it had no encodable fields.

igorkulman · March 26, 2019, 7:02pm

A generic method would not really solve my initial problem. I would still need to have a dummy decodable class as in the current workaround to be able to skip data I am not interested in.

itaiferber · March 26, 2019, 8:20pm

Relatively up-front, we made the decision that very little of the Encoder/Decoder API would be optional — this is why, for instance, encoding container methods like container(keyedBy:)/unkeyedContainer()/singleValueContainer() don't throw: if you don't support all types of containers in one way or another, your format is likely sufficiently different from what Codable offers that it likely isn't a good fit for the infrastructure.

Sometimes this means that some encoders to some formats might need to do additional work to offer compatibility with Codable features — this might mean a format that doesn't natively support dictionaries would instead encode key-value pairs, or that a format like XDR (which offers no identifying tokens) would need to insert breadcrumbs to indicate some amount of type information. [I don't know enough about XDR to know whether this is feasible; I suspect that the answer is "no", but it's entirely possible that XDR, for instance, is not a good fit for `Codable`]

In general, contrary to @gwendal.roue's suggestion, I would say that fatalError is rarely the right answer — instead, encoders should do extra work to accommodate differences between the runtime representation and the encoded representation of their values. (The specifics of this vary by format, but that has always been our intention, at least. Nothing prevents you from fatalErroring, though.)

In this case at least, a default implementation should be reasonably possible.

igorkulman · March 28, 2019, 7:05pm

About the best way to implement this, I am thinking about

Adding mutating func skip() throws to the UnkeyedDecodingContainer protocol in Codable.swift.yb
Adding and extension to the UnkeyedDecodingContainer protocol in Codable.swift.gyb doing the same thing I currently do, decoding an empty struct

// Default implementation of skip() in terms of decoding an empty struct
struct Empty: Decodable { }

extension UnkeyedDecodingContainer {
  public mutating func skip() throws {
    _ = try decode(Empty.self)
  }
}

The empty struct does not seem very elegant but this should be a reasonable default implementation that should work for the JSONDecoder and some other decoders.

Any custom decoder can then implement the method in specific way as needed or maybe throw a fatalerror if this method is really not suitable for it.

itaiferber · March 28, 2019, 8:44pm

This was how I would implement it as well. JSONDecoder and PropertyListDecoder can then do something a bit more efficient by just incrementing their index (since at the moment, contents are already decoded up-front).

Pending going through API review here via Swift evolution, this sounds entirely reasonable.

igorkulman · April 5, 2019, 1:54pm

Created evolution PR Add skip() to UnkeyedDecodingContainer by igorkulman · Pull Request #1012 · apple/swift-evolution · GitHub and implementation PR Add skip() to UnkeyedDecodingContainer by igorkulman · Pull Request #23707 · apple/swift · GitHub

wvezey · October 2, 2021, 3:32am

In the case where decoding an unkeyed container throws an error, currentIndex does not increment and the iteration through the container effectively comes to a halt because the error caches; the next item in the array throws the error of the offending item, regardless of whether it is compliant with the struct keys. Of course, I am probably missing something obvious, and am open to guidance. A moveNext() method would come in quite handy in the catch block if you wanted to continue to inspect the items in the unkeyed container, provided that it cleared the container's memory of the previous error.

Allow me to add that the decoding DummyCodable struct solution works in the case of a caught error. When executed in the catch block, the container moves forward to the next item in the array. Many thanks for posting that workaround.