nestedContainer in KeyedEncodingContainer and UnkeyedEncodingContainer should be throwing

cherrywoods · April 1, 2018, 2:56pm

In msgpack there is a size limit for arrays and maps / unkeyed containers and keyed containers.
If implementing an msgpack Encoder, I would wan't to tell the using person that it has reached the size limit for an unkeyed or keyed container if it calls encode, but also if it request a nested container or a super encoder. I would wan't to do this by throwing an error in these methods, but they are currently not declared to throw.

Is there reason for these methods to not throw?
I couldn't catch up the full initial design discussion of these features, but I am very interested into detailed explanations of design decisions.

I'm also not sure whether I am writing this in the right place. It seems a bit to "small" for this forum regarding the other threads in here. Should I just have opened a pull request? I hope I'm in the right place here, please tell me if I'm not.

Thanks for all comments on this!

ahti · April 1, 2018, 5:21pm

I've had a similar problem in a project of mine: When encoding things into SQLite statements, I wanted to restrict nested containers to only single-value containers. I solved this by creating Throwing(...)Container classes that do nothing but throw an error when any method is called.

This is the source code. It's not very elegant, but it works. Note that some things like empty arrays will request containers, but then not do anything with them. If that is a concern for you, search the file I linked for nestedNonSingleValueContainerRequested for another workaround .

Overall I agree these methods (probably all methods in Encoder and Decoder) should be allowed to throw, but simply changing this now would break source compatibility, so I'm not sure what the best way to fix this would be.

cherrywoods · April 1, 2018, 5:44pm

Thanks for your suggestion. I think I won't implement it that way because it is way simpler for me to check for array and dictionary sizes in the next encoding step, when all Codable types are already converted to containers.

Good point about source compatibility, I wasn't thinking about it, but of course: this is a bigger thing than I thought it was.

With all these methods not allowed to throw and also with the very limited number of EncodingErrors, Codable seems not to be thaaaaaaat well suited for formats with such restrictions (although I'm not fluent in SQL and don't known what you are encoding there exactly... and the size limit in msgpack is from my point of view rare to ever happen, since you need to encode more than 4.294.967.295 element into an unkeyed container and I don't know if those numbers of elements even fits into an Array on 32-bit platforms where Array.Index == Int ≈≈ Int32).

I'm curious what the original consideration was to not make these methods throw.

itaiferber · April 6, 2018, 8:10pm

Sorry for the delay in responding to this — I just got back from vacation, so I'm catching up on a lot of threads here.

The design behind encoding containers intends for them to be lightweight. A request for a container in and of itself does not necessitate that the encoder do something; for instance, JSONEncoder.singleValueContainer() just returns self. There are very few cases in which just requesting a container would be a cause for error; for many (most?) implementations and formats, it's entirely reasonable that requesting a container but doing nothing with it is a no-op.

In that sense, having all container requests be non-throwing was better for ergonomics than having them throw; for the same reason that not all methods throw: if there isn't an expectation of failure, the additional annotation is just noise.

The solution that I was going to suggest is what @ahti has already offered: a good (and relatively simple) way to handle this is to return a container that signals this kind of error by throwing. A simple error flag can allow your Encoder and its containers to always return an ErrorContainer (or similar) which just throws. @ahti's solution is good, but can be shortened up a bit, too.

As for changing this — as you and @ahti noted, backwards compatibility is why we can't change this after the fact (but I think there are enough tools to solve this problem which make it workable overall).

ahti · April 6, 2018, 10:41pm

I'm not so sure about this. I can imagine many reasons why a format could support, for example, limited nesting, size limits, only nesting under specific keys and so on.

My solution would be good for that if it were not significant to only request a container and then do nothing with it, but a common type (Array) will request an unkeyed container and then do nothing with it when empty, which should definitely lead to an empty array in the case of JSON. Thus my solution alone does not address this problem completely, requiring another workaround somewhere.

I'm not sure this problem is significant enough to warrant a change right now (can't think of a way to do it without breaking source compatibility without compiler magic right now), but I do think it is a weak-point of the current design.

itaiferber · April 6, 2018, 10:52pm

Could potentially, yes — but are there common formats which do, and in a way which necessarily require throwing on a container request?

Note that it is not our intent to be able to represent all possible serialization formats and their limitations. From another thread:

It's possible to come up with formats which put such strong limitations on encoding, but I'm not sure how well they would fit the Codable model otherwise.

Indeed, the representation of Array cannot change, but that is to our benefit because we can rely on it not doing anything special. If you wish to allow Array and Dictionary to encode empty containers (while other containers become no-ops): both of those cases fall into encode<T : Encodable>(_: T...), wherein you can check whether the object is an empty array or dictionary and store [:] or [] directly instead of asking them to encode.

cherrywoods · April 7, 2018, 7:40am

Ok, I think that's the point. I think I had a wrong understanding of the whole thing. I thought about it as equal to the container type in the format in the way that requesting such a container was equal to inserting a new msgpack map in the msgpack that was created up to this point in the same way.

When thinking about this more like a help to structure your information in an abstract way and not a direct wire to the format, I don't think I would necessarily use an ErrorContainer, but just throw the error later, when I'm converting the intermediate representation on the encoders storage to the final format in all cases, because it seems a lot easier to me. In terms of JSONEncoder, not JSONEncoder, but JSONSerialization would throw the error.

For my msgpack case, I'm going to check the size limits in this second step anyway, because to me it is really an edge case and I won't add new types to replace one if clause in the later step if they will nearly never be used. The overall problem why I thought about doing this during the container request is that the errors that I can throw from this later point are less informative, because they don't tell which exact instance encoded too many elements, because they won't contain the coding path or the unencodable instance.

For arrays you can also check the size limit in encode<T : Encodable>(_: T...), but when I'm implementing something like this, I always thing about what will happen if someone calls encode(to:) on an array directly in another implementation of encode(to:), for example, because one only wants to encode just this array and isn't aware of the .singleValueContainer method, or does not know why to use it, etc. This also came up quite regularly, when array used to request super encoders and call encode(to:) directly (back in May 2017).
To me it seems like one should never call encode(to:) directly in encode(to:) because it keeps the encoder from doing one of it's essential works. If I'm right about this, it would be good to have some documentation about this, maybe direly at encode(to:).

tonyfreeman · April 7, 2018, 2:49pm

+1 on this. But I think we should make top level api throwing as well. I came up with the same solution to return ErrorContainer. In my case Encoder is using Stream, so it could be closed or ran out of buffer capacity (because there could be thousands of them). I don't think we should freeze intentionally limited api for eternity just because of possible noise. Especially since almost all Codable APIs can throw.

As for the backwards compatibility, it would affect only handcoded implementations of Encodable protocol.

I wouldn't be so worried if we could change it is some time later, but in my understanding we can't do anything about it after Swift 5?

But what if I don't support single value container or unkeyed container?
And don't support nested containers at all, like FormUrlEncodedEncoder.

Isn't try a no-op in Swift?

itaiferber · April 7, 2018, 7:32pm

cherrywoods:

When thinking about this more like a help to structure your information in an abstract way and not a direct wire to the format, I don't think I would necessarily use an ErrorContainer, but just throw the error later, when I'm converting the intermediate representation on the encoders storage to the final format in all cases, because it seems a lot easier to me. In terms of JSONEncoder, not JSONEncoder, but JSONSerialization would throw the error.

For my msgpack case, I'm going to check the size limits in this second step anyway, because to me it is really an edge case and I won't add new types to replace one if clause in the later step if they will nearly never be used. The overall problem why I thought about doing this during the container request is that the errors that I can throw from this later point are less informative, because they don't tell which exact instance encoded too many elements, because they won't contain the coding path or the unencodable instance.

Yep, that's a design decision that you can make based on what makes most sense for the structure you prefer. Errors earlier on can be more informative, but that might not make any sort of measurable difference.

cherrywoods:

For arrays you can also check the size limit in encode<T : Encodable>(_: T...), but when I'm implementing something like this, I always thing about what will happen if someone calls encode(to:) on an array directly in another implementation of encode(to:), for example, because one only wants to encode just this array and isn't aware of the .singleValueContainer method, or does not know why to use it, etc. This also came up quite regularly, when array used to request super encoders and call encode(to:) directly (back in May 2017).
To me it seems like one should never call encode(to:) directly in encode(to:) because it keeps the encoder from doing one of it's essential works. If I'm right about this, it would be good to have some documentation about this, maybe direly at encode(to:).

Yes, this is one of the reasons you shouldn't call encode(to:) directly on values but instead allow the encoder to intercept them (you can accidentally bypass encoding strategies this way too). It might be worth filing a Radar to track updating the documentation to reflect that it shouldn't be called directly.

itaiferber · April 7, 2018, 7:39pm

Unfortunately, we cannot make this change even for, or before, Swift 5. There are plenty of manually-written implementations out there (not to mention actual Encoder implementations), and we are past the point of being willing to make source-breaking changes to APIs without significant evidence of harm.

Normally, if we were to make changes, we would deprecate old API and introduce new API; however, because of the details of try and throwing in Swift, it is not possible to overload methods based on whether they throw, so we couldn't have a throwing version and a non-throwing version simultaneously.

The API contract requires you to support all kinds of containers; if you cannot support them (even by accepting contents through them and, say, translating values into dictionaries), then you're not meeting the API contract, and it's likely that your format is not a good match for the Codable, unfortunately.

Yes — to clarify, I meant that requesting containers can (and should) be relatively cheap.

cherrywoods · April 7, 2018, 7:56pm

Is there some document where one can find all those API descriptions or do I have to extract that from the space between the lines in Encoders documentation?

I had quite a hard time to understand the concept behind Codable because I didn‘t find anything realy usefull on google and ended up just reading JSONEncoder. And as it seems, I am still missing some of the concept.

tonyfreeman · April 7, 2018, 9:03pm

That's unfortunate. For me, being forced to hackaround with ErrorContainer is a significant harm. I don't have any more evidence.

They won't be affected as you can implement throwing function as a non-throwing.

Swift 5 already have a bunch of breaking changes that will produce compile error. And fixing this one is simple, we already have compiler fix-it to insert try.

krilnon · April 7, 2018, 9:12pm

Probably a bugs.swift.org issue for Encodable.encode(to:), right? Since those are in-source.

tonyfreeman · April 7, 2018, 10:40pm

It would be nice to see the implementations and formats behind this statement (apart from the Foundation). Because even for a json (including Foundation.JSONEncoder) requesing an unkeyed container should produce empty array: { items: [] }, it isn't a no-op, it's at least 9 bytes in the output stream. I'm going to search on github.

I did a quick research: first 3/4 packages have this issue:
https://github.com/IBM-Swift/Swift-Kuery-ORM/blob/master/Sources/SwiftKueryORM/DatabaseEncoder.swift#L64-L66 - could throw in unkeyed/keyed calls.

https://github.com/alickbass/CodableFirebase/blob/master/CodableFirebase/Encoder.swift#L56 -
https://github.com/ShawnMoore/XMLParsing/blob/master/XMLParsing/XML/Encoder/XMLEncoder.swift#L306 -

turned out, the last two are actualy inherited this from the Foundation
https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/JSONEncoder.swift#L217 -

I understand why overflow or force unwrap could crash, but.. It's actually doesn't matter if the Foundaion's implementation prefer to crash, but as you can see it spreads out. And we wouldn't have this copy/pasted failures if encoder could throw. (well, that's not an argument)

But ok, let's assume it's a no-op, then Decoder should simply return optional UnkeyedDecodingContainer? if the container not found or does have incompatible type. It shouldn't throw either, because it's also a no-op. What could go wrong? We either have a container or not.

cherrywoods · April 8, 2018, 8:35am

See container(keyedBy:) | Apple Developer Documentation

Discussion

You must use only one kind of top-level encoding container. This method must not be called after a call to unkeyedContainer() or after encoding a value through a call to singleValueContainer()

So you should actually not throw an error there, because it is not a valid use of the API.
Using preconditionFailure is the right way to go in this case, because this is just not a case that could occur, it's just a programming error.

I didn't quite get why

should throw. This implementation just don't seems to be finished.

Is this "just" a code style and complexity consideration? I tried to imagine a scenario where using an ErrorContainer truly causes problems, e.g. because some information seemed to be validly encoded, e.g. because an empty Array requested an unkeyed container and an error occurs while this container is created and therefor an ErrorContainer returned, but the array never calls encode, but on the decoding side some undefined problem arrises nobody knows about, because array can not request an unkeyed container during decoding because there is none.
The thing here is that this array will has gone through some encode method from a container of from some top level API like JSONEncoder. You would need to check there that the container isn't an ErrorContainer and could throw there, so though it isn't straight forward, it still is possible.

I see the asymmetry in Encoder and Decoder here. I tend to see the encoding process separated into two steps:

Converting Codable instances to some well processable intermediate format
Encoding the intermediate format to the final format

This works very well with the current API and you can throw your errors in the second step when the intermediate format violates the conditions of the final format.
During decoding this process is turned up side down and you can't tell if the intermediate format you decoded does not match the type hierarchy in swift one wants to decode, you can't tell this before the now second step that involves the decoder is running and here it is more "realistic" that a container does not match the requested container type and also pretty much required for types like Dictionary that either encode as keyed or unkeyed container.

This layout isn't suited that well if you are encoding to a stream and this causes the problems there and makes implementations more complicated, but if there is a second step, it seems not only for me to be unreasonable to throw such errors there, so after all I'm not questioning this design that encoding container methods do not throw because there seem to be good reasons for that, though I still don't know the whole concept.

tonyfreeman · April 8, 2018, 11:26am

Yes.

tonyfreeman · April 8, 2018, 12:44pm

The real problem is that we don't have anything else to encode/decode things automaticaly. And this protocols were initially designed for Foundation (backing JSONCoders and PListCoders) but then were moved to Swift so we could use them in another scenarios, but they're still more like a Foundation thing, not Swift. Actually, I don't even need codingPath: [CodingKey] and userInfo: [CodingUserInfoKey : Any] I don't see a use in general, you can implement this in your Encoder/Decoder. It is an implementation detail.

What I'm saying is that we should look at this protocols like a part of Swift 5 for general use, staying there forever. Imagine Foundation doesn't exist. Are these the protocols we want to freeze for eternity?

If there was a discussion in the core team, and they've decided that userInfo and codingPath stays there, that Encoder protocol should crash instead of throw - I'm ok with it. I just think that this feature hasn't had much attention and the vast majority are just happy with Foundation.JSONEncoder. I'd love to be wrong.

cherrywoods · April 8, 2018, 3:28pm

I support this.

I think these things have a sense. While the coding path is to me mainly a thing about richer error descriptions so you know which instance is causing problems inside code, I have indeed seen an use case for userInfo: If you wan't to detect to which format you are encoding inside encode(to:) and you are using JSONEncoder among others, setting a value in this dictionary is quite necessary to detect JSONEncoder because the actually encoder implementation is hidden. I think that it is good for a format to hide the encoder implementation because it makes calling encode(to:) in the first place impossible.

tonyfreeman · April 8, 2018, 3:57pm

If you mean Encoder's errors - it is an implementation detail, and this information can be put in the error itself. Encodable errors - you're already in control and can figure out on what step you got an error.

This is also a hack, so this will do as well:

if "\(type(of: encoder))" == "_JSONEncoder"

cherrywoods · April 8, 2018, 4:18pm

I mean concretely EncodingError. I'm not sure if this is what you mean with Encodable errors, but in case of EncodingErrors, you are indeed not in control (in understanding of control flow). Sure when you are debugging you know it, but in code when you are using e.g. JSONEncoder you do not directly know where the error comes from because it is passed from one encode(to:) implementation to the next via the method calls inside the encoder (you kind of passed the "control" to the encoder implementation). So you could of course also provide this information yourself there, but only in your own types, not Array, and therefor it is necessary for this in my understanding.

I don't think this is a hack, I got it from there:

It at least works better IMO, because this is an internal name and there is no guarantee that it won't change. If you think about a closed source Encoder with hidden implementation, it will definitely be harder to guess the name.