Encodable incorrect Json string with URL and Decimal existentials

itaiferber · November 30, 2023, 4:06pm

Yes, with AnyEncodable you need to be careful to ensure the right encoding/decoding strategies are used, but there's an even more fundamental constraint you need to watch out for.

In order to decode a value from data at runtime, information about what type of value to create needs to be present somewhere. For some serialization APIs, that information lives inside of the data itself; for others, that information lives externally (in code, in a schema, etc.). Storing type information in the data itself has the benefit of the consumer not needing to know the type in order to correctly read the data, but with the drawbacks that (1) this type might not be valid for the reader (e.g., it might not exist at runtime), and (2) that the type information can be messed with or corrupted (intentionally or unintentionally).

Codable, for security and interoperability with other consumers, leaves type information out of the encoded data — which means it needs to live somewhere; in this case, it's defined in code as the static type which you request to decode.

The benefit to this is that you can decode data that would be ambiguous otherwise; for example, the value 723052783.047189 in an archive could represent some Double value, but it can also be a Date encoded using its underlying floating-point representation. You can't work backwards from the value to figure out what the encoded type was, but if that type is in the code, this is trivial.

The drawback to this is that if the static type of the value isn't in the code, you can't decode the value at all. And this is the danger with AnyEncodable: by type-erasing the values, it's not always possible to work backwards to decode the data again: your [AnyEncodable] containing only Double values could look identical to an [AnyEncodable] containing only Date values encoded with the .deferredToDate encoding strategy. If you don't know what the types were at encode time, it's highly unlikely that you'll actually know them at decode time.

If you know for certain that you'll never ever need to decode the values (e.g., you're writing something which is an export-only tool by definition), then this isn't something you need to think about. But requirements can change over time, and you may find yourself in a situation where you do actually need to be able to read the data back, and need to find another encoding scheme that allows you to do that.