Encodable incorrect Json string with URL and Decimal existentials

Dmitriy_Ignatyev · November 29, 2023, 5:54pm

Hi. I've met weird behavior trying to encode URL and Decimal values as existentials. When instances of these types are wrapped as any Encodable the result JSON string is incorrect.

Here is the minimal code for playground to reproduce:

let url = URL(string: "www.apple.com")!
let decimal = Decimal(string: "10.1")!

let urlJsonString: String // ok, == "www.apple.com"
let decimalJsonString: String // ok, == "10.1"
do {
  let encoder = JSONEncoder()
  
  let urlData = try! encoder.encode(url)
  let decimalData = try! encoder.encode(decimal)
  
  urlJsonString = String(decoding: urlData, as: UTF8.self)
  decimalJsonString = String(decoding: decimalData, as: UTF8.self)
}

let urlJsonStringAny: String // invalid, == "{"relative":"www.apple.com"}"
let decimalJsonStringAny: String // invalid, == "{"exponent":-1,"length":1,"isCompact":true,"isNegative":false,"mantissa":[101,0,0,0,0,0,0,0]}"
do {
  let encoder = JSONEncoder()
  
  let urlData = try! encoder.encode(AnyEncodable(url as any Encodable))
  let decimalData = try! encoder.encode(AnyEncodable(decimal as any Encodable))
  
  urlJsonStringAny = String(decoding: urlData, as: UTF8.self)
  decimalJsonStringAny = String(decoding: decimalData, as: UTF8.self)
}

public struct AnyEncodable: Encodable {
  private let encodable: any Encodable

  public init(_ encodable: any Encodable) {
    self.encodable = encodable
  }

  public func encode(to encoder: any Encoder) throws {
    try encodable.encode(to: encoder)
  }
}

Existential values give "{"relative":"www.apple.com"}" for url instance and "{"exponent":-1,"length":1,"isCompact":true,"isNegative":false,"mantissa":[101,0,0,0,0,0,0,0]}" for decimal instance.

Seems it is a bug, but may be I'm missing something.

itaiferber · November 29, 2023, 6:25pm

This isn't a bug, but expected behavior because of how AnyEncodable encodes the underlying value; specifically, because the contents of encodable are being encoded directly into the Encoder, the Encoder never sees the outer type, and can't intercept with strategies that apply in those instances.

e.g., when encodable is a URL, this is calling URL.encode(to: encoder) directly, and control passes to URL's method; the encoder itself never sees URL, so it can't apply its encoding strategy.

There are a few prior threads on the forums here that go into more detail:

(among others)

Dmitriy_Ignatyev · November 29, 2023, 6:50pm

Thanks for explanation and useful links. While I understand these arguments and can accept the fact it is expected behavior (because this behavior is known), it is still not obvious and is feeled like a broken abstraction.
When I use existential, I expect that all instances will behave like underlying value.
When I abstract an instance of cat which can "meow", I expect that instance of any Cat will also "meow". I don't expect that calling func doSomeSound() internals of the cat will be returned instead of "meow".

If there is one implementation of encode function, then according to rules of method dispatch the same method should be called no matter it is existential or not.
So one question to clarify. How are Decimal internals like "{"exponent":-1,"length":1,"isCompact":true,"isNegative":false,"mantissa":[101,0,0,0,0,0,0,0]}" occurs in the final Json string? I mean is there two encode() methods, or some internal checks are done...

itaiferber · November 29, 2023, 7:03pm

To be clear here, the behavior you're seeing here is not the difference between

struct S1: Encodable {
    let decimal = Decimal()
    // ...
}

struct S2: Encodable {
    let decimal: any Encodable = Decimal()
    // ...
}

but the difference between encoding decimal using a container vs. not:

func encode(to encoder: any Encoder) throws {
    let container = try encoder.singleValueContainer()

    // Calls `encode<T>(_: T)`, and `encoder` is _aware_ it is
    // encoding a `T` — it can inspect `T` _before_ calling
    // `T.encode(to:)`.
    try container.encode(decimal)
}

func encode(to encoder: any Encoder) throws {
    // Control passes directly to `Decimal.encode(to:)`.
    // `Decimal` requests a keyed container and encodes
    // properties into it directly; `encoder` never sees 
    // `Decimal` at all.
    try decimal.encode(to: encoder)
}

The first two structs produce the same output; it's the implementation of encode(to:) that matters, because there's a significant semantic difference between them. The container method hands off the entire value to the Encoder for processing; the second never gives Encoder a chance to intercept.

tera · November 29, 2023, 7:28pm

Why do you need AnyEncodable?

This gives correct result with any Encodable:

let url = URL(string: "www.apple.com")!
let decimal = Decimal(string: "10.1")!
do {
    let urlAny: any Encodable = url
    let decimalAny: any Encodable = decimal
    let urlAnyData = try! JSONEncoder().encode(urlAny)
    let decimalAnyData = try! JSONEncoder().encode(decimalAny)
    print(String(data: urlAnyData, encoding: .utf8)!)
    print(String(data: decimalAnyData, encoding: .utf8)!)
}

Also this (if make "ANyEncodable.encodable" non private), but it shows that AnyEncodable is not needed:

do {
    let urlData = try! JSONEncoder().encode(AnyEncodable(url).encodable)
    let decimalData = try! JSONEncoder().encode(AnyEncodable(decimal).encodable)
    print(String(data: urlData, encoding: .utf8)!)
    print(String(data: decimalData, encoding: .utf8)!)
}

You may also consider this version:

do {
    let urlData = AnyEncodableHolder(url).data(encodedWith: JSONEncoder())
    let decimalData = AnyEncodableHolder(decimal).data(encodedWith: JSONEncoder())
    print(String(data: urlData, encoding: .utf8)!)
    print(String(data: decimalData, encoding: .utf8)!)
}

with:

struct AnyEncodableHolder {
    let encodable: any Encodable
    init(_ encodable: any Encodable) {
        self.encodable = encodable
    }
    func data<Encoder: TopLevelEncoder>(encodedWith encoder: Encoder) -> Data where Encoder.Output == Data {
        try! encoder.encode(encodable)
    }
}

Dmitriy_Ignatyev · November 29, 2023, 7:38pm

There is input dictionary of type [String: any Encodable]. As it doesn't conform to Encodable, all values are firstly wrapped by AnyEncodable giving [String: AnyEncodable].
The [String: AnyEncodable] dict then encoded to json.

taylorswift · November 29, 2023, 7:46pm

may i suggest wrapping the dictionary instead of wrapping the existentials? i’ve found that dictionaries usually need some custom encoding logic anyway, to use consistent key order.

Dmitriy_Ignatyev · November 29, 2023, 8:15pm

I have no need in custom encoding logic, but anyway I'm interested to hear some advices on this topic and how can [String: any Encodable] be wrapped in another way.
One thing I was thinking about is performance – mapping dictionary values is heavier operation than casting the whole dictionary as Encodable. But I didn't found the way to resolve 'any Encodable doesn't conform to Encodable' error except wrapping of all values in a AnyEncodable struct.

taylorswift · November 29, 2023, 8:21pm

your problem (which i’m sure you’re already aware) is that Dictionary<String, any Encodable> does not conform to Encodable, because its conformance is conditional on Value:Encodable. for a variety of reasons, you can’t give Dictionary a second conditional conformance, so you need to wrap it in a Dictionoid like

@frozen public 
struct DictionaryOfAnyEncodable
{
    @usableFromInline internal 
    var base:[String: any Encodable]
}

then, you are free to conform DictionaryOfAnyEncodable to Encodable by implementing encode(to:) and dispatching through the existential values the way others in this thread have suggested.

there is no need to map the dictionary values. the wrapper struct is a transparent abstraction whose only purpose is to work around a limitation of the type system.

Dmitriy_Ignatyev · November 29, 2023, 8:31pm

Thanks for explanation, this is what I need.

itaiferber · November 29, 2023, 10:03pm

Just a general note on AnyEncodable, in case you didn't come across it in the linked threads: one of the main reasons we didn't offer AnyEncodable in the stdlib is that by design, it's not possible to offer an equivalent AnyDecodable, since type information does not live in the produced data.

It's easy to fall into the trap of being able to encode arbitrary data without realizing that you haven't left enough information in an archive to be able to correctly decode it back.

Do you have any decode requirements, or are you looking to exclusively encode data?

Dmitriy_Ignatyev · November 30, 2023, 8:50am

I have read the provided links and your other posts, but it wouldn't hurt to get another explanation.
The thesis about data loss is clear – if it is needed to pass data (e.g. Decimal) with full precision and decode it back without precision loss then we can choose another encoding / decoding strategy.
In my task only encoding is needed, but I've additionally added decoding unit tests for future compatibility. Now everything work fine. Just to mention – I've met several errors while running unit tests, and one of them is described in this post. I express my gratitude for the help and detailed explanations

itaiferber · November 30, 2023, 4:06pm

Yes, with AnyEncodable you need to be careful to ensure the right encoding/decoding strategies are used, but there's an even more fundamental constraint you need to watch out for.

In order to decode a value from data at runtime, information about what type of value to create needs to be present somewhere. For some serialization APIs, that information lives inside of the data itself; for others, that information lives externally (in code, in a schema, etc.). Storing type information in the data itself has the benefit of the consumer not needing to know the type in order to correctly read the data, but with the drawbacks that (1) this type might not be valid for the reader (e.g., it might not exist at runtime), and (2) that the type information can be messed with or corrupted (intentionally or unintentionally).

Codable, for security and interoperability with other consumers, leaves type information out of the encoded data — which means it needs to live somewhere; in this case, it's defined in code as the static type which you request to decode.

The benefit to this is that you can decode data that would be ambiguous otherwise; for example, the value 723052783.047189 in an archive could represent some Double value, but it can also be a Date encoded using its underlying floating-point representation. You can't work backwards from the value to figure out what the encoded type was, but if that type is in the code, this is trivial.

The drawback to this is that if the static type of the value isn't in the code, you can't decode the value at all. And this is the danger with AnyEncodable: by type-erasing the values, it's not always possible to work backwards to decode the data again: your [AnyEncodable] containing only Double values could look identical to an [AnyEncodable] containing only Date values encoded with the .deferredToDate encoding strategy. If you don't know what the types were at encode time, it's highly unlikely that you'll actually know them at decode time.

If you know for certain that you'll never ever need to decode the values (e.g., you're writing something which is an export-only tool by definition), then this isn't something you need to think about. But requirements can change over time, and you may find yourself in a situation where you do actually need to be able to read the data back, and need to find another encoding scheme that allows you to do that.