Why does my JSON response fail to decode on iOS 15 but succeed on iOS 17?

I am seeing an issue where the exact same JSON payload is failing to decode on my iPhone SE (iOS 15.7), but succeeding on my iPhone 15 Pro (iOS 17.6.1).
I am using a default JSONDecoder with no extra configurations.

On both of my devices, I tried inspecting the payload by printing it's contents using different encodings. They both produced the same results:
String(data: data, encoding: .utf8) returned nil.
String(data: data, encoding: .utf16) returned garbage data.
String(data: data, encoding: .ascii) returned readable ascii.

Upon inspecting the ascii output, I noticed there were some weird characters in the payload; one of the values in the JSON is O–³À. I assume this is the cause of my problems.

I would like to know: why is this happening? Did something change in JSONDecoder since iOS 15?

1 Like

Yes, iOS 17 brought a newly rewritten decoder as part of the Foundation rewrite. It may have different handling, but as long as it’s UTF-8 it’s valid JSON.

3 Likes

That reminds me of a question. I use Codable to persist data in my app. Does anyone know if it's guaranteed encoding format won't change (so that old release and new release don't have data interoperability issue)? I haven't thought out an approach if it happens.

(FWIW, the encoding I asked about is the serialization format of Codable, not related to the characterset encoding the OP asked about.)

JSON requires input data encoded in UTF-8, so I expect this should work at least. It's unlikely that character encoding would be affected by JSON encoder/decoder change. Are you sure you used the right data?

1 Like

Oh, that was just me debugging and trying to inspect the payload to see what was wrong with it. UTF8 typically works when I want to inspect the payload, but in this case it didn't, so I thought it may be relevant to the decoding issue.

I am also sure I used the correct data. I ended up extracting that data and performed some basic hardcoded tests, testing the string encoding in addition to the JSON decoding.
This exact same test failed at the JSON decoding part on iOS 15 but succeeded on iOS 17. The string encoding produced the same results on both devices.

1 Like

The current JSONDecoder implementation doesn't assume the encoding is utf8. swift-foundation/Sources/FoundationEssentials/JSON/JSONDecoder.swift at 2f4f5b80cfb4cd4540e1e75bc07dff4a54a5ea47 · swiftlang/swift-foundation · GitHub

The old one probably assumed utf8 and failed if it wasn't utf8.

3 Likes

I wonder how one specifies the encoding? I don’t think JSONDecoder.decode has an argument for it.

1 Like

JSON is defined as UTF-8, so no other encoding is valid. Seems like the new parser needs to do some validation.

2 Likes

Drive-by comment which is probably not relevant.

Whilst I wish this were true, I don't think it is.

RFC 4627 has to say

  1. Encoding

JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.

Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.

      00 00 00 xx  UTF-32BE
      00 xx 00 xx  UTF-16BE
      xx 00 00 00  UTF-32LE
      xx 00 xx 00  UTF-16LE
      xx xx xx xx  UTF-8

So UTF-8 is the default and a reasonable assumption, but not the only option.

3 Likes

That RFC is obsolete for JSON. The current one is RFC8259, which states:

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

3 Likes

Oh nice one, TIL! Thanks!

3 Likes

It's not obsolete for being able to decode JSON that was valid based on the latest RFCs at the time of creating the library that encoded it.

I've also linked the exact part of the code that is doing the decoding.

3 Likes

Sure. I'm surprised the previous JSONSerialization based version wasn't similarly flexible, as JSONSerialization is pretty old. Technically I'm not sure JSONDecoder is 8259 compliant if it supports more encodings, but it probably doesn't matter as long as JSONEncoder is always UTF-8.

1 Like

@KrimpusChrome I have an idea on how to get the minimal character(s) to reproduce the issue. It's a binary search like approach. The steps:

  1. Decode the data on iOS 17.6.1 to get the string
  2. Split the string in half and encode each of them on iOS 17.6.1
  3. Send the encoded data to iOS 15.7 and try to decode both of them
    a) If one succeeds and on fails, repeat the steps using the failed data
    b) If both succeeds, the current data is the minimal data
    c) If both fails (is this possible?), select any one and repeat
  4. Now that you get the minimal data, you can find its string in step 2

More experiments after you get the minimal string:

  • Encode the string on iOS 15.7 and compare the data with that on iOS 17.6.1
  • Encode it on Linux and compare the result

I suppose if you have a small string to reproduce the issue, it's more likely to get help from Apple forum or here. Just my thoughts.

1 Like

Okay, so I feel a little silly here. I forgot to mention the critical point that I am not decoding every property in the JSON payload.

If I try to decode the problematic value, it fails on both iOS 15 and 17. If I ignore that value, it fails on iOS 15 and succeeds on iOS 17.

I have a minimal reproducible example below:

struct JSON1: Decodable {
    let device_type: String?
    let wifi_ssid: String?
}

struct JSON2: Decodable {
    let wifi_ssid: String?
}

func test() {
    // {"device_type":"O–³À","wifi_ssid":"test"}
    // PROBLEMATIC BYTES: 0x4F, 0x96, 0xB3, 0xC0
    let bytes: [UInt8] = [0x7B, 0x22, 0x64, 0x65, 0x76, 0x69, 0x63, 0x65, 0x5F, 0x74, 0x79, 0x70, 0x65, 0x22, 0x3A, 0x22, 0x4F, 0x96, 0xB3, 0xC0, 0x22, 0x2C, 0x22, 0x77, 0x69, 0x66, 0x69, 0x5F, 0x73, 0x73, 0x69, 0x64, 0x22, 0x3A, 0x22, 0x74, 0x65, 0x73, 0x74, 0x22, 0x7D]
    let data = Data(bytes)
    print(String(data: data, encoding: .utf8)) // Nil on all versions of iOS
    print(try? JSONDecoder().decode(JSON1.self, from: data)) // iOS 15.7: Fails to decode | iOS 17.6.1: Fails to decode
    print(try? JSONDecoder().decode(JSON2.self, from: data)) // iOS 15.7: Fails to decode | iOS 17.6.1: Succeeds in decoding
}

I suspect the data is invalid. That's why String(data: data, encoding: .utf8) fails on both phones. I save the data to a file and run file command to show its encoding. It fails to detect the encoding.

 % file -bI test.dat
application/json; charset=unknown-8bit

% cat test.dat
{"device_type":"O???","wifi_ssid":"test"}

So, how do you generate the data?

I didn't expect JSONDecoder().decode(JSON2.self, from: data) could succeed. But now that I know it, I believe it must be because JSONDecoder().decode() uses internal API, instead of String(data:, encoding:). (Note that the above cat command can show most part of the file except those invalid characters).

1 Like

If you look at the link above about the code, it gets the byte array and checks the first 4 bytes for encoding. If it is utf8, it just keeps using that byte array. And then it uses JSONScanner or JSON5Scanner to find the bytes of interest to decode. 0x96 after 0x4F is an invalid byte in utf8, so when it wants to parse that into a String, it fails. Presumably the old way was to just decode everything into an NSDictionary first, making the decoding fail if there was any invalid JSON, even in the parts we don't care about.

Here is a table about what's valid in utf8: UTF-8 - Wikipedia

This is completely RFC compliant according to the RFC link above:

A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.

2 Likes

I suspect you are getting latin1 or windows1250 or similar encoded json. You can decode it into that and then get utf8 bytes from it to feed it to JSONDecoder. But optimally you would just tell the devs providing the API to get their shit together and send utf8 json.

1 Like