JSON decode of Unicode escape sequence: cannot handle more than 4 hex digits?

I want to use Unicode escape sequence for emoji character in my JSON text. But it appears JSONDecoder cannot handle escape sequence of more than 4 hex digits. Of all the examples I see online they are all limited to 4 hex digits or less. But emoji can have scalar value that's five hex digits.

Is the JSONDecoder bad or is JSON unicode code escape just not allow more than 4 hex digits?

let anEmoji: Character = "\u{1F468}\u{1F3FB}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F468}\u{1F3FB}"
print(anEmoji)  // prints: 👨🏻‍❤️‍👨🏻


struct Emoji: Decodable {
    let character: String
}

let json = """
    { "character": "\\u1F468\\u1F3FB\\u200D\\u2764\\uFE0F\\u200D\\u1F468\\u1F3FB"}
    """

let decoder = JSONDecoder()
if let data = json.data(using: .utf8), let result = try? decoder.decode(Emoji.self, from: data) {
    print(result)   // expect: 👨🏻‍❤️‍👨🏻, actual: Emoji(character: "὆8ἿB‍❤️‍὆8ἿB")
} else {
    print("No dice!")
}

According to json.org at least, it has to be exactly 4 hex chars.

1 Like

Since some emoji characters have 5 hex digits scalars, so JSON unicode escape sequence cannot express those emoji :frowning:

Hm, looks like splitting the character into utf-16 codepoints and escaping those works, but I'm not sure whether that should work everywhere or it's just Foundation:

import Foundation

let anEmoji: Character = "\u{1F468}\u{1F3FB}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F468}\u{1F3FB}"
let str = "\"" + anEmoji.utf16.map { "\\u" + String($0, radix: 16) }.joined() + "\""

let decoder = JSONDecoder()
if let data = str.data(using: .utf8), let result = try? decoder.decode(String.self, from: data) {
    print(result)   // works
} else {
    print("No dice!")
}

this is part of the JSON specification, string escape sequences always use UTF16

3 Likes

Aha, json unicode escape sequence is "\u" follow by zero padded 4 hex digits in utf16 encoding

Can't you use this?

let json = """
    { "character": "\u{1F468}\u{1F3FB}\u{200D}\u{2764}\u{FE0F}\u{200D}\u{1F468}\u{1F3FB}"}
"""

or even this:

let json = "{ \"character\": \"👨🏻‍❤️‍👨🏻\" }"

It’s not in Swift. The json text is saved to a text file and to be used anywhere as json.

The reason I want to use json Unicode escape sequence instead of actual emoji characters in the file is so the text is plain ascii.

Plain ascii, I see. Then if there's other user content in that JSON, like "René François", or, perhaps "“quotes”", you'll have to convert those to ASCII accordingly: "Ren\u00e9 Fran\u00e7ois", "\u201Cquotes\u201D"

It’s only emoji’s and meta data of each emoji parsed from Unicode website.

e.g. "flag: Côte d’Ivoire" :slight_smile:

Both Xcode and Vim were not able to display the json file with all those emoji characters all over. It’s totally grey. So I went with escape sequence.

A few characters outside ascii maybe ok? I just want to see what I generated while I’m working things out.

VSCode is fine.

If it works for you - sure, go for it.

BTW, the mentioned 👨🏻‍❤️‍👨🏻 is shown correctly in Xcode for me (although not in an old BBEdit version).

Xcode is fine displaying emojis or any Unicode characters. Except it’s not able to display one giant string of emoji json text. I think the very long line is the problem. Same with Vim.

I link my json file in Xcode, click on it and it just a shade of grey. Then I have to use VSCode.

Strange. json files from here open fine for me in Xcode. Maybe your case is more complicated for Xcode to handle.

I look at those json, they are broken up line by line. Mine has no line break. It’s just one huge line.

I just pretty format it so it’s not one giant line.

I see what you are talking about, looks like Xcode bug. I removed line breaks - and indeed I can see some symbols rendered incorrectly - then I insert a single line break before the sequence of incorrect symbols - and they start rendering correctly.

Perhaps you can split your giant line into a fragments of some reasonable length.

Hey, surprise! iPad Playground open my json file just fine, I cannot edit the file, maybe it’s because it’s a resource. But text selection, scroll is smooth and fast.

Maybe it’s built with new infrastructure. I hope iPad playground will grow into the new Xcode with no old baggage.