Converting byte codes in strings

CTMacUser · July 28, 2024, 5:56am

I'm adapting some C code. The only failures come from converting something that directly from codes:

| Original Code | My Code |
|---------------|---------|
| "\xff\x00\x00\x01" | "\u{FF}\u{00}\u{00}\u{01}" |
| "\x01\x00\x00\xff" | "\u{01}\u{00}\u{00}\u{FF}" |
| "\xff\x00\x00\x02" | "\u{FF}\u{00}\u{00}\u{02}" |
| "\x02\x00\x00\xff" | "\u{02}\u{00}\u{00}\u{FF}" |
| "\xff\x00\x00\x03" | "\u{FF}\u{00}\u{00}\u{03}" |
| "\x03\x00\x00\xff" | "\u{03}\u{00}\u{00}\u{FF}" |
| "\xff\x00\x00\x04" | "\u{FF}\u{00}\u{00}\u{04}" |
| "\x04\x00\x00\xff" | "\u{04}\u{00}\u{00}\u{FF}" |
| "\x40\x51\x4e\x44" | "\u{40}\u{51}\u{4E}\u{44}" |
| "\x44\x4e\x51\x40" | "\u{44}\u{4E}\u{51}\u{40}" |
| "\x40\x51\x4e\x4a" | "\u{40}\u{51}\u{4E}\u{4A}" |
| "\x4a\x4e\x51\x40" | "\u{4A}\u{4E}\u{51}\u{40}" |
| "\x40\x51\x4e\x54" | "\u{40}\u{51}\u{4E}\u{54}" |
| "\x54\x4e\x51\x40" | "\u{54}\u{4E}\u{51}\u{40}" |
| "\x54\xc5" | "\u{54}\u{C5}" |
| "\xc5\x54" | "\u{C5}\u{54}" |
| "\x5a\xa9" | "\u{5A}\u{A9}" |
| "\xa9\x5a" | "\u{A9}\u{5A}" |
| "\x05\xf9\x9d\x03\x4c\x81" | "\u{05}\u{F9}\u{9D}\u{03}\u{4C}\u{81}" |
| "\xfe\xdc\xba\x98\x76\x54\x32\x10" | "\u{FE}\u{DC}\u{BA}\u{98}\u{76}\u{54}\u{32}\u{10}" |
| "\xef\xcd\xab\x89\x67\x45\x23\x01" | "\u{EF}\u{CD}\u{AB}\u{89}\u{67}\u{45}\u{23}\u{01}" |
| "\x01\x23\x45\x67\x89\xab\xcd\xef" | "\u{01}\u{23}\u{45}\u{67}\u{89}\u{AB}\u{CD}\u{EF}" |
| "\x10\x32\x54\x76\x98\xba\xdc\xfe" | "\u{10}\u{32}\u{54}\u{76}\u{98}\u{BA}\u{DC}\u{FE}" |

As I'm typing this, I'm wondering if the problem is UTF-8 conversion when some byte values are in the Latin-1 range, which are now used for UTF encoding. This expands them from one byte to a few bytes, whose results won't match.

Anyway, what's the right way to do this?

michelf · July 28, 2024, 12:58pm

"\xff" in C means one byte with the hexadecimal FF value. "\u{FF}" in Swift means one Unicode code point FF (the character "ÿ"), which when encoded as UTF-8 gives you two bytes: C3 BF.

String in Swift can only hold valid Unicode characters. If you have bytes that are not Unicode, you can use the array type [UInt8] and express it as [0xFF, 0x00, 0x00, 0x01, …]. Byte data like this is also often wrapped in the Data type from Foundation.

CTMacUser · July 28, 2024, 7:17pm

I realized that an hour after posting. It’s weird how writing a problem down well enough for a post sometimes brings new potential solutions.

phoneyDev · July 28, 2024, 10:30pm

It even has a name