C strings are "zero terminated" (contain zero at the end), e.g. this will do in the above example if you want to switch from CFStringCreateWithBytes to CFStringCreateWithCString:
You've got two errors here, the first is using Int instead of UInt8, and the second how you are passing resulting array into CFStringCreateWithXXX.
Once you get correct UInt8 array:
print(charBytes)
let data = Data(charBytes)
precondition(data == Data([0xA2, 0xD0, 0])) // TODO: remove afterwards
let bytes = (data as NSData).bytes // TODO: remove NSData dependency
let string = CFStringCreateWithCString(nil, bytes, CFStringEncoding(CFStringEncodings.big5.rawValue))
As for the NSData conversion (as in my example) - that's a quick & dirty way to getting bytes out of data, to do it in a modern way you'd want to use withUnsafeBytes on Data.
Edit: I'd not recommend using intermediate Array here, you may construct Data directly without making Array first.
import Foundation
let combinedCode = "A2D0" // Fullwidth Alphabet "B" in Big5.
let string = String(data: combinedCode.hexData!, encoding: .big5)
print(string)
extension String {
var hexData: Data? {
var firstDigit: UInt8?
var data = Data()
for char in self {
guard let hex = char.hexDigitValue else { return nil } // not a hex string
let digit = UInt8(hex)
if let first = firstDigit {
data.append(first * 0x10 + digit)
firstDigit = nil
} else {
firstDigit = digit
}
}
if firstDigit != nil { return nil } // odd hex string
return data
}
}
extension String.Encoding {
static var big5: String.Encoding = {
let cfEncoding = CFStringEncodings.big5
let nsEncoding = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEncoding.rawValue))
let stringEncoding = String.Encoding(rawValue: nsEncoding)
return stringEncoding
}()
}
Edit: The name "hexData" is not quite good, can be easily confused with "make a data with hex representation of a given string" (e.g. to go from "ABC" to 414243 hex string stored in Data), while here you are doing the opposite. Data(hexString: "A2D0") looks better.
I also made mine more universal (i.e. handling any codepage supported by CoreFoundation):
public extension String {
func parsedAsHexLiteral(encoding: CFStringEncodings? = nil) -> String? {
guard count % 2 == 0 else { return nil }
guard range(of: "^[a-fA-F0-9]+$", options: .regularExpression) != nil else { return nil }
let encodingRaw: UInt32 = {
if let encoding = encoding {
return UInt32(encoding.rawValue)
} else {
return CFStringBuiltInEncodings.UTF8.rawValue
}
}()
let charBytesRAW: [Int] = compactMap(\.hexDigitValue)
var charBytes = [UInt8]()
var buffer = 0
charBytesRAW.forEach { neta in
if buffer == 0 {
buffer += neta
} else {
buffer = Int(buffer) * 16
charBytes.append(UInt8(buffer + neta))
buffer = 0
}
}
let data = Data(charBytes)
let dataBytes = data.withUnsafeBytes {
[Int8](UnsafeBufferPointer(start: $0, count: data.count))
}
let string = CFStringCreateWithCString(nil, dataBytes, CFStringEncoding(encodingRaw))
if let string = string {
return string as String
}
return nil
}
}
Update: I managed to remove NSData dependency. However, it looks like the usage of withUnsafeBytes needs upgrade. At this moment I still can't figure out how to do it.
Good call – NSData's bytes is quite problematic in swift:
NSData bytes is known to be valid until the NSData object is deallocated and exact point in time when this can happen is quite liberal in current Swift - e.g. it can happen right after the last usage of "data" variable ("data.bytes" in this example) which could immediately result in bytes memory invalid / reused for something else – unless of course you keep the reference to NSData object long enough:
let string = CFStringCreateWithCString(nil, data.bytes, CFStringEncoding(CFStringEncodings.big5.rawValue))
// use NSData object somehow to make sure it is valid till this point
// make sure this "usage" is not optimised away (release builds, etc)
Or just switch to a safer API like below (or even safer API – String(data:encoding:) – as suggested before.)
Indeed, like so:
let cfString = data.withUnsafeBytes { p in
CFStringCreateWithBytes(nil, p.baseAddress!, data.count, CFStringEncoding(encodingRaw), false)
}
(or a similar CFStringCreateWithCString usage in which case the data has to be zero terminated as was discussed previously).
Where are you getting that hex sting from, or, in other words, why do you use this form to begin with? It's quite unusual to store text in such a form. Is this due to ascii compatibility?
Zonble: I know some traditional Bopomofo users who are familiar with Big5 code input method (內碼輸入法) since the DOS era. They use traditional Bopomofo to input Hanzi but Big5 code for punctuations and symbols. Using big5 code has already become a muscle memory. The feature is for such users.