Can encoding String to Data with utf8 fail?

nick.keets · April 1, 2019, 11:43am

Can this string.data(using: .utf8)! ever fail? Is it safe to use it for any user input?

kirilltitov · April 1, 2019, 12:11pm

Theoretically, yes. In reality, I've never encountered it yet :)

However, if you want to get just bytes from string, you can always do this: let myBytes = [UInt8](myString.utf8) (and then Data(myBytes) if you really need it as Data).

johannesweiss · April 1, 2019, 1:33pm

It can fail, for example with this broken NSString:

import Foundation

let x = "💇‍♀️" as NSString
// this will create a broken string by using NSString's UTF16 offsets which is illegal here as we're splitting within one surrogate pair.
let bad = x.substring(with: NSRange(location: 0, length: 1)).data(using: .utf8)
print(bad.debugDescription) // will print 'nil'

if you want something that can't fail, use String.utf8 which yield a UTF8View which will use Unicode's replacement character in case you have some illegal string. If you want a Data containing the UTF8 bytes from a String, you could use

import Foundation

let x = "💇‍♀️" as NSString
// this will create a broken string by using NSString's UTF16 offsets which is illegal here as we're splitting within one surrogate pair.
let badString = x.substring(with: NSRange(location: 0, length: 1))
print(Data(badString.utf8)) // will print '3 bytes'

which can't fail. The 3 bytes you'll find in there are this replacement character.

YOCKOW · October 22, 2020, 4:20am

(I know this topic is old, but think it should be updated.)

From Swift 5(5.1), string.data(using: .utf8) is the very same with Data(string.utf8) since swift#24215, swift#24239, and swift-corelibs-foundation#2173 were merged.

Therefore, bad.debugDescription in @johannesweiss's first example above is not nil any longer in recent Swift.