Reading invalid strings from ByteBuffer

George · May 6, 2021, 6:22pm

I noticed that the ByteBuffer for reading strings are not fallible. This means that I can create an invalid string like so:

let invalidUTF8 = context.channel.allocator.buffer(bytes: [0xc3, 0x28])
let invalidString = String(buffer: invalidUTF8)

Creating a String directly from the array of bytes returns nil (String(bytes: [0xc3, 0x28], encoding: .utf8))

Am I correct in understanding that invalid strings created from byte buffers are unsafe, and that any untrusted input should use something like invalidUTF8.withUnsafeReadableBytes { String(bytes: $0, encoding: .utf8) }?

lukasa · May 6, 2021, 6:51pm

String(bytes:encoding:) is actually a Foundation method, derived from NSString and overlaid onto String. As NIO doesn't have access to Foundation, it doesn't use that method (though for performance reasons it wouldn't anyway).

Instead, NIO uses String(decoding:as:). That method is not failable: it replaces any invalid UTF-8 bytes with the unicode replacement character. Using that method on your bytes produces "\u{fffd}(".

If you explicitly want the behaviour of not parsing invalid UTF8, then yes, you'd need a custom version. I recommend using readBytes instead of withUnsafeReadableBytes (no need to bring pointers into this), but you can use whatever you like.

johannesweiss · May 7, 2021, 7:54am

What Cory says, with the addition that we have readString(length:encoding) -> String? with import FoundationCompat. I also wouldn't recommend going through Foundation's constructor as (on Darwin) it'll give you a String bridged from an NSString which will then be slower in Swift because NSString stores its bytes in UTF-16 and Swift stores its as UTF-8.

You could in theory do

buffer.readString(length: foo, encoding: .utf8).map { string in
    var string = string
    string.makeContiguousUTF8()
    return string
}

and then you'd pay the conversion overhead only once and not on every operation.