String(validatingUTF8:) doesn't


(Drew Crawford) #1

I have just now for the second time root-caused yet another "nasty nasty UB bug" from yet another developer who got cut on the very sharp edge of this API:

    var result = [40,50,60] as [Int8]
    return String(validatingUTF8: result)

This poorly-named String constructor does not take a Swift array of UTF8 bytes, it takes an UnsafePointer to a C string. When that C string is not null-terminated (as shown here), UB ensues.

I believe *at least* we need a sane name for this constructor like String(validatingUTF8CString:) that vaguely suggests what the programmer can do to avoid UB.

I further believe that this API is just plain bad, but swift-dev disagrees and so in the interests of doing something to stop the bleeding I propose we rename.

Drew


(Vladimir) #2

How at all Swift allows such an implicit conversion from one type to another, at the same time when we must explicitly convert let say Int8 to Int16 ???

I.e. this is not allowed

var i8 : Int8 = 10
var i16 : Int16 = i8

But [Int8] to UnsafePointer<CChar> - no problems.

It is very weird behavior in this case.

Anyone can explain why String(validatingUTF8:) should silently convert [Int8] to UnsafePointer<CChar> ??

···

On 21.04.2016 12:01, Drew Crawford via swift-evolution wrote:

I have just now for the second time root-caused yet another "nasty nasty UB
bug" from yet another developer who got cut on the very sharp edge of this API:

    var result = [40,50,60] as [Int8]
    return String(validatingUTF8: result)

This poorly-named String constructor does not take a Swift array of UTF8
bytes, it takes an UnsafePointer to a C string. When that C string is not
null-terminated (as shown here), UB ensues.

I believe **at least** we need a sane name for this constructor like
String(validatingUTF8CString:) that vaguely suggests what the programmer
can do to avoid UB.

I further believe that this API is just plain bad, but swift-dev disagrees
and so in the interests of doing /something/ to stop the bleeding I propose
we rename.

Drew

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Drew Crawford) #3

CChar is typedefed to Int8 I believe. So the CChar=>Int8 implicit conversion is not surprising.

The conversion from an [Int8] to an UnsafePointer<Int8> is a bit of a sharper edge, although there are cases where it makes sense (e.g. calling to C). Importing C APIs with some kind of decl to opt into array-pointer conversion may make more sense than what we do now.

But independently of the language-level design problem, this is a bad API. Its name does not describe what it does, It causes UB even though the word "unsafe" does not appear anywhere as is our convention, and quite frankly I don't understand why it is so important to have an API to work with CStrings when we could have an API to work with arrays in the standard library instead.

I thought there might be a performance reason, but having read the sourcecode it looks O(N), so I am at a loss as to what problem this solves that an array-based API does not, that justifies the sharp edge of memory-unsafety.

Drew

···

On Apr 21, 2016, at 5:23 AM, Vladimir.S <svabox@gmail.com> wrote:

How at all Swift allows such an implicit conversion from one type to another, at the same time when we must explicitly convert let say Int8 to Int16 ???

I.e. this is not allowed

var i8 : Int8 = 10
var i16 : Int16 = i8

But [Int8] to UnsafePointer<CChar> - no problems.

It is very weird behavior in this case.

Anyone can explain why String(validatingUTF8:) should silently convert [Int8] to UnsafePointer<CChar> ??

On 21.04.2016 12:01, Drew Crawford via swift-evolution wrote:

I have just now for the second time root-caused yet another "nasty nasty UB
bug" from yet another developer who got cut on the very sharp edge of this API:

   var result = [40,50,60] as [Int8]
   return String(validatingUTF8: result)

This poorly-named String constructor does not take a Swift array of UTF8
bytes, it takes an UnsafePointer to a C string. When that C string is not
null-terminated (as shown here), UB ensues.

I believe **at least** we need a sane name for this constructor like
String(validatingUTF8CString:) that vaguely suggests what the programmer
can do to avoid UB.

I further believe that this API is just plain bad, but swift-dev disagrees
and so in the interests of doing /something/ to stop the bleeding I propose
we rename.

Drew

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution