Invalid pointer to the result of `cString(using:)`

woolsweater · January 6, 2019, 1:55am

I recently fixed a bug in some code handing off a string to a C library function. The C function takes a plain char * (no const in either position): void use_str(char * s); so we need an UnsafeMutablePointer<CChar> on the Swift side. Our Swift code was basically this:

import Foundation

let s = "abcde"

let sPtr = UnsafeMutablePointer(mutating: s.cString(using: .utf8))

use_str(sPtr)

Running this, if you inspect sPtr and sPtr.pointee before the use_str call, you can see that the pointer is bad: it does not point to the contents of the string.* Until Xcode 10/Swift 4.2, it had consistently pointed to zeroed-out memory and the original implementer "worked around" what they thought was a bug in the C library. When we upgraded, the pointed-to values started being garbage, and there was breakage.

Storing the C string in a local variable first, var sChars = s.cString(using: .utf8)!, fixes the issue. (That done, we can apparently also skip the explicit call to UnsafeMutablePointer(mutating:) and just write use_str(&sChars).)

So it looks like the return value of cString(using:) is invalid immediately unless explicitly copied. Shouldn't it live as long as the String that produced it? If I understand correctly what's happening, this ObjC is equivalent to the original code:

NSString * s = @"abcde";
use_str([s cStringUsingEncoding:NSUTF8StringEncoding]);

which is perfectly valid, as far as I know (use_str may need to copy the bytes, of course, but that's a separate issue).

Alternatively, can/should there be a warning about UnsafeMutablePointer(mutating:) pointing directly to the result of a method call like this? Maybe our original Swift code is more equivalent instead to this, taking the address of a message send expression, which is illegal:

void update_char(char * c);

//...

update_char(&[s characterAtIndex:0]);

Please help me correct errors in my understanding; I'm trying to better grasp the situation/how Swift pointers operate. We should have caught this bug in our code, but I'm not sure why it occurred in the first place.

*In fact if you add another string/pointer pair, in Swift 4.2 the two pointers consistently hold the same address!

ASwiftUser · January 6, 2019, 2:22am

I am not sure if string.cString(using: .utf8) returns a pointer such that ptr[string.utf8.count] == 0. Here is what I do:

    func editPtr(ptr: UnsafePointer<CChar>, count: Int) -> UnsafeMutablePointer<CChar> {
      var r = UnsafeMutablePointer<CChar>.allocate(capacity: count + 1)
      for index in 0 ..< count { r[index] = ptr[index] }
      r[count] = 0
      return r
    }

This may be do to the fact that they did not null terminate the C String but I would not know why.

woolsweater · January 6, 2019, 2:43am

It does. This is documented, and it wouldn't be a C string if it didn't. But nul-termination is not the issue; the issue is at the "other end".

Martin · January 6, 2019, 3:16am

Swift.cString(using:) is implemented in NSStringAPI.swift, it returns an array which is created in _persistCString(). So my guess would be that in

let sPtr = UnsafeMutablePointer(mutating: s.cString(using: .utf8))

that array is valid only in the scope of the function call, and the pointer becomes invalid as soon as the function returns, similarly as in this example:

let p1 = UnsafePointer( [1, 2, 3, 4].map { $0 + 1 })
let p2 = UnsafePointer( [5, 6, 7, 8].map { $0 + 1 })
print(p1.pointee) // 24415
print(p2.pointee) // 24415

An alternative to call the C function would be

s.withCString { sPtr in
    use_str(UnsafeMutablePointer(mutating: sPtr))
}

Another alternative would be to define a wrapper in the bridging header file

static void use_str_const(const char *s) { use_str((char *)s); }

and then simply call use_str_const(s) in Swift.

woolsweater · January 6, 2019, 3:50am

Ah, thank you for that pointer to the source! That seems to be the explanation.

Xcode's documentation (carried over from the ObjC version) says:

The returned C string is guaranteed to be valid only until either the receiver is freed, or until the current memory is emptied, whichever occurs first.

where I assume that "the current memory" is referring to the stack frame. (?)

So this seems to me like a bug in either the doc or the implementation.

Nevin · January 6, 2019, 4:21am

I believe it does not. Certainly the resulting C string will terminate in a null byte, but the length of that C string, in general, will not equal the count of the original string.

ASwiftUser · January 6, 2019, 5:02pm

I am confused. Can you explain?