[Pitch] Safe(r) string literal conversion to UnsafePointer<CChar>, UnsafeRawPointer

Hello, Swift Community!

Pitch

I'd like to make a pitch to improve the safety situation around conversions of string literals into UnsafePointer / UnsafeRawPointer types for interoperability with C APIs and C data types. Concretely, the proposal is to make UnsafePointer and UnsafeRawPointer conform to ExpressibleByStringLiteral, which can improve safety but also help with constant folding of string literals for example when used to initialize global variables that have a C types.

Motivation

See the following code sample:

// C
struct MyStruct1 {
const char *name;
};

// Swift
let x = MyStruct1(name: "abc") // <== (!) warning: cannot pass 'String' to parameter; argument 'name' must be a pointer that outlives the call to 'init(name:)'
print(String(cString: x.name))

The warning is pointing out a serious problem: The string literal converts to a temporary String object, which is then converted to a pointer via .utf8CString, and there's no guarantee that the final pointer points to the global constant string. In fact, in practice under -Onone the code above ends up producing a dangling pointer. Another example of the same problem can be shown on C APIs that stash constant pointers (where the C side expects that they are pointing to global constants):

// C
const char *current_phase = 0;
void set_current_phase(const char *s) { current_phase = s; }
void print_current_phase(void) { printf("%s\n", current_phase); }

// Swift
set_current_phase("Downloading...") // NO WARNING (!)
print_current_phase()

In this case, no warning is produced by the compiler, because we implicitly expect (without having such guarantee) that the C API doesn't stash the pointer and that there's no use of that pointer after the API has returned. Under -Onone, the code ends up using a dangling pointer in practice (and doesn't print the correct string), and ironically, under -O both these examples end up "working" because the optimizer "saves us" by removing the temporary String object.

While the general problem of safely converting a String into UnsafePointer is hard to solve (because of truly dynamically constructed Strings), the special case of passing a string literal directly to a C API or data structure could and should be handled correctly, intuitively and safely in Swift.

Proposal

The proposal is to:

  • add the ExpressibleByStringLiteral conformance (and all the related other conformances) to UnsafePointer and UnsafeRawPointer
  • then have the implementation of this conformance directly convert the internal string literal pointer (the "start" Builtin.RawPointer in the _ExpressibleByBuiltinStringLiteral initializer) into UnsafePointer/UnsafeRawPointer

This will mean that in both the code samples listed above, the type checker will prefer the UnsafePointer/UnsafeRawPointer type and because the code involved in that doesn't do any heap allocations / copies, it will guarantee that the pointer passed to the C API or struct is going to be the original constant string pointer.

Note that there are still going to be problematic cases (when not passing a literal directly) that are unaffected by this proposal:

var str = "string" // this is a String
let x = MyStruct1(name: str) // creates a dangling pointer, but produces a warning
set_current_phase(str) // creates a dangling pointer, doesn't produce a warning
set_current_phase("a" + "b") // creates a dangling pointer, doesn't produce a warning

Alternatives

(1) We could instead (or on top of the proposal above) make StaticString, which already is ExpressibleByStringInterpolation, eligible for the string-to-pointer conversion that today only applies to String.

(2) We could instead (or on top of the proposal above) make even String-based string literals always convert to a global constant string pointer, possibly via mandatory optimizations or some other approach in the SIL pipeline.

Thoughts?

4 Likes

What will happen with interoplated string literals? Will this continue to be silently dangling?

set_current_phase("abc \(str)")

Sadly yes, the proposal is trying to be very limited in scope: Only address true literals directly passed to function calls, or directly assigned to an Unsafe(Raw)Pointer variable or member.

This would be more flexible, as StaticStrings can be passed around, stored in collections, etc. So it'd cover more valid cases where you can safely use [what are ultimately] string literals.

The complication is that StaticString doesn't necessarily contain a string - it can contain a single Unicode.Scalar instead. Whether it does or not is known at compile time, so in principle the compiler can produce an error if a scalar version of a StaticString is used in this way. I'm not sure how that'd actually be implemented, though.

1 Like

UnsafePointer<CChar> wouldn't support char8_t, because C2x and C++20 use unsigned char as the underlying type.

This seems like a stopgap attempt to get better performance by constant-initializing strings. Strings aren’t the only thing we know needs constant initialization. There’s been discussion about defining true compile-time evaluation, and I feel like the better approach is to flesh that story out with constant-initialization semantics than adding an ad-hoc solution for strings.

1 Like

There's no performance motivation in this proposal -- the cases I'm describing are already getting optimized under -O where they don't end up doing any runtime operations on the constant strings.

Would you mind elaborating on how would compile-time evaluation / constant-initialization actually solve the safety (and ergonomics) of passing string literals to C APIs and data structures? To me the two seem related, but orthogonal: If we want to maintain source compatibility, the string-to-pointer conversions cannot be changed to always require constant initialization.

Your second alternative might also improve the safety of imported C macros?

For example:

#define SMALL_STRING "0123456789"

is imported as:

public var SMALL_STRING: String { get }

AFAIK, the SMALL_STRING has a constant pointer in C, but this isn't guaranteed in Swift.

Details

An implicit conversion from built-in literal to String will create a _SmallString if possible:

https://github.com/apple/swift/blob/swift-5.8.1-RELEASE/stdlib/public/core/String.swift#L641-L658

An implicit conversion from String to _Pointer will create a temporary Array:

https://github.com/apple/swift/blob/swift-5.8.1-RELEASE/stdlib/public/core/Pointer.swift#L453-L461