[Pitch] Remove transparent bridging between Swift strings and char *

Hi All.

In the spirit of Chris’ focus on Swift 3 message…

I’ve been working on calling C code that takes “const char * const *” arguments, and it ain’t easy, but that can be left for a future proposal…

What does surprise me is that Swift String bridges directly into “char *” arguments in C as nul-terminated C strings, apparently preserving unicode and all. I can find nothing on bridging to “char *” in “Using Swift with Cocoa and Objective-C"

In the spirit of preventing you from hurting yourself, I think this functionality should be removed, forcing you to use cString(using:) first.

-Kenny

I think it is too useful for the C interop that it would not be
feasible for it to be removed completely. One tweak that I think we
should consider making is removing this implicit conversion when
calling Swift code, and only leave it for calling imported functions.
The reasoning is that Swift code should not be using
UnsafePointer<UInt8> to pass strings around.

We might need to leave an escape hatch (an underscored attribute) to
opt into this behavior for the overlays though.

Dmitri

···

On Wed, Jun 22, 2016 at 9:37 AM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

Hi All.

In the spirit of Chris’ focus on Swift 3 message…

I’ve been working on calling C code that takes “const char * const *” arguments, and it ain’t easy, but that can be left for a future proposal…

What does surprise me is that Swift String bridges directly into “char *” arguments in C as nul-terminated C strings, apparently preserving unicode and all. I can find nothing on bridging to “char *” in “Using Swift with Cocoa and Objective-C"

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

What does surprise me is that Swift String bridges directly into “char *” arguments in C as nul-terminated C strings, apparently preserving unicode and all. I can find nothing on bridging to “char *” in “Using Swift with Cocoa and Objective-C"

* Using Swift with Cocoa and Objective-C > Interoperability > Interacting with C APIs > Pointers > Constant Pointers:

"When a function is declared as taking a UnsafePointer<Type> argument, it can accept [...] A String value, if Type is Int8 or UInt8. The string will automatically be converted to UTF8 in a buffer, and a pointer to that buffer is passed to the function."

In the spirit of preventing you from hurting yourself, I think this functionality should be removed, forcing you to use cString(using:) first.

Do you mean the encoding should always be given, instead of using UTF-8 by default? I think the no-argument -[NSString cString] method was deprecated for this reason?

-- Ben

···

On 22 Jun 2016, at 17:37, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

I've actually enjoyed this hidden feature on several occasions. It nicely allows you to interact with C APIs such as:

system("rm -rf ~/*")

Could you please elaborate a bit on the "hurting yourself" part? Do you mean e.g. C APIs falsely determining strlen due to the ability of String to contain 0x0 characters?

···

On Jun 22, 2016, at 6:37 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

Hi All.

In the spirit of Chris’ focus on Swift 3 message…

I’ve been working on calling C code that takes “const char * const *” arguments, and it ain’t easy, but that can be left for a future proposal…

What does surprise me is that Swift String bridges directly into “char *” arguments in C as nul-terminated C strings, apparently preserving unicode and all. I can find nothing on bridging to “char *” in “Using Swift with Cocoa and Objective-C"

In the spirit of preventing you from hurting yourself, I think this functionality should be removed, forcing you to use cString(using:) first.

-Kenny

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Thanks! Missed that.

I think it would be OK if the transparent bridging called cString(using:) itself, and the whole thing would fail if the conversion fails.

-Kenny

···

On Jun 22, 2016, at 11:52 AM, Ben Rimmington <me@benrimmington.com> wrote:

On 22 Jun 2016, at 17:37, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

What does surprise me is that Swift String bridges directly into “char *” arguments in C as nul-terminated C strings, apparently preserving unicode and all. I can find nothing on bridging to “char *” in “Using Swift with Cocoa and Objective-C"

* Using Swift with Cocoa and Objective-C > Interoperability > Interacting with C APIs > Pointers > Constant Pointers:

"When a function is declared as taking a UnsafePointer<Type> argument, it can accept [...] A String value, if Type is Int8 or UInt8. The string will automatically be converted to UTF8 in a buffer, and a pointer to that buffer is passed to the function."

In the spirit of preventing you from hurting yourself, I think this functionality should be removed, forcing you to use cString(using:) first.

Do you mean the encoding should always be given, instead of using UTF-8 by default? I think the no-argument -[NSString cString] method was deprecated for this reason?

-- Ben

But the cString(using:) method is from Foundation, and using UTF-8 should always succeed in any case.

UTF-8 by default is probably correct for POSIX APIs, but maybe not for Windows APIs (if code pages are expected).

-- Ben

···

On 22 Jun 2016, at 20:02, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

I think it would be OK if the transparent bridging called cString(using:) itself, and the whole thing would fail if the conversion fails.

But the cString(using:) method is from Foundation, and using UTF-8 should always succeed in any case.

I suppose you’re right about that, but now it brings up more confusion in my mind about what’s in Foundation and what’s in the standard library. So would there be extensions on String in (Swift) Foundation that implements cString(encoding:)? What decides what functions are in the standard library and what are in Foundation?

UTF-8 by default is probably correct for POSIX APIs, but maybe not for Windows APIs (if code pages are expected).

I guess it all depends on what you think should be valid for C strings. Based on the documentation, I guess the core team thinks UTF-8 is valid.

-Kenny

···

On Jun 22, 2016, at 12:37 PM, Ben Rimmington <me@benrimmington.com> wrote:

As I understand it, UTF-8 for POSIX APIs is always (usually?) valid on macOS, valid on Linux _IF_ your LANG etc. are set to C or en_US.UTF-8, and of questionable validity on Windows. Is the stdlib possibly converting to the "system encoding" underneath and possibly silently dropping characters it can’t translate when the encoding is not UTF-8? Whichever thing is actually happening, I definitely agree this needs to be documented. I also somewhat agree that forcing an explicit call to a cString accessor is not entirely unreasonable (though it does present a danger of allowing such pointers to escape calls to C APIs, normally you would never see the raw value and there’s no reason to think it remains valid outside the scope of the conversion). Interop with C should be easy, but it shouldn’t be so easy that you can ignore the way the APIs work - dealing in UnsafePointers means thinking hard about memory management, even more so in some ways when doing interop from Swift than in plain C.

That having been said, the encoding issue is the part that concerns me more than the rest, I have no strong opinion either way about changing how the automatic bridging happens. If the encoding semantics were documented (maybe even with examples showing how to do it manually if you need more control for special cases), I’d be satisfied with the status quo.

-- Gwynne Raskind

···

On Jun 22, 2016, at 15:01, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

On Jun 22, 2016, at 12:37 PM, Ben Rimmington <me@benrimmington.com> wrote:
But the cString(using:) method is from Foundation, and using UTF-8 should always succeed in any case.

I suppose you’re right about that, but now it brings up more confusion in my mind about what’s in Foundation and what’s in the standard library. So would there be extensions on String in (Swift) Foundation that implements cString(encoding:)? What decides what functions are in the standard library and what are in Foundation?

UTF-8 by default is probably correct for POSIX APIs, but maybe not for Windows APIs (if code pages are expected).

I guess it all depends on what you think should be valid for C strings. Based on the documentation, I guess the core team thinks UTF-8 is valid.

-Kenny

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution