[Pitch] Restrict pointer conversion to C interoperability

I propose restricting pointer conversions in Swift to the C interop use case.

Constrain Pointer Conversion Feature to C-interoperability

  • Proposal: SE-NNNN Constrain Pointer Conversion Feature to C-interoperability
  • Author: Guillaume Lessard
  • Review Manager: TBD
  • Status: pending
  • Implementation: pending
  • Bugs: rdar://92429037
  • Previous Revision:

Introduction

Currently, a Swift function with a parameter type of UnsafePointer<T> can be called with an Array<T> or an inout T argument. These values are then viewed as an UnsafePointer<T> from within the function body. This has been the case since the beginning of Swift in order to facilitate interoperability with C and Objective-C.

We believe that this feature is too broad, since it works for any function parameter of type UnsafePointer<T>. By restricting this feature to the use cases of C, C++ and Objective-C interopability, we would improve the safety of Swift code, removing many opportunities for inadvertent bounds overruns.

Motivation

Swift allows pointer conversions for any function parameter of type UnsafePointer<T>, including functions written purely in Swift. This leads to the surprising possibility that a called function can access out-of-bounds memory with a called site that hasn't obviously used an unsafe construct. A common case is a simple re-implementation of C's strlen():

func arrlen(_ p: UnsafePointer<CChar>) -> Int {
  var i = 0
  while p[i] != 0 { i += 1 }
  return i
}

var array: [CChar] = [1, 2, 3]
arrlen(array) // probably not 3

This code will read beyond array's heap storage if none of its elements equals 0.

There have been well-documented program bugs and memory corruption issues due to to unintended use of this feature. While the actual type of the parameter contains the word Unsafe, this is still unexpected: Unsafe in a type name means it has manual memory management, not unchecked memory access.

There has been an initial attempt to restrict pointer conversion with the @_nonEphemeral parameter attribute. The intent of the @_nonEphemeral attribute is to warn against inadvertent pointer escapes in Swift code. The attribute must be present for the warning to appear. Unfortunately, this doesn't do enough to prevent inadvertent out-of-bounds memory access.

Proposed solution

We should restrict the pointer conversion feature to only be available for arguments to C, C++ and Objective-C functions. This would prevent inadvertent conversions from arrays to single pointers on one hand, and it would completely prevent the kind of pointer escapes the @_nonEphemeral attribute is meant to warn against.

Pointer conversion should be usable transitively when Swift code implements a shim that calls out to C. We therefore should also add an attribute @forwardedToC (name to be discussed) that enables pointer conversion for Swift function parameters when it is present.

Detailed design

The newly restricted pointer conversion feature would work just as it already does when applied to an argument to a C function call:

  • Arguments of type [T] can be converted to UnsafePointer<T> or UnsafeRawPointer parameters.
  • Mutable arguments of type [T] or inout T can be converted to UnsafeMutablePointer<T> or UnsafeMutableRawPointer parameters. This requires using the & sigil.
  • Arguments of type String can be converted to parameters of types UnsafePointer<UInt8>, UnsafePointer<Int8> or UnsafeRawPointer.

In order to support shims written in Swift that forward their pointer parameters to an underlying C library, we will add a parameter attribute @forwardedToC (naming suggestion welcome). This parameter will be valid only for Unsafe[Mutable]Pointer<T> and Unsaf[Mutable]RawPointer parameters. A swift method overriding an Objective-C method with an UnsafePointer<T> parameter must use the @forwardedToC attribute.

Aspirational feature: The @forwardedToC attribute will only be valid when a Swift function forwards the marked parameter to either (a) a C function, or (b) another Swift function that has a compatible parameter marked with @forwardedToC. Trying to use the attribute on an invalid parameter declaration will be a compiler error.

Variables of function type (e.g. var f: (UnsafePointer<Int>, Int) -> Int) do not distinguish whether they store a Swift function or a C function pointer. As such, function calls made through stored variables will not allow pointer conversion, and will require using the withUnsafePointer(to:) or withUnsafeBytes(of:) functions, their mutable counterparts or their Array equivalents.

Source compatibility

This is a source-breaking change and must occur in conjunction with a new language mode.

ABI stability

This proposal does not affect ABI.

Implications on adoption

This feature will automatically be turned on in a new language mode.

Alternatives considered

We could elect to leave the pointer conversion as is, and add overloads as a "picket fence" around problematic Swift functions. See Swift#42002 for an example of what this alternative looks like. In our opinion this is not a sustainable alternative in the long run.

We could allow pointer conversion for function calls made through variables ("function pointers" in C parlance). This would make the change smaller than proposed here, but it would be too easy to bypass the safety added by this proposal. We could also give the compiler the ability to distinguish Swift closures from C function pointers, but that would be a substantial source break, and generally prevent substituting Swift functions with C functions.

Acknowledgments

John McCall previously initiated a discussion where larger changes to pointer conversions were contemplated, in "Revisiting the pointer conversions". From another angle, pointer conversions were made more lenient in order to better match C semantics in SE-0324. By making implicit pointer conversions usable in a larger range of situations, it has had the side effect of uncovering many unsafe misuses of pointer conversion.

11 Likes

Presumably this would also apply to C++ functions too. Would the implementation of this proposal add @forwardedToC attribute to the imported parameters in the imported function decl by default, or how would you implement it? If you will add @forwardedToC to parameters automatically this will help us simplify implementation on the clang importer side, as we have several places where we're cloning functions and their parameters to generate synthesized Swift functions, so if we don't need to make any adjustments to that code that would be great.

Yes, I pictured that the Clang importer would add the attribute automatically. This should also apply to C++ functions that traffic in pointers.

1 Like

I have mixed feelings about this, but in the (admittedly rare) case where the pointer parameter is in a base class or protocol, can it be inferred onto overrides / implementing methods? Normally I’m against inference at a distance like this, but I dislike the idea of breaking code by improving type information even more.

(I know there are already circumstances where this happens, but that doesn’t mean I want to expand them. Then again, maybe it’s not common enough to matter.)

The type mapping would be the same as currently, but would not occur as frequently, e.g. [T] can be used for UnsafePointer<T>. I don’t think this needs to change how a pointer to a class instance is interpreted in a C++ context.

Not a pointer to a class instance, a pointer in an overridable position:

@interface Base: NSObject
- (void)processValues:(const double *)values count:(size_t)count;
@end
class Sub: Base {
  override func processValues(_ values: UnsafePointer<Double>, count: Int) { … }
}
let sub = Sub()
let base: Base = sub
sub.processValues([1, 2, 3], count: 3) // error?
base.processValues([1, 2, 3], count: 3) // okay

The requested inference is roughly what we do for @objc, just at a parameter level.

The overriden Swift function signature would require the @forwardedToC attribute, in order to be able to match the imported Objective-C signature of the function it overrides.

1 Like

As someone who created the whole DSL to avoid nested calls to withUnsafe*Pointer I'm extremely against this proposal

Existing syntax with ampersand operator already forces programmer to understand they are passing array as a pointer. With this proposal implemented the only thing changing is the fact that now API providers have to explicitly allow passing arrays as pointers in their interfaces. In my opinion this only adds additional friction with no real benefit. There's a reason why type is called "Unsafe".

Re: the example of arrlen which supposedly reimplements strlen
Passing a non-null terminated string to strlen is a programmer error using unsafe interface and ignoring its specification. Language can not protect agains programmer errors, unfortunately. Otherwise we would still have ability to send messages to nil object without any consequences, like objective-c does. Problems like this are solved on library level. If a vendor provides unsafe interface they probably have a good reason to do so and passing the responsibility to consumer of that interface is not always a bad thing

3 Likes

As I understand it, that’s the whole reason behind this proposal: you don’t currently need to use any special syntax to pass an array as a pointer - the compiler will simply translate your “safe” array into an unsafe pointer transparently, even for normal Swift functions.

5 Likes

Right, I see. I would probably be happier to address this implicit behavior instead of completely disabling the feature.

1 Like

This does not disable the feature! It restricts it to C/C++ and Swift wrappers of C/C++ API, which are the entire reason pointer conversions exist in the first place. Relying on pointer conversion in pure Swift API is simply unsafe, and there is no way to fail safely (such as trapping), since bounds information is not preserved. We are also working on addressing memory sharing ergonomics in pure Swift, as laid out in this roadmap.

4 Likes

It's unsafe. It's needed for crucial performance reasons. For example, the only way to pass an array in recursive function without copying the array struct itself on each call is only by passing it as an unsafe pointer. When you have thousands of calls at a time this is extremely valuable tool to have. Real life example is concurrent rendering of scene objects in some scene represented by a tree. Each call is escaping via Dispatch.concurrentPerform, while guaranteed thread safe because each call modifies predefined region of the memory in that given array. In this case passing array as inout argument will result in additional and unnecessary retain/release of underlying _storage is the underlying array buffer, will it not?

I understand that Chris Lattner has literally zero influence on the language nowadays, but let's not forget that one of the swifts original purpose is to be a systems programming language:

  • In the systems programming context, it is important for Swift developers to have low-level opt-in access to something like the C or C++ memory consistency model. This is definitely interesting to push forward, but is orthogonal to this work.

I'd like to have this as an opt-in mechanism via simple operator, not with another @-attribute on the other side of API call.

We are also working on addressing memory sharing ergonomics in pure Swift, as laid out in this roadmap.

This looks great. Lets have buffer views before making this source breaking change. Why this proposal is not a "future direction" of buffer view proposal?

The performant access you want should be done through UnsafeBufferPointer, not UnsafePointer. Dropping bounds information isn't what unlocks performance in the application you're referring to.

This change is predicated on a new language mode (probably Swift 6). This is the same time horizon as buffer views.

5 Likes

Happy to see this get narrowed down.

If you are writing Swift code which relies on this conversion, I would recommend making an explicit overload.

Before:

func doSomething(_ input: UnsafePointer<Int>, count: Int) { 
  // ...
}

doSomething([1, 2, 3], count: 3)

After:

func doSomething(_ input: UnsafePointer<Int>, count: Int) { 
  // ...
}

// >> You can technically drop the 'count' parameter now.

func doSomething(_ input: [Int]) {
  input.withUnsafeBufferPointer { doSomething($0.baseAddress!, count: $0.count) }
}

doSomething([1, 2, 3]) // ✅ works

// >> Or keep it, for source compatibility or prefix processing.

func doSomething(_ input: [Int], count: Int) {
  precondition(count <= input.count) // Couldn't check this before!
  input.withUnsafeBufferPointer { doSomething($0.baseAddress!, count: count) }
}

doSomething([1, 2, 3], count: 3) // ✅ works

// BONUS: The original function, with UnsafePointer parameter, 
//        can now be a private implementation detail.
//        It doesn't need to be exposed as public API any more!

Correct me if I'm wrong, but I don't believe there is anything that can be done now that would be impossible after this change. People may need to add some overloads to regain the lost ergonomics, but I think limiting this implicit conversion is a net win for the language.

3 Likes

That’s right.

3 Likes

I've updated the pitch (both the gist and the first post) to clarify the rules, improve some wordings, and add acknowledgements.

1 Like

I'm already doing exactly that :) Exactly because I was expecting this kind of change in the language

This change is too big of a source break. inout T to UnsafeMutablePointer<T> conversions in pure Swift code are very useful in a wide variety of situations, including async function calls. They are only as "unsafe" as the programmer's inability to follow Swift's rules for memory and pointers.

If this change must pass, then the escape hatch attribute needs to be more general than @forwardedToC. Again, these conversions are very useful in pure Swift code.

+1, this covers a big sharp invisible edge in the language which affects not only end-developers but also library authors who have to internalize magic overload resolution rules.

I am curious what your thoughts are on the source migration story (realizing that's not strictly required for a SE proposal IIRC). Some code will break where the solution is to add @forwardedToC, but other code will break because that code is actually broken and/or unintentionally using pointer conversion.

The standard library needs an async version of withUnsafeMutablePointer before this change is reasonable. It's trivial to create this yourself today via inout T to UnsafeMutablePointer<T> conversion, and lots of Swift projects do.