Pitch: Implicit Pointer Conversion for C Interoperability

Andrew_Trick · August 11, 2021, 8:09pm

Proposal: Latest draft
Authors: Andrew Trick, Pavel Yaskevich
Implementation: apple/swift#37956
Bugs: SR-10246

Introduction

C has special rules for pointer aliasing, for example allowing char * to alias other pointer types, and allowing pointers to signed and unsigned types to alias. The usability of some C APIs relies on the ability to easily cast pointers within the boundaries of those rules. Swift generally disallows typed pointer conversion. See SE-0107 UnsafeRawPointer API. Teaching the Swift compiler to allow pointer conversion within the rules of C when invoking functions imported from C headers will dramatically improve interoperability with no negative impact on type safety.

Motivation

Swift exposes untyped, contiguous byte sequences using UnsafeRawPointer. This completely bypasses thorny strict aliasing rules when encoding and decoding byte streams. However, Swift programmers often need to call into low-level C functions to help implement the encoding. Those C functions commonly expect a char * pointer rather than a void * pointer to the contiguous bytes. Swift does not allow raw pointers to be passed as typed pointers because it can easily introduce undefined behavior.

Calling a C function from Swift that takes a byte sequence as a typed pointer currently requires confusing, ugly, and likely incorrect workarounds. Swift programmers typically reach for UnsafeRawPointer's "memory binding" APIs. Either bindMemory(to:capacity:) or assumingMemoryBound(to:). We regularly see reports from programmers who were blocked while attempting a seemingly trivial task and needed to reach out to Swift experts to understand how to call a simple C API.

Memory binding APIs were never intended for regular Swift programming. Any use of them outside of low-level libraries is a usability bug. Furthermore, observing how the memory binding APIs are commonly used to workaround compiler errors reveals that they are often used incorrectly. And sometimes there is no correct alternative short of copying memory. Swift's model for typed memory was designed to be completely verifiable with a runtime sanitizer. When such a sanitizer is deployed, many of these workarounds will again raise an issue.

Consider using Foundation's OutputStream.write API. The programmer's initial attempt will look like this:

func write(messageData: Data, output: OutputStream) -> Int {
  return messageData.withUnsafeBytes { rawBuffer in
    guard let rawPointer = rawBuffer.baseAddress else { return 0 }
    return output.write(rawPointer, maxLength: rawBuffer.count)
  }
}

The compiler issues an unhelpful error:

error: cannot convert value of type 'UnsafeRawPointer' to expected argument type 'UnsafePointer<UInt8>'

There's no way to make the diagnostic helpful because there's no way to make this conversion generally safe. A determined programmer will eventually figure out how to defeat the compiler's type check by arbitrarily picking either bindMemory or assumingMemoryRebound, both of which require global understanding of how messageData's memory is used to be correct. Now the code may look like this, or worse:

func write(messageData: Data, output: OutputStream) -> Int {
  return messageData.withUnsafeBytes { rawBuffer in
    guard let rawPointer = rawBuffer.baseAddress else { return 0 }
    let bufferPointer = rawPointer.assumingMemoryBound(to: UInt8.self)
    return output.write(bufferPointer, maxLength: rawBuffer.count)
  }
}

This problem crops up regularly in compression and cryptographic APIs. You can see a couple examples from CommonCrypto in the forums: CryptoKit: SHA256 much much slower than CryptoSwift, and withUnsafeBytes Data API confusion

As a generalization of this problem, consider a toy example:

encrypt.h

#include <stddef.h>
 
struct DigestWrapper {
  unsigned char digest[20];
};
 
int computeDigest(unsigned char *output,
                  const unsigned char *input,
                  size_t length);

It should be possible to call computeDigest from Swift as follows:

encrypt.swift

func makeDigest(data: Data, wrapper: inout DigestWrapper) -> Int32 {
    data.withUnsafeBytes { inBytes in
        withUnsafeMutableBytes(of: &wrapper.digest) { outBytes in
            computeDigest(outBytes.baseAddress, inBytes.baseAddress,
                          inBytes.count)
        }
    }
}

Without implicit conversion we need to write something like this instead:

func makeDigest(data: Data, wrapper: inout DigestWrapper) -> Int32 {
    data.withUnsafeBytes { inBytes in
        withUnsafeMutableBytes(of: &wrapper.digest) { outBytes in
            let inPointer =
                inBytes.baseAddress?.assumingMemoryBound(to: UInt8.self)
            let outPointer =
                outBytes.baseAddress?.assumingMemoryBound(to: UInt8.self)
            return computeDigest(outPointer, inPointer, inBytes.count)
        }
    }
}

In some cases, a typed Swift pointer, rather than a raw pointer must be converted to char *. It is always safe to construct a raw pointer from a typed pointer, so the same implicit conversion to C arguments should work for both UnsafePointer<T> and UnsafeRawPointer. A common use case involves a sequence of characters stored a buffer of any type other than CChar that needs to be passed to a C helper that takes char *. The character data may reside in an imported tuple (of any element type) or in a Swift array of UInt8 serving as a byte buffer.

The implicit conversion issue isn't limited to char *. It also comes up when APIs expect signed/unsigned pointer conversion. This has been a problem in practice for Swift programmers calling the mach kernel's task_info API. Wherever the compiler C language's special aliasing rules apply, they should all apply consistently.

The problematic cases that are documented in bug reports and forum posts are just a very small sampling of the issues that we've been made aware of both from direct communication with programmers and by searching Swift code bases for suspicious uses of "bind memory" APIs.

Proposed solution

For imported C functions, allow implicit pointer conversion between pointer types that are allowed to alias according to C language rules:

An raw or typed unsafe pointer, Unsafe[Mutable]RawPointer or
Unsafe[Mutable]Pointer<T1>, will be convertible to a typed
pointer, Unsafe[Mutable]Pointer<T2>, whenever T2 is
[U]Int8. This allows conversion to any pointer type declared in C
as [signed|unsigned] char *.
A typed unsafe pointer, Unsafe[Mutable]Pointer<T1>, will be
convertible to Unsafe[Mutable]Pointer<T2> whenever T1 and T2
are integers that differ only in their signedness.

The conversions automatically apply to any function imported by the compiler frontend that handles the C family of languages. As a consequence, a Swift programmer's initial attempt to call a C, Objective-C, or C++ function will just work in most cases. See the above Motivation section for examples.

This solution does not affect type safety because the C compiler must already assume pointers of either type may alias.

Note that implicit conversion to a const pointer type was implemented when unsafe pointers were introduced. The new conversions extend the existing design. In fact, this extension was anticipated when raw pointers were introduced, but the implementation was deferred until developers had experience using raw pointers.

This solution does not cover C APIs that take function pointers. However, that case is much less common. For function pointer based APIs, its more appropriate to provide a Swift shim around the C API to encapsulate both the workaround for converting the pointer type and the function pointer handling in general.

Detailed design

Implementation of this feature is based on the constraint restriction mechanism also used for other implicit conversions such as pointer/optional conversions. It introduces a new PointerToCPointer restriction kind which is only applied in argument positions when call is referencing an C/ObjC imported declaration and argument is either Unsafe[Mutable]RawPointer or Unsafe[Mutable]Pointer<T> and parameter is a pointer type or an optional (however deep) type wrapping a pointer.

To support new conversion in interaction with optional types e.g. UnsafeRawPointer -> UnsafePointer<UInt8>? new restriction won't be recorded until there are other restrictions left to try (e.g. value-to-optional or optional-to-optional conversions), doing so makes sure that optional promotion or unwrap happens before new implicit conversion is considered.

Note that only conversions between typed signed and unsigned integral
pointers are commutative, conversions from raw pointers are more
restrictive:

Actual Swift Argument	Parameter Imported from C	Is Commutative
`UnsafeRawPointer`	`UnsafePointer<[U]Int8>`	No
`UnsafeMutableRawPointer`	`Unsafe[Mutable]Pointer<[U]Int8>`	No
`UnsafePointer<T>`	`UnsafePointer<[U]Int8>`	No
`UnsafeMutablePointer<T>`	`Unsafe[Mutable]Pointer<[U]Int8>`	No
`UnsafePointer<Int8>`	`UnsafePointer<UInt8>`	Yes
`UnsafePointer<Int16>`	`UnsafePointer<UInt16>`	Yes
`UnsafePointer<Int32>`	`UnsafePointer<UInt32>`	Yes
`UnsafePointer<Int64>`	`UnsafePointer<UInt64>`	Yes
`UnsafeMutablePointer<Int8>`	`Unsafe[Mutable]Pointer<UInt8>`	Yes
`UnsafeMutablePointer<Int16>`	`Unsafe[Mutable]Pointer<UInt16>`	Yes
`UnsafeMutablePointer<Int32>`	`Unsafe[Mutable]Pointer<UInt32>`	Yes
`UnsafeMutablePointer<Int64>`	`Unsafe[Mutable]Pointer<UInt64>`	Yes

Source compatibility

No effect.

In general, adding implicit conversions is not source compatible. But this proposal only adds implicit conversions for function argument types that would already cause an override conflict had they both been part of an overridden function declared in C. Since the new implicit conversions are only applied to functions imported from C, this change cannot introduce any new override conflicts.

Effect on ABI stability

Not applicable. Pointer conversion is entirely handled on the caller side.

Effect on API resilience

Not applicable. Pointer conversion is entirely handled on the caller side.

Alternatives considered

Use C shims to make C APIs more raw-pointer-friendly. In SwiftNIO, the pointer conversion problem was prevalent enough that it made sense to introduce a replacement C APIs taking void * instead of char *. For example: swift-nio/CNIOHTTPParser.h at nio-1.14 · apple/swift-nio · GitHub This is not an obvious workaround, and it it impractical for most developers to introduce shims in their project for C APIs.

Rely on C APIs to be replaced or wrapped with Swift shims. The rate at which programmers run into this interoperability problem is speeding up, not slowing down. Swift continues to be adopted in situations that require interoperability. There are a large number of bespoke C APIs that won't be replaced by Swift APIs in the foreseeable future. If the existing C API is wrapped with a Swift shim, then that only hides the incorrect memory binding workaround rather than fixing it.

Add more implicit conversions to Swift. This would introduce C's legacy pointer aliasing rules into the Swift language. Swift's model for type pointer aliasing should remain simple and robust. Special case aliasing rules that happen to work for common cases are deeply misleading. They introduce complexity in the language definition, implementation, and tooling. These special cases are unnecessary and undesirable for well-designed Swift APIs. Implicit type punning introduces more opportunities for bugs. Special aliasing rules would also penalize performance of pure Swift code. Finally, this would not be a source-compatible change.

Introduce UnsafeRawPointer.withMemoryRebound(to:capacity:). This is a generally useful, although somewhat unsafe API. We also plan to introduce this API, but it isn't a sufficient fix for C interoperability. It only provides yet another ugly and confusing workaround alternative.

Acknowledgments

Thank you to all the patient Swift programmers who have struggled with C interoperability and shared their experience with the Swift team.

Thanks to @eskimo, @lukasa, @jrose, @karl, and @itaiferber for helping those programmers use unsafe pointers while waiting for the language and libraries to be improved.

beccadax · August 12, 2021, 4:02am

I’m slightly worried about legibility—will people understand that these conversions are only available on imported declarations? Perhaps ~~module~~ generated interfaces could include a @cPointer type attribute that can’t actually be written in user source code, but indicates there are extra semantics.

This otherwise sounds like a huge win.

Philippe_Hausler · August 12, 2021, 7:35pm

Will this pose potential ambiguity with inferred types of the "with" closures?

Overall this seems like a good step in the right direction for ergonomics for unsafe things (and that is VERY welcomed coming from someone who has had to wade through that...)

xedin · August 12, 2021, 8:12pm

I don't it would be a problem because most of "with" APIs have either concretely typed parameters or are generic over pointee type and solutions with implicit conversions are going to be always ranked lower than the ones that have none of them.

xedin · August 13, 2021, 3:28am

macOS and Linux toolchains are now available for downloaded if anybody wants to give it a try.

Chris_Lattner3 · August 13, 2021, 4:24am

I don't support a proposal like this. Continuing down the slippery slope of adding more hard coded implicit conversions to the language has a number of problems:

This continues the tact of singling out specific cases that occur in the standard library, instead of defining a unifying principle for what sorts of implicit conversions we want to support in the language. This continues to get us into trouble, e.g. most recently with the CGFloat proposal.
In the case of CGFloat, there was schedule pressure to make something quick and dirty happen for a specific release, but it was well understood that it wasn't a generalized solution that we could carry forward.
This ignores an overarching goal of the Swift language, which aims to have generic features in the language that can be deployed in libraries: allowing the library API authors to decide which policies makes sense for their clients, instead of having the language impose policies itself. Hard coding a bunch of special cases in the compiler makes the language less consistent than having a unifying theory.
This continues the slippery slope we're on and doesn't appear to have an end. Imagine if instead of adding property wrappers to Swift, we added @AppStorage, @Binding @Environment, @State, @StateObject and all the other SwiftUI property wrappers directly to the language. Are we encouraging a world with dozens of implicit conversions hard coded in the type checker?
There are lots of middle level libraries that have exactly the same sorts of requirements as the standard library, and we can't reasonably hard code these into the compiler.
This proposal doesn't consider all the other subtype relationships in the standard library - why should we improve these C pointer conversions, but not others?
Finally, this proposal only affects interoperability with legacy C code. That code is destined to be ugly for a bunch of reasons, and specifically exists to bridge the gap between two worlds. Why would we bother sugaring this one tiny aspect of that glue logic?

-Chris

beccadax · August 13, 2021, 4:47am

This seems to me less like it’s adding special cases for standard library types and more like it’s adding special cases for declarations from ClangImporter. It’s to be expected that libraries can’t define similar conversions since libraries can’t define new (statically-enforced) foreign language importers. And in this particular proposal, the new rules won’t just benefit a narrow use case like CGFloat does—a ton of C, ObjC, and probably eventually C++ APIs will become much more ergonomic.

xedin · August 13, 2021, 5:28am

Just to add to what @beccadax is saying. I don’t see a reason why we can’t add conversions like this while we are working on the more generic solution, and re-implement them later in a more generic framework, as I mentioned in a tangential discussion about mutable to immutable conversion. Also there are multiple major implementation/design considerations (one of them being type-checker performance) with a generic solution, besides determining whether that would be beneficial to the language or would aid in a creation of dialects.

blangmuir · August 13, 2021, 6:54pm

I would go further and say we should be able to write it on Swift declarations as well. I think you ought to be able to wrap a C API without breaking source compatibility. I'm wondering if we should also allow

let ptr: @cPointer UnsafePointer<UInt8> = ...
call(ptr)

to factor an expression out of the call. I realize using this pointer from Swift itself may violate our own aliasing rules, but that's no different than if you abuse assumingMemoryBound(to:) with the current situation. On the other hand, maybe there is never a good reason to need to get the type conversion on the local value and only on the calls that use it.

Ben_Cohen · August 13, 2021, 7:20pm

As before, the standing guidance of the core team is that narrow custom conversions that bring benefit could be considered but that "generalizing support for implicit conversions would be harmful to the language".

My recommendation is that a general conversion mechanism be explored in separate threads* that explore the problem space and whether the core team's previous guidance should stand, rather than redirecting all pitches for custom conversions into being ones about solving the general problem. It is not fair on the proposal authors otherwise, who are looking for other kinds of feedback on their pitches that will be crowded out.

Following the link to the post didn't clarify for me how the CGFloat conversion "got us into trouble". Can you explain more directly what the trouble was and how it would be addressed by a general conversion capability instead?

Just to check – it's not clear to me whether you're talking about CGFloat in Swift 5.5 or Swift 1.0 here. I'm guessing 1.0? Because if it's about Swift 5.5, it's an inaccurate description, so it'd be good to be clear.

Like all slippery slope arguments, this is pretty weak. In fact it even includes the counter – we didn't bake these behaviors in because property wrappers was an achievable generalization that clearly improved the language. It's not an appropriate analogy to generalized implicit conversions, which are both unlikely to be achievable in the medium term in practice and quite possibly an inappropriate direction for the language regardless of implementation difficulties.

This would be true whether or not we were implementing this subtype relationship through a general mechanism or a specific compiler change. Each potential conversion would need to be proposed and considered separately, and the mechanism for how that subtyping is achieved is just an implementation detail. The only difference would be an implied "implicit conversions are generally OK, that's why we added the language feature". But doing that conversion would still have to be considered on a case by case basis.

Because good interoperability with C is one of Swift's strongest selling points, but remains far more painful than it needs to be, and the feeling of the authors is that this particular conversion would go a long way to improving that and make Swift more appealing to a wider set of use cases. I agree with them.

* ps-edit: I guess I should be aware of the irony of debating generalized conversions on this pitch thread while saying we shouldn't debate them on thread... if this conversation goes in interesting directions wrt generalized conversions, maybe we can move that part of the conversation out to a separate one.

Karl · August 13, 2021, 8:24pm

I would add that it isn't just sugar to make things less ugly, or even just less "painful" or verbose; we have a situation today where people don't know what they should be doing, and existing APIs like assumingMemoryBound are so delightfully tempting, but easily lead people down dangerous paths where their code might be miscompiled.

We've tried for years to teach everybody the correct way, but it's still a pervasive problem. That seems to suggest that what we have is just too complex, and we need to do more on the language/compiler side of things to help ensure that everybody is writing correct code.

Also, "legacy C code" kind of undersells it a bit. C is the only language for which we have genuine, first-class interoperability on all platforms. Want to call a Rust library? A Java library? You'll be going through a C interface. Even the Python interop library is all built on Swift-C and C-Python interoperability. C is our primary gateway to the non-Swift world. It isn't just legacy.

Hacksaw · August 14, 2021, 5:30am

It seems like that could lead to the more generic solution being constrained by the needs of this forerunner, lest it be code breaking. That alone make me want caution.

xedin · August 14, 2021, 5:35am

I think it’s only a question of semantics, if generic solution can’t support such a straightforward conversion than we should keep looking.

AlexSedykh · August 14, 2021, 1:09pm

I agree, and can add that C is unique language for the platform, because we have Metal Shader Language and it is C itself. And this part of platform is not going to go out to history like Objective-C. And it is very perspective to make much easy interchangeability between them as possible.

Andrew_Trick · August 15, 2021, 3:16am

The arguments for new implicit conversion features can stand on their own. This proposal does not introduce any new types into our current support for implicit conversions. It makes the existing unsafe pointer conversions aware of special aliasing rules in C based on information from the importer. The implementation is a straightforward extension of the current mechanism.

We're not piling on another feature. We're fixing an old feature. This "bug" has been a barrier to Swift adoption and frankly a source of animosity toward the language. Developers need a fix now. Holding that up for leverage on future proposals would be wrong. And, as @becca explained, even if we had a library feature, the rules being proposed here should not be part of that feature.

Because this proposal is explicitly about conversions that are safe according C's strict aliasing rules. This solves a well-defined, common set of problems. Everything about this proposal, including the motivation, design and implementation revolves around that. Whatever other conversion you may be thinking of would lead to a very different discussion.

@karl already explained this well, and it should be clear from the proposal, but I want to stress again that this is not about syntactic sugar. It opens a class of C APIs up for interoperability without breaking Swift's memory model.

Andrew_Trick · August 15, 2021, 3:31am

Yes, I think that's a good instinct. But, in any cases that these conversions kick in, C has in a sense declared the "wrong" argument type for Swift. We really want the shim to declare the right type. In fact, this is often what shims are for currently.

So, if you're writing a shim for foo(const char *), we really want to you decide whether this is a CString or byte buffer and use the correct Swift pointer type. UnsafePointer<CChar>? or UnsafeRawPointer?.

With the proposed change, you can still write the same shims, but you now longer need to violate the memory mode by call assumingMemoryBound(to:). You can simply turn around and call the underlying C function.

The major benefit of assumingMemoryBound(to:) is that it marks places where users are likely violating the memory model. It's very helpful to make that assumption explicit both for human understanding and to build diagnostics. We actually don't want that to become implicit for Swift locals, which could be passed to other arbitrary Swift code.

[EDIT] @blangmuir I forgot to mention the most important point. When developers do wrap a C API by copy-pasting the imported Swift declaration, and that happens to break source compatibility, @xedin has made sure that they are presented with a diagnostic explaining that those conversions are only valid in C code. I think that's enough to prompt them to adjust the argument type.

Ben_Cohen · August 15, 2021, 3:41am

I don't see a reason why it would constrain the generic solution. Worst case is that these conversions would need to continue to be bespoke because they have some need that a general solution should not cover. We know that there are many examples already where this will be the case. For example, a general solution would never have been enough for CGFloat because it was a bidirectional conversion – something that (as SE-0307 pointed out) would not be appropriate for a more general solution.

johnno1962 · August 15, 2021, 7:58am

A few observations. Wouldn't be better where such niche ad-hoc conversions were expressed in Swift and only available if you are working in a module that contains them or imports it. For example, the conversions required for the call to computeDigest in the pitch could be expressed in these terms:

extension UnsafeMutablePointer where Pointee == UInt8 {
  init(_ implicit: UnsafeMutableRawPointer) {
    self.init(implicit.assumingMemoryBound(to: Pointee.self))
  }
}
extension UnsafePointer where Pointee == UInt8 {
  init(_ implicit: UnsafeRawPointer) {
    self.init(implicit.assumingMemoryBound(to: Pointee.self))
  }
}

which it begs the question why on earth should such a niche conversion be baked into the compiler if you care about type safety; all to avoid the moderate inconvenience of aliasing a pointer correctly.

The problem here arises because people are talking about a DAG implementation. Better to impose the "C++, one conversion rule" and this limitation is avoided and keeps the type checker out of trouble. There is already a working prototype that caters for this case without issue mentioned in the other thread

Finally, the PR for this bespoke change already contains more lines of code added than the other PR mentioned in the other thread fairly well advanced towards a more universal solution. More code, slower type checker. To my mind, time would be better spent fleshing the latter out sooner rather than later.

scanon · August 15, 2021, 3:13pm

This is wildly simplistic. Simple code is sometimes fast, but fast code is often complex.

johnno1962 · August 15, 2021, 4:27pm

Ha, I wondered if I'd get pulled up on that. If you look at the actual PR it definitely won't be faster even though it won't be much slower. As you add more bespoke conversions though their slowing effect will be cumulative as you add code paths. With a general mechanism based on some sort of type lookup table this overhead will be relatively constant no mater how many conversions you add.