C Interoperability: Import "struct Incomplete *" as Unsafe(Mutable)RawPointer rather than OpaquePointer

Hi all,

TL;DR

I propose to replace the OpaquePointer struct with a deprecated typealias for UnsafeRawPointer. Then, change the import of C pointers-to-incomplete-types to produce UnsafeMutableRawPointer or UnsafeRawPointer, depending on whether the pointee const.

Introduction

The standard library currently has an odd pointer type, OpaquePointer. Better abstractions exist in the standard library for working with pointers to "raw" memory (Unsafe(Mutable)RawPointer), so OpaquePointer is mostly redundant and also fails to capture mutation (there is no "Mutable" variant).

I suspect that most people don't reach for OpaquePointer. The only main source of OpaquePointer-based APIs are imported C APIs that include pointers to incomplete types. For example, given:

void foo(struct Incomplete *arg);
void bar(const struct Incomplete *arg);

we will import both parameter types to OpaquePointer, e.g.,

// Current
func foo(_ arg: OpaquePointer);
func bar(_ arg: OpaquePointer);

I propose to instead import using Unsafe(Mutable)RawPointer, e.g.,

// Proposed
func foo(_ arg: UnsafeMutableRawPointer);
func bar(_ arg: UnsafeRawPointer);

This matches how we import void * and const void*, and allows us to simplify the standard library by removing OpaquePointer.

Source compatibility

When we remove the OpaquePointer struct and change the way pointers to incomplete types are imported, there are several ways in which we can break code. One obvious mitigation is to introduce a deprecated type alias for OpaquePointer:

typealias OpaquePointer = UnsafeRawPointer

That will allow code that explicitly refers to OpaquePointer to continue along, using UnsafeRawPointer, because the APIs are mostly the same.

The main bit of friction comes from the fact that today one can construct an Unsafe(Mutable)RawPointer and Unsafe(Mutable)Pointer<T> from an OpaquePointer (and vice versa), without regard to mutability. So, code that previously got an OpaquePointer from a C API could then construct an UnsafeRawPointer directly from it, e.g.,

func f() {
  let p: OpaquePointer = someCAPI()
  g(UnsafeMutableRawPointer(p)) // current well-formed, but would be ill-formed with this proposal
}

UnsafeMutableRawPointer does have an initializer that allows one to create a mutable pointer from a non-mutable one, but it's called init(mutating:) rather than init(_:). One can provide some measure of source compatibility by added (deprecated) initializers to UnsafeMutableRawPointer, e.g.,

extension UnsafeMutableRawPointer {
  @available(*, deprecated, renamed: "init(mutating:)")
   public init(_ from : UnsafeRawPointer) {
     self.init(mutating: from)
  }

 @available(*, deprecated, renamed: "init(mutating:)")
 public init?(_ from : UnsafeRawPointer?) {
   guard let from = from else { return nil } 
   self.init(mutating: from)
  }
}

It's a little ugly, but it smoothes over most of the source compatibility concerns we've seen in practice. See the experimental implementation of this pitch for a little more commentary on the source-compatibility story. Specifically, we might also need add (deprecated) initializers to UnsafeMutablePointer.

Alternatives Considered

There are two main alternatives:

  • Leave OpaquePointer alone: it's ugly and crufty, but it's not worth breaking any code over such a small amount of excess API.

  • Import incomplete types as distinct (but limited) types in the Swift type system. This, for example, would import the incomplete C struct "Incomplete" as a Swift struct, e.g.,

    // If we imported incomplete types...
    @_incomplete struct Incomplete { }
    func foo(_ arg: UnsafeMutablePointer<Incomplete>);
    func bar(_ arg: UnsafePointer<Incomplete>);
    

    This is arguably a better solution: it means that pointers to different incomplete C types (e.g., struct Incomplete * and struct OtherIncomplete *) get imported as distinct types in Swift, which implies better type safety. However, there are two downsides:

    1. We would need to invent a notion of incomplete types. So, while one could name Incomplete in a Swift program, you couldn't actually create a variable of type Incomplete, or instantiate a generic with Incomplete, or any of the other things we're accustomed to doing with Swift types. Aside from the nontrivial design and implementation effort, this introduces a significant complication into the mental model for Swift.
    2. Making imported C types more type-safe is likely to break significantly more existing source code.

    Given those two downsides, I feel that it's better to make a limited change here (eliminating OpaquePointer) and move on---we don't need to expend significant effort to improve C interoperability further, and it's certainly not worth breaking much code over.

So, what do we think? Is it worth trying to get rid of OpaquePointer?

Doug

7 Likes

I prefer the second alternative (with better type safety).

Can you import each incomplete type as an empty Swift enum?

typedef struct Incomplete Incomplete;
void foo(const Incomplete *in);
void bar(Incomplete *in);
void baz(Incomplete **out);
enum Incomplete {}
func foo(_ in: UnsafePointer<Incomplete>)
func bar(_ in: UnsafeMutablePointer<Incomplete>)
func baz(_ out: UnsafeMutablePointer<UnsafeMutablePointer<Incomplete>>)
2 Likes

This is arguably a better solution: it means that pointers to different incomplete C types (e.g., struct Incomplete * and struct OtherIncomplete *) get imported as distinct types in Swift, which implies better type safety. However, there are two downsides:

When (if?) we add strong type-aliases, then each incomplete type could be a distinct typecopy of Unsafe(Mutable)RawPointer. We just have to decide what parts of the ...RawPointer interface get published in each new type.

I concur, I'd like to see it do this too. But I'm definitely conflicted. It depends on what we want to encourage with these imports. Importing as a raw pointer types encourages writing to and from its byte offsets. Importing it as a strong but unrepresentable type encourages passing around to other APIs that take the same incomplete pointers.

1 Like

Empty Swift enums aren't restrictive enough, though. If I have an UnsafePointer<Incomplete>, where Incomplete is an empty Swift enum, I can't form a new Incomplete (there's no case to create with), but I could access it's pointee:

let value = p.pointee

Even that would need to be disallowed for imported incomplete types.

Doug

Why not just typealias OpaquePointer to UnsafeMutableRawPointer unconditionally?

There can't be any existing [compile-error-free] code that mutates through an OpaquePointer, so it's not source breaking, right? The downside is that it weakens the type safety of potential future code that uses the deprecated OpaquePointer type.

Something like strong type aliases (a la Haskell's new type) would allow us to have distinct type aliases of Unsafe(Mutable)RawPointer for each different imported type. This would be a better overall solution than my "option #2", because it means we don't have to invent something that would be weird in Swift (incomplete types). Rather, strong type aliases are a decent feature on their own, that would then improve the situation here.

From a staging perspective, we could go with my proposed solution now and, at some future point when/if strong type aliases get added to the language, revisit the import of pointers-to-incomplete types to change it from Unsafe(Mutable)RawPointer to "a unique strong type alias of Unsafe(Mutable)RawPointer.

Doug

6 Likes

If you alias OpaquePointer to UnsafeMutableRawPointer, you hit the same issues as with aliasing UnsafeRawPointer, but in reverse: when you construct an OpaquePointer from an UnsafeMutablePointer, you get stuck adding the mutating:. My choice of UnsafeRawPointer was pretty arbitrary, because you can't really do mutation meaningfully when the pointee type is incomplete.

Doug

If the empty enums conform to an empty protocol:

public protocol _OpaquePointee {}

Then a constrained extension could generate warnings and errors:

extension UnsafePointer where Pointee : _OpaquePointee {

  @available(*, deprecated, message: "Cannot access opaque pointee")
  public var pointee: Pointee {
    fatalError("Cannot access opaque pointee")
  }

  @available(*, deprecated, message: "Cannot access opaque pointee")
  public subscript(_: Int) -> Pointee {
    fatalError("Cannot access opaque pointee")
  }
}

I guess you could shadow all of the Unsafe(Mutable)Pointer API this way. It still feels very, very kludgy to me, because (for example) you could pass the UnsafePointer<Incomplete> to some arbitrary function that is generic on UnsafePointer<T>, and that function will likely blow up.

Doug

1 Like

I see your point.

Arguably, OpaquePointer is currently broken. Even in C, you can cast away constness, but you can't just ignore it, as OpaquePointer effectively does.

So, +2 for your already-implemented approach (with the deprecated init(_:)'s): +1 for fixing OpaquePointer, and +1 for abolishing it!

strong +1 for changes in this area. Strong -1 for importing any incomplete struct pointer with the same type. Most modern C APIs use pointers to incomplete structs and when using them from Swift it would be a shame to be able to mix them up accidentally which is less typesafe than C which is quite something.

The idea of making a newtype Incomplete = Unsafe(Mutable)RawPointer for any struct Incomplete * sounds like the best plan to me, wished we had a (Haskell-style) newtype anyway.

CC @lukasa too.

3 Likes

Yup, I hit this before and ultimately didn't make much forward progress, but while we're here I'll bring up the concern I mentioned at the time.

Specifically, some C libraries go through ABI-breaking but API-preserving changes where they "opaquify" their structs: that is, they previously had structs that had public members, and they hide those away in a subsequent release. A particularly notable library that did this recently was OpenSSL, which did it for essentially all of their data types.

This is basically transparent from C code: you probably weren't dereferencing those pointers in your own code anyway, so it didn't matter that they hid the code from you, and in the few cases you were dereferencing the pointer you could shim in the new functions that did the same job.

In Swift, however, this is a source breaking change, because the pointer type goes from UnsafePointer<T> to OpaquePointer. This is really tough to interop with.

In my ideal world we'd have something like OpaquePointer<T> using something like Haskell's newtype, but I'd also be happy enough with the use of the Unsafe[Mutable][Raw]Pointer types.

I agree that the best design involves importing as a newtype of Unsafe(Mutable)RawPointer. That'll certainly cause more source breakage than my proposal (now there are more distinct types running around), but in a sense it's "good" breakage because Swift is providing more type safety for these APIs.

  • Doug
1 Like

I don't actually think a newtype of Unsafe(Mutable)RawPointer is automatically the right way to go, the main reason being that one of the goals for handling incomplete C types is dealing with when they become complete. If I import a Swift library that has pointers-to-incomplete types in its public API, and I also import the C header that contains the definition of the formerly-incomplete type, the Swift API won't match up. That's fine! That's pretty much inevitable in Swift. But I care about how hard it is to make those two APIs match up.

  1. In Swift 4, the library API takes an OpaquePointer. A client has to use initializers to get in and out of OpaquePointer-land. If the value is on the stack, they need to use withUnsafe(Mutable)Pointer(to:) to even get a pointer to convert, although the compiler doesn't stop you today from just going directly to OpaquePointer with & even though that isn't correct.

  2. With Doug's original proposal, the library API takes an Unsafe(Mutable)RawPointer. We already have implicit conversions up to these pointer types from typed Unsafe(Mutable)Pointers (one of the few places in the language where we do so), so magically this will just work.

  3. With structs that wrap pointers, a client has to use initializers to get in and out of typed-pointer land, with the same problem with values on the stack. The compiler can't statically help you do this, but…at least the names line up, I guess? (UnsafePointer<Foo> vs. UnsafeIncompleteFooPointer or something. I'm not sure how "Mutable" fits in here either.)

  4. With fake empty structs (roughly Doug's @_incomplete, everything lines up and we get proper types (UnsafePointer<Foo> on both sides), but people can ask for p.pointee and maybe get a value out. I'm actually not so worried about this as long as we can distinguish "complete" and "incomplete" runtime metadata—if it's incomplete, you do a load of zero bytes and get nothing back. (Yes, a human can write that code, but why would they?) I don't think we need to expose the "incompleteness" at the language level; it'll just fall out that using such a type as a value is pointless.

Note that the other direction is uninteresting to me. If a Swift library exposes a C type in its API and it could see the definition of the type at compilation time, I think it's okay to require (continue requiring) that the definition be visible to the client as well.

I haven't thought too much about the migration path for (4), which to me is clearly the "right" answer if we don't mind breaking code. If we can't do (4), though (probably due to the tricky runtime bits), I wouldn't want to do (3), which also breaks a lot of code in addition to being only a 60% answer. I'd rather admit that Swift can't handle C 100% and stick with (1) or (2)—and probably (2) in practice. (2), again, is Doug's original proposal. If we can get away with it.

The ideal thing to do here would be something like what we're planning for moveonly, where types are assumed to be "complete" unless you specifically opt in to supporting incomplete types. You could then make sure that pointee and friends are only available for pointers to complete types. That would be a lot of effort for not much benefit, though.

In practice, I don't think myOpaque.pointee is that big a problem, because what are you going to do with that value once it's dereferenced? It has no members. You could try assigning to pointee, I suppose; I believe that currently this would silently do nothing (uninhabited types are 0 bytes long), which could be confusing.

Could we add an "uninhabited" bit to the metadata for a type (if there isn't one already), have Unsafe[Mutable]RawPointer trap in appropriate places if that bit is set, and optimize those checks away when we're certain the type is inhabited (and, I suppose, in production builds)?

We could probably hack this up quickly to better understand the source-compatibility impact of this approach (your #4). It does address the concerns brought up by @lukasa about C libraries changing whether the type is complete being a big source-breaking change, it eliminates OpaquePointer, and gives us the stronger typing guarantees.

Metadata uniquing becomes really interesting if we have the notion of incomplete types; presumably, when you unique incomplete metadata, you have to go hunting to see if someone else already completed that metadata... and you might need to revise that decision if someone later dlopen's a shared library that now has complete type metadata. I... think... all of this is doable.

Doug

2 Likes

I’m definitely interested in helping test out any change in this area: I have some code lying around that works with OpenSSL and would happily validate the change.

I'd love to see OpaquePointer go away. It isn't useful for anything, and thus doesn't carry its weight. It is needless complexity.

2 Likes

There's actually an Opaque Pointers in Swift thread elsewhere in the forums which I've kind of reopened the other day. This message is just to link those two because the other one contains a a real-world example of how bad it is to support both OpenSSL 1.0 and 1.1 where the main difference is that the structs have become opaque.

2 Likes