Towards a safe and ergonomic language model for Swift and C++ interop

zoecarver · June 3, 2022, 5:09pm

When importing C++ codebases, I propose that the Swift compiler transform C++ APIs into specific, well-known, safe patterns that feel native in Swift. In order to make sure every C++ API is imported in this way, the Swift compiler cannot import all APIs and must use annotations, API notes, and flags to determine how best to map a given C++ API.

To accommodate this new direction, and these goals for the importer, how we develop interop must change. Rather than trying to import everything all at once, with a “best-effort” attitude, we should find specific mappings for each C++ API pattern. For implementation, this means focusing on one pattern at a time, and not importing APIs that don’t fit into any of the defined patterns. Initially, I propose that we start with the following patterns, but we can add more in the future.

(Each of the below patterns will need its own pitch and evolution post outlining the pattern in more detail.)

Trivial types: trivial types have no special members except for constructors. They are the types from C and Objective-C that can already be imported. Pointers are not trivial types. These types can be automatically imported. (Note: “trivial” types are also allowed to hold Objective-C classes as members.)

Iterators: iterators are unsafe in C++. You can easily create a dangling reference or iterate past the end of the range. Iterators will be imported through a safe, native Swift interface that checks bounds and manages ownership. For example: std::vector ’s begin and end methods will not be directly callable in Swift, instead they will be transformed into a native Swift iterator/collection pattern, so the vector may be used in for-loops, etc. Iterators can be automatically imported using begin and end methods, or manually imported using annotations.

Owned types: owned types manage their storage using copy constructors and destructors. If they allocate storage, they must destroy it when the destructor is invoked, and copy the storage when the copy constructor is invoked. Their special members must not have side effects and copies/destroys must balance out. Examples of owned types include std::string and std::vector . Owned types must be annotated as such in order to be imported.

Foreign reference types (FRT): FRTs are a way to express non-Swift, non-Objective-C class types. FRTs must be reference counted or marked as “immortal” to maintain safety. FRTs must have reference semantics and object identity. FRTs must always be trafficked with at least one layer of indirection in C++ (i.e., FRTs must not be returned by value). FRTs must be annotated as such in order to be imported.

Unsafe escape hatch: If developers need to use other APIs, we can consider an unsafe escape hatch where the Swift compiler will do a “best-effort” import of the unsafe API. This will need to be developed further, and will likely be accompanied by warnings or an “unsafe” prefix.

These four patterns will provide a set of ergonomic, safe APIs that can be used to test and adopt interop. With this strategy, the bounds of interop will be well defined and every importable API will have a clear mapping. This will improve stability and usability greatly. Additionally, in terms of development, these goals will allow us to have clear objectives to work toward.

Torust · June 3, 2022, 10:54pm

This approach and its focus on annotations seems targeted at a particular type of C++ code-base, where the developer has control over the source of the C++ library and the ability/inclination to change it. I'm concerned that this places a large number of APIs/libraries under the "unsafe escape hatch" category.

The approach that Swift takes with importing C code is that everything is available (with few exceptions such as complex macros), and everything is equally as safe or unsafe and as performant or un-performant to use from Swift as it is from C; effectively, everything falls under the "unsafe escape hatch" option by default, but without warnings or unsafe prefixes. There are annotations that can be added to make the APIs be more ergonomic to use from Swift (e.g. nullability or NS_OPTIONS), and equally Swift wrappers can be written to provide a memory-safe and more ergonomic experience to other Swift code. Crucially, though, if I want to use a C library in a few places without taking on the work of maintaining it or a wrapper around it I can do so.

Obviously, C++ is a much more complex language than C to import, and not everything maps neatly. However, as the ownership manifesto gets filled out, more and more C++ code should become expressible in Swift, and so the "best-effort" import should become more capable.

In effect, I disagree with this proposal's priorities, even if the end-state might be quite similar to what's proposed. The most important aspects of C++ interop to me are that C++ code is no less usable and no less performant to use from Swift than it would be from C++, and if I want to make it more usable (more like native Swift code) then that's when I can reach for annotations and wrappers.

Of course, there are ways the importer can help make things more ergonomic by default (e.g. maybe importing std::unique_ptr and std::shared_ptr as property wrappers), and we should definitely do that where there's no cost to expressivity or performance. Doing things like making std::vector's begin and end methods not directly callable, however, is IMO counterproductive; I think developers will reach for the most ergonomic API without needing to make less safe APIs unavailable.

0x41c · June 4, 2022, 5:29am

I agree with you about the priorities of the importer. There would definitely need to be some transparency in the importation of c++ APIs. As mentioned, there would be an increase in what can be expressed from c++ to swift.

However, if I recall correctly, the mappings between the std::vector begin and end methods from c++ to swift should be completely transparent to the programmer when proxied for safety. In essence, the programmer should be able to interact with the methods as if they were safe swift functions without the need to worry about bounds checking etc.

With that in mind, what you said about it being counterproductive is also true in the sense that there would need to be a lot of work put into implementing those proxies. It might be something that will come up in the workgroup meeting but there's surely some halfway point between being unsafe and safe in swift where this lands. I mean, utilizing any other programming language in interop with swift will be inherently more unsafe.

zoecarver · June 4, 2022, 4:59pm

Adding an API notes file to another library shouldn't be too hard. To mark a few key types as foreign reference types (for example) will only be a few lines. This "price" will be made up quickly in the time saved using the better API and not fixing bugs.

It seems like your main concern is the fact that imports would not longer always be automatic. Is that correct? Either way, I think that's a valid concern, and we can brainstorm some strategies to mitigate it. That being said, please note that some of these safe patterns are automatically imported (trivial types and iterators).

Doing things like making std::vector 's begin and end methods not directly callable, however, is IMO counterproductive; I think developers will reach for the most ergonomic API without needing to make less safe APIs unavailable.

Here are three uses of std::vector's begin method. Without trying it out, can you tell me which of the following snippets introduce a bug?

var v = vector(1)
let start = v.begin()
doSomething(start)
fixLifetime(v)

let v = vector(1)
let start = v.begin()
doSomething(start)

func findStart(ofVector v: vector) -> Iter { v.begin() }

The point I'm making here is that C++ APIs are actually more dangerous to use from Swift than C APIs, specifically when it comes to memory management/lifetime. In C everything is manually managed. That has a certain degree of unsafety. In C++ objects can be automatically managed in a subtly different way than Swift. These subtle differences make it very easy to write bugs in what looks like idiomatic Swift code.

The importer can easily map these unsafe iterators into something safe and ergonomic. My question to you is why is this counterproductive? Are there specific use cases for C++ iterators that cannot be used in Swift via these patterns?

Torust · June 4, 2022, 9:21pm

Annotations being in an apinotes file rather than Clang annotations in the source definitely alleviates some of my concerns, although I still wouldn’t want it to be mandatory (so keeping things as automatic as possible by default, like you say).

The second and possibly the third depending on surrounding code, I’m assuming, since Swift lifetime rules (especially before lexical lifetimes) allow the lifetimes of variables to be ended early. There are ways to make that safe in Swift (e.g. treating a call to C++ as being a barrier with lexical lifetimes, which may or may not be worth doing). Another way of modelling it is that semantically, the call to begin should be treated as calling a read accessor on the vector (immutable borrow), and thus the result’s lifetime is tied to the vector – I’m not sure whether that’s intended to be supported with new ownership features, but annotating functions as returning a borrowed pointer would be useful and another way of handing this it it were.

I’m not arguing that it’s counterproductive to import these in a safe way for Swift as a convenience; I think that’s definitely worthwhile. However, let’s say that for some reason you wanted to call a method like std::find from Swift; with this proposal, you no longer have access to begin() and end() so can’t do so (without data() which has the same problem). You can contrive other situations where you also need the iterators when calling C++ code; for example, pass begin() to an initial call to set up state then repeatedly pass in end() to a function that iterates on that state. If you’re hiding the tools to do so you’re making things unnecessarily difficult for the programmer in the name of helping them or protecting them from themselves, and that’s what I’m arguing is counterproductive.

zoecarver · June 5, 2022, 7:25pm

In all of the examples, start will be dangling. The third example doesn't work in C++ either. No matter how many ownership features we add, we can't get the last one to work: it's a matter of semantics.

I hope this shows just how dangerous these projections of owned storage are in Swift. I have seen people who work on the Swift compiler write these bugs a few times (myself included). I like to think the Swift compiler developers are some of the best in the world, so if they are making these mistakes, anyone can. Swift is seen as a safe language, so I really feel that this kind of implicitly unsafe behavior is unacceptable and will burn our users.

There are ways to make that safe in Swift (e.g. treating a call to C++ as being a barrier with lexical lifetimes, which may or may not be worth doing). Another way of modelling it is that semantically, the call to begin should be treated as calling a read accessor on the vector (immutable borrow), and thus the result’s lifetime is tied to the vector – I’m not sure whether that’s intended to be supported with new ownership features, but annotating functions as returning a borrowed pointer would be useful and another way of handing this it it were.

That's a good point, and something we considered. The problem is that the ownership features we need aren't quite implemented yet, so there's not a great way to model this today. In terms of lexical lifetimes, this starts to fall apart when working with generic code. And even if both were implemented/worked well, it would require quite a few annotations (in contrast to the proposed solution which imports iterators/sequences/collections automatically).

However, let’s say that for some reason you wanted to call a method like std::find from Swift; with this proposal, you no longer have access to begin() and end() so can’t do so

I think we could fairly easily bridge back to an unsafe C++ iterator when making calls to C++ functions (and we could even validate the iterator in Swift before passing it off). But I want to leave the details of this for the eventual in-depth proposal outlining how iterator patters could be imported.

Torust · June 5, 2022, 10:28pm

I think that's a little disingenuous – for the third example, a wrapper function around begin() is no less safe than calling begin() directly, and "works" so long as the vector is kept alive/not mutated. (My Rust is very limited so I'm speaking out of my depth here, but I believe you could express 3. safely by indicating the returned value has the same lifetime as the passed-in value; that is, there are ownership features that could enable that pattern to be safe, although whether Swift would ever support them is a different matter). I admit to being at a loss why the first example is unsafe given the fixLifetime call, unless doSomething is somehow holding onto the passed-in iterator.

I think your comments are useful to illustrate the different perspectives here, though:

This helped clarify where I think the disconnect is. My expectation is that when I'm writing code that uses C++ APIs from Swift I should be thinking like I'm writing C++, similar to how when I'm writing code that uses C APIs from Swift I'm thinking like I'm writing C. That means eliminating the early-lifetime gotchas that Swift currently has with RAII types, but not many other changes.

On the other hand, your goal seems to be that you should be able to treat using C++ APIs from Swift like writing any other Swift code. To an extent, I think that's a worthy goal. I guess I'm just worried that that won't be achievable at scale (that as soon as you're doing something unusual you're back to writing C++ in Swift) and that it may introduce performance issues (for things like game engines sometimes you really do want -Ounchecked in release, and you avoid Swift reference counting wherever possible). How successful what you're trying to do will end up being will depend on how often users fall off the happy path and end up having to write unsafe code; I see that as being pretty often, but maybe (hopefully) I'm wrong.

As a side point, your examples so far seem fairly heavily focused on the C++ standard library. My expectation would be that for my use cases, the type of C++ code I'd predominantly be calling into (e.g. things like ImGui or PhysX, both of which I currently use C wrappers around) don't use a lot of standard library types or patterns. So long as those types of libraries are ergonomic to use (without warnings and unsafe prefixes everywhere) and don't have additional overhead compared to C++ I'll be happy.

Karl · June 5, 2022, 10:58pm

I agree with @Torust.

This comes up every time we want to interop with another language, and I think the idea that we've generally gone with is to not try to make a better X than X - don't make a better Obj-C than Obj-C, or a better Python than Python, or a better C++ than C++. Expose them as they are, and rely on developers to wrap them in things more appropriate for consumption by code which expects Swift-like semantics.

It's a difficult balance, because of course the goal is that libraries written in C++ are ultimately used like native Swift libraries. Still, it feels like perhaps a bit too much if, in order to import a C++ API, you must now deal with ARC and bounds-checking (as though this were a Swift API such as Array). That feels like something that should be done by higher-level wrappers.

zoecarver · June 6, 2022, 2:10am

This helped clarify where I think the disconnect is.

Yes, I think this is it. Thank you for calling this out :)

My expectation is that when I'm writing code that uses C++ APIs from Swift I should be thinking like I'm writing C++, similar to how when I'm writing code that uses C APIs from Swift I'm thinking like I'm writing C. That means eliminating the early-lifetime gotchas that Swift currently has with RAII types, but not many other changes.

Ignoring whether this is something that would be beneficial to do, I am not sure it's possible.

On the other hand, your goal seems to be that you should be able to treat using C++ APIs from Swift like writing any other Swift code. To an extent, I think that's a worthy goal. I guess I'm just worried that that won't be achievable at scale (that as soon as you're doing something unusual you're back to writing C++ in Swift) and that it may introduce performance issues (for things like game engines sometimes you really do want -Ounchecked in release, and you avoid Swift reference counting wherever possible). How successful what you're trying to do will end up being will depend on how often users fall off the happy path and end up having to write unsafe code; I see that as being pretty often, but maybe (hopefully) I'm wrong.

I think this is a good summation of our disagreement. I think we both understand the other's argument, and just disagree on whether this will be implementable, how often users will fall off the "happy path," and the balance of annotations vs automatic imports.

For the second point, I would be genuinely interested to see some pseudo-code/snippets of Swift where you want to use an example API that doesn't fit into one of these patterns. If nothing else, that will help us find what patterns to tackle next.

I think that's a little disingenuous – for the third example, a wrapper function around begin() is no less safe than calling begin() directly, and "works" so long as the vector is kept alive/not mutated.

That is not correct. I think this is highlighting my point very well, actually (especially considering that this is the only example we couldn't fix with all the ownership features and lifetime rules under the sun). The result of the call to findStart is always a dangling pointer. There is no valid way to use the result of findStart.

I don't think it's super relevant to the discussion, but I will explain why each of these produces a dangling reference, because it's kind of interesting if nothing else. In reverse order:

func findStart(ofVector v: vector) -> Iter { v.begin() }

As I mentioned before, this will also produce a dangling reference in C++. What's happening here is the vector is being copied from the caller into v. The copy allocates some new storage. At the end of the function, v is destroyed (and the storage is deallocated), but the iterator/pointer is still the same (referencing the deallocated storage) so the caller gets back a dangling reference even if the argument is still alive. This could be fixed with v: inout vector (as that would remove the copy).

let v = vector(1)
let start = v.begin()
doSomething(start)

This one is pretty strait forward: Swift has different lifetime rules than C++ so v might be destroyed before start's last use. This could be fixed if we somehow associated the two lifetimes. (Or maybe with lexical lifetimes?)

var v = vector(1)
let start = v.begin()
doSomething(start)
fixLifetime(v)

Finally, this one is the most complicated. Here the issue is that we are doing an "lvalue to rvalue" conversion or a "mutable to immutable" conversion. What this means is we need to convert the mutable var v into something immutable, which we do with a copy. You might be asking, "Why do we need a copy? It's never mutated?" I will explain the answer with the following example:

struct S {
  func someMethod(closure: () -> Void) { 
    closure()
    // ERROR: exclusivity violation
  }

  mutating func mutate()
}

var s = S()
s.someMethod {  s.mutate()  }

The closure captures a mutable self and we have access inside the method to the immutable self which means we have two references that alias: one mutable and one not. This violates exclusivity. To fix it, self is copied before each non-mutating method call (if self is mutable).

zoecarver · June 6, 2022, 2:18am

This comes up every time we want to interop with another language, and I think the idea that we've generally gone with is to not try to make a better X than X - don't make a better Obj-C than Obj-C, or a better Python than Python, or a better C++ than C++. Expose them as they are, and rely on developers to wrap them in things more appropriate for consumption by code which expects Swift-like semantics.

While I agree it's a good idea to reflect on previous initiatives, I don't think, "because this is how we did it last time" is a good reason we should implement C++ interop one way or another. (Also, please note that we were implementing interop this way until we realized it wasn't the most viable path forward.)

Torust · June 6, 2022, 2:22am

This is getting off the main topic, but is it a possibility to treat this as (pseudo-code):

std::iter<T> findStart(const std::vector<T>& vector)

rather than passing a value copy? That's what I assumed when I read the function in Swift; that passing a parameter in Swift is an immutable borrow by default, and there's only semantically a copy when the value is assigned (so e.g. if there were a let vector = vector within the findStart function). I now realise that's not the case, but would it be possible to make that the actual behaviour – in other words, make shared rather than owned (using the terms from the ownership manifesto) the default for C++ types? The downside is that might lead to inconsistent behaviour; with that said, I'm having a hard time thinking of examples where there's an observable difference between the two for Swift-native code.

For the exclusivity conflict, I'd argue it's also somewhere where the implicit copy is undesirable and so the user should be required to explicitly make a copy; i.e. it'd have to be this for the compiler not to error:

var s = S()
let sCopy = s
sCopy.someMethod { s.mutate() }

I realise that's a source break, but it's maybe worth considering for Swift 6; the current behaviour seems actively harmful.

Jumhyn · June 6, 2022, 3:05am

Of course, Objective-C did also get features which allowed folks to express more ‘Swifty’ idioms in Objective-C directly, such as nullability annotations and lightweight collection generics.

Karl · June 6, 2022, 12:11pm

Right - that's why I say there is a balance. I suppose the goal is that C++ libraries such as LLVM, WebKit, V8, etc are usable from Swift.

But we should also be careful about trying to do too much. Obviously we need to provide enough that imported interfaces can actually be used, and you make a good case that perhaps there is no reasonable alternative for iterators, but in general, this:

I'm not sure that "feel native in Swift" should be the goal. I'm not sure they'll ever be as good as even a basic wrapper, and I don't think that should bother us at all - I think wrappers are ultimately what we should be encouraging. The importer is the first step/lower-level building block which you use to build something that feels native, IMO.

I don't object to the importer providing a reasonable interface, of course :) but I don't think it should make too much effort to make the Swiftiest API it can, if that makes sense.

mboehme · June 8, 2022, 6:42am

This comment piqued my interest. IIUC, API Notes currently only work for C constructs (no support for C++ classes, namespaces, overloads etc.). Does this mean you're planning to extend API Notes to support C++?

plotfi · June 8, 2022, 6:44am

@mboehme I was thinking the same thing. Thanks for asking.

@zoecarver when you mention apinotes do you mean extending apinotes itself to express what libraries are safe for c++-interop consumption or do you mean something new that is similar to an apinote that is yet to be created?

zoecarver · June 8, 2022, 3:58pm

Yes, probably. We can sync up about this because I know you might be doing some similar work for lifetime annotations.

mboehme · June 9, 2022, 6:48am

Yes, that's why I'm interested.

Great to hear that you're thinking of working on this. And I would definitely be interested in syncing on this. I'll reach out via PM.

rwessel · February 14, 2023, 2:31pm

zoecarver:

func findStart(ofVector v: vector) -> Iter { v.begin() }
As I mentioned before, this will also produce a dangling reference in C++. What's happening here is the vector is being copied from the caller into v. The copy allocates some new storage. At the end of the function, v is destroyed (and the storage is deallocated), but the iterator/pointer is still the same (referencing the deallocated storage) so the caller gets back a dangling reference even if the argument is still alive. This could be fixed with v: inout vector (as that would remove the copy).

I'm new to this forum, so apologies in advance if I'm simply misunderstanding, but this thread – and this comment in particular – seems to raise a lot of important questions.

The question about the function findStart in Swift is unanswerable without further information. It depends on whether 'vector' is a value-type or reference-type, i.e. you can only tell by looking up the type (or documentation). The function will only make a copy if vector is a value type, but even then the answer is still ambiguous. A Swift Array is a pseudo value-type - it presents itself as a value-type but is actually reference-backed. A copy of the array is only made if and when the function attempts to mutate the array.
This raises an important issue for interoperability with C++. A C++ programmer would typically pass data of any notable size (e.g. larger than a pointer) as a reference, and (if the data is supposed to be immutable) as a const reference. C++ is often used where resources cannot be assumed to be (practically) limitless - management matters, i.e. don't make casual copies. If all C++ objects are presented to Swift as value-types that can only be passed by copying, it makes a critical difference. Is that the current intention of this project?
C++ offers many choices for how parameters are passed, e.g. by reference or value and also (critically) whether an object passed by reference is mutable. Swift is very limited in this respect - a value parameter is immutable (unless specified as in-out) and a reference is always mutable. There seems to be a fundamental problem here - if objects from C++ are always passed by value, it could result in a lot of redundant/expensive copying and many types cannot be supported, e.g. any use of unique_ptr. But conversely, passing by reference makes all data mutable, which opens another can of worms. Is there a plan for dealing with this dilemma? And coming back to the iterator problem, will we be able to support the concept of const_iterator vs (mutable) iterator?
It's not often a sound idea to pass around an iterator detached from the container (in C++). Iterators are used in very narrow (fleeting) contexts, usually within an algorithm with clear bounds (begin/end), and never passed around as if they are both autonomous and long-lived. At best, all you can do is reference the value it points to, i.e. you can't use it to iterate. It would be simpler/easier/better to dereference the iterator to pass the item value or reference to the function.
Should we be considering wrappers around C++ iterators (or other features) to make them 'safe' for uses that are not even sound practice in C++?