Brainstorming how to import base classes for C++ interop

zoecarver · June 18, 2021, 4:03pm

I hoped that we could wait longer to think about this, but unfortunately, the time is now. It is hard to continue to make progress with C++ interop while we have such a glaring issue in front of us.

Essentially, the problem is that we currently don't import any members of base classes. This includes special members, meaning we get linker errors when trying to use a class with a base class that has special members (such as a custom copy constructor or destructor). You can see an example of this if you try to use std::vector in Swift code.

The best solution for this problem should be clear: import both the parent and child classes, then have the child inherit from the parent. Unfortunately, this is not so simple. C++ classes are always imported as Swift structs which currently do not support inheritance from other Swift structs. This would be a big change for the language. One that we should arguably make, but, regardless, will take time to discuss and even more time to implement. We cannot have C++ interop be blocked on this, so we need to look for other solutions, at least for the time being.

Another solution would be to use protocols with default implementations to provide parent classes and their members. During IRGen, these protocols could be magically turned into concrete types, like structs. This would be a hack and would take some work, so I'm not too excited about it. But if there is interest, I can elaborate on my thinking a bit more.

Another solution is to flatten child classes into a single class. All the parent members would then be added to the child class, and it would appear as though there was no parent. I like this solution a lot. It seems like the easiest way to get something workable, and from a user's point of view, there wouldn't be much difference between this and "the best solution."

Here's an example of what importing a parent and child class would look like:

// C++
struct A {
	void test1();
};
struct B : A {
	void test2();
};

// Imported as
struct A {
	func test1()
}
struct B {
	func test1()
	func test2()
}

The biggest problem here is the Swift compiler would view child and parent classes as completely different, unrelated types. This means it would not be easy to up/down cast instances of these classes. However, I think there may be a fairly simple solution to this as well. When importing these types, it would not be too hard to add some convenience methods to help with casting. For example:

// C++
struct A {
	void test1();
};
struct B : A {
	void test2();
};

// Imported as
struct A {
	func test1()
	func asB() -> B?
}
struct B {
	func test1()
	func test2()
	func asA() -> A // alternatively "asParent()"
}

If anyone has thoughts about these possible implementations or has an idea for a better implementation, I'd love to hear them.

zoecarver · June 18, 2021, 4:04pm

CC @compnerd, @plotfi, and @egor.zhdan.

Ceylo · June 18, 2021, 5:42pm

What is the reason for importing C++ struct/class as a Swift struct rather than a Swift class? It looks like this would allow inheritance without doing weird things.

I didn’t follow previous steps of C++ interop so I hope it doesn’t sound dumb.

plotfi · June 18, 2021, 7:20pm

@Ceylo I am not as much an expert on this as Zoe and others who have been working on/with C++-Interop for far longer, but I have a sense that structs being value types map better to the C and C++ semantics of being pass by value (where inouts or UnsafePointers can be used to model C++ pass by reference). Also, I think there is a good mapping between C++ ctor,cctor,dtor,move to Swift's Value Witness Tables used for handling the passing, assigning, and returning of values. @zoecarver what do you think about this?

jonprescott · June 18, 2021, 7:38pm

To follow up, C++ classes and C++ structs are essentially the same, out of the box, differences being the default access control settings. So, without considering inheritance, there is a better match between Swift structs and C++ classes. Swift classes are reference-counted data structures, which can be built in C++, however, reference counting is not inherent to the C++ language. In C++, I can build a reference counted class (and a reference counted struct), but, it has to be built by hand. But, both C++ classes and C++ structs support inheritance, polymorphism, etc., which is the overlap with Swift classes.

masters3d · June 18, 2021, 8:36pm

@Erik_Eckstein seems to be manually bridging C++ structs into Swift final classes in libswift. I wonder if this is something future C++ interop should do by default?

swift/libswift at main · apple/swift (github.com)

Libswift is enabled - Development / Compiler - Swift Forums

zoecarver · June 18, 2021, 10:36pm

I agree with Puyan. We currently do, should, and will almost certainly continue to map C++ classes and structs to Swift structs (except potentially in a few specific cases, such as types that are specifically marked as reference-counted).

I started writing out the reasons for this, but I think the C++ Interop manifesto does a much better job than whatever I would have come up with on the fly. Here's a link to the relevant section: https://github.com/apple/swift/blob/main/docs/CppInteroperabilityManifesto.md#structs-and-classes

Moreover, we shouldn't use reference types here just because Swift has an artificial limitation on value types, which map better. We should simply remove the limitation, or work around it for the time being. Even inheritance isn't worth sacrificing the value-type semantics that map so nicely with Swift structs.

With regard to libswift, I think there might be a bit of confusion. Libswift isn't importing the types you're talking about; those are part of a new API written in Swift. The Clang Importer is used for this API, but it is only providing the implementation details. Most, if not all of the API that is actually used (to create optimization passes, for example) is actually plain old Swift code. That Swift code just happens to be calling imported functions.

It looks like Erik chose to use final classes in certain places (in other places, structs are used, too). That's most likely because a reference type was best suited for that particular type/API interface. This is a decision that can be made when writing an API, but not when importing it. The existing C++ APIs, the ones we might import with C++ interop, are expecting to be used as value types (or maybe pointers/references to value types), but not as reference-counted types.

Hope this clears it up a bit.

fraserjane · June 19, 2021, 6:56am

I’ve contributed to the gir2swift project, which faces similar challenges in that it in imports an external class hierarchy. Gir2swift handles this by using protocols with default implementations, which are flagged as inlinable. This works quite well for the consumer of the library, since you can declare a variable as a member of a base class and assign objects of a derived class. I think a similar approach would work well here.

George · June 19, 2021, 10:39pm

One issue with the asB/asA implementation is that as? won't work properly, meaning that if you have an Any or a SomeProtocol you could only cast it to the leaf class. I'm not sure this is a deal-breaker, but deserves calling out (I'm currently dealing with this while binding a C++ project through C).

One thing I'm curious about in terms of struct inheritance being too big of a blocker, do you mean general struct inheritance (i.e. I could write struct Bar { }; struct Foo: Bar { }), or is limited inheritance (for instance, only for the C++ bridge) an endeavor of roughly equal complexity?

egor.zhdan · June 20, 2021, 3:23pm

Another problem with this approach is that Swift doesn't support multiple inheritance, so we won't always be able to map C++ inheritance to Swift inheritance even if/when Swift supports struct inheritance. Whatever solution we use now will probably have to stay at least for this particular case.

At this point we could stop calling them protocols, and introduce some kind of cxxStruct keyword (now that there is a similar precedent with the added actor keyword). This would probably be cleaner from users' point of view (no need to think about why some protocols can be instantiated without a concrete type) and perhaps more convenient for compiler/tooling development (would help to avoid lots of if(protocolDecl.notActuallyAProtocolButACxxStruct)).
In either case it feels to me as too big of a change to the language.

This solution avoids the multiple inheritance issue, and (probably) doesn't require any dramatic changes to anything other than the ClangImporter. I'm personally in favor of this solution.

However, @George is definitely right that this brings some noticeable inconvenience to casting.
Another aspect of this is that code written in extension CxxBase {...} won't be directly available for inheritors of CxxBase.

We'll probably also need some kind of mechanism for downcasting C++ objects, for example func cxxDowncast<Source, Destination> described in the C++ interop manifesto.

George · June 20, 2021, 7:59pm

One additional complication is projects like MLIR that build without run-time type information (-fno-rtti). In this mode, there would effectively be "un-castable" types, because you could have a c++ backed concrete type Foo for which let a: Any = …; let b = a as? Foo would be nonsensical. I'm not sure if it is possible, but a situation where some imported libraries have RTTI and others do not would complicate this further.
This situation makes me think of compile-time-only ("marker") protocols like Sendable, which have some similar restrictions (namely, you can't cast to them). So if we use compile-time-only structs to represent classes (and they are flattened), we can have an optional side-tree of protocols depending on whether or not we have RTTI. Of course C++ makes it even more complicated for us, since even without RTTI you can downcast to an unambiguous parent class.
All this is in support of @egor.zhdan's observation that we may need a new concept (cxxStruct) which is compile-time-only (like marker protocols). It won't need to support inheritance (because you can't cast to it) so just flattening types and providing asFoo() methods should be enough make it usable. If we have RTTI, we can add a tree of protocols along the side which could be cast to and used to model multiple inheritance.
It may be a little awkward to have two concepts (the protocol and the cxxStruct), but down the line we may be able to add sugar along the lines of 299, or something like #cxxImplementation(of: Protocol).

compnerd · June 24, 2021, 4:15am

I don't see how this model works for interesting cases of C++, and I'm worried that this may lead us down a path where we cannot model some of the inheritance which may be important.

struct S { };
struct T: S { };
struct U: S { };
void f(S *s) {
  puts(typeid(*s).name());
}

The RTTI information is associated with the type, which is lost if you compress the type hierarchy. How would that work for the derived types in Swift?

struct S {
  virtual void f() = 0;
};
struct T: S {
  void f() override { puts("T::s"); }
};
struct U: S {
  void f() override { puts("U::s"); }
};
struct V: T, U { };
void f(V *v) {
  static_cast<T *>(v)->f();
  static_cast<U *>(v)->f();
}

f refers to an indeterminate value technically, and needs to be explicitly qualified to be identified. If you compress the type hierarchy, how does the reference get resolved?

struct S {
  virtual void f() { puts("S::f"); }
};
struct T: S {
  void f() override { puts("T::f"); }
};
struct U: S {
  void f() override { puts("U::f"); }
};
struct V: T, U {
  using S::f;
};

How does V::f get resolved when things are flattened?

xwu · July 6, 2021, 6:01am

Given that Swift won’t support multiple inheritance even if structs eventually support subtyping relationships, as mentioned previously above, it seems pretty clear to me that C++ class hierarchies are best imported as Swift protocol hierarchies.

It’s been some time since I’ve thought about C++, but it seems possible for abstract classes to be bridged exclusively as protocols, final classes to be bridged exclusively as structs, and any other class Base that is or could be inherited from to be bridged as both protocol BaseProtocol and struct Base: BaseProtocol.

With respect to asA or asB methods, this strikes me as inconsistent with the direction of Swift, since the language abandoned such spellings as asInt or asString in favor of initializers.

But we don’t need to stick to initializers in this case; how Obj-C types are bridged offers a helpful precedent, I think: Types that conform to _ObjectiveCBridgeable (or whatever it’s been renamed to) get custom treatment with the as operator.

It would be of-a-kind and a legitimately scoped addition to have something similar for C++ bridged types so that users can use idiomatic Swift to upcast or downcast those bridged types—indeed, further consideration of how to generalize _ObjectiveCBridgeable to other languages was the reason that the Swift Evolution proposal about the protocol was deferred in SE-0058.

(The design, however, might be a little tricky as a corresponding _CxxBridgeable protocol itself can’t use an associated type for the C++ base class and may have to be just a marker protocol, since any type can conform to a protocol in only one way, and subclasses need to be convertible to any of their parent classes.)

If the suggestion above is adopted to import a C++ class Base as both a struct Base and a protocol BaseProtocol, then subclasses naturally get Derived() as BaseProtocol functionality without requiring any additional work in the compiler, of course. However, it may be clunky to import all APIs in terms of existential types, and some form of _CxxBridgeable would make possible the use of Derived() as Base.

beccadax · July 7, 2021, 2:43am

Would it be better to invert this approach?

I'm hardly an expert on the C++ ABI, but if I recall correctly, in the absence of virtual base classes, each base class is laid out in the derived class identically to a data member of the same type. (There might be an edge case around empty classes with no vtable, but this seems like something you could work around.) If so, then perhaps we should import base classes as stored properties of the derived class, and then mirror their members onto the derived class so they can be called as though they were inherited:

// C++
struct A {
	void test1();
};
struct B : A {
	void test2();
};
A *useAndReturnItsPointer(A &a);

// Imported as
struct A {
	func test1()
}
struct B {
	// Stored properties and real members:
	var asA: A
	func test2()

	// Mirrored from asA:
	func test1()
}
func useAndReturnItsPointer(_: inout A) -> UnsafeMutablePointer<A>

// Ordinary use:
var b = B()
b.test2() 		// As normal
b.test1() 		// Call emitted as though you wrote `b.asA.test1()`

In this model, you would "upcast" by simply accessing asA, which seems convenient enough, and I think you could downcast with a helper function:

// Upcasting via property
var B = B()
useAAndReturnItsPointer(&b.asA)

// Downcasting via helper function
withUnsafeCxxDowncast(of: useAAndReturnItsPointer(b.asA), via: \B.asA) { recoveredB in
	recoveredB.test2()
}

// Implementation in the standard library
@available(macOS 9999, *)
public func withUnsafeCxxDowncast<Base, Derived, Result>(of baseThis: UnsafeMutablePointer<Base>, via upcastProperty: WritableKeyPath<Derived, Base>, _ body: (inout Derived) throws -> Result) rethrows -> Result {
	// We'd have to update `MemoryLayout.offset(of:)` to support applying `this`
	// offsets from C++ vtables.
	let derivedThisRaw = UnsafeRawMutablePointer(baseThis) - MemoryLayout<Derived>.offset(of: upcastProperty)!

	// `assumingMemoryBound(to:)` is correct here because we are assuming
	// that `baseThis` is an instance of `Base` inside `Derived`, i.e., the surrounding
	// memory is already bound to `Derived`.
	return try body(&derivedThisRaw.assumingMemoryBound(to: Derived.self).pointee)
}

I think this could be made to work reasonably even with multiple inheritance, and even in fairly tricky cases:

// C++
struct GrandparentA { void gpA(); };
struct GrandparentB { void gpB(); };
struct GrandparentC { void gpC(); };

struct ParentA: public GrandparentA, public GrandparentB, public GrandparentC { void pA(); };
struct ParentB: public GrandparentB, public GrandparentC { void pB(); };

struct A: public ParentA, public ParentB, public GrandparentC { void a(); };

// Imported as:
struct GrandparentA {
	func gpA()
}
struct GrandparentB {
	func gpB()
}
struct GrandparentC {
	func gpC()
}

struct ParentA {
	// Stored properties and real members:
	var asGrandparentA: GrandparentA
	var asGrandparentB: GrandparentB
	var asGrandparentC: GrandparentC
	func pA()

	// Mirrored from asGrandparentA:
	func gpA() 

	// Mirrored from asGrandparentB:
	func gpB()

	// Mirrored from asGrandparentC:
	func gpC()
}

struct ParentB {
	// Stored properties and real members:
	var asGrandparentB: GrandparentB
	var asGrandparentC: GrandparentC
	func pB()

	// Mirrored from asGrandparentB:
	func gpB()

	// Mirrored from asGrandparentC:
	func gpC()
}

struct A {
	// Stored properties and real members:
	var asParentA: ParentA
	var asParentB: ParentB
	var asGrandparentC: GrandparentC
	func a()

	// Mirrored from asParentA:
	var asGrandparentA: GrandparentA
	func pA()
	func gpA() 

	// Mirrored from asParentB:
	func pB()

	// *Not* mirrored because both ParentA and ParentB inherit from GrandparentB:
	@available(*, unavailable, message: "use 'asParentA.asGrandparentB' or 'asParentB.asGrandparentB' instead")
	var asGrandparentB: GrandparentB
	@available(*, unavailable, message: "use 'asParentA.gpB()' or 'asParentB.gpB()' instead")
	func gpB()

	// Mirrored from asGrandparentC (even though it's also visible via
	// asParentA and asParentB, the more direct inheritance "wins"):
	func gpC()
}

However, virtual bases seem harder since an instance of e.g. ParentA embedded in A has a different size and layout from an instance of ParentA standing alone. Maybe a sufficiently clever copying operation could make it work, though—I don't know what you have planned for that.

CTMacUser · July 7, 2021, 5:29am

There was a recent thread for a storage class for properties between type-level (i.e. static) and instance-level. I think it was called "shared," and it was copy-initialization that determined when a new instance shared with an existing one instead of using a brand-new sub-property. Maybe we can make C++ virtual sub-objects correspond to some sort of shared sub-property.

zoecarver · July 7, 2021, 4:33pm

I don't think we're actually losing any information here. Even if the type doesn't appear to have any parents, the compiler knows it does (if for no other reason than it can look through the associated Clang type). Keep in mind, during IRGen we're still going to use the Clang types, so I think the RTTI information should be preserved.

I'd say this is a little bit of an edge case. I'd be OK with having some logic along the lines of "if the member would be ambiguous, don't propagate it down to the child class" this forces the user to cast the type first, just like in C++.

zoecarver · July 7, 2021, 4:40pm

I'm not sure this is really "inverted." It seems like this is very similar to what I'm suggesting, just with slightly different implementation details. Rather than having a method for getting the parent class, you're suggesting that it would be a stored property. I think this is a great idea. It not only is a better/more Swifty design but also fixes our padding issue. When we ultimately IRGen this, it will go back to using Clang's representation of the type (and the proper functions), but that's an implementation detail, and as you said, it should line up with stored properties.

I also really like withUnsafeCxxDowncast. That seems like a cleaver way to implement down casting.

zoecarver · July 7, 2021, 4:42pm

I agree. It would be great if we could use the as operator in the future for C++ types.