Calling C++ constructors -- looking for feedback on my approach

I have a tentative PR here that adds support for calling C++ constructors in ClangImporter:

I've marked the PR as "do not merge" as I'd like some feedback on my general approach first.

Specifically, because C++ constructors always return their result indirectly (i.e. they take a this parameter pointing to the object to be initialized, even if the class is "loadable" in the Swift sense), I'm mapping C++ constructors to Swift initializers that have their return type marked with the @out attribute.

This makes things pretty easy on the IRGen side because all of the infrastructure already knows that the result will be returned indirectly, and it's guaranteed that memory will be allocated for the result.

I'm a bit concerned though that I may be misusing the @out attribute. In Swift code, I think it's only used if the return value is an address-only type, whereas in the C++ interop I would be using it for all C++ constructor calls. I'm wondering if this could lead to missed optimization opportunities; more generally, I don't want to use @out for purposes it wasn't intended for.

An alternative approach would be to add checks in various places in IRGen to see if I'm calling a C++ constructor and force an indirect return in this case. I think this would be done everywhere we're checking FI.getReturnInfo().isIndirect() today.

I'd appreciate some feedback on whether I'm going about this the right way!

2 Likes

@Slava_Pestov I'm told by @gribozavr that you may have some good insights here.

@out can absolutely be used with non-address-only types, so no worries there. And it happens quite a lot when functions as “reabstracted” to be passed to generic code.

Constructor calls are complicated across the various C++ ABIs, and you may need to expose some APIs in Clang in order to handle them correctly.

Constructor calls are complicated across the various C++ ABIs, and you may need to expose some APIs in Clang in order to handle them correctly.

So you're saying I need to use something like clang::CodeGenTypes::arrangeCXXConstructorCall() instead of just naively passing the this pointer as the first argument (which may not be correct for all ABIs)? That makes sense.

Along the same lines, it looks wrong to me that SignatureExpansion::expandExternalSignatureTypes currently uses clang::CodeGen::arrangeFreeFunctionCall() to call C++ member functions. I think this should be using clang::CodeGenTypes::arrangeCXXMethodCall() to handle all ABIs correctly. Do you agree?

I'll explore this some more and work out how to handle this in IRGen. I'll loop back when I have something concrete to discuss.

I agree on both counts.

Getting back to this after working on something else.

I've looked at the details of what I would need to do, and I'd appreciate feedback on my plan.

I see two main issues to solve:

  • arrangeCXXConstructorCall() takes a CXXConstructorDecl, but SignatureExpansion::expandExternalSignatureTypes() doesn't have access to that.

    Indeed, as SignatureExpansion deals in function types, not function declarations, it shouldn't have access to the CXXConstructorDecl.

    Luckily, arrangeCXXConstructorCall() doesn't need very much from the CXXConstructorDecl. It really just wants the clang::FunctionProtoType (which I think we can get through SILFunctionType::getClangFunctionType()) and a clang::GlobalDecl, which it passes to CGCXXABI::HasReturn and CGCXXABI::hasMostDerivedReturn. Those two functions really just need the clang::Decl::Kind (which we could hardcode as Decl::Kind::CXXConstructor) and the clang::CXXCtorType (which I believe we could hardcode as Ctor_Complete). So if we change these functions in Clang, I think we should have everything we need to pass into arrangeCXXConstructorCall().

  • SignatureExpansion::expandExternalSignatureTypes() would need to know whether to call arrangeFreeFunctionCall(), arrangeCXXMethodCall() or arrangeCXXConstructorCall().

    There are two ways I can see to do this:

    • Try to infer from the SILFunctionType what we're calling. If the function takes a thin metatype and returns an object of the same type, it's probably a C++ constructor. If it takes a self parameter, it's probably a C++ method. Otherwise, it's a free function.

      This approach would not require any more far-reaching changes, but it arguably encodes too much knowledge about expected SIL signatures in expandExternalSignatureTypes().

    • Introduce two new SILFunctionTypeRepresentation cases: CXXMethod and CXXConstructor. This is cleaner, and the new cases are a nice parallel to the existing ObjCMethod case. However, there are a lot of places that switch over SILFunctionTypeRepresenation and would need to be extended, so I'd like to know whether people think this is the right approach before starting work on it.

I think breaking down the constructor type is probably the wrong way to go here — Clang ought to be able to change its decision about how it expands constructor signatures in the future, e.g. in response to ABI changes, new features, etc. Instead, I think we should figure out if there's a reasonable way to ensure that we can have a GlobalDecl available whenever we're lowering the type of a constructor. In principle that's not difficult:

  • We can only emit a reference to a constructor with function_ref.
  • We should be able to look through function_ref to find the CXXConstructorDecl. If we can't, that's a problem.
  • When we're emitting a direct call to a constructor, we can pretty easily propagate that decl to the call site.
  • If there are non-call uses of the function_ref, we can pretty easily generate a thunk.

Sorry, I got sidetracked into some other work, and I'm only now getting back to this.

I've explored how to get the CXXConstructorDecl plumbed through to the SignatureExpansion. I see two main options, for which I have draft patches based on my PR linked above; I'd like some feedback before proceeding further.

I'll also need to expose the arrangeCXXConstructorCall in Clang; a draft patch for that is here.

Option 1 (my preference): Plumb clang::FunctionDecl through IRGenModule::getSignature()

Draft patch here

(I think you probably had something like this in mind?)

Pros

  • Changes relatively little code.

Cons

  • Breaks the idea that a Signature can be computed solely from a CanSILFunctionType.

  • Some getSignature() callsites now pass a clang::FunctionDecl, some don't. We need to make sure we do this everywhere it's relevant. Could be brittle as future getSignature() callsites might forget to do this. We could reduce the risk by making the clangFunc parameter non-optional and requiring callers to pass an explicit nullptr if it's not relevant.

Option 2: Plumb CXXConstructorDecl through FunctionType and SILFunctionType

Draft patch here

Pros

  • Signature can still be computed solely from a CanSILFunctionType.

Cons

  • What's a decl doing inside a type? Ewwww.

  • Changes much more code.

  • How does this affect SIL serialization? (I'm very hazy on this topic.) Is it even possible to serialize/deserialize the CXXConstructorDecl?

I don't feel fully qualified to answer all your questions but couple of thoughts:

  1. I think we need to at least store the type of the C++ constructor in FunctionType/SILFunctionType, because that's what we are trying to do for ObjC. https://github.com/apple/swift/pull/29691. IMO, it would be strange if ObjC types were tracked but C++ types were not (e.g. if two different C++ types map to the same Swift type, we need to be careful about not carelessly mixing values of the two types). John has a WIP PR up for serialization. [WIP] Implement Swift serialization of Clang types by rjmccall · Pull Request #29560 · apple/swift · GitHub

    For example, we have an additional labelled argument (experimental, currently unused), for @convention(c), which we can spell out like @convention(c, cType: "int (*)(int)"), I'd imagine we'd have something similar for C++, say like @convention(cpp, cppType: "size_t (*)(std::vector const &)") (the name of the label is not super important, we can change that since it is experimental).

  2. I don't know if 1. entails needing to store the CXXConstructorDecl itself in the type. I would imagine that a (synthesized) Swift decl would store the C++ decl, and it's (synthesized) Swift type would store a corresponding C++ type. But I'm just spit-balling here, I don't know if that's feasible, or if, for some reason, we need to store the whole decl (although I agree it would be a bit ewww if we had to do that).

1 Like

That makes sense, and I explored trying to use this for the C++ constructor work, but unfortunately the type is not enough. arrangeCXXConstructorCall(), Clang's routine for codegenning a C++ constructor call, requires the CXXConstructorDecl, not just the type. (We've discussed this up-thread.)

Yes, we will definitely be careful about this.

That looks promising, but it just handles the type; if CXXConstructorDecl were included in SILFunctionType, it would have to be serialized too.

See above -- codegenning the C++ constructor call requires a CXXConstructorDecl. If we want the codegen to use only the SILFunctionType, this means that we would have to put the CXXConstructorDecl in there.

This doesn't seem very desirable, and so my preference is for the other option, namely passing the decl through genSignature().

1 Like

That makes sense, and I explored trying to use this for the C++ constructor work, but unfortunately the type is not enough. arrangeCXXConstructorCall() , Clang's routine for codegenning a C++ constructor call, requires the CXXConstructorDecl , not just the type. (We've discussed this up-thread.)

Gotcha'. I should've read the preceding discussion more carefully. I'll leave it up to John then to comment here. :slight_smile:

NP -- the context around https://github.com/apple/swift/pull/29691 is definitely useful. Wasn't aware that you were working on this, and this should come in useful in other contexts in C++ interop!

@John_McCall Looking for feedback on the options for plumbing CXXConstructorDecl through to SignatureExpansion (not sure if you'd seen).

If you agree, I'd like to go ahead with Option 1 described there.

It needs to be a clang::GlobalDecl, but yes, I think plumbing it down to SignatureExpansion is probably the right approach. That'll require us to either enforce that the function_ref isn't used indirectly in SIL or generate a thunk for it in IRGen.

Thanks, that makes sense.

Is this something that can happen for initializers?

It definitely seems like something I need to watch out for once I start doing method calls correctly with arrangeCXXMethodCall().

Anyhow, I'll proceed along these lines and will resubmit the PR for review once it's ready.

It might be something that can happen because of SIL transformations even if you can't convince SILGen to do it. There needs to be some rule that SIL can't do it if we intend to have that restriction.

1 Like

+1 to everything John said here. Just want to add that if we are going to restrict it, please make sure to add a verifier check.

I’ve reconsidered and have now taken an approach that doesn’t require me to plumb the GlobalDecl through to SignatureExpansion. This seemed problematic for reasons discussed before: It breaks the idea that a Signature can be computed solely from a CanSILFunctionType, and we’d need to make sure the constructor doesn’t get called indirectly. Besides, computing the correct LLVM signature isn’t the only problem we have to solve; we also have to make sure we pass the right implicit arguments when we actually call the constructor.

Instead, I now do this:

  • Give the constructor a SIL function type that make SignatureExpansion produce a best-effort guess at what the constructor signature should look like (namely, pass a this pointer as the first argument and return void).
  • If this assumption wasn’t correct, emit a thunk that adds any required implicit parameters and adjusts the return type if necessary. This thunk is alwaysinline, so there is no runtime overhead. There is some compile-time overhead from having LLVM inline the thunk; if this turns out to be excessive, I’ll make a followup change to emit the correct callsite without a thunk in the case of direct calls.

The PR (Add support for calling C++ constructors by martinboehme · Pull Request #30630 · apple/swift · GitHub) is still marked “draft”, but this is only because it requires a change to land in Clang that allows the implicit constructor args to be retrieved. I think this is the right approach and consider the PR to be final in terms of what I’d like to merge, and I’d appreciate a review at this point.

Alright, that seems workable, although you might consider having the constructor return the value as an @out result.

The assumption will be that references to constructors are always to the complete-object variant, which seems reasonable to me.

@John_McCall as an additional benefit, wouldn't returning the value @out allow for SILOptimizer RVO to compose with C++ RVO? Seems like it would.