I have a tentative PR here that adds support for calling C++ constructors in ClangImporter:
I've marked the PR as "do not merge" as I'd like some feedback on my general approach first.
Specifically, because C++ constructors always return their result indirectly (i.e. they take a this parameter pointing to the object to be initialized, even if the class is "loadable" in the Swift sense), I'm mapping C++ constructors to Swift initializers that have their return type marked with the @out attribute.
This makes things pretty easy on the IRGen side because all of the infrastructure already knows that the result will be returned indirectly, and it's guaranteed that memory will be allocated for the result.
I'm a bit concerned though that I may be misusing the @out attribute. In Swift code, I think it's only used if the return value is an address-only type, whereas in the C++ interop I would be using it for all C++ constructor calls. I'm wondering if this could lead to missed optimization opportunities; more generally, I don't want to use @out for purposes it wasn't intended for.
An alternative approach would be to add checks in various places in IRGen to see if I'm calling a C++ constructor and force an indirect return in this case. I think this would be done everywhere we're checking FI.getReturnInfo().isIndirect() today.
I'd appreciate some feedback on whether I'm going about this the right way!
@out can absolutely be used with non-address-only types, so no worries there. And it happens quite a lot when functions as âreabstractedâ to be passed to generic code.
Constructor calls are complicated across the various C++ ABIs, and you may need to expose some APIs in Clang in order to handle them correctly.
Constructor calls are complicated across the various C++ ABIs, and you may need to expose some APIs in Clang in order to handle them correctly.
So you're saying I need to use something like clang::CodeGenTypes::arrangeCXXConstructorCall() instead of just naively passing the this pointer as the first argument (which may not be correct for all ABIs)? That makes sense.
Along the same lines, it looks wrong to me that SignatureExpansion::expandExternalSignatureTypescurrently usesclang::CodeGen::arrangeFreeFunctionCall() to call C++ member functions. I think this should be using clang::CodeGenTypes::arrangeCXXMethodCall() to handle all ABIs correctly. Do you agree?
I'll explore this some more and work out how to handle this in IRGen. I'll loop back when I have something concrete to discuss.
Getting back to this after working on something else.
I've looked at the details of what I would need to do, and I'd appreciate feedback on my plan.
I see two main issues to solve:
arrangeCXXConstructorCall() takes a CXXConstructorDecl, but SignatureExpansion::expandExternalSignatureTypes() doesn't have access to that.
Indeed, as SignatureExpansion deals in function types, not function declarations, it shouldn't have access to the CXXConstructorDecl.
Luckily, arrangeCXXConstructorCall() doesn't need very much from the CXXConstructorDecl. It really just wants the clang::FunctionProtoType (which I think we can get through SILFunctionType::getClangFunctionType()) and a clang::GlobalDecl, which it passes to CGCXXABI::HasReturn and CGCXXABI::hasMostDerivedReturn. Those two functions really just need the clang::Decl::Kind (which we could hardcode as Decl::Kind::CXXConstructor) and the clang::CXXCtorType (which I believe we could hardcode as Ctor_Complete). So if we change these functions in Clang, I think we should have everything we need to pass into arrangeCXXConstructorCall().
SignatureExpansion::expandExternalSignatureTypes() would need to know whether to call arrangeFreeFunctionCall(), arrangeCXXMethodCall() or arrangeCXXConstructorCall().
There are two ways I can see to do this:
Try to infer from the SILFunctionType what we're calling. If the function takes a thin metatype and returns an object of the same type, it's probably a C++ constructor. If it takes a self parameter, it's probably a C++ method. Otherwise, it's a free function.
This approach would not require any more far-reaching changes, but it arguably encodes too much knowledge about expected SIL signatures in expandExternalSignatureTypes().
Introduce two new SILFunctionTypeRepresentation cases: CXXMethod and CXXConstructor. This is cleaner, and the new cases are a nice parallel to the existing ObjCMethod case. However, there are a lot of places that switch over SILFunctionTypeRepresenation and would need to be extended, so I'd like to know whether people think this is the right approach before starting work on it.
I think breaking down the constructor type is probably the wrong way to go here â Clang ought to be able to change its decision about how it expands constructor signatures in the future, e.g. in response to ABI changes, new features, etc. Instead, I think we should figure out if there's a reasonable way to ensure that we can have a GlobalDecl available whenever we're lowering the type of a constructor. In principle that's not difficult:
We can only emit a reference to a constructor with function_ref.
We should be able to look through function_ref to find the CXXConstructorDecl. If we can't, that's a problem.
When we're emitting a direct call to a constructor, we can pretty easily propagate that decl to the call site.
If there are non-call uses of the function_ref, we can pretty easily generate a thunk.
Sorry, I got sidetracked into some other work, and I'm only now getting back to this.
I've explored how to get the CXXConstructorDecl plumbed through to the SignatureExpansion. I see two main options, for which I have draft patches based on my PR linked above; I'd like some feedback before proceeding further.
I'll also need to expose the arrangeCXXConstructorCall in Clang; a draft patch for that is here.
Option 1 (my preference): Plumb clang::FunctionDecl through IRGenModule::getSignature()
(I think you probably had something like this in mind?)
Pros
Changes relatively little code.
Cons
Breaks the idea that a Signature can be computed solely from a CanSILFunctionType.
Some getSignature() callsites now pass a clang::FunctionDecl, some don't. We need to make sure we do this everywhere it's relevant. Could be brittle as future getSignature() callsites might forget to do this. We could reduce the risk by making the clangFunc parameter non-optional and requiring callers to pass an explicit nullptr if it's not relevant.
Option 2: Plumb CXXConstructorDecl through FunctionType and SILFunctionType
For example, we have an additional labelled argument (experimental, currently unused), for @convention(c), which we can spell out like @convention(c, cType: "int (*)(int)"), I'd imagine we'd have something similar for C++, say like @convention(cpp, cppType: "size_t (*)(std::vector const &)") (the name of the label is not super important, we can change that since it is experimental).
I don't know if 1. entails needing to store the CXXConstructorDecl itself in the type. I would imagine that a (synthesized) Swift decl would store the C++ decl, and it's (synthesized) Swift type would store a corresponding C++ type. But I'm just spit-balling here, I don't know if that's feasible, or if, for some reason, we need to store the whole decl (although I agree it would be a bit ewww if we had to do that).
That makes sense, and I explored trying to use this for the C++ constructor work, but unfortunately the type is not enough. arrangeCXXConstructorCall(), Clang's routine for codegenning a C++ constructor call, requires the CXXConstructorDecl, not just the type. (We've discussed this up-thread.)
Yes, we will definitely be careful about this.
That looks promising, but it just handles the type; if CXXConstructorDecl were included in SILFunctionType, it would have to be serialized too.
See above -- codegenning the C++ constructor call requires a CXXConstructorDecl. If we want the codegen to use only the SILFunctionType, this means that we would have to put the CXXConstructorDecl in there.
This doesn't seem very desirable, and so my preference is for the other option, namely passing the decl through genSignature().
That makes sense, and I explored trying to use this for the C++ constructor work, but unfortunately the type is not enough. arrangeCXXConstructorCall() , Clang's routine for codegenning a C++ constructor call, requires the CXXConstructorDecl , not just the type. (We've discussed this up-thread.)
Gotcha'. I should've read the preceding discussion more carefully. I'll leave it up to John then to comment here.
NP -- the context around https://github.com/apple/swift/pull/29691 is definitely useful. Wasn't aware that you were working on this, and this should come in useful in other contexts in C++ interop!
It needs to be a clang::GlobalDecl, but yes, I think plumbing it down to SignatureExpansion is probably the right approach. That'll require us to either enforce that the function_ref isn't used indirectly in SIL or generate a thunk for it in IRGen.
It might be something that can happen because of SIL transformations even if you can't convince SILGen to do it. There needs to be some rule that SIL can't do it if we intend to have that restriction.
Iâve reconsidered and have now taken an approach that doesnât require me to plumb the GlobalDecl through to SignatureExpansion. This seemed problematic for reasons discussed before: It breaks the idea that a Signature can be computed solely from a CanSILFunctionType, and weâd need to make sure the constructor doesnât get called indirectly. Besides, computing the correct LLVM signature isnât the only problem we have to solve; we also have to make sure we pass the right implicit arguments when we actually call the constructor.
Instead, I now do this:
Give the constructor a SIL function type that make SignatureExpansion produce a best-effort guess at what the constructor signature should look like (namely, pass a this pointer as the first argument and return void).
If this assumption wasnât correct, emit a thunk that adds any required implicit parameters and adjusts the return type if necessary. This thunk is alwaysinline, so there is no runtime overhead. There is some compile-time overhead from having LLVM inline the thunk; if this turns out to be excessive, Iâll make a followup change to emit the correct callsite without a thunk in the case of direct calls.