I'm working on the final bits of the implementation of SE-0451: Raw identifiers and I've hit a weird wall with the way imported C++ templates are currently handled.
My changes add logic to ASTPrinter and ASTMangler that say if an Identifier contains any characters that aren't normal identifier characters, add backticks around it as needed before printing it or mangling it.
Unfortunately, this conflicts with the way C++ interop is currently implemented w.r.t. specialized templates. Specialized templates are imported by constructing an Identifier that contains the entire specialized signature, arguments and all. For example, one of the toolchain tests fails when printing the synthesized interface of a C++ module because I generate:
@available(*, unavailable, message: "Un-specialized class templates are not currently supported. Please use a specialization of this type.")
struct MyTemplatedStruct<T> {
}
struct `MyTemplatedStruct<CInt>` {
init()
}
where previously, struct MyTemplatedStruct<CInt> {...} was expected. So from what I can tell, there is a synthesized struct named MyTemplatedStruct that is generic and then another one literally named MyTemplatedStruct<CInt>?
This output is just from a synthesized interface from SourceKit so I'm not sure if that change actually breaks usage of the type (I assume there's a special case on Swift-side lookup to resolve something that looks like X<Y> as literally X<Y> and not a Swift type specialization when the name matches a C++ symbol?). But it's still somewhat load-bearing behavior because (1) it affects code completion, which would also want to (incorrectly) insert the backticks, and (2) it also changes the mangling of those symbols because the backticks are now included in the mangling.
I'm a bit stuck trying to think of the right way to resolve this. By the time components like ASTPrinter and ASTMangler have an Identifier in their hands, they've gone through some recursion and the context of "this was a specialized C++ type" is lost somewhere higher in the call stack. One thing I could try is to bump up the alignment of Identifier and steal a low bit to store that information, but I'm not sure if that's the best approach. Is there anything else I should look at first?
How deep is that call stack? Would it be possible to pass down some flags at least temporarily to unblock this effort (instead of changing the alignment of Identifiers)?
I think users will never actually refer to these identifiers as X<Y>. We import template specializations through type aliases like:
using MyAlias = Templated<Int>;
So user need to refer to MyAlias in their code. ClangImporter will create a declaration for the specialization Templated<Int> and a type alias declaration for MyAlias. We import the specialized template as a distinct type, not as a generic. We do not expect users to refer to the specialization directly (without the type alias), so strictly speaking we could change the naming convention but I am not sure whether we actually want to.
Not very deep, but there are enough distinct calls that would need to be augmented that it gives me pause if I have to check whether something is backed by a clang::ClassTemplateSpecializationDecl at each call site. It's easy to miss something now, or in the future.
There are also places in ASTPrinter where it prints based on TypeReprs instead of the type-checked Types, but maybe that's not actually an issue for all of these synthesized declarations.
I'll see what it takes to make the existing tests pass.
Gotcha! I did some more testing and I have a better understanding of what's going on, combined with your explanation. The user can't actually spell the concrete type X<Y> as it's generated today, because it gets parsed as a specialization of the unavailable generic type, which forces them to use the typealiases.
So, this raises an interesting point worth considering: C++ interop could adopt raw identifiers fully here to let clients write the actual names of their specialized types instead of using typealiases, because if we kept the backticks in the names, they'd now be available to be written that way. If you currently generate this today:
@available(*, unavailable) struct X<T> {}
struct `X<CInt>` {}
typealias IntWrapper = `X<CInt>`
let x = IntWrapper() // what they write today
let x = `X<CInt>`() // what they could write instead
That would be a really nice use of raw identifiers, IMO. Everything else would be the same—two different specializations would still be distinct types because their names would be distinct.
In the meantime, however, I'll continue looking at ways to preserve the existing behavior.
The main caveat here is that we are not sure if we want our users to be able to spell those identifiers as of today. We might end up importing some templates as generics in the future and doing that would be a breaking change if users can spell the name of the specialisations directly using raw identifiers.
To make sure I'm understanding this, are you saying that if you want to allow users to write X<CInt> as an actual generic type in the future, letting them write `X<CInt>` would prevent that? Since they're using typealiases already, could you just define new typealiases to match, like:
Sorry for the late reply. It looks like adopting this for interop would be a bit more work because the current module deserialization logic relies on users accessing template instantiations through the type aliases. Addressing this is not something that we plan to do in the near future. Did you manage to work this around or are you still blocked?