Odd behavior involving generic Obj-C classes and type metadata accessors

One of our teams has encountered some odd behavior involving Swift codegen for Objective-C generics that intersects with some efforts to avoid using the -ObjC flag so the linker can do better pruning of things that are known to not be used only-dynamically. The issue seems to be the difference between when a direct reference to the _OBJC_CLASS_$_* symbol is used vs. __swift_instantiateConcreteTypeFromMangledName when accessing the metadata. I wasn't sure if this was intentional or a bug so I figured I'd start here.

I've uploaded a small repro example at GitHub - allevato/objc-generics-swift-metadata-accessors-weirdness.

First, let's look at a native Swift class as the baseline:

let obj = NativeSwift<NSString>()

Looking at the IR, we see that it calls __swift_instantiateConcreteTypeFromMangledName using the mangled name of the symbol, and that name includes the bound generic parameters:

  %0 = call ptr @__swift_instantiateConcreteTypeFromMangledName(ptr @"$s4main11NativeSwiftCySo8NSStringCGMD") #10
  %1 = call swiftcc ptr @"$s4main11NativeSwiftCACyxGycfC"(ptr swiftself %0)

This works and doesn't pose any linker issues because there are references to other symbols from the class.

Next, the metadata for a non-generic Obj-C class:

let obj = NonGenericList()

Here, the compiler generates a direct reference to the _OBJC_CLASS_REF_$ symbol for the class, so the linker won't strip it.

  %0 = call swiftcc %swift.metadata_response @"$sSo14NonGenericListCMa"(i64 0) #11
  %1 = extractvalue %swift.metadata_response %0, 0
  %2 = call swiftcc ptr @"$sSo14NonGenericListCABycfC"(ptr swiftself %1)
  // ...

define linkonce_odr hidden swiftcc %swift.metadata_response @"$sSo14NonGenericListCMa"(i64 %0) #6 {
  // ...
  %3 = load ptr, ptr @"OBJC_CLASS_REF_$_NonGenericList", align 8
  %4 = call ptr @objc_opt_self(ptr %3) #5
  %5 = call ptr @swift_getObjCClassMetadata(ptr %4) #7

Where things get weird is when the Obj-C class is generic:

let obj = GenericList<NSString>()
  %0 = call ptr @__swift_instantiateConcreteTypeFromMangledName(ptr @"$sSo11GenericListCMD") #10
  %1 = call swiftcc %swift.metadata_response @"$sSo8NSStringCMa"(i64 0) #11
  %2 = extractvalue %swift.metadata_response %1, 0
  %3 = call swiftcc ptr @"$sSo11GenericListCAByxGycfC"(ptr %2, ptr swiftself %0)

This calls __swift_instantiateConcreteTypeFromMangledName, but since the mangled name of the Objective-C class doesn't include the bound generic parameters, it still has to load them separately. Is this actually saving anything compared to just putting a direct reference to the _OBJC_CLASS_REF_$ in the binary?

But the real reason this is problematic is since the other methods invoked on the class boil down to objc_msgSend, without that reference to the class symbol in the binary, the linker ends up stripping the whole thing. And if the initializer being called happens to be failable, the result is that it just returns nil at runtime because the class isn't found.

Passing -Xfrontend -disable-concrete-type-metadata-mangled-name-accessors does make the generic case work, but that's not something I really want teams to reach for.

Should this be considered a bug? Could we consider emitting the direct reference in that last case? It seems like it would be harmless to do so but I could certainly be missing something.

@nate_chandler @Joe_Groff You've both done work in this space, do you have any insight?

When I had originally done the work for mangled name accessors, we didn't have an encoding for direct references to Objective-C class references. We can't use the normal symbolic reference kinds since the static ObjC class object reference needs to be stored indirectly in the __objc_classrefs section to be fixed up by the ObjC runtime, and it also needs to be realized before it can be used as a class in-process. Looking up by name does both of these things. Looking at top of tree, it appears that we did recently introduce a symbolic reference kind for @protocol objects but not for classes. You could add one for classes too, but we would only be able to use it behind a deployment target gate.

A backward-compatible possibility might be to put a dummy reference to the symbols after the null terminator of the symbolic reference; this won't appear at runtime to be part of the string, but should keep the referenced symbol(s) alive through linking.

Although we do want "pure" Swift to be able to link correctly in the face of dead-stripping so that it eventually doesn't rely on -ObjC, I suspect that for Objective-C, you will have more problems than just this by not using -ObjC, because there are so many implicit dynamic dependencies. Categories for instance are likely to evaporate. @Mike_Ash might be able to speak to the feasibility of linking ObjC code without -ObjC.

1 Like

Yeah, we're definitely aware of what can go wrong if Obj-C code in general isn't pulled into the linkage, so even though we've dropped -ObjC we still use -force_load to make sure most Obj-C code isn't skipped by the linker.

The specific cases that benefit from dropping -ObjC and omitting -force_load are our code-generation cases (like protocol buffers), where it's easy to end up with huge type graphs but only a portion of the types are actually used by client code. None of that code should be referenced only-dynamically, so we're fine with it being dropped if it's not detected as being used. Indeed, the code examples I gave above work fine if they're written in Objective-C because Clang emits a real class reference to the generic class; it's only Swift's use of the mangled name accessors that breaks us by not doing that.

Thanks for the pointer! That sounds like a promising approach. I'll try to poke at this in the near future and see if I can make any progress on it.

I've had an opportunity to try this and I think I'm most of the way there, but I'm getting some linker errors that I don't fully understand. When building an executable with this change, ld64 fails with errors like the following:

ld: illegal text-relocation in '_symbolic So7NSErrorC'+0xC (/private/var/folders/0v/1tmqzwl17lgcckn6pp05wftw006b5z/T/lit-tmp-y3nimkpp/enum-error-execute-3660a4.o) to 'anon-337'

In "classic" mode, the error provides a little more information:

ld: Absolute addressing not allowed in arm64 code but used in '_symbolic So7NSErrorC' referencing 'objc-class-ref'

But I'm not sure what it is about this specific IR that's different from all the others to cause this problem. The same thing happens if I use the address of the _OBJC_CLASS_$_... symbol instead of the _OBJC_CLASS_REF_$_.... Any ideas? My draft PR is here.