I looked into ld logs to see why some symbols remain unstripped and discovered that nominal type descriptors and reflection metadata field descriptors always 'no dead strip' (@llvm.used). Can someone explain why, please?
class SomeClass {}
Compiled with -emit-ir -O -internalize-at-link:
@llvm.used = appending global [5 x ptr] [
ptr @"\01l_entry_point",
ptr @"nominal type descriptor runtime record for output.SomeClass",
ptr @"reflection metadata field descriptor output.SomeClass",
ptr @__swift_reflection_version,
ptr @_swift1_autolink_entries
], section "llvm.metadata"
The reason I ask is because nominal type descriptors contain a reference to the type's init which may refer to a bunch of other symbols.
i am speculating somewhat here, but after poring through some history, my best guess is that it was originally added to support the _typeByName()function in the stdlib, and associated swift-corelibs-foundation functionality to implement NSClassFromString (as outlined in this issue).
it appears reflection metadata can be either fully removed, or retained just for the debugger (via -disable-reflection-metadata and -reflection-metadata-for-debugger-only, respectively). there's also the -conditional-runtime-records frontend flag that might be of use for identifying more unused symbols (as outlined here). however, i've yet to find a means of removing nominal type descriptors even if the type itself is unused.
i've yet to find a means of removing nominal type descriptors even if the type itself is unused.
apparently, on Darwin, the combination of -conditional-runtime-records and -lto=llvm-thin[1] results in the nominal type descriptors being stripped if they are unreferenced. however, for unknown reasons, when testing on Linux (via compiler explorer), this does not appear to be the case.
having now looked into the implementation somewhat, it does make me wonder:
given that the motivating use case for _typeByName() is seemingly class-from-string support – could the implementation reasonably be changed to stop treating all type metadata uniformly for this purpose? i.e. don't emit the type metadata for enums, structs, etc since they aren't expected to support being looked up in this way?
the conditional runtime records approach seems like a more general solution for reducing unwanted metadata and it avoids breaking clients depending on the current functionality (which GitHub suggests there are a decent number).
Unfortunately, if you look up a generic class by name, you may end up having to resolve structs and enums that are in its generic parameters. There could probably be an opt-out or default-flipping feature here, but just "is it a class" isn't sufficient.
TC='com.apple.dt.toolchain.XcodeDefault'
xcrun --toolchain $TC \
swiftc -parse-as-library -emit-executable -target arm64-apple-macos14 -Osize \
-Xfrontend -conditional-runtime-records \
-Xfrontend -lto=llvm-thin \
-Xfrontend -internalize-at-link \
-Xfrontend -disable-reflection-metadata \
-o main main.swift -v
nm -m main | xcrun swift-demangle | grep 'SomeClass'
0000000100003e80 (__TEXT,__text) non-external (was a private external) main.SomeClass.__allocating_init(a: ()) -> main.SomeClass
0000000100003f78 (__TEXT,__constg_swiftt) non-external (was a private external) method descriptor for main.SomeClass.__allocating_init(a: ()) -> main.SomeClass
0000000100003e90 (__TEXT,__text) non-external (was a private external) main.SomeClass.init(a: ()) -> main.SomeClass
0000000100003eb4 (__TEXT,__text) non-external (was a private external) type metadata accessor for main.SomeClass
00000001000080b8 (__DATA,__data) non-external full type metadata for main.SomeClass
0000000100008090 (__DATA,__data) non-external (was a private external) metaclass for main.SomeClass
0000000100003f44 (__TEXT,__constg_swiftt) non-external (was a private external) nominal type descriptor for main.SomeClass
00000001000080d0 (__DATA,__data) non-external (was a private external) type metadata for main.SomeClass
0000000100003e98 (__TEXT,__text) non-external (was a private external) main.SomeClass.__deallocating_deinit
0000000100003e90 (__TEXT,__text) non-external (was a private external) main.SomeClass.deinit
0000000100008048 (__DATA,__objc_const) non-external __DATA_main.SomeClass
0000000100008000 (__DATA,__objc_const) non-external __METACLASS_DATA_main.SomeClass
The nominal type descriptor isn't marked as no dead strip in the object anymore. But there is l_$s4main9SomeClassCHn (nominal type descriptor runtime record for main.SomeClass prefixed with an l) that is marked, and it refers to the nominal type descriptor.
xcrun --toolchain $TC \
swiftc -parse-as-library -emit-object -target arm64-apple-macos14 -Osize \
-Xfrontend -conditional-runtime-records \
-Xfrontend -lto=llvm-thin \
-Xfrontend -internalize-at-link \
-Xfrontend -disable-reflection-metadata \
-o main.o main.swift -v
nm -m main.o | grep 'SomeClass'
...
0000000000000260 (__TEXT,__swift5_types) non-external [no dead strip] l_$s4main9SomeClassCHn
# And l_$s4main9SomeClassCHn refers to itself and _$s4main9SomeClassCMn
objdump -r --macho main.o
...
Relocation information (__TEXT,__swift5_types) 4 entries
address pcrel length extern type scattered symbolnum/value
00000000 False long True SUB False l_$s4main9SomeClassCHn
00000000 False long True UNSIGND False _$s4main9SomeClassCMn
i tested your example and changed -Xfrontend -lto=llvm-thin to just -lto=llvm-thin and then the symbols appear to be removed. not sure if passing directly to the frontend like that should produce a warning/error if it's unsupported, but also haven't looked very closely into how the options flow through the various internal bits.
FWIW, the -conditional-runtime-records and -internalize-at-link should be considered highly experimental, and I'm aware of even more situations where they will break things (we could even say miscompile).
If you want to eliminate metadata from binaries, I recommend Embedded Swift.
Hm, I don't see any open issues on GitHub for -internalize-at-link. If there are known cases where it doesn't work as expected it might be worth opening an issue for each of them? So the community could see them (and tackle some).
FWIW, we've been using -conditional-runtime-records and -internalize-at-link in production for all of our Swift code for a couple years now and have yet to run into any issues related to it.
I found this thread while trying to figure out why virtual methods in a class are being pulled, even if the class itself is not being included.
I have a large library that I am trying to put on a diet [1], and one of the major sources of code bloat is that Swift is including classes into my final executable that are not referenced, and these classes end up bringing all of their open/virtual methods, which in turn cause the bloat.
I have a sample test case that exhibits the problem[2]
When attempting to track down the source for these symbols, I found that “lnominal type descriptor runtime record” is being flagged as ‘dont-dead-strip’, I have a trivial example in [2] where I am trying these assorted flags, with no luck.
This class that is unreferenced ends up being pulled like this via the linker why_live feature:
lnominal type descriptor runtime record for SimpleLibraryStrip.SUBCLASSABLE from SimpleLibrary.swift.o
dont-dead-strip
lnominal type descriptor runtime record for SimpleLibraryStrip.SUBCLASSABLE from SimpleLibrary.swift.o
lnominal type descriptor runtime record for SimpleLibraryStrip.SUBCLASSABLE from SimpleLibrary.swift.o
I am using the -internalize-at-link, -conditional-runtime-records and have tried both version of lto, llvm-full and llvm-thin
hmm, this is intriguing. in Dima's original example, the flags mentioned in this thread did appear to successfully strip the unreferenced symbols, including the nominal type descriptors, but i guess maybe that only 'works' when everything is within the same target/module? in your reduced example (the 'SimpleLibraryStrip + ConsumerStrip' sample project), it seems like if you emit the LLVM IR for the 'SimpleLibraryStrip' target, then it contains the expected llvm.used.conditional metadata. however, when that target is emitted to an object file, the resulting binary marks the nominal type descriptors for all externally-visible symbols as 'no dead strip'.
i've tried all the experimental/hidden/unsupported flags i could think of/find so far and none seem to have an effect. i don't really know enough about the details to have a sense if there's just some missing piece here that prevents this from being possible at the moment. perhaps @kubamracek could speak to whether the 'conditional use' symbol metadata is something that is expected (or could be made) to live through emission to an object file so that it can be 'seen' again when linking a final binary.