Why does this logic exist in the mangling process?

I've created a tool to analyze the LLVM IR generated by the Swift compiler to find unused declared dependencies on modules of my project. And I noticed that a module named "MemberGetMember" was been mangled to "MemberGet", so I started debugging the mangling step of the Swift and I found out what was going on. That happens because of this logic on the swiftDemangling/ManglingUtils.h:144

This piece of code searches for repeated words on the module name (in that case) and just omits those.
I'm just curious to know: Why does this logic exist?

It doesn't omit them at all actually. What's going on here is documented here in the mangling document:
https://github.com/apple/swift/blob/main/docs/ABI/Mangling.rst#identifiers

To give an example, consider the name you suggested earlier MemberGetMember the mangling scheme looks somewhat like this:

$s09MemberGetA0MXM (this specific mangling refers to a module descriptor)

The beginning of this mangling, 09MemberGetA0 encodes the identifier MemberGetMember by using a previously found word substitution Member. The beginning 0 in this scheme instructs the demangler that there is a word substitution within the following identifier. The 9 tells it how many characters following it are part of the identifier. Of course next is our identifier MemberGet which is 9 characters, but where is the ending Member? Well, because this identifier started with a 0, the demangler will go through this just demangled 9 characters of MemberGet and assign word indices. In this case, there are 2 words within this identifier, Member with idx 0 because it appears first, and Get with idx 1. This next A tells us two things here: 1. that all of our word substitutions are done and we can move on to the other mangled nodes because it is uppercase (if this were a lower case a, then there are more word substitutions within the identifier) and 2. that our word substitution is using word idx 0. It's word idx 0 because there are only 26 possible word substitutions within an identifier (because A-Z is 26 characters). So after seeing this A, the demangler knows to insert Member immediately after MemberGet resulting in the identifier MemberGetMember. The final 0 in this case lets the demangler know it's finished with this identifier.

As to why the mangler does this, its because it saves characters in a mangled name. In the example I provided above, the mangled name would be longer had it not been for these word substitutions:

$s15MemberGetMemberMXM
vs.
$s09MemberGetA0MXM

This isn't a great example of how this saves character space, but once you start adding methods in nested types with many parameters, mangled names get pretty long. There are other really interesting substitution rules that the mangler/demangler do to save even more character space that you can read about here:
https://github.com/apple/swift/blob/main/docs/ABI/Mangling.rst#substitutions

8 Likes

Thank you so much for your time to give me this very clear explanation! :smiley: