Hacking the name mangler

carlos42421 · January 23, 2019, 11:27pm

Hi. Bit of a request for signposts here in a new area. Can anyone give a summary where the code is for creating mangled function names?

It seems like it’s changed from swift 3 to Swift 5? I’m pretty sure it used to output mangled function names like _TF3AVR ... ie with some leading _ etc but mostly just letters and numbers.

Now since swift 5 (?) it seems to be creating more complex mangled names that start with a $ and so are enclosed in double quotes.

I’m using llc to lower this to assembly language then compiling using avr-gcc but it can’t handle these complex identifiers.

So I want to patch my build of swift to make simpler identifiers without the leading $ etc.

Cheers for any help people can give with advice on where I can find the relevant code to hack.

Carl

Joe_Groff · January 24, 2019, 5:54pm

The mangling prefix is set by MANGLING_PREFIX. However, if llc can give you assembly language, is there a reason you can't also use LLVM to generate the machine code? It sounds like there may be a syntax incompatibility between llc's output and avr-gcc's assembler. $ ought to be a valid symbol name character.

jrose · January 24, 2019, 6:07pm

We also use "$" in non-initial position sometimes, since we haven't been trying to avoid it, so changing the prefix might not be sufficient for your use case.

carlos42421 · January 24, 2019, 6:32pm

Awesome. That will be good enough to start to move my experiment on. Thanks Joe. :)

This is possibly unlikely to end up in production on my platform. I’m mainly using it for diagnosis. Until now, I was indeed using llc to write AVR object files. However, since swift 5 I am getting obscure linker errors about overlapping sections when I go to link the elf file. I think it’s probably a target specific bug in the AVR object file writer. I’m trying to help track it down for the AVR llvm team. In order to do so I tried outputting assembly files then using avr-gcc to compile object files and looking at if they link, what is different to track down the bug.

Technically I am pretty sure in modern gcc any identifier enclosed in double quotes is legitimate so it should work with modern swift name mangling.

But I think the version of avr-gcc I’m using from CrossPack, built in 2013 is maybe too old so it’s breaking. (I’m using it to help with compatibility for people on older macs, my home compiles versions were breaking on 2013 Mac Pro’s but I’ll probably upgrade soon.)

The mangling format seems to have changed in swift 4.1 or 4.2?

Also, and this isn’t a swift issue, it’s llvm, the llvm ir looks like call “$s3AVR...” and llc seems to compile that to call ($s3AVR...) which is also breaking I think. Reading the gcc standards, even modern gcc might not like that as I’m not sure if the parentheses instead of quotes matches the gcc identifier standards.

Is there any mission goal for llvm to generally produce gcc compatible assembly language? I’m not sure what the applicable standards are here. I’ll raise a bug with the llvm guys if that’s breaking an agreed standard practice.

Cheers

Carl

carlos42421 · January 24, 2019, 6:34pm

I think, as discussed replying to Joe, $ inside the identifier won’t matter. Cheers. :)

Joe_Groff · January 24, 2019, 6:35pm

I'm not sure what LLVM's official policy is, but on other platforms I know of it generally produces gcc-compatible assembly output. It might be reasonable to ask the AVR LLVM folks if the incompatibility is intentional.

carlos42421 · January 25, 2019, 8:18am

Ok. Makes sense.

General innocent question about name mangling:

Is there a reason why it’s ended up so obscure? Couldn’t the same sort of thing be achieved by something like

AVR_main__function

AVR_BoardState__type_metadata

...etc

With some simple separator, spelt out names instead of symbol letters like TdfY codes and some encoding scheme to handle special characters.

Not an exact scheme but you get the idea, creating human readable symbols at a glance. I sort of feel like it would make all sorts of tool use and interop less mystical?

I know swift name mangling has been around since version 1 and I’m assuming someone has asked this question before and there’s a very good answer. Because if it can change in v4.1/v4.2 to make it look less human readable, couldn’t it change in v5 to make it dramatically more human readable? :)

Sorry if I’m asking annoying questions, I just thought it’s worth asking before ABI stability probably closes the door on this opportunity forever?

Carl

Torust · January 25, 2019, 8:59am

I'm far from an expert on this and will rely on others to correct me, but the general idea is that a mangled name should be the shortest possible unique identifier for each name (function, type, etc.). One reason for this is that all of the public mangled names need to exist within a binary; if you double the length of the mangling, the code size cost of all of the mangled names also doubles.

Runtime parsing also occurs (e.g. some things are instantiated based on mangled names), and having a longer mangled name means more time spent in the parser, whereas if the demangler sees, say, I and knows that stands for machine-width Int it can immediately shortcut parsing those two extra characters. (Note that I don't actually know what Swift's mangling for Int is; this is just an example).

beccadax · January 25, 2019, 10:52am

Mangling needs to produce a unique identifier for every possible (distinguishable, runtime-relevant) declaration you could write in a Swift program. That means not only the base name of a function, but also the argument labels, the parameter and return types, the generic constraints, the type it’s nested in, the kind of declaration, and a private identifier if it’s private or fileprivate. And all the types need to be qualified with a module name. And there are often several variants of the same entity, each needing a different name (if you’ve ever seen “reabstraction thunk” in a Swift backtrace, you’ve seen an example). And it can’t clash with any symbol generated by any other programming language, including pre-ABI-stable Swift. And there’s operators. And Swift identifiers can have Unicode characters in them, too.

Pack all of that into your mangling, and soon enough it starts to cause appreciable code size issues. There are Swift apps with hundreds of megabytes of code; making mangled names shorter has a noticeable impact on their code size. Abbreviating common type names starts to make sense. So do a lot of other name size optimizations.

And it’s actually turned out that we pack so much information into mangled names, we don’t need as much metadata as we used to. We can just parse the information we need back out of the mangled names.

Rather than try to make the mangling human-readable, Swift tries to provide first-party tools to work with mangled names. That’s why “swift demangle” exists, and why the compiler’s demangling library is usable in other projects.

carlos42421 · January 25, 2019, 11:31am

Cool. Great answers. Thanks guys. I sort of assumed it must have been asked and answered comprehensively and smart people had considered all angles but I've learned over my decades of coding never to "assume" things and never stop asking obvious questions.

I suppose in a way, with name mangling in Swift, C++ and other languages, you're sort of ultimately reflecting the disparity between a rich, high level language like swift and the fact that at the bottom level, linkers (arguably) haven't changed a huge amount since the 1970s. So you're trying to map this huge, rich, language into a relatively short little string that ld can understand and put next to/link with code produced by much more primitive languages. Never a trivial problem!

Alejandro · January 25, 2019, 2:40pm

Si = Swift.Int