ELF metadata reflection

During the Windows test suite work, a fun little problem for ELFish targets was found. It seems that the in memory reflection doesn't really work in practice (outside of the tests). The particular issue that I am currently thinking about involves the location of the section containing the relevant metadata. Section names are stored in the shstrtab section with the shdr->sh_info containing an offset into the section. This is problematic since shstrtab is normally not mapped into the VA of the image. This means that we cannot find a section by name (there is no real equivalent to getsectbyname from dyld).

My current thought is that we should employ the use of section flags here, as ELF has 0x80000000-0xffffffff reserved as user defined value ranges for the application (shdr->sh_flags & SHT_PROGBITS indicates that the data is program data, and then shdr->sh_flags & ~SHT_LOUSER should give us the program specific bits that we can play with to enumerate the section types).

Am I forgetting something about ELF loading/handling? Is this a reasonable approach? This should improve the reflection support on ELFish targets.

CC: @Joe_Groff @John_McCall @dcci @Slava_Pestov

In-process, I don't think we use the section names at all, we go off of the MetadataSections record that the static constructor registers, and prior to that refactoring, we had used symbol names. It seems to me like we should try to be consistent with the runtime's behavior; maybe we can use a known symbol name to point to the MetadataSections constant info.

1 Like

Right, I remember doing the refactoring that you are referring to wrt registration of the sections with the runtime.

However, it is possible to reflect upon a remote process (RemoteMirror) where the image has been loaded already from disk (and consider the disk image to be removed, since unlike Windows, you can delete a file that is mapped on Unices). In such a case, you only have the in memory image to read from. For ELF, in such a case, shstrtab (section header strings - contains the section names) and strtab (string table - contains the symbol names) are not mapped, so we cannot find a symbol manually (and this should work for static binary cases, so we cannot use dlsym either. So, we need to parse the ELF metadata which is guaranteed to be mapped.

Is there a more direct way to encode the offset of the metadata sections data structure so that it gets mapped and doesn't need symbol lookup? If there's a way remote mirrors could get the table addresses it needs with less work, and without looking up symbols that could get stripped or mangled by various things, that seems generally good for robustness.

@compnerd https://github.com/apple/swift/blob/master/stdlib/public/runtime/StaticBinaryELF.cpp has some ELF parsing, it was used for static executables on Linux to provide a dlsym() type implementation. Although it will need to mmap the file, so wont work if it has been deleted.

Right, that is what I am trying to figure out. However, what I had not considered is that we really only need to access the metadata symbol itself. We could push that into a special section and mark that section with the SHT_LOUSER bit to indicate that it contains the metadata since the section will be mapped but the name of the section is lost. The bit will identify the section and because the content of the section is going to contain only a single instance of swift::MetadataSections we know how to process the data. This does seem significantly better than what I had initially thought about. It should remain ABI compatible as well, simply that reflection will continue to be broken on older releases, which seems reasonable.

It is certainly possible to reconstruct the data if you have the file and can reparse it. The attempt here is to process just the content which is in memory.

There's no ABI to be compatible with on Linux yet. As a future proofing thing, though, MetadataSections should be easier to extend with new functionality if someone decides to support an ABI at some point.

Absolute, that's why I added a version field as the first item in MetadataSections :slight_smile:

Hi! Recently I've been looking into this problem and I'd like to share some observations / get some extra feedback from you. Let me start with some super brief introduction: Swift has API to extract reflection metadata from a given image (of an executable or dynamic library loaded into memory) and populate SwiftReflectionContext : swift_reflection_addImage(... ). Internally this function calls addImage(RemoteAddress). The existing implementation of addImage for ELF appears to make some wrong assumptions about the image: i.e. that the section header table is available/loaded into memory, the strings table is available/loaded into memory, etc. The current code works for swift-reflection-dump-based tests (swift/test/Reflection/*) because there we "emulate" building an image of a given "test" executable (parse the object file, create a new buffer, place the reflection metadata sections, .rodata, sections header table, strings table at the correct addresses, etc), but it doesn't work for real images.

Having a correct implementation of addImage(...) for ELF would enable us to make swift/validation-test/Reflection/* tests work on Linux and, in particular, fix swift-reflection-test.c for ELF-based platforms (and considerably simplify it / make it work consistently for MachO and ELF). To achieve this we need a robust way to locate the reflection metadata sections inside a given image (image = a loaded into memory executable or library). We can't rely on the section header table (not loaded into memory) or the symbol table (not loaded into memory + the special symbols _start, _stop are hidden anyway), or the dynamic symbol table (the special symbols _start, _stop are not there (they are hidden), generating some new special symbols with globally unique names appears to be problematic / a complex task). Moreover, ELF segments can contain multiple sections, so i.e. the program headers table (which describes the segments) doesn't contain enough information to locate a particular section (if I'm not mistaken). To overcome these difficulties one can do the following: create a special allocatable note section (for example, ".note.swift5_reflection_metadata") that will contain a special marker(i.e. versioned magic string) and a pointer to swift::MetadataSections (this is a struct which contains the addresses of _start, _stop symbols for the reflection metadata sections). For every allocatable .note section the linker will create PT_NOTE program header, so in the runtime we can scan the list of all program headers from a given image, find all the "notes", then find .note.swift5_reflection_metadata, get a pointer to swift::MetadataSections and extract all the necessary information from there. The approach which relies on some special .note sections appears to be already used in some other places, i.e. it's mentioned here: NetBSD Documentation: Vendor-specific ELF Note Elements . Maybe I'm missing something, any suggestions / other approaches / feedback would be greatly appreciated! Many thanks, Alex

I would follow the pattern already used by swift_addNewDSOImage to register new images in the runtime. There's a global MetadataSections structure generated with pointers to all of the interesting sections. We could give that structure a predictable symbol name (if it doesn't already) to make it easier to find in-process.

Thanks for the reply, maybe I'm missing something, but the problem is the following: in general, the reflection metadata sections can live inside a separate process (i.e. when we build RemoteMirror), what we can do - we can read chunks of the memory of that image and save them into local buffers, but we can't execute any remote code. Moreover, this method (addImage) takes the RemoteAddress of one particular Image, so it doesn't have access to the runtime inside the remote process. Here one of my intentions was to enable us to correctly process each individual image (implement addImage). To the best of my knowledge we can't have multiple strong symbols with the same name (so that the linker would keep them all) + the symbol table is not loaded into memory (thus, not available).

But yeah, you are right - MetadataSections already contains these pointers (though not all of them, but it's easy to fix), and the plan was to reuse it: .note.swift5_reflection_metadata would contain a pointer to that instance of MetadataSections. I just don't see other ways to expose it because of the problems I described above. (for ELF)

cc: @John_McCall @dcci @compnerd