How are mangled names referenced in FieldTypeMetadataBuilder?

I'm trying to parse the type metadata for a struct, to extract the fields with the types. I'm aware of libraries that do this, but I want to understand it myself, for a combination of educational purposes and keeping dependencies of the thing I'm working on to a minimum. I'm working off of code in the compiler like this to understand the ABI, since the documentation is out of date and inaccurate.

My end goal is to get an array [(String, Any.Type), where the string is the field name and the type is the the type of the field.

Say that I have the following struct:

struct S {
    
    let subtitle: Subtitle // a struct declared in my executable
    let name: (Int, Int) 
    let title: String
    let bc2: Bool 
    let b3: BigTypeName // a struct declared in my executable

}

Note that the first and last types are my own, while the ones in the middle are built in Swift types. The output from the compiler includes a bunch of references to the fields, which I can see by running nm on my binary:

0000000100003b44 s _symbolic _____ 12MetadataTest8SubtitleV
0000000100003b4a s _symbolic Si_Sit
0000000100003b52 s _symbolic SS
0000000100003b56 s _symbolic Sb
0000000100003b5a s _symbolic _____ 12MetadataTest11BigTypeNameV

So nm is able to understand what the mangled names of the two custom types are. I can see in the compiler source that a relative pointer (presumable direct) is stored for each field type.

When I run my code, the output is this:

[("\u{01}7\u{02}", "subtitle"), ("Si_Sit", "name"), ("SS", "title"), ("Sb", "bc2"), ("\u{01}5\u{02}", "b3")]

As you can see, my types are... something, while the Swift types are something that _typeByName() understands and can produce the appropriate Any.Type. I'm not sure if this is the name mangled version of those types, but it would make sense.

So, what is "\u{01}7\u{02}"? Is it an offset? Any suggestions would be incredibly helpful.

These are symbolic references: swift/docs/ABI/Mangling.rst at main · apple/swift · GitHub

3 Likes

Thanks, this was really helpful!

I did have one more question; I realized after trying to implement this that what's actually on the other end of the reference is the nominal type descriptor, not the mangled name, as the compiler source code had led me to believe. Just curious if this function is misnamed or confusingly named, or if I'm misunderstanding the intention behind it?

That's correct, it stores a relative offset to the type's nominal descriptor. The runtime can then create metadata with this structure which is what things like Mirror do. There is no single canonical mangled name for a custom type in binaries iirc i.e. MyModule.MyType doesn't have a s8MyModule6MyTypeV mangled name in the binary for folks to go looking for because the runtime generally doesn't need this and can construct metadata with offsets to descriptors, access functions, etc. This function in particular is getting the address to a global type ref string that refers to a metadata reference which can include symbolic references :slightly_smiling_face:

2 Likes

Thanks, that makes sense. I just found the code to generate that a bit confusing, given that I'm reading it on Github with no IDE and then trying to correlate it to the binary.

If you have time, I did want to ask if it's actually possible to write a function that goes from the Any.Type instance to a list of the fields with their types as Any.Type.

My original plan was that if the mangled name was stored there, I could use _typeByName() to get the Any.Type instance. But it feels increasingly unrealistic to do this, based on the outputs I'm getting for some of the type names not being a full mangled name. I guess that kind of makes sense, given that the nominal type descriptors all seem to be part of the binary; this could never work for generic types.

So it feels like I have a few potential options at this point:

First is to go looking for other metadata in the original Any.Type pointer that's not related to the nominal type. But I'm not sure what's actually stored there until I track down the relevant source file that generates it.

The second option is to try out the other functions like _getTypeByMangledNameInEnvironment or swift_getTypeByMangledNameInContext and see if, with the right arguments, they can fill in the blanks in strings like Sayr with the generic type.

Any suggestion on the best way to proceed here?

The code behind Mirror or _forEachField might be of interest to you:

The Any.Type is a pointer to a type metadata record. If it's a struct, enum, or class, then that metadata record contains the generic arguments for a generic types, as well as a pointer to the nominal type descriptor, which in turn should be able to lead to you to the field descriptors, which will give you the names and type mangled names for the fields. You'd want to use swift_getTypeByMangledNameInContext using the mangled field name, nominal type descriptor as the context, and the generic arguments from the metadata record to turn those mangled names into type metadata pointers themselves, which can be used as Any.Type values.

1 Like

Hate to bring this up in an irrelevant thread, but @Alejandro could you please respond to us over in this post? (Someone else also @'d you)