Formula to extract mangled protocol name from mangled protocol extension member?

SymbolGraphGen does not tell us what protocol a protocol extension member belongs to unless the protocol itself is defined in the current module. (-emit-extension-block-symbols only affects symbols added via retroactive conformance.) this information is critical because SymbolGraphGen does not emit the correct generic signatures for β€œ::SYNTHESIZED::” members, so they much be computed from the associated conformsTo relationship.

therefore we need to compute the mangled name of the protocol from a USR like:

s:18InternalExtensions8ProtocolPAAE13unconstrainedytvp
::SYNTHESIZED::
s:18InternalExtensions6StructV

from manually inspecting symbolgraphs, i know that the output should be s:18InternalExtensions8ProtocolP, which suggests to me there should be some sort of β€œscan until the first capital P”-like algorithm for deducing this. but i also know that symbol mangling is hideously complex with many edge cases, and i am obviously not trying to reinvent the swift demangler.

really hoping some compiler devs will chime in here and point me in the right direction.

Prefixes are stable in Swift’s symbol mangling, so if one mangling production really does start with the same components as another, you can definitely do a scan like that. The problem for you is that, first, that scan requires you to parse those initial components because of course the identifiers can include a capital P; and second, protocol extensions from a different module do not have that common-prefix property.

I think it would be great to have a nice Swift library for processing demangle trees.

2 Likes

right, i remember trying to do the parsing manually a couple years back, but i could never get it to work reliably (because there are many USRs with "non-standard" manglings with word substitutions, stdlib types, etc.) so i gave up on it.

this is disappointing for documentation tooling, because right now there is no way to query the availability of members inherited through protocol conformances - they are always unconditionally available regardless of constraints on the original conformance, which is obviously wrong.

FWIW word substitutions are part of the standard mangling scheme as well as stdlib types (I assume you mean like Si for Swift.Int, sugared mangling is only in debug info iirc).

right, when i said "standard" i meant "uses the <character count> <characters> encoding scheme".

the S-prefixed "known type" manglings are especially problematic because there is no way to know the length of the identifier (or if it even refers to a protocol), since new ones can be added at any time.