My first question comes from a desire to ensure precise dependencies in our build system. Specifically, we're interested in precise project dependencies, not as focused on SDK dependencies.
Swift allows some cases of referencing symbols from transitive dependencies. To identify these references, I am considering using undefined symbols from object files.
Is it a valid approach to identify gaps in dependencies by demangling an object file's undefined symbols, and get the module name from those symbols? For example using swift::Demangling::Node::getModuleName().
Could getModuleName produce any false positives/negatives? In other words, are there any cases where say module A will define a symbol that that returns some other module M, from getModuleName?
One issue is I've seen that getModuleName returns the empty string for some symbols, even some that have a Module node. So as an alternative to using getModuleName, I was considering a breadth first search of the demangled node tree, and taking the first node with Kind == Module. Would this produce more accurate results, since it includes the symbols that getModuleName would otherwise exclude, or would it produce false positives?
The other question I have is minor/learning.
I wrote a tool to identify the "primary" kind of a symbol, for example Function, Variable, etc. I'm wondering whether the heuristic I used is valid. The tool does a bfs of the node tree to find the first node that is either a non-context node (not isContext(node)), or a node that has multiple children (node->getNumChildren() > 1). In my manual use, this seems to produce decent results.
I wouldn't rely on mangled names to generate dependencies. On one hand, symbols might migrate between modules but need to keep their original module name for ABI reasons. On the other, the set of undefined symbols is itself not necessarily representative of the full set of module dependencies, since a module can be required for things not evident in the binary, such as inlinable functions and typealiases, or the binary dependency might arise from things that don't produce direct symbol references, such as dynamic protocol conformance lookups, mangled type names, and other runtime-dependent functionality. What are you trying to achieve by avoiding the formal module dependencies in the source code?
Functions and variables can have a number of different symbols related to them. Do you have a more specific definition of what you need the "primary kind" to represent?
My thinking was to augment our dependencies, by identify the handfuls of missing dependencies required for dynamic linking. This may not change your answer, but I wanted to clarify that it's not to generate all deps.
A bit more context might help. Our build has hundreds of modules (many very small), all linked statically. We'd like to also be able to link dynamically. But when linking modules as a dyllib, some modules have linker errors for undefined symbols from transitive deps. When compiling, the build system (Bazel) makes all transitive dependencies available, but when linking we want only direct dependencies to be linked. If we had a smaller number of modules, it might be doable to link with all transitive dependencies, but with hundreds, we hit mach-o size limits. Note that Bazel disables autolinking.
Am I correct in assuming this wouldn't happen in our project's modules unless we explicitly did this ourselves? Currently I'm not looking to determine system dependencies with demangling, only our project's deps.
I think this wouldn't affect us since the resulting bundle would contain all transitive deps.
When I referred to Function and Variable, I was referring to those specific Node::Kinds. Here's an example of the tool's output. Each symbol is followed by a Node::Kind name. The motivation of this tool was to help better understand the dynamic linking issues from above. Given all the linker errors, what is the set of the kinds for those symbols.
That's strange, I would expect the opposite—dynamic libraries should already load their transitive dependencies via their own load commands without the executable also linking against those dependencies, whereas a static library is a dumb blob of .o files that you would have to manually provide the transitive dependencies for. Does the Swift driver's own -scan-dependencies functionality not work for you? If not, could we improve the driver to provide the dependencies you need? Trying to reconstruct dependencies post-hoc from build products seems like it'll only cause pain in the long run.
In most cases, looking at the sole child of the Global node seems like it should be sufficient to determine the symbol kind.
Yes, this is exactly the outcome we want. Let dyld do its thing. Bazel does this linking wrong, which we want to change. Bazel links against all transitive dylibs, which results in too many load commands and passing mach-o limits.
This is the goal that that got me here. But in order to link properly like this, each dylib link invocation has to include the dylibs it directly depends on. So the challenge is: how can we ensure the build graph contains the (minimal) deps that ld needs?
We haven't tried it. Is it in Xcode 11.6?
Thanks for the warning. Maybe we do it only as a short term solution, until we can use -scan-dependencies…? Or, maybe we'll think of something else.
The case where this wasn't sufficient was static symbols. In that case, I wanted to see Function or Variable, etc, not Static.