The core of the failure I found seems to be fairly deep in the DocumentationContext, where a bit of code is trying to pull a DocumentationNode out of a local cache, but the node in question isn't part of the local module, which I'm guessing is part of the reason there's nothing in the cache to retrieve.
At a high level, the overall flow of all this data follows the normal convert process flow, so it initiates from the convert action. In between that high level and the specific detail of the failure, I'm a smidge lost on expectations and even why a DocumentationNode would be requested when we're trying to convert symbols from a local module. I'm sure I'm missing something about the internals, so figured I'd start here and ask.
Following the backtrace, the crash is happening within the DocumentationWorkspace where it's registering a provider (DocumentationWorkspace.swift:123:27), all of which (at my glance - but I could be missing something) is part of what appears to be a normal initialization process.
The question immediately in my head is "What's different about an analysis initialization from an otherwise normal convert initialization", and this seems like it's almost too early in the setup of the workspace to have an impact - but analysis is reliably crashing, where a normal convert isn't - so there's got to be something: a missed assumption or such.
It's somewhere under the flow of registering all the symbols in use (and in this case, the crashing symbol is on one of the dependencies of the library) (DocumentationContext.swift:1098:20)
Any chance I could get a little eyeball time from developers familiar with DocC and brief/high-level explanation of what's expected within this whole provider-registering-bundles process, and how it might be extending to attempting to pull back DocumentationNodes that are from dependent libraries?
It seems reasonable that this provider mechanism would load up all available symbol graphs, but I'm missing how we're getting to where we're trying to pull DocumentationNodes from a cache that won't exist. Or maybe they should exist in that cache, and I'm not understanding what I should expect be there.
DocumentationNode is DocC's semantic model for any symbol/article/tutorial it produces a documentation page for. Documentation nodes get created during DocC's bundle registration process, which loads up symbols from symbol graphs files and standalone Markdown article files. This is the "Analysis and Registration" section in the Compiler Pipeline document.
The model is also used to track symbols resolved out-of-process (via OutOfProcessReferenceResolver, which isn't something that SwiftPM integrates with at the moment). @ronnqvist wrote most of that infrastructure so would be the best person to provide more details about that if you're interested.
Based on the crash trace you provided, it looks like the crash is happening during the symbol registration phase. I'll do my best to explain what each of the methods in the call hierarchy does (from the top-down):
This method loads symbols from symbol graph files into DocC's documentation context. The documentation context is DocC's store for most of the data about symbols/articles/tutorials it tracks. It gets populated during bundle registration and contains rich information about each piece of documentation content (or "topic") and the relationship between these pieces of documentation via its Topic Graph. Something worth noting here as well is that each valid topic gets a unique identifier, which is what ultimately decides the URL of the generated documentation pages. This identifier is modeled using the ResolvedTopicReference struct.
This method creates API Collection pages for inherited symbols. Two things to define here: 1) an API Collection is a page whose purpose is to group related APIs. You can create an API Collection page in the same way you create an article, with the only difference being that you define a Topics section for API Collections, and 2) an inherited symbol is a symbol which is included in your module but isn't defined in your source code (called a "synthesized symbol" in Swift, for example the == implementation that Swift generates for you). To prevent your documentation pages to get cobbled up with a bunch of synthesized symbols, e.g., when you conform to SwiftUI.View and get a bunch of modifiers, DocC automatically creates an API Collection page that groups all the inherited symbols per protocol that's conformed to; that's what createInheritedSymbolsAPICollections does. For example: Documentation.
This method creates an API Collection documentation node and registered it in the documentation context. It's given the parent page where the API Collection should be curated and the children pages that it should curate.
This method returns the programming languages a topic (identified by its topic reference) is associated with. When your build multi-language documentation, each language's compiler produces a set of symbol graphs, and DocC links up symbols together into a single DocumentationNode. For example, a Swift class marked @objc will get a single DocumentationNode with 2 source languages: Swift and Objective-C. One more thing to note is that at the symbol graph level, the URL paths of a same symbol can be different in each of their languages. For example, an API can have different names in Swift vs. Objective-C ("Sloth" vs. "SLOSloth"). When you're writing a link in Markdown content, you're allowed to use either of the syntaxes (``Sloth`` or ``SLOSloth``), which @ronnqvist implemented for Swift 5.7! DocC takes the Swift URL path as the canonical representation of the page though, and you can retrieve that path via DocumentationContext.canonicalReference(for:).
sourceLanguages(for:) queries the available languages associated with a documentation node by calling DocumentationContext.entity(with:), which returns the documentation node associated with a reference, or throws an error if the node couldn't be found, which is the cause of the crash you're seeing. It seems like entity(with:) couldn't find a node associated with the reference, which shouldn't happen at this stage. Looking at DocumentationContext.entity(with:), it's possible that documentationCacheBasedLinkResolver.canonicalReference(for: reference) would return a reference that doesn't exist, which would be a bug. In fact, I don't believe that canonicalReference(for:) should ever return a reference that's different than the one you provide it in your case, since you're not building multi-language documentation at all.
It's also worth noting that the build setup you have where you're providing symbol graph files for multiple modules to DocC (which is SwiftViz's scenario, looking at the files generated in .build/symbol-graphs) isn't officially supported, however assuming there's an isolated patch that resolves the situation you're running into, I'm happy to merge it. A path forward for multi-module documentation has been discussed here: Use cases for combined documentation of multiple targets in Swift-DocC.
This seems surprising to me, because as far as I remember, the analyze functionality isn't related to the stack trace we're looking at. Are you calling docc with the exact same arguments when you run without --analyze?
Thank you for the detailed walk-through! That'll help quite a bit - I'm planning on circling back to more debugging and tracing the flow after I finish another tidbit project on the side.
It's also worth noting that the build setup you have where you're providing symbol graph files for multiple modules to DocC (which is SwiftViz's scenario, looking at the files generated in .build/symbol-graphs ) isn't officially supported, however assuming there's an isolated patch that resolves the situation you're running into, I'm happy to merge it.
I think you were looking at the main branch on that repo, instead of the branch I cut where I originally found the error. It's still happening, but in the analyze_error branch, I'm only generating documentation for a single module (SwiftVizScale), and my intent was the analyze to report status was effectively the same command as the normal build/convert. In practice, there are some differences. The build command started out identical, but got tweaked up for generating static content for GitHub pages, and I converted to using the docc-plugin as far more convenient, so they've diverged.
Interesting, I tried the older convert command and then adding the analyze options directly on that, and the original docc convert setup is failing (with the DocC included in Xcode beta 4, as well as the main branch build)
Failed for me once, but on further runs didn't fail. So now I'm wondering if there's some race condition in prepping and building that DocumentationContext process. The failure message when it died was remarkably similar to the analysis failure point (just had a better error message):
When I went back in to dig more @ronnqvist illuminated what was happening, and the issue has been updated accordingly.
I mistakenly thought that ALL the symbol graphs output from the compile command with --target were supposed to be used, when that's what causes the issue. DocC uses them all together so the failure I was seeing was because I was including symbol graphs other than the single, specific module when generating the analysis.