[Pitch] Deprecating the `docc process-archive index` subcommand

Hi all,

I would like to propose deprecating the index (sub)command of the docc executable (invoked by calling docc process-archive index <path/to/SomeModule.doccarchive>) and plan for it to be removed after the 6.6 release (aligned with the Swift releases). This would mean that this subcommand would continue to exist in the upcoming 6.4 release, 6.5 release, and 6.6 release where it would print a warning to inform developers of its deprecation and upcoming removal. For the 6.7 release and onwards this command would no longer be callable.

I'm hoping, and sort of expecting, that most of you have never heard of or never used this subcommand but if you are using it, we want to heard from you to understand if there are any use cases that aren't covered by passing the --emit-lmdb-index flag earlier in the docc convert call.

A brief history recap

In the initial open source release of DocC, in 2021, the documentation web pages didn't have an on-page sidebar. At this time docc convert only output the JSON files with the data for each page. Optionally, one could pass the --index[1] flag to the convert command to configure DocC to output a LMDB database containing the information about the navigation hierarchy of the documentation pages. Applications, such as Xcode, could use read this LMDB database to display their own navigational UI separate from the pages themselves. If one didn't pass the --index flag initially, they could add an LMDB database to an already built .doccarchive by passing that .doccarchive to the index subcommand. This was significantly slower than creating the LMDB database during the convert call—taking close to as long as the full convert call—because it needs to read and decode every single JSON file in the archive rather than process them as they're created (before they're encoded as JSON data).

Later, in 2022, as part of the effort add an on-page sidebar to the documentation web pages, DocC started to output a JSON representation of this navigation information. Alongside that change we renamed the original --index flag to --emit-lmdb-index to distinguish this new JSON representation of the navigation hierarchy for the on-page sidebar that docc convert always creates from the opt-in LMDB database representation of the navigation hierarchy.

The same year we released the Swift-DocC Plugin (swift package generate-documentation) which since its initial release has always passed the --index/--emit-lmdb-index flag by default. Since the beginning; because Xcode uses the LMDB database to display its navigation hierarchy, it always passes the --index/--emit-lmdb-index flag.

For the four years since then, the situation has remained the same:

  • DocC always creates the JSON representation of the navigation hierarchy for the on-page sidebar.
  • One can opt-in to also creating an LMDB database representation of the navigation hierarchy by passing the --emit-lmdb-index flag to docc convert.
  • Both Xcode and the DocC Plugin passes --emit-lmdb-index flag.
  • One can add an LMDB database representation of the navigation hierarchy to an already built .doccarchive by passing that .doccarchive to the docc process-archive index subcommand. This has no impact on the JSON representation of the navigation hierarchy.

I'm proposing that we phase out and ultimately remove that last capability.

Impact and non-impacts on workflows

By removing this index subcommand neither of the two primary workflows are impacted:

  • Anyone building documentation to host on the web is completely unaffected by these proposed changes because the on-page sidebar uses the JSON representation of the navigation hierarchy that docc convert always outputs.
  • Anyone building documentation using Xcode, xcodebuild, or swift package generate-documentation can continue to open and view that documentation in Xcode, with its navigation hierarchy, because all those tools pass the--emit-lmdb-index flag by default.

The one use case that I can think of that is impacted by these changes is if someone:

  • Builds documentation with the DocC Plugin (swift package generate-documentation) and explicitly passes --disable-indexing to opt out of creating the LMDB database representation of the navigation hierarchy.
  • Later realizes that they want to open that already-built .doccarchive in Xcode (or any other viewer that reads the LMDB database representation of the navigation hierarchy).

With these proposed changes, that someone won't be able to use docc process-archive index to augment the .doccarchive with an LMDB navigator. Instead, they'd have to rebuild the .doccarchive from the same inputs by running swift package generate-documentation again without opting out of the creating of the LMDB navigator.

That use case is quite niche and has a viable workaround but it does still have some value. However, as you'll see below, the existence of the index subcommand comes with a nontrivial cost.

Reasons to remove the index subcommand

Because the index subcommand takes an already built .doccarchive as input, the code that creates the navigator hierarchy in DocC is based around "Render JSON" (the per-page JSON files that DocC creates for DocC Render to read) as its input. This means that in order to avoid duplicating all this code, the --emit-lmdb-index codepath also needs to use "Render JSON" as its inputs. This in turn means that:

  • The navigator hierarchy can only use information that's present in the per-page JSON files. This limitation has meant that some information isn't available to the navigator and other information needs to be reconstructed from strings that are formatted for on-page presentation.
  • The navigator hierarchy cannot be created until DocC has created a "Render JSON" version of each page. This limitation has been a source of bugs because the source-of-truth about the documentation hierarchy isn't available to query while the documentation is being processed.
  • Hypothetical future alternate output formats, like static HTML or ePUB, needs to create a "Render JSON" representation of each page to in order to construct the navigator hierarchy data structures, discard all the JSON, and then output a representation of that navigation hierarchy that's suitable for that output format. This is not only slow and wasteful but also a big source of complexity.

In the long term we want to change all these things;

  • We want to have a source-of-truth data model of the navigation hierarchy that can be queried throughout the documentation processing.
  • We want that source-of-truth data model to be able to reference any other transient in-memory information.
  • We want each output representation of the navigation hierarchy to be independent of each other.

Alternatives considered

Keep the index subcommand around

We can still do all those things over the coming releases and keep the index subcommand around but it would result in a situation where the way that the index subcommand creates the LMDB database representation of the navigation hierarchy has little to nothing in common with the --emit-lmdb-index codepath and quite likely has behavioral or data differences as well.

In this case, where the index subcommand is a lesser version of the --emit-lmdb-index flag that likely doesn't receive any further bug fixes, I feel that the usefulness of having such a subcommand is diminished to the point where it's not useful. It's also not that different from calling the index command an older version of docc before it was removed.

Keep both code paths for creating the LMDB navigation the same as today

If we deem it important that the index subcommand continues to exist and that it continues to behave the same as the --emit-lmdb-index flag, we could continue to use that code for creating the LMDB database representation of the navigator hierarchy but use the new data model for all querying during documentation processing and for creating the all other representations of the navigator hierarchy (JSON, HTML, ePUB spine, etc.)

A major downside with approach this is that when there are two sources of truth, neither of them is a true source of truth. This could surface as bugs where there are differences between the data in the JSON representation and the LMDB representation of the same navigator hierarchy.

Remove the index subcommand even sooner

In the opposite direction, if it turns out that no-one is using the index subcommand today or that those who are don't have any issues with updating their workflows to instead pass the --emit-lmdb-index flag, we might be able to remove the index subcommand slightly sooner, after the 6.5 release (instead of after the 6.6 release as proposed above).


  1. This flag was later renamed to --emit-lmdb-index to distinguish it from the JSON index file (representing the same information) that DocC outputs by default nowadays. ↩︎

6 Likes

+1 on the pitch from me, which is probably not surprising. I think this is a good path to evolve the tooling for the long term goals of DocC, making that "what're all the pages" content and info clearly visible and more flexible for a variety of output formats.

2 Likes

I opened a PR here that prints a deprecation warning message (below) when running the index subcommand.

The index command is deprecated and scheduled to be removed after the Swift 6.6 release; pass the --emit-lmdb-index flag to the convert command instead


The convert command always creates a JSON representation of the navigation hierarchy for the on-page sidebar.
If you need an LMDB database representation of the same navigation hierarchy, pass the --emit-lmdb-index flag to the convert command instead of running the index command on the output of the convert command.

If you're building documentation using the Swift-DocC Plugin (swift package generate-documentation) it passes the --emit-lmdb-index flag to Swift-DocC by default, and requires the --disable-indexing/--no-indexing flag to opt out of that behavior. If you need the LMDB database representation of the same navigation hierarchy in the documentation output, don't pass the --disable-indexing/--no-indexing flag to swift package generate-documentation.

The idea is to also cherry-pick that change into the 6.4 release so that the deprecation message reaches developers sooner—to give any developer who uses this command but doesn't read the Swift Forums as much time as possible to both reach out about their use case and/or to transition to passing the the --emit-lmdb-index flag instead.

1 Like