Combined documentation of multiple targets

Hi. This is a continuation of this topic which focused on goals and use cases for combined documentation of multiple targets. In this new topic I'd like to focus on the technical aspects of how to build combined documentation for multiple targets and find a high level design with the right tradeoffs to support the goals and use cases from the previous thread.

An example from many combined use cases

The "sloth" development team is working on a Swift package that contains 4 libraries; "A", "B", "C", and "D" which depend on each other as follows:

 A
 ├─ B
 │  ╰─ C
 ╰─ D

The team also has two related Swift packages with one library each; "E" and "F".

 E
 F

The targets in these 3 Swift packages are all conceptually related as one project "A-F" so the team host the documentation for all the 6 targets in one place. The combined documentation has an overview page that explains what the project is about, links to each target's documentation, and links to a handful of repositories from other team's who use this project.

The "capybara" team is working on an app and a framework. The framework depend on the "B" target from one of the "sloth" team's Swift packages but doesn't depend the "A" or "D" targets. In the "capybara" team's project, the targets depend on each other as follows:

 App
 ╰─ Framework
    ╰─ ─ ─ ─ ─ ─ ─ ─ B (external dependency)
                     ╰─ C

Public types from "B" appear as arguments and return values in function declarations in the "capybara" team's framework and some types in the framework conform to public protocols from "B". When the team locally builds documentation it also builds documentation for "B" and "C". Following a link from the locally built framework documentation navigates to the locally built documentation for "B" and from there to "C".

The "capybara" team's company hosts documentation for shared code on an internal website and the team's Framework documentation is hosted there. The team doesn't re-host the "B" and "C" documentation. Instead the links from the Framework to "B" is replaced with absolute links to where the "sloth" team hosts their documentation for the "B" target.

Other teams at the company also host documentation for their shared code on the internal website. The "antelope" team hosts documentation for a big framework with 3 example projects and a handful of snippets to help explain how to best use the different parts of the big frameworks's API.

Background

This section gives an overview of how Swift-DocC works, establishes some terminology, and describes how Swift-DocC fit into scripts and build workflows.

Viewed in isolation, Swift-DocC is a documentation compiler that takes symbol information, markup, and media as input and outputs a directory of documentation data that can be hosted to view the rendered documentation. We call the directory of input files a "documentation catalog" and the directory of output files a "documentation archive".

One way to pass symbol graph files to DocC is as files in the documentation catalog but, because of how DocC is integrated into scripts and build workflows, it's more common to pass a separate directory of "additional" symbol graph files and only use the documentation catalog for markup files and media.

Both the documentation catalog and the symbol graph files is optional input but at least one of them is needed to build documentation. Regardless of how input is passed to DocC, one call to docc convert creates one documentation archive. A documentation archive may cover one or more modules (target documentation), tutorials, and/or manually designated "technology roots".

For the remainder of this post I'll be using "target" in place of "module" to unify the terminology used in build workflows and in DocC.

Some use case pass more than one symbol graph file to DocC. The two main cases are:

  • To combine information for one target across multiple source languages.
  • To combine information for one target across multiple platforms.

It's possible— although not officially supported and with some issues—to pass symbol graph files for multiple targets to one docc convert call. In this case the output documentation archive will contain documentation for all targets.

DocC uses links to connect documentation pages. Links in "Topics sections" (level-2 headings titled “Topics”) are used to define the documentation hierarchy. We call this "curation". DocC also creates links based on some references found in symbol graph files, for example for types in declarations.

All inputs in a documentation build can link to each other. Links are resolved relative to the page where they're written but developers can write links with absolute paths to refer to pages from different top-level documentation hierarchies.

DocC can also link to "external documentation sources" but doesn't ship with any concrete implementations that interface with any specific sources. Content from external documentation sources doesn't become their own pages in the documentation archive.

DocC can optionally write an extra file into the documentation archive that lists each page, and on-page landmark, that can be linked to along with a summary of its content. We call this file the "linkable entities" file.

Most developers don't interact with DocC directly. Instead their documentation workflows are connected to their build workflows, leveraging systems like Swift Package Manager or Xcode to extract symbol information and pass the relevant inputs to DocC. Targets in these systems may have dependencies and can use public symbols from their dependencies. Circular dependencies are not allowed between these targets.

Proposed solution

Preface

Two different strategies—each with their own tradeoffs—could both be used as the foundation for building combined documentation for multiple targets:

  • Making a single call to DocC with one documentation catalog and the combined symbol graph files for multiple targets.
  • Making separate calls to DocC with separate documentation catalogs for each target and only that target’s symbol graph files and then combining the documentation archives into a combined archive.

It's my opinion that the design for this feature needs to deeply consider how it will work when DocC is integrated with a build workflow. When considering DocC in this context I feel that the single call strategy opens up to too many possible problems to be worth the benefits that it provides to DocC in a direct-call context. More on this in the Alternatives considered section.

Because of this I would like to propose a solution that's based around separate calls to docc with separate documentation catalogs for each target.

All proposed names for actions, command line arguments, types in Swift-DocC, etc. are placeholder names.

High level design

To support linking between targets, I propose adding a new --dependency argument to the docc convert (and docc preview ) actions. Each dependency argument would pass the file path of a dependency’s' already built documentation archive. DocC will treat dependencies' documentation as external sources of documentation—meaning that resolved content does not result in distinct pages in the documentation archive—and will use a new DocumentationArchiveResolver resolver (that conforms to the existing ExternalReferenceResolver and ExternalSymbolResolver protocols) to resolve both links written by developers and links from references in symbol graph files to content from dependencies' documentation.

When building multi-target documentation as part of a build workflow each target's documentation would depend on its dependencies' documentation. This means that each target can link to its direct dependencies but not their dependencies.
From the example at the top of this post, the “B” target can link to symbols in “C” and but not symbols in “A”, “D”, “E”, or “F”.

Building documentation for “B” and “C” would result in two separate documentation archives where “B” has relative links to “C”. These local links would be valid as long as both documentation archives are “hosted”, for example by a local preview server or local renderer (like the documentation window in Xcode).

Developers can specify documentation dependencies in any order when calling DocC directly, for example in scripts.

To write links to other targets’ documentation, developers can use the existing doc: link syntax and use other documentation archive's unique identifier as the "host" of the documentation URI.

doc://com.example/path/to/documentation/page#optional-heading
    ╰─────┬─────╯╰────────────┬────────────╯╰───────┬───────╯
      bundle ID     path in docs hierarchy    heading name

As a more convenient way to link to other target's documentation, I propose that the existing symbol link syntax be extended with a leading slash for links to symbols in other targets. A leading slash for links to symbols in the current target would be optional but supported. For example, to link to the "something" property of "SomeClass" in "SomeDependency" from the documentation for "MyTarget", I would write:

``/SomeDependency/SomeClass/something``

If "MyTarget" have a symbol also named "SomeDependency" I would like to it with:

``SomeDependency``

In addition to "authored" links, automatic links from references in symbol graph files should resolve to dependencies symbols. By registering a resolver conforming to ExternalSymbolResolver , DocC will let the resolver try to resolve all symbol references from that aren't found in any local symbol graph file. For the resolver to have this data, the documentation archive would need to contain a list of all the symbol page's unique identifiers. I propose that these identifiers be added to the “linkable entities” file (the file that lists each page, and on-page landmark, that can be linked to) and that this file will be created by default.

To create a combined documentation archive for multiple targets I propose adding a new docc merge action. The merge action will take 3 types of inputs:

  • A list of documentation archives to be combined.
  • An optional documentation catalog of conceptual (non-symbol) content for the “landing page” of the combined documentation
  • Optional configuration for how to handle links to documentation archives other than the input list.

The output of the merge action will be a new documentation archive with the data for the combined documentation. The combined archive will contain separate data hierarchies, navigation hierarchies, and possibly themes for each target.

If no documentation catalog is passed, DocC will generate top-level "/documentation" and "/tutorial" pages, as applicable, that will list all the targets and all the tutorials respectively. If a documentation catalog is passed, the developer is responsible for linking to each target and/or tutorial as they see fit. The merge action’s documentation catalog can link to documentation from all its input documentation archives (same as convert action dependencies).

When merging documentation archives, cross-target links to documentation archives that that are not passed as input will be transformed by DocC. Follow up proposals should cover both versioned documentation and how DocC can know where external documentation is hosted so that DocC can transform links to this documentation into fully resolved absolute links to the hosted documentation. Until then, DocC will default to transforming links into non-links (but keeping the resolved titles, text style, and abstracts as applicable). Developers who know where an external target (a target that’s not part of the combined documentation archive) hosts its documentation can override this behavior by providing a documentation base URL per-target.

Passing a combined documentation archive as input to docc merge will remove its landing pages and replace them with new landing pages based on that docc merge action's inputs.

Developers can mix build workflows—that create per-target documentation archives with links—and scripts or workflows that call DocC directly to create combined documentation archives. This can for example be used to combine documentation across multiple repositories or to combine documentation for targets that cannot be built in a single build workflow (for example due to differences in platform requirements).

For cases where all targets can be built together, the systems where DocC is integrated could let developers configure which targets to combine documentation and run the docc merge action as part of the build workflow.

Future proposals should cover how to link to symbols in SDKs and other pre-built dependencies and how DocC can transform these links to fully resolved absolute links to where that documentation is hosted. It’s likely that this would be implemented as another external resolver type in DocC.

Applying this to the example

The “sloth” team can write links between their target dependencies (from “A” to “B” and “D” and from “B” to “C”). If “A”, “E”, and “F” cannot be built in one build workflow the team can still leverage build workflow integrations to build documentation for “A”, “E”, and “F” separately. This avoids needing to extract symbol information directly. Building documentation for “A” also builds documentation for “B”, “C”, and “D”.

The team can customize the "landing pages" that describe the “A-F” project and can host all of their documentation together.

The team can’t use documentation links to highlight other projects that use their "A-F" project. Instead they would use https links. It's likely not more than one link per project—either to repository or its top-level hosted documentation.

When the “capybara” team builds their app documentation they also get local documentation for their framework and for the “B” and “C” targets. Since “B” and “C” targets have their separate documentation catalogs and only link in build-dependency-order, the “capybara” team’s local version of the “B” and “C” documentation appear the same as the "sloth" team's hosted documentation for these targets. No local documentation for “A”, “D”, “E”, of “F” is built.

When the “capybara” teams hosts the framework's documentation they have the choice of transforming links to “B” into non-links or into absolute links to the “sloth” team's hosted documentation.

When the “antelope” team builds documentation for their “Big Framework” and their 3 example apps, each example app can link to individual public symbols from the “Big Framework” but not to other app’s symbols. The “Big Framework” also can’t link to symbols in the example apps or use documentation links to reference the example apps. With the proposed solution the “Big Framework” could use https links to reference the top level page of each sample. Future proposals may cover new functionality specifically related to example apps that improve the ability for the framework to link to its examples.

Future directions

This proposal aims to design the core support for building and hosting multi-target documentation. Other proposals should build on this to add more features. A few important future directions mentioned in the proposed solution are:

  • Versioned documentation
  • Creating fully resolved absolute web links to other hosted documentation.
  • Linking to symbols in SDKs and other pre-built dependencies.

Alternatives considered

As mentioned above, an alternate solution could be based around a single docc convert call with one larger multi-target documentation catalog and symbol graphs for multiple targets.

Bidirectional links

One key benefit of passing symbol graphs for multiple targets together is that links between targets can be bidirectional.

However, supporting bidirectional cross-target links introduce pose some new problems.

For example, if the “sloth” team had documented its “A-F” project so that the “B” target contained links to “A” and “F”. Then when the “capybara” team used “B” as a dependency and built documentation for their app in a build workflow, “A” would not be built and “F” wouldn’t even be locally cloned. This would mean that the local version of the “B” documentation would contain broken links to “A” and “F”.

Further, if “A-F” is documented in one documentation catalog then conceptual content related to targets other than “B” and “C” would be locally built and links to the other targets from this content would also be broken.

I'm against the idea that documentation workflows should ever build code that's not part of the build workflow but even if that was an option, there are no guarantees that "A" can build in every situation where "B" can. For example, "A" may have higher a required Swift versions or SDK versions than "B" or "A" may be a single platform target but "B" is a cross platform target and the current build workflow is not building for a platform that "A" supports.

Documentation with bidirectional cross-target links will encounter these issues as soon as (at least) one target can be imported individually. The way that developers would avoid this issue is to only write links in the target's dependency order.

Output format

Another benefit of a single docc call for multiple targets is that the output is one documentation archive without introducing a new "merge" action.

However, if the only way to link between targets is to pass them to the same docc call, then that means that linking to a target also embeds all that target's documentation into the documentation archive. This issue can be solved by also adding support for resolving external links to existing documentation archives (as described in the proposed solution).

Also, if the only way to combine documentation for multiple targets into a single documentation archive is to pass them to the same docc call, then that means that targets that are not part of the same build workflow need to abandon the integration of DocC in the build workflow and construct the docc convert call themselves. The developers could either redundantly build documentation in the build workflow to access the symbol graph files for each target that the build workflow created or they would need to construct the calls to generate the symbol graph files for each target themselves.

In the proposed solution, when a developer wants to combine documentation for targets that can’t be built in a single build workflow, they can still leverage the build workflows and only need to construct a docc merge call themselves.

These issues could be solved by also adding a new "merge" action (as described in the proposed solution) to make it easier for developers to utilize build workflows to generate documentation and combine documentation for targets that they can't build in a single build workflow.

Output format flexibility

Performing the merge action on the documentation archive—as opposed to the in-memory documentation model—adds the responsibility to alternative output formats to also support merging as it applies to that format. However, most of the steps that this merge action would perform are specific to the documentation archive format and it's not clear if other output formats would need this. I'm not aware of any alternative output format implementations in forks of Swift-DocC, so it's hard to determine the impact this would have on other formats.

I don't think the separation of the "convert" actions work and the "merge" action's work would prevent future output formats or future intermediate format from being implemented.

7 Likes

hi David! this is a really tough project, but it’s also one of the most interesting things i’ve ever worked on and if DocC ends up going down this route it will evolve into something very new and different from what it is right now.

Biome uses leading slash to denote package namespace, eg:

``/swift-nio/NIOCore/ByteBuffer.readableBytes``

this syntax also provides a convenient place for a version tag:

``/swift-nio/2.40/NIOCore/ByteBuffer.readableBytes``

Biome has dependency-aware symbollinks so bare-module references don’t require prefixing, you can just refer to them like:

``NIOCore/ByteBuffer.readableBytes``

as you do in source.

at some point i want to see Biome be able to reference a branch name:

``/swift-nio/async/NIOCore/EventLoopFuture``

this will be hard to do if leading slash denotes module namespace, because unlike version numbers, branch names look like module names. so i urge you not to burn leading slash on module name.

1 Like

Hi!

Very nice proposal, David! I definitely agree that making separate calls to docc convert should prove to be way more convenient for manual usage but especially when integrating these features into the SPM DocC plugin and IDEs. I also think that we should spend significant time on those to make usage of cross target documentation as easy and convenient as possible.

I'm not 100% sure if I understood your explanation of the merge command correctly. When passing only pre-built documentation archives to the command, those would all be hosted next to each other - independently of their dependency hierarchy - as direct children of documentation, correct?

However, what do you mean by this?

Would all pages from the pre-built documentation archives still be present at the same paths as compared to when there is not documentation catalog passed to the command? Or would the new archive only include those pages that are (transitively) linked to from the new documentation catalog? I guess the "target" corresponding to the new documentation catalog would also be hosted next to the targets from the pre-built archives, wouldn't it?

I also agree with the leading slash syntax as a shorthand for absolute links.

However, I think it would be beneficial to also allow (sufficiently qualified, i.e. starting with a target name) relative links to be used for referring to pages from other targets. This results in relative links behaving just like Swift code, where a local extension to external types can shadow the original member. This point already came up in my proposal-discussion on how to Document Extensions to External Types Using DocC :

Just for completeness: my PR from the extension topic already brings support for absolute symbol links in Swift-DocC: https://github.com/apple/swift-docc/pull/335.

I don't think linking to two different versions or even branches of the same target is a very common requirement, however, one could also solve this problem by embedding the version/branch name in the target name. This could be done in the configuration file proposed in the use-case thread like this:

    - name: Logging
      baseURL: "https://apple.github.io/swift-log/current/documentation/Logging"
    - name: Logging_1_3_0
      baseURL: "https://apple.github.io/swift-log/v1.3.0/documentation/Logging"
    - name: Logging_main
      baseURL: "https://apple.github.io/swift-log/main/documentation/Logging"

Solving this on a target level could make sense since we're already building documentation on a per-target (not per-package) basis.

I love the idea of the convenience, and I think for the most part this makes sense, but what happens when there's a conflict in naming? For example, some dummy such as myself using a too-common term for both a dependency module name and an internal struct? (I know that sounds almost fanciful, but I've done exactly that to myself while trying to name code structures, protocols, and modules).

If a symbol exists that could reference more than one location - how is that warned about, or should that be considered an error in this scenario?

I'd personally lean towards using the doc:// format for links as the more explicit form. I thought about suggesting a link symbol with a preceding slash might be treated differently (considering that a sort of canonical reference to the root structure of the set of modules), but that also seemed somewhat opaque and extremely easy to miss. The idea being to reference to the top-level of the dependency SomeDependency as ``/SomeDependency`` as in your earlier sample.

Looking more at how the whole thing gets processed:

Am I understanding correctly that the gist of this would be to have a cascade of builds to get to the combined-end-product? First build documentation archives for each individual module, and then those build products become the inputs for the docc merge processing command?

If that's the case, what happens with the cross-references that can't be resolved in the earlier steps? Or am I presuming additional steps and the merge command will also construct and use the ExternalSymbolResolver with the contents from each of the various dependency modules being combined together?

I guess I'm mostly trying to get a sense of what this command would look like if I have 5 or 10 dependencies that I want to bundle and render together with cross-links.

Does this remove the need for --emit-digest, or only for the merge use case? (I'm using it to extract a list of all the possible symbols within a module in order to see them "jumbled all together on the table" for when I go to organize/curate documentation). I do like this additional detail being included in the JSON.

I'd really hoped (and wanted) to be able to reference some of Apple's documentation - specifically Swift standard library, Foundation, Combine, and in some cases SwiftUI - to be able to reference the protocols, classes, or structs that are relevant to a library that extends on upon any of those. If that needs to be an explicit HTML link, so be it - but it would be a far nicer experience (IMO) to be able to use at least use a doc:// reference, and have some mechanism to let me know if that link was incorrect (didn't resolve) rather than post-processing through any rendered pages to verify that a link still resolves after it's rendered. That's overlaying a bit with your future directions, so I may just be wanting to jump to a flying-car style end-state here.

presumably, both colliding links would have to be written with the module name prefixed, just as you do in source.

they get resolved in the later steps. some kind of tiered link resolution architecture is inevitable once versioning enters the picture.

why do we even need the scheme to begin with? why can’t references to imported symbols "just work" like they do in source?

I love the convenience of symbols, and for the most part I'm sure they will just work, but they're all referenced from the context of a local page, and even adding a preceding / doesn't resolve between different canonical documentation references. The extended doc:// scheme does by including a hostname (or bundle ID in hostname form) in the middle, which is why I like it and will probably tend to use it when I can figure out (or have access to) additional resolver infrastructure to reference those external locations.

can you give an example of when leading slash would not be sufficient to disambiguate references?

Two modules, both named "Utility", from different authors.

1 Like

this situation would not occur in project-local links, because two modules with the same name cannot be combined into the same product.

this situation could occur in project-external links, which is why Biome uses leading slash to denote package namespace, which would not collide.

it's possible down the road that we will have to deal with colliding package names, and both tools will have to introduce some concept of username and even hostname, as you described. but i do not think that is necessary right now.

if you are worried about colliding module names, this is a reason to support having leading slash denote package namespace instead of module namespace.

this does not scale once versioning is taken into account. modeling each VCS node as a separate target results in a staggering amount of data duplication.

i understand that this is not a concern for small, localized use cases that only involve a handful of packages and don't need versioning more granular than a few major releases. but it doesn't scale.

Could you give an example where this would create more data duplication than e.g. Package.resolved? After all, the configuration file only has to contain entries for versions that are actually used in the documentation catalog.

I furthermore expect this configuration file to not be actually typed out in full. I expect the SPM plugin or DocC itself to automatically infer the link for the version that is used in code if it can find its docs on a "well known" platform (e.g. GitHub pages or the Package Index). I think that this is essential for a good user experience.

1 Like

the Package.resolved is not the problem, the problem is the symbolgraph models themselves.

the example that you gave is not representative of the use case i am thinking of, because you only have a handful of versions and branches, which would not strain the implementation:

  • Logging (latest)
  • Logging (1.3.0)
  • Logging (main/latest)

this can be modeled in the way you propose without difficulty. however a more-mature project might have many branches (generally, a master branch + 2 active “release” branches), and each branch will ideally have snapshots taken daily, possibly even hourly.

the amount of snapshots we are talking about (for a single module) is on the order of 102 to 104. we typically talk about the sizes of symbolgraphs in terms of tens of MBs, which is what i mean when i say “scaling”.

Thanks for all the replies. I'll try and answer by topic

Prefixing link with a package name

Having worked a lot on the implementation of links in DocC I don't foresee any problems with adding additional path components before the module name, now or in the future.

That said, the only two uses cases that I can think of right now for prepending the link with a package name are:

  • Disambiguation if two packages have a target with the same name.

  • Discoverability, for example via code completion, if the developer doesn't know the targets that they depend on but they do know the packages that they depend on.

Are there any uses cases that I'm missing here?

Module name collisions in a build workflow would already be problematic when building the code. SwiftPM (in Swift 5.7+) enable developers to define new unique names for the conflicting modules using module aliases. I haven't looked into this in detail but if the module names that DocC sees are the alias named then the solutions uses to solve module collisions in build workflows should also solve them for DocC.

It's also possible for direct calls to DocC to pass custom dependencies with colliding targets that would need to be disambiguated. DocC considers the module/target as a symbol so it could be disambiguated with a suffix to that path component. It's something that's worth looking into in more detail.

Linking to specific versions or branches

I think we may have different ideas of how to link to specific versions of documentation and possibly about the uses cases for doing so. I'll explain my thinking below. Please add yours as well.

In my view, DocC always works with the "local" version of a target and its dependencies. This means that when a developer writes a link to a dependencies' symbol they always write ``/TargetName/SymbolName`` (without any version information). In the locally built documentation the link points to the locally built documentation.

However, when hosting the target's documentation and transforming the link to the hosted documentation for the dependency, the link needs to point to the correct version of the hosted documentation. This means that the local dependency documentation needs some information about what version this is. The details of how it would know this should be discussed on its own, because it's a large and complex problem to solve and deserves a focused discussion. It may be possible to derive the name based on information about the resolved information or it may be information that the developer commits to a file somewhere in the documentation catalog (together with the base URL).

For DocC to resolve a link to a version of the documentation other than the local version it needs to have the data for that version. SwiftPM or IDEs wouldn't checkout that version or branch so DocC wouldn't be able to build other versions of documentation from source. That said, the only information DocC really needs to resolve links is the "linkable entities" file which are fairly small. If linking to specific versions or branches is something that we want to support in the future I could imagine possible designs involving prebuilt "linkable entities" files. I'm strongly feel that DocC shouldn't do any networking during a build, so the responsibility would mostly fall on IDEs or custom scripts to download the other version's information before the documentation build started.

I don't have a fully formed picture of how I imagine linking to versioned documentation would work but I don't think anything in this design puts severe restrictions on how it could work. From a link resolution implementation perspective I feel like both ``/package-name/version/TargetName`` and ``/package-name/TargetName`` (for the local version) could coexist since the package-name would have few enough descendants that DocC could try them both as TargetNames and as known versions. I could also see ``/TargetName-version`` or ``/TargetName@version`` working to disambiguate between the different versions of a target's documentation but that syntax is for a separate later discussion.

More about merging documentation

Yes, that sounds correct. If we look at the "A-F project" example:

Unless the 3 packages ("A", "E", and "F") can be built in the same build workflow, there would be one documentation build for "A", one for "E", and one for "F". "E" and "F" both build one target without external links (since there are no dependencies) and result in one documentation archive each.

The documentation build for "A" would build "B" and "D" as dependencies and building "B" would build "C" as a dependency of it. In a build system this is typically planned out ask various tasks that depend on each other to define their order. A bit overly simplified; the "build C" task doesn't depend on anything so the build starts there. Both "build B" and "build documentation for C" depend on C (they actually depend on certain subtasks that produce certain files (e.g. the symbol graph files to build documentation) but that's too much detail). The task to "build documentation for B" depend on "build B" and on "build documentation for C" to pass it as a dependency documentation archive to docc convert.

Following the same pattern, the full build for "A" results in 4 documentation archives;

  • "A" which may link to "B" and "D"

  • "B" which may link to "C"

  • "D" which doesn't link to any dependency

  • "C" which doesn't link to any dependency

If at this point all 6 archives would be passed to docc merge then all cross-target links would be preserved.

If however, only "A" and "B" was passed to docc merge then the links from "B" to "C" and from "A" to "D" would need to be transformed.

As an example, say that instead of merging "A-F" all in one go, that team decided to merge "A-D" first and pass a documentation catalog for their landing page. This would result in a documentation archive with the landing page as top-level paths and each target's documentation as "/documentation/A/path", "/documentation/B/path", etc.

Next, if they passed this combined documentation archive and "E" and "F" to another docc merge call with another documentation catalog it would result in a new documentation archive with the new "A-F" landing page as top-level paths and all target's documentation as "/documentation/A/path", "/documentation/B/path", ..., "/documentation/F/path".

I'm thinking that it's best to remove the old "A-D" landing pages, which is why I proposed that they be removed. If the landing pages are also combined then it's possible to have path collisions that could cause a page from the "A-D" catalog to link to a different page in the "A-F" catalog.

Possibilities for link conflicts

Module level collisions likely need their own solution separately. I'm hoping that SwiftPM module aliases would work for this (see above). It's also possible to fully spell out links so unless the two documentation archives have the same identifier a developer could use <doc://com.something.first/Utility> so reference one of the two targets without collisions.

Links with the "doc" scheme work for both symbols and non-symbols, and will continue to do so, and their ability to include the documentation archives' identifier enable them to uniquely describe some links that would be ambiguous with the symbol link syntax. I hope that these cases would be rare but it is an option.

Collisions between symbols and targets shouldn't happen with this syntax. For example if "MyTarget" had a "Something" dependency and a "Something" symbol then ``/Something`` would refer to the dependency and `` Something`` would refer to the symbol.

When I said that a leading slash would be optional for the current target I meant that developers could write either of

MyTarget/Something
/MyTarget/Something

to refer to the "Something" symbol in the current target.

From a link resolution implementation standpoint the leading slash isn't necessary. The number of target dependencies is likely so small that checking if the first path component of a link matches one of them wouldn't be an issue.

The only real downside I see with allowing symbol links to other targets without a leading slash is if my target had a "Something" dependency that I linked to as without a leading slash and later I defined a Something type in my target. Now the link that used to point to the dependency would point to the local symbol without warning.

Other than the disambiguation purpose, I think of the leading slash syntax as an indication of intent from the developer to their co-contributors who read the markup. If a symbol link doesn't start with a leading slash it's relative to something in the current target. If it starts with a leading slash the first path component is a target name (which may be the current target).

Any need for the emit digest flag

I cut this portion of the original post because it was quite long already, but "no" the --emit-digest flag would still be used to emit the files about diagnostics, assets, and search index information.

Linking to Swift, standard library, Foundation, Combine, etc.

These would be treated as external SDK dependencies. There are a fair number of technical details to discuss about how DocC would get this information but the idea is that the documentation link syntax should be the same for SDK dependencies as it is for documentation archive dependencies.

4 Likes

aha. Biome (since v0.2) does not really have a concept of a “local” target or a dependency root. Biome has a concept called culture which is relative, orthogonal to module namespace, and is defined for every module in its modulegraph.

culture determines the “rights” a particular symbol has within a particular context, how much the compiler “trusts” relationships emanating from that symbol, and how the compiler breaks ties when edge cases occur.

culture is basically a generalization of “locality”, and because it is relative, it means Biome doesn’t have to globally privilege any symbol or group of symbols.

for a more concrete example of this in action, if you go to the docs for swift-json, you’ll see that they connect to the docs for swift-grammar. however it would be incorrect to say that swiftinit hosts documentation “for swift-json that includes docs for swift-grammar”, just as it would be incorrect to say that swiftinit hosts documentation “for swift-grammar that includes docs for swift-json”. the same is true for swift-nio and the other packages on the site.

i considered these spellings but i rejected /TargetName-version because it does not generalize well to branch references (TargetName-master looks like a package name), and /TargetName@version because i did not want @ signs showing up in URLs.

versions are also logically a package-level concept, so it makes more sense for it to show up on the left-hand side of the module name.

based on reading How to properly reference a type and module with the same name? , i assumed ``Something`` would refer to the module, and ``Something/Something`` would refer to the the symbol. if this is true, then ``/Something`` would not provide any additional expressive power.

i always found this strange, but i implemented the lookup algorithm in Biome this way for consistency with DocC.

But this has nothing to do with the linking syntax I proposed earlier, does it?

I get that a lot of snapshots lead to a lot of data, but only for the package that has all these snapshots taken (which cannot be avoided entirely and has nothing to do with the linking syntax). However, for targets linking to such a mature project with many snapshots, nothing really changes, because one version of the documentation archive will only ever reference one version (the version manifested in the Package.resolved/the tag from the Package.swift file) plus maybe a handful of other versions that the author wants to link to for some reason. Note that I expect all links to dependencies to actually link to the dependencies' original hosting location, not a local build of the dependencies' documentation archive.

Symbol link priorities and links to extensions

There are a number of subtleties to how symbol links work in DocC.

That's mostly true but the missing piece is that DocC resolves all references relative to the page where the link occurs. For example, in this hypothetical code:

// 1: the module is named "Something"
class TopLevelSymbol {
    class Something { // 2: the first nested symbol
        class NestedOne {
            class NestedTwo {
                class Something { // 3: the deeply nested symbol
                    ...
                }
            }
        }
    }
}
class DifferentSymbol {
    ...
}

The link ``Something`` from the documentation for NestedOne would resolve to the outer nested symbol (2) since that's the container scope but the same link from the NestedTwo documentation would resolve to the deeply nested symbol (3) since that's the descendant.

Similarly, the ``Something`` link from the documentation for TopLevelSymbol would resolve to its descendant symbol (2). Only from the documentation for DifferentSymbol would the ``Something`` link resolve to the module (1).

A leading slash would provide the ability to express an intentional reference to the module in cases where relative descendant or container symbols would otherwise be preferred.

If instead the hypothetical code had a top level "Something" symbol like this:

// 1: the module is named "Something"
class Something { // 2: now a top-level symbol
    class NestedOne {
        class NestedTwo {
            class Something { // 3: a deeply nested symbol
                ...
            }
        }
    }
}
class DifferentSymbol {
    ...
}

then a ``Something/Something`` link could either be interpreted as "<module> / <top-level>", "<top-level> / <member-of-top-level>", or "<nested> / <member-of-nested>" in all of the possible scopes. Only the first interpretation would resolve successfully. This means that link resolver implementations need to either back track, branch, or look ahead depending on what their overall link resolution strategy is.

This also becomes relevant for links to extensions.

This is a very good point and I see that my phrasing was a big the ambiguous when I said (emphasis added)

I propose that the existing symbol link syntax be extended with a leading slash for links to symbols in other targets.

What I should have said is

[...] a leading slash for links to symbols in documentation archives passed as dependencies.

Since extensions are content for the extending module and as passed as the main input to DocC they would be resolved as local symbol links and wouldn't need a leading slash. It would still be optional and DocC would try to resolve the link locally before checking external resolvers.

If I understand this correctly it means that local content is prioritized over external content. If so, that's how I imagine it would work as well. Even with an optional leading slash, DocC would try to resolve links locally before resolving them as external references. Because of the way that links are always resolved relative to other pages I feel that it's still useful to prefix a local link to with a slash to treat it as an "absolute1" symbol link instead of resolving it relative to the current page.

1. The term "absolute" symbol links can have many meanings. The resolved path to the symbol page in the documentation archive will have an added initial "documentation" path component. This isn't necessary to include when writing the link in content since symbols links can't resolve to tutorials but it is supported to write a symbol link like ``/documentation/SomeTarget/Something`` .

This would mean that if a local extension matched ``/OtherTarget/Something`` then DocC wouldn't attempt to resolve "Something" in the external "OtherTarget" documentation archive dependency.

This is what I propose that we change. I view it as a benefit to the developer or contributor of the documentation to be able to read or write a symbol link without a leading slash and know that it refers to some local symbol or some local extension.

1 Like

that’s a fair point. it’s early enough that i may end up moving Biome to use double slash (``//swift-nio/niocore/eventlooppromise``) for package-absolute links.

how would this compose with the doc: plane?

``doc:niocore/eventlooppromise``
``doc:/niocore/eventlooppromise``
``doc://swift-nio/2.41.1/niocore/eventlooppromise``

DocC uses general URIs with the "doc" scheme for documentation so I imagine that this would follow the rules for URIs more broadly.

doc://something/path/to/some/page#anchor
╰┬╯ ╰────┬────╯╰───────┬────────╯╰──┬──╯
 │    host name       path       fragment
scheme  

It's possible to skip the host name and follow the scheme with the path, without without a leading slash

doc:path/to/some/page
doc:/path/to/some/page

If links with a "doc" scheme were make to a distinction between a leading slash and not then it would only be applicable when the host (bundle identifier) is skipped. However, as far as I recall, general URIs don't make this distinction so I'm not sure if I think "doc" schemed links should be consistent with symbol links or with URIs in general.

It's also already possible to write an external "doc" schemed link by specifying the other archives' identifier as the host name so I there's not the same need to add a new syntax for it

doc://other-target-identifier/path/to/some/page
    ╰───────────┬───────────╯

Note that URI host names are prefixed with two slashes and that the path starts with the first slash after the host name.

Edit:

Good to know but unrelated to cross-target links, symbol links don't support schemes so ``doc:something`` would be considered a link with a single path component (doc:something). This would for example be useful when linking to an Objective-C method such as compare:options:range:.

Also, DocC uses the < > markdown syntax since it fills in the link text based on the title of the resolved page. It's also possible to write "doc" scheme link using the []() syntax.

2 Likes

cc @Karl

1 Like

URLs with no slashes after the scheme have opaque paths (WebURL.hasOpaquePath) - they don't technically have "path components", or any hierarchical structure whatsoever. It's just a scheme name, followed by an opaque string.

That also means you can't resolve most relative references against these URLs (e.g. "../foo") - the standard forbids it, and standards-conforming APIs such as JavaScript's URL class, rust-url, and WebURL will fail to resolve the reference. They used to be called cannot-be-a-base URLs. You're also not allowed to set most components on these URLs.

JavaScript:

var opqURL = new URL("doc:foo/bar/baz");

// Setting a component:

opqURL.path = "wont-work";
console.log(opqURL.href);
// ❌ "doc:foo/bar/baz"

// Resolving a relative reference:

console.log(new URL("../qux", opqURL));
// ❌ Uncaught TypeError: Failed to construct 'URL': Invalid URL

The processing and escaping rules for these URLs are entirely different from those with a leading slash. Path components are not compacted (because they don't have components), and even unescaped spaces are allowed! Yes, really!

javascript:alert("hello, world!");

So both conceptually and practically, there is a world of difference between these URLs:

doc:path/to/some/page
doc:/path/to/some/page

Personally, I would not recommend using both forms in the same scheme. It's likely to make processing awkward and confusing.

4 Likes