Combined documentation of multiple targets

Hi!

Very nice proposal, David! I definitely agree that making separate calls to docc convert should prove to be way more convenient for manual usage but especially when integrating these features into the SPM DocC plugin and IDEs. I also think that we should spend significant time on those to make usage of cross target documentation as easy and convenient as possible.

I'm not 100% sure if I understood your explanation of the merge command correctly. When passing only pre-built documentation archives to the command, those would all be hosted next to each other - independently of their dependency hierarchy - as direct children of documentation, correct?

However, what do you mean by this?

Would all pages from the pre-built documentation archives still be present at the same paths as compared to when there is not documentation catalog passed to the command? Or would the new archive only include those pages that are (transitively) linked to from the new documentation catalog? I guess the "target" corresponding to the new documentation catalog would also be hosted next to the targets from the pre-built archives, wouldn't it?

I also agree with the leading slash syntax as a shorthand for absolute links.

However, I think it would be beneficial to also allow (sufficiently qualified, i.e. starting with a target name) relative links to be used for referring to pages from other targets. This results in relative links behaving just like Swift code, where a local extension to external types can shadow the original member. This point already came up in my proposal-discussion on how to Document Extensions to External Types Using DocC :

Just for completeness: my PR from the extension topic already brings support for absolute symbol links in Swift-DocC: https://github.com/apple/swift-docc/pull/335.

I don't think linking to two different versions or even branches of the same target is a very common requirement, however, one could also solve this problem by embedding the version/branch name in the target name. This could be done in the configuration file proposed in the use-case thread like this:

    - name: Logging
      baseURL: "https://apple.github.io/swift-log/current/documentation/Logging"
    - name: Logging_1_3_0
      baseURL: "https://apple.github.io/swift-log/v1.3.0/documentation/Logging"
    - name: Logging_main
      baseURL: "https://apple.github.io/swift-log/main/documentation/Logging"

Solving this on a target level could make sense since we're already building documentation on a per-target (not per-package) basis.

I love the idea of the convenience, and I think for the most part this makes sense, but what happens when there's a conflict in naming? For example, some dummy such as myself using a too-common term for both a dependency module name and an internal struct? (I know that sounds almost fanciful, but I've done exactly that to myself while trying to name code structures, protocols, and modules).

If a symbol exists that could reference more than one location - how is that warned about, or should that be considered an error in this scenario?

I'd personally lean towards using the doc:// format for links as the more explicit form. I thought about suggesting a link symbol with a preceding slash might be treated differently (considering that a sort of canonical reference to the root structure of the set of modules), but that also seemed somewhat opaque and extremely easy to miss. The idea being to reference to the top-level of the dependency SomeDependency as ``/SomeDependency`` as in your earlier sample.

Looking more at how the whole thing gets processed:

Am I understanding correctly that the gist of this would be to have a cascade of builds to get to the combined-end-product? First build documentation archives for each individual module, and then those build products become the inputs for the docc merge processing command?

If that's the case, what happens with the cross-references that can't be resolved in the earlier steps? Or am I presuming additional steps and the merge command will also construct and use the ExternalSymbolResolver with the contents from each of the various dependency modules being combined together?

I guess I'm mostly trying to get a sense of what this command would look like if I have 5 or 10 dependencies that I want to bundle and render together with cross-links.

Does this remove the need for --emit-digest, or only for the merge use case? (I'm using it to extract a list of all the possible symbols within a module in order to see them "jumbled all together on the table" for when I go to organize/curate documentation). I do like this additional detail being included in the JSON.

I'd really hoped (and wanted) to be able to reference some of Apple's documentation - specifically Swift standard library, Foundation, Combine, and in some cases SwiftUI - to be able to reference the protocols, classes, or structs that are relevant to a library that extends on upon any of those. If that needs to be an explicit HTML link, so be it - but it would be a far nicer experience (IMO) to be able to use at least use a doc:// reference, and have some mechanism to let me know if that link was incorrect (didn't resolve) rather than post-processing through any rendered pages to verify that a link still resolves after it's rendered. That's overlaying a bit with your future directions, so I may just be wanting to jump to a flying-car style end-state here.

presumably, both colliding links would have to be written with the module name prefixed, just as you do in source.

they get resolved in the later steps. some kind of tiered link resolution architecture is inevitable once versioning enters the picture.

why do we even need the scheme to begin with? why can’t references to imported symbols "just work" like they do in source?

I love the convenience of symbols, and for the most part I'm sure they will just work, but they're all referenced from the context of a local page, and even adding a preceding / doesn't resolve between different canonical documentation references. The extended doc:// scheme does by including a hostname (or bundle ID in hostname form) in the middle, which is why I like it and will probably tend to use it when I can figure out (or have access to) additional resolver infrastructure to reference those external locations.

can you give an example of when leading slash would not be sufficient to disambiguate references?

Two modules, both named "Utility", from different authors.

1 Like

this situation would not occur in project-local links, because two modules with the same name cannot be combined into the same product.

this situation could occur in project-external links, which is why Biome uses leading slash to denote package namespace, which would not collide.

it's possible down the road that we will have to deal with colliding package names, and both tools will have to introduce some concept of username and even hostname, as you described. but i do not think that is necessary right now.

if you are worried about colliding module names, this is a reason to support having leading slash denote package namespace instead of module namespace.

this does not scale once versioning is taken into account. modeling each VCS node as a separate target results in a staggering amount of data duplication.

i understand that this is not a concern for small, localized use cases that only involve a handful of packages and don't need versioning more granular than a few major releases. but it doesn't scale.

Could you give an example where this would create more data duplication than e.g. Package.resolved? After all, the configuration file only has to contain entries for versions that are actually used in the documentation catalog.

I furthermore expect this configuration file to not be actually typed out in full. I expect the SPM plugin or DocC itself to automatically infer the link for the version that is used in code if it can find its docs on a "well known" platform (e.g. GitHub pages or the Package Index). I think that this is essential for a good user experience.

1 Like

the Package.resolved is not the problem, the problem is the symbolgraph models themselves.

the example that you gave is not representative of the use case i am thinking of, because you only have a handful of versions and branches, which would not strain the implementation:

  • Logging (latest)
  • Logging (1.3.0)
  • Logging (main/latest)

this can be modeled in the way you propose without difficulty. however a more-mature project might have many branches (generally, a master branch + 2 active “release” branches), and each branch will ideally have snapshots taken daily, possibly even hourly.

the amount of snapshots we are talking about (for a single module) is on the order of 102 to 104. we typically talk about the sizes of symbolgraphs in terms of tens of MBs, which is what i mean when i say “scaling”.

Thanks for all the replies. I'll try and answer by topic

Prefixing link with a package name

Having worked a lot on the implementation of links in DocC I don't foresee any problems with adding additional path components before the module name, now or in the future.

That said, the only two uses cases that I can think of right now for prepending the link with a package name are:

  • Disambiguation if two packages have a target with the same name.

  • Discoverability, for example via code completion, if the developer doesn't know the targets that they depend on but they do know the packages that they depend on.

Are there any uses cases that I'm missing here?

Module name collisions in a build workflow would already be problematic when building the code. SwiftPM (in Swift 5.7+) enable developers to define new unique names for the conflicting modules using module aliases. I haven't looked into this in detail but if the module names that DocC sees are the alias named then the solutions uses to solve module collisions in build workflows should also solve them for DocC.

It's also possible for direct calls to DocC to pass custom dependencies with colliding targets that would need to be disambiguated. DocC considers the module/target as a symbol so it could be disambiguated with a suffix to that path component. It's something that's worth looking into in more detail.

Linking to specific versions or branches

I think we may have different ideas of how to link to specific versions of documentation and possibly about the uses cases for doing so. I'll explain my thinking below. Please add yours as well.

In my view, DocC always works with the "local" version of a target and its dependencies. This means that when a developer writes a link to a dependencies' symbol they always write ``/TargetName/SymbolName`` (without any version information). In the locally built documentation the link points to the locally built documentation.

However, when hosting the target's documentation and transforming the link to the hosted documentation for the dependency, the link needs to point to the correct version of the hosted documentation. This means that the local dependency documentation needs some information about what version this is. The details of how it would know this should be discussed on its own, because it's a large and complex problem to solve and deserves a focused discussion. It may be possible to derive the name based on information about the resolved information or it may be information that the developer commits to a file somewhere in the documentation catalog (together with the base URL).

For DocC to resolve a link to a version of the documentation other than the local version it needs to have the data for that version. SwiftPM or IDEs wouldn't checkout that version or branch so DocC wouldn't be able to build other versions of documentation from source. That said, the only information DocC really needs to resolve links is the "linkable entities" file which are fairly small. If linking to specific versions or branches is something that we want to support in the future I could imagine possible designs involving prebuilt "linkable entities" files. I'm strongly feel that DocC shouldn't do any networking during a build, so the responsibility would mostly fall on IDEs or custom scripts to download the other version's information before the documentation build started.

I don't have a fully formed picture of how I imagine linking to versioned documentation would work but I don't think anything in this design puts severe restrictions on how it could work. From a link resolution implementation perspective I feel like both ``/package-name/version/TargetName`` and ``/package-name/TargetName`` (for the local version) could coexist since the package-name would have few enough descendants that DocC could try them both as TargetNames and as known versions. I could also see ``/TargetName-version`` or ``/TargetName@version`` working to disambiguate between the different versions of a target's documentation but that syntax is for a separate later discussion.

More about merging documentation

Yes, that sounds correct. If we look at the "A-F project" example:

Unless the 3 packages ("A", "E", and "F") can be built in the same build workflow, there would be one documentation build for "A", one for "E", and one for "F". "E" and "F" both build one target without external links (since there are no dependencies) and result in one documentation archive each.

The documentation build for "A" would build "B" and "D" as dependencies and building "B" would build "C" as a dependency of it. In a build system this is typically planned out ask various tasks that depend on each other to define their order. A bit overly simplified; the "build C" task doesn't depend on anything so the build starts there. Both "build B" and "build documentation for C" depend on C (they actually depend on certain subtasks that produce certain files (e.g. the symbol graph files to build documentation) but that's too much detail). The task to "build documentation for B" depend on "build B" and on "build documentation for C" to pass it as a dependency documentation archive to docc convert.

Following the same pattern, the full build for "A" results in 4 documentation archives;

  • "A" which may link to "B" and "D"

  • "B" which may link to "C"

  • "D" which doesn't link to any dependency

  • "C" which doesn't link to any dependency

If at this point all 6 archives would be passed to docc merge then all cross-target links would be preserved.

If however, only "A" and "B" was passed to docc merge then the links from "B" to "C" and from "A" to "D" would need to be transformed.

As an example, say that instead of merging "A-F" all in one go, that team decided to merge "A-D" first and pass a documentation catalog for their landing page. This would result in a documentation archive with the landing page as top-level paths and each target's documentation as "/documentation/A/path", "/documentation/B/path", etc.

Next, if they passed this combined documentation archive and "E" and "F" to another docc merge call with another documentation catalog it would result in a new documentation archive with the new "A-F" landing page as top-level paths and all target's documentation as "/documentation/A/path", "/documentation/B/path", ..., "/documentation/F/path".

I'm thinking that it's best to remove the old "A-D" landing pages, which is why I proposed that they be removed. If the landing pages are also combined then it's possible to have path collisions that could cause a page from the "A-D" catalog to link to a different page in the "A-F" catalog.

Possibilities for link conflicts

Module level collisions likely need their own solution separately. I'm hoping that SwiftPM module aliases would work for this (see above). It's also possible to fully spell out links so unless the two documentation archives have the same identifier a developer could use <doc://com.something.first/Utility> so reference one of the two targets without collisions.

Links with the "doc" scheme work for both symbols and non-symbols, and will continue to do so, and their ability to include the documentation archives' identifier enable them to uniquely describe some links that would be ambiguous with the symbol link syntax. I hope that these cases would be rare but it is an option.

Collisions between symbols and targets shouldn't happen with this syntax. For example if "MyTarget" had a "Something" dependency and a "Something" symbol then ``/Something`` would refer to the dependency and `` Something`` would refer to the symbol.

When I said that a leading slash would be optional for the current target I meant that developers could write either of

MyTarget/Something
/MyTarget/Something

to refer to the "Something" symbol in the current target.

From a link resolution implementation standpoint the leading slash isn't necessary. The number of target dependencies is likely so small that checking if the first path component of a link matches one of them wouldn't be an issue.

The only real downside I see with allowing symbol links to other targets without a leading slash is if my target had a "Something" dependency that I linked to as without a leading slash and later I defined a Something type in my target. Now the link that used to point to the dependency would point to the local symbol without warning.

Other than the disambiguation purpose, I think of the leading slash syntax as an indication of intent from the developer to their co-contributors who read the markup. If a symbol link doesn't start with a leading slash it's relative to something in the current target. If it starts with a leading slash the first path component is a target name (which may be the current target).

Any need for the emit digest flag

I cut this portion of the original post because it was quite long already, but "no" the --emit-digest flag would still be used to emit the files about diagnostics, assets, and search index information.

Linking to Swift, standard library, Foundation, Combine, etc.

These would be treated as external SDK dependencies. There are a fair number of technical details to discuss about how DocC would get this information but the idea is that the documentation link syntax should be the same for SDK dependencies as it is for documentation archive dependencies.

4 Likes

aha. Biome (since v0.2) does not really have a concept of a “local” target or a dependency root. Biome has a concept called culture which is relative, orthogonal to module namespace, and is defined for every module in its modulegraph.

culture determines the “rights” a particular symbol has within a particular context, how much the compiler “trusts” relationships emanating from that symbol, and how the compiler breaks ties when edge cases occur.

culture is basically a generalization of “locality”, and because it is relative, it means Biome doesn’t have to globally privilege any symbol or group of symbols.

for a more concrete example of this in action, if you go to the docs for swift-json, you’ll see that they connect to the docs for swift-grammar. however it would be incorrect to say that swiftinit hosts documentation “for swift-json that includes docs for swift-grammar”, just as it would be incorrect to say that swiftinit hosts documentation “for swift-grammar that includes docs for swift-json”. the same is true for swift-nio and the other packages on the site.

i considered these spellings but i rejected /TargetName-version because it does not generalize well to branch references (TargetName-master looks like a package name), and /TargetName@version because i did not want @ signs showing up in URLs.

versions are also logically a package-level concept, so it makes more sense for it to show up on the left-hand side of the module name.

based on reading How to properly reference a type and module with the same name? , i assumed ``Something`` would refer to the module, and ``Something/Something`` would refer to the the symbol. if this is true, then ``/Something`` would not provide any additional expressive power.

i always found this strange, but i implemented the lookup algorithm in Biome this way for consistency with DocC.

But this has nothing to do with the linking syntax I proposed earlier, does it?

I get that a lot of snapshots lead to a lot of data, but only for the package that has all these snapshots taken (which cannot be avoided entirely and has nothing to do with the linking syntax). However, for targets linking to such a mature project with many snapshots, nothing really changes, because one version of the documentation archive will only ever reference one version (the version manifested in the Package.resolved/the tag from the Package.swift file) plus maybe a handful of other versions that the author wants to link to for some reason. Note that I expect all links to dependencies to actually link to the dependencies' original hosting location, not a local build of the dependencies' documentation archive.

Symbol link priorities and links to extensions

There are a number of subtleties to how symbol links work in DocC.

That's mostly true but the missing piece is that DocC resolves all references relative to the page where the link occurs. For example, in this hypothetical code:

// 1: the module is named "Something"
class TopLevelSymbol {
    class Something { // 2: the first nested symbol
        class NestedOne {
            class NestedTwo {
                class Something { // 3: the deeply nested symbol
                    ...
                }
            }
        }
    }
}
class DifferentSymbol {
    ...
}

The link ``Something`` from the documentation for NestedOne would resolve to the outer nested symbol (2) since that's the container scope but the same link from the NestedTwo documentation would resolve to the deeply nested symbol (3) since that's the descendant.

Similarly, the ``Something`` link from the documentation for TopLevelSymbol would resolve to its descendant symbol (2). Only from the documentation for DifferentSymbol would the ``Something`` link resolve to the module (1).

A leading slash would provide the ability to express an intentional reference to the module in cases where relative descendant or container symbols would otherwise be preferred.

If instead the hypothetical code had a top level "Something" symbol like this:

// 1: the module is named "Something"
class Something { // 2: now a top-level symbol
    class NestedOne {
        class NestedTwo {
            class Something { // 3: a deeply nested symbol
                ...
            }
        }
    }
}
class DifferentSymbol {
    ...
}

then a ``Something/Something`` link could either be interpreted as "<module> / <top-level>", "<top-level> / <member-of-top-level>", or "<nested> / <member-of-nested>" in all of the possible scopes. Only the first interpretation would resolve successfully. This means that link resolver implementations need to either back track, branch, or look ahead depending on what their overall link resolution strategy is.

This also becomes relevant for links to extensions.

This is a very good point and I see that my phrasing was a big the ambiguous when I said (emphasis added)

I propose that the existing symbol link syntax be extended with a leading slash for links to symbols in other targets.

What I should have said is

[...] a leading slash for links to symbols in documentation archives passed as dependencies.

Since extensions are content for the extending module and as passed as the main input to DocC they would be resolved as local symbol links and wouldn't need a leading slash. It would still be optional and DocC would try to resolve the link locally before checking external resolvers.

If I understand this correctly it means that local content is prioritized over external content. If so, that's how I imagine it would work as well. Even with an optional leading slash, DocC would try to resolve links locally before resolving them as external references. Because of the way that links are always resolved relative to other pages I feel that it's still useful to prefix a local link to with a slash to treat it as an "absolute1" symbol link instead of resolving it relative to the current page.

1. The term "absolute" symbol links can have many meanings. The resolved path to the symbol page in the documentation archive will have an added initial "documentation" path component. This isn't necessary to include when writing the link in content since symbols links can't resolve to tutorials but it is supported to write a symbol link like ``/documentation/SomeTarget/Something`` .

This would mean that if a local extension matched ``/OtherTarget/Something`` then DocC wouldn't attempt to resolve "Something" in the external "OtherTarget" documentation archive dependency.

This is what I propose that we change. I view it as a benefit to the developer or contributor of the documentation to be able to read or write a symbol link without a leading slash and know that it refers to some local symbol or some local extension.

1 Like

that’s a fair point. it’s early enough that i may end up moving Biome to use double slash (``//swift-nio/niocore/eventlooppromise``) for package-absolute links.

how would this compose with the doc: plane?

``doc:niocore/eventlooppromise``
``doc:/niocore/eventlooppromise``
``doc://swift-nio/2.41.1/niocore/eventlooppromise``

DocC uses general URIs with the "doc" scheme for documentation so I imagine that this would follow the rules for URIs more broadly.

doc://something/path/to/some/page#anchor
╰┬╯ ╰────┬────╯╰───────┬────────╯╰──┬──╯
 │    host name       path       fragment
scheme  

It's possible to skip the host name and follow the scheme with the path, without without a leading slash

doc:path/to/some/page
doc:/path/to/some/page

If links with a "doc" scheme were make to a distinction between a leading slash and not then it would only be applicable when the host (bundle identifier) is skipped. However, as far as I recall, general URIs don't make this distinction so I'm not sure if I think "doc" schemed links should be consistent with symbol links or with URIs in general.

It's also already possible to write an external "doc" schemed link by specifying the other archives' identifier as the host name so I there's not the same need to add a new syntax for it

doc://other-target-identifier/path/to/some/page
    ╰───────────┬───────────╯

Note that URI host names are prefixed with two slashes and that the path starts with the first slash after the host name.

Edit:

Good to know but unrelated to cross-target links, symbol links don't support schemes so ``doc:something`` would be considered a link with a single path component (doc:something). This would for example be useful when linking to an Objective-C method such as compare:options:range:.

Also, DocC uses the < > markdown syntax since it fills in the link text based on the title of the resolved page. It's also possible to write "doc" scheme link using the []() syntax.

2 Likes

cc @Karl

1 Like

URLs with no slashes after the scheme have opaque paths (WebURL.hasOpaquePath) - they don't technically have "path components", or any hierarchical structure whatsoever. It's just a scheme name, followed by an opaque string.

That also means you can't resolve most relative references against these URLs (e.g. "../foo") - the standard forbids it, and standards-conforming APIs such as JavaScript's URL class, rust-url, and WebURL will fail to resolve the reference. They used to be called cannot-be-a-base URLs. You're also not allowed to set most components on these URLs.

JavaScript:

var opqURL = new URL("doc:foo/bar/baz");

// Setting a component:

opqURL.path = "wont-work";
console.log(opqURL.href);
// ❌ "doc:foo/bar/baz"

// Resolving a relative reference:

console.log(new URL("../qux", opqURL));
// ❌ Uncaught TypeError: Failed to construct 'URL': Invalid URL

The processing and escaping rules for these URLs are entirely different from those with a leading slash. Path components are not compacted (because they don't have components), and even unescaped spaces are allowed! Yes, really!

javascript:alert("hello, world!");

So both conceptually and practically, there is a world of difference between these URLs:

doc:path/to/some/page
doc:/path/to/some/page

Personally, I would not recommend using both forms in the same scheme. It's likely to make processing awkward and confusing.

4 Likes

what if we used the scheme to differentiate?

<doc:implicitlyNamespaced> ←→ ``implicitlyNamespaced``
<module:ModuleName/qualified> ←→ ``/ModuleName/qualified``

Taking a few steps back; the existing syntax for "doc" scheme links in DocC will work for referencing articles and tutorials in external documentation archives by specifying the other archive's identifier like <doc://other-target-identifier/path/to/some/page>. I feel that this is fine and that this initial proposal doesn't need to cover syntax changes for "doc" scheme links. We can always make improvements to the "doc" scheme syntax later when people have been using it to reference external content.

I do however feel that developers should be able to use symbol links to reference symbols in external documentation archives and that this should part of the initial proposal, with a leading slash for external symbol links.

1 Like