Documentation Tooling Workgroup meeting - 15 December 2025
Attending
- Joe Heck
- Vera Mitchell
- Franklin Schrans
- David Ronnqvist
- Sven Schmidt
- Sofia Rodriguez Morales
- Dave Verwer
Topics
-
(Joe) Follow up on testing/eval plan for SEO results from custom DocC image that enhances HTML content for crawlers. Relevant thread in Forums: Supplementing DocC's output with minimal content in the per-page HTML files - #3 by ronnqvist
-
(Joe) Note Victor Puga's suggesting for providing explicit content at `/ in a DocC archive: Documentation homepage - #3 by victorpuga
Discussion
SEO Followup
Joe: We have an image set up from David's branch that can be invoked
- may be missing DOCC_HTML_DIR
- invoked indirectly
- also need to add in additional CLI option to invoke the feature
The question is really how to best test this?
Suggestion 1:
- Fork something that we want to test, I think we talked about something explicit there
Franklin: I think we identified OpenAPI generator runtime
The testing scenario would be to create two forks, publish both, and compare against each otehr
- create a fork without any changes
- create a fork that uses the new image
Sofia - Is there a dashboard to see what's going on with SEO values?
Joe: Not really. There's a Google Search dashboard, relevant per site, but that doesn't really give you detailed SEO comparisions. Have to compare more explicitly through search queries.
Joe: compare with built content - do iterative searches through the content to see where they rank compared to each other. I think there was some work done prior to see where some specific content was landed, and that someone in the server team may have some prior data and would like to compare post search results - so using time rather than comparing two different site locations.
Joe: I'll coordinate testing and reporting for this, likely using HackMD as a collaborate whiteboard. It's not clear to me if we see one, but not another, if the sites have been indexed or not. Bit of a jam in our theory.
Sven - I think you can ask Google if a URL has been indexed.
Victor Puga's PR - content at / through DocC Render
- related PR: Added Documentation homepage by VictorPuga · Pull Request #982 · swiftlang/swift-docc-render · GitHub
Joe: Originally, I was unsure if this provided content at / or /documentation from a DocC archive. I asked in the thread, and per Victor's response it's supposed to provide some data at /
group: This results in potentially duplicate content kinds of pages - one at / and another at /documentation, in the case where you're assembling a combined DocC archive build.
David: DocC content can't generate at / - there's nowhere to drop in data that presents there, it's entirely controlled by the DocC render app.
David: will read through the Forums post and PR ask some questions.
Sofia: if only one target, that would be a weird experience - maybe it should redirect?
Franklin: that would be good to discuss.
Sven: issue currently shows an error if that that's not. If there was a proper homepage, we could avoid a whole separate lookup.
David: is it strange to suggest the renderer could handle that with redirects? If there is a single page, redirect to that - if there's an authored multi-target show that.
Franklin: any way to know what they're looking at?
Joe: haven't read PR as yet, but there is the index.json at the root of a DocC archive that could be read and iterated to find the relevant modules.
Sven: those pages could be a "no index" to avoid indexing them, correct?
Dave: Could put a metadata tag in the header to indicate that it shouldn't be indexed.
Static HTML...
David: I have an experiment that puts HTML in the pages inside of a no-script tag so that it doesn't interfere with JS renderer. Have a live demo in the forum post
- Example: DocumentationContext
If you disable javascript in your browser, you can refresh, click through, etc.
Can increment towards a fully static HTML experience with opengraph meta tags and other things.
Dave Verwer: Really good start - exposes just the content, and avoids the nevigation by intent if I understand?
David: Yes, that's correct. All of the Javascript pieces are still on the page, so if Javascript it enabled it'll trigger and run if enabled.
You can hit this with curl and get the underlying content. The links are page relative, shouldn't even need a web server for those.
Sven: If you were to think ahead - which part is the most tricky?
David: Probably /search - it needs to query a data file and do the quicknav mechanisms.
I've played around with using CSS to annotate Swift vs. ObjC symbols so that you can toggle which is visible through CSS. Anything that's the same in both languages doesn't use the CSS.
But right now, the page content is Swift-only, irregardless of the language.
Dave Verwer: What would require the Javascript to enable for features that currently exist?
David: Navigator would require some JavaScript/CSS.
The site uses the same navigator for every page, so you don't want to duplicate it.
Tutorials require JS for their interactions.
Scroll location, displaying content at the right moment. There's also quiz/eval mechanisms that use the JS at the tail end of the tutorials.
Franklin: in VS Code, it supports a live preview. Does that use JS?
Joe: nope I don't think so - that's all going through LSP and accessing DocC directly through that. The VSCode team has been super open to feedback though.
Franklin: So it doesn't keep track of your scroll position?
Joe: I don't think so.
SiteMaps
Franklin: Any follow up with SiteMaps?
David Ronnqvist put in some details in the Slack threads:
(copied out of Slack to keep from loosing it)
The information in each entry is only the human written. For example, for the NIOLoopBound page the "rawIndexableTextContent" is only this text
"NIOLoopBound is an always-Sendable, value-typed container allowing you access to value if and only if you are accessing it on the right EventLoop. Overview NIOLoopBound is useful to transport a value of a non-Sendable type that needs to go from one place in your code to another where you (but not the compiler) know is on one and the same EventLoop. Usually this involves @Sendable closures. This type is safe because it verifies (using preconditionInEventLoop(file:line:)) that this is actually true. A NIOLoopBound can only be constructed, read from or written to when you are provably (through preconditionInEventLoop(file:line:)) on the EventLoop associated with the NIOLoopBound. Accessing or constructing it from any other place will crash your program with a precondition as it would be undefined behaviour to do so."
and a page like WriteBufferWaterMark has " " as its text (I think it's a bug that it's 1 space and not an empty string).
If you pass the --emit-digest flag DocC writes this indexing-records.json top-level in the .doccarchive output (next to the linkable-entities.json file)
Using a bit of jq you can extract only the length of the content for each reference and join them together into an output that can almost be treated as a CSV (it would only have the rows with values and not the leading columns names row).
For example, this jq-query with the NIOCore indexing records
jq '.[]
| select(
.location.type == "topLevelPage"
)
| [
(.rawIndexableTextContent | length) ,
(.location.reference.url | sub("doc://[^/]*"; ""))
]
| join(",")' \
--raw-output \
/path/to/NIOCore.doccarchive/indexing-records.json | sort -nr
produces output like:
12044,/documentation/Docs/loops-futures-concurrency
9615,/documentation/Docs/swift-concurrency
6487,/documentation/NIOCore/ByteToMessageDecoder
6106,/documentation/NIOCore/EventLoopFuture
4969,/documentation/Docs/ByteBuffer-lengthPrefix
4334,/documentation/NIOCore/ChannelPipeline
2999,/documentation/NIOCore/ByteBuffer
1872,/documentation/NIOCore/EventLoop/scheduleCallback(at:handler:)-2xm6l
1439,/documentation/NIOCore/SocketOptionProvider
1356,/documentation/NIOCore/ChannelOptions/Types/DatagramVectorReadMessageCountOption
1229,/documentation/NIOCore/EventLoopFuture/reduce(into:_:on:_:)
which shows that the EventLoops, EventLoopFutures, and Swift Concurrency article contains the most content in that .doccarchive
likewise, the ByteToMessageDecoder protocol is the symbol with the most documentation
with this information—that's already available in existing DocC versions (IIRC for several years)—it's possible to determine which pages in a given archive have human written documentation
That's likely enough to start an experiment that limits what documentation pages SPI passes to Google for search indexing
and if you find that there's additional data that would be useful to have but that DocC doesn't already output, then we can talk about in what file it would make sense to add that
Action Items
- Skipping meeting on the 29th of December, Next meeting is scheduled to be Jan 12th