This is a follow-up to our recent Documentation Workgroup discussion.
We are hosting DocC documentation for more than 1,000 open source Swift packages on the Swift Package Index. By default, we build documentation with the latest version of Swift. When we updated to Swift 6.1 we encountered a subtle regression in documentation rendering. The problem in this case was that the sidebar content was incorrect.
We've also had issues in the past where sidebar content was missing or documentation generation failed and led to missing docs.
Of all these issues, we can only reliably detect when docs are missing. To that end, we run regular checks if known documentation pages return 404s.
Obviously, this check cannot detect incorrect content on documentation pages.
This is a concern, given how many pages we host. It is a concern in particular, because doc archives often consist of thousands of files, which we upload to and serve from S3. At 173GB of docs across 30M files, there are almost guaranteed to be upload errors or missing files.
As you can imagine, problems like these are impossible to detect for us. Even if the doc archive app (i.e. the Vue.js hosting app) had some way of enumerating all archive pages, we wouldn't be able to crawl them all to verify them.
However, if we don't have a verification method, we risk breaking documentation on Swift version updates or if we mess something up in our build system. We do have an integration test that ensures documentation generation generates output on all supported platforms. However, this process cannot detect issues like the one I described at the start.
A while ago, Chris, a Swift Mentorship Program mentee, prototyped a solution that loads the Vue.js app in order to run deeper tests than just HTTP status code inspection (https://github.com/msuzoagu/spiDocSmokeTest, finestructure / spi-doctest · GitLab). However, this is quite a slow process and would only allow spot checks.
This is probably still worthwhile to pursue but we hope there could be a more comprehensive solution.
We've been thinking about this for a while and believe two features as part of the Vue.js application would allow for better testing:
- a health check endpoint that can be queried via HTTP which runs a "structural integrity" check
- a checksum mechanism covering the many files doc archives contain allowing for a quick way to verify an archive in its entirety is intact
A "structural integrity check" is obviously a fuzzy term but what this should ensure is that the app is up and running correctly. This should include issues like missing data. Sometimes, you can "see" that a doc archive isn't behaving correctly via errors in the JS console but this isn't surfaced in HTTP error codes. The health check should surface these same errors and allow them to be inspected via a simple HTTP call.
Regarding the checksum mechanism, this would be a great help allowing us to verify uploads. It's probably not something we'd use across all repositories (because it would mean re-downloading all files from S3) but it would be helpful to verify an archive is intact on its own if we have to.
What are people's thoughts around this? How are others testing their hosted documentation archives? Is there something else we could/should be doing to ensure documentation pages are displaying correctly?