[GSoC 2026] Interested in Documentation Coverage Project

Hello everyone,

My name is Shashank Yadav and I am a second-year Computer Science student at SRM Institute of Science and Technology.

I am interested in applying for Google Summer of Code 2026 with the Swift project. After reviewing the proposed ideas, the Documentation Coverage project particularly caught my attention because I am interested in developer tooling and documentation systems.

I have experience working with Swift and SwiftUI through personal projects, and I am currently exploring the Swift DocC repository to better understand how documentation coverage is implemented.

I would really appreciate guidance on the following:

• Where the current documentation coverage implementation exists in the Swift DocC codebase

• Any beginner-friendly issues or areas I could start exploring

• Any recommended resources to better understand the architecture of Swift DocC

I am looking forward to learning more about the project and contributing to the repository before submitting my GSoC proposal.

Thank you for your time and guidance

3 Likes

The way that the experimental prototype for documentation coverage was implemented long ago is by creating a CoverageDataEntry value for each page. There are a few other places that gather and format data but that coverage data entry type is a good place to start looking at.

We had some trouble not too long ago with low quality contributions that didn't always compile and that were abandoned upon receiving feedback. Because of this I haven’t added any new good-first-issue issues in a while. However, I can find or create something for you in almost any area of the code. Is there anything in particular that interests you more?

DocC's has a bit of conceptual documentation for contributors. The Swift-DocC Pipeline gives a quick high-level overview of the big steps involved inside docc convert. That and then some of the other articles and API collections listed on the top-level page is probably a good place to start.

However, none of that describe the broader build orchestration picture (the tools that schedule tasks such as extraction of symbol information and passes the relevant input files to docc convert). For learning about DocC in the context of documentation coverage you can probably ignore that piece for now and only know that some tool passes symbol graph files to DocC that describe the API surface of the module being documented and that that’s the symbol data that DocC works with.

You can also find some information about the experimental coverage feature in other threads on the Forum that have asked about it. For example these two threads:

Thank you for the helpful pointers and for sharing those forum threads.

I’ve started reading through them and exploring how the experimental documentation coverage feature currently works in Swift DocC. Based on your suggestion, I’ll begin by looking into the CoverageDataEntry implementation to better understand how coverage metrics are generated and formatted.

I also plan to experiment with the current coverage output to see how the data is structured and how it behaves in practice.

Thanks again for the guidance while I’m getting familiar with the DocC architecture!

2 Likes

Hi @ronnqvist,

Following up on my previous message, I spent some time experimenting with the experimental documentation coverage feature locally to better understand how it works in practice.

I built the `swift-docc` repository and ran the coverage feature using:

swift run docc convert TestDocs.docc --experimental-documentation-coverage

This produced the following coverage summary output in the terminal:

--- Experimental coverage output enabled. ---
| Abstract | Curated | Code Listing
Types | (0/0) | (0/0) | (0/0)
Members | (0/0) | (0/0) | (0/0)
Globals | (0/0) | (0/0) | (0/0)
It also generated a documentation-coverage.json file, which I inspected to understand how documentation pages are represented as CoverageDataEntry objects.

While exploring the codebase, I traced where these entries are created and noticed that they are generated in ConvertActionConverter when documentation nodes are processed and converted into CoverageDataEntry objects.

From what I’ve observed so far, the current coverage metrics mainly focus on whether a symbol has an abstract, whether it is curated, whether a code listing is present, and parameter/member documentation coverage depending on the symbol type.

While experimenting with the output, I started wondering whether additional metrics might also be useful for coverage analysis. For example, metrics related to example coverage, documentation length, or article/tutorial coverage might provide additional insight into documentation completeness.

I’m continuing to explore the pipeline to better understand how coverage data is extracted from render nodes and where additional information could potentially be incorporated.

Thanks again for the earlier pointers — they were very helpful while I was navigating the DocC architecture.

Yes, part of this project idea is to identify other metrics that would be useful to add (and even removing existing metrics if any of them aren't deemed useful).

The other part of this project idea is to redefine the format of the "documentation-coverage.json" file so that:

  • tools can easily consume the information and display it to developers
  • the format is extensible for future metrics; meaning that ideally tools wouldn't need to write new code—or even update—to display a new metric as long as that metric is of kind of metric (for example a fraction, integer, or a percentage value) that the tool already supports displaying.
1 Like

Thanks for the clarification earlier — it helped me better understand the goals of improving both the metrics and the coverage output format.

I looked again at the documentation-coverage.json file generated locally. Each entry currently contains metadata about the documentation page or symbol (such as title, referencePath, kind, and sourceLanguage) along with specific coverage-related fields like hasAbstract, isCurated, and hasCodeListing.

Since these metrics appear as fixed fields in the JSON, I started wondering whether adding new metrics in the future (for example example coverage, documentation length, or other indicators) might require extending the format each time.

Based on your comment about making the format extensible, I was thinking that coverage information might instead be grouped under a more generic metrics structure. For example, each metric could describe its type (boolean, ratio, integer, etc.), which might allow tools to consume and display new metrics without requiring changes to the overall JSON structure.

I’m continuing to explore how these values are currently generated from CoverageDataEntry and where a redesigned format like this might fit into the pipeline.

Does this direction align with what you had in mind for making the coverage output easier to extend?

Yes, that direction aligns with what I had in mind for making the coverage output extensible.


I'll also mention that it can be difficult to design a file format without anything that's consuming that format. This is why I suggested last year that it could be helpful to include some time in the proposal to prototype something small that displays the coverage information. That way you'll get some feeling for how the format is to work with and a way to verify that it isn't missing any information that tools need.

It could also be useful to prototype something (a website, app, or anything else that you're comfortable with) that displays some coverage data to build your own opinion on the current format—if it contains the information that you need, if it's easy to work with, etc.—and if there are any changes to the data that you'd like to propose to make it easier to present the information.

(That prototype isn't the end product of the project but it can be a relevant tool throughout your project to verify that the data is easy to work with and contains relevant information.)

I built a small CLI prototype to work directly with the generated documentation-coverage.json and see how the current format behaves when consumed by a tool.The prototype reads the JSON and computes a simple coverage score per symbol, highlights missing documentation aspects, and attempts to surface additional insights.

Here is a sample output from the prototype when running it on a test catalog:

TestDocs
Coverage: 33%
Missing: Abstract, Code Listing
 Additional Metrics:
- Documentation Length: Not available in JSON
- Example Count: Not available in JSON

 Summary:
Total Symbols: 1
Well Documented: 0
Needs Improvement: 1

 Overall Coverage: 33% 

While implementing this, I noticed that the current format works well for existing checks but is harder to extend for richer insights. Even simple additions like documentation length or example presence aren’t directly supported.

This prototype helped highlight these limitations when consuming the data in practice. Next, I plan to explore how the data is generated and experiment with more extensible structures for additional metrics.

1 Like

For clarification; you don’t need to do the prototype work before the proposal. If you’ve got enough information to write the proposal then you can include time for prototyping in the proposal and do that work later to check that the updated format is nice to work with and make small refinements as necessary.