Diagnostic for undocumented public declarations

Thanks, this is exactly what I was getting at with my question above. The check would absolutely be based on the cascaded doc comments.

1 Like

I agree, and I think having some limited degree of control over "strictness" is key here. To avoid feature creep, I imagine most scenarios where a user wants doc comment validation are one of these:

  1. The documentation is not going to be published, so not every public decl needs a comment, but decls (of any visibility) that do should still be validated for correctness (i.e., parameters in the comment match the signature, you don't have a Returns section for a function that returns Void, etc.).
  2. The documentation is going to be published, so every public decl should have full documentation (either inherited or its own) and be validated for correctness.

Being able to choose between those two modes (with "off" being the third) would satisfy folks who have more permissive style guidelines for project- and team-internal code, such as "if the only thing you would say about a method named setFoo is 'sets the foo', then leave it out," whereas if you were publishing these docs for an external SDK, you probably would document that in the interest of completeness and having something to link to.

This feature would probably also need some special cases to exclude things like public declarations that have a name with a leading underscore, since that typically signifies that they're only public as an implementation detail for another module or for testing, and we probably don't want to require documentation for those even in a so-called strict mode.

4 Likes

Yes, I think that makes a lot of sense.

I just recently had to wrestle with this when emitting symbol graphs that had to handle nested cases, @_spi, and some other more rare cases, so I think there is a good foundation I can repurpose to prevent this kind of noise.

1 Like

Iā€˜m definitely in favour of a sourcekit-based solution. Iā€™ve only used it a little bit myself, but I think itā€™s supposed to be the library with ā€œIDE-levelā€ understanding of your code. It can do code completion and syntax colouring, and that requires a fairly complete idea of things. I really think we should be using it more - especially for command line tools! For too long have IDEs hogged the good stuff!

But it isnā€™t very easy to build a sourcekit tool that can scale to large projects. It would be really nice to have something with a simple API like swift-formatā€˜s custom linter rules, but with greater understanding of your code. So you can write your own complex rules in a host that handles all the tricky stuff about using sourcekit (like figuring out import paths and so on).

There might already be a tool like that. Iā€™m not extremely well versed on the various linters out there (there are a few, I think).

Hello, I'm really happy this concern is explicitly mentioned in this thread (as well as underscore-prefixed identifiers). I feel like we're in good hands :slight_smile:

Does anyone of us happen to know what algorithm is used by Xcode in order to feed this cascade and display the relevant documentation when one option-clicks an identifier in the editor? It is private to Xcode, or available in some external tool, like SourceKit-LSP?

Because... Isn't the goal of the diagnostic we talk about to make sure:

  1. Documentation is available in well-behaved IDEs
  2. Documentation is available for well-behaved generators (html, etc.)

Isn't it a primary goal, then, to see if both can feed from the same source of truth?

Whatever part of the project that @owenv refers to in the post below seems to have all of the prerequisite knowledge; shouldn't any missing docs or issues be diagnosed as part of that?

If we are going to put this functionality into the compiler, I would love to see both of these modes check code examples in doc comments to make sure they compile (or, even, verify that the output matches what is claimed).

Still not sure if all of this has to be done in the compiler though. Could it be part of SwiftPM? What do peer languages such as Rust do in terms of validating documentation completeness?

I think this would be a great addition and tie into the existing playground capabilities really well. At the same time, validation of code snippets seems like a topic of large enough depth that it deserves its own separate discussion from this one about validation of existence/structure of comments, since there would be a lot more details to work out:

  • How strict would it be: syntactic validation, semantic validation, or runtime validation? I can think of legitimate reasons for each.
  • There'd need to be a way to control whether validation was desired (since it may not be for all Swift code blocks), so maybe a new Markdown code fence language identifier would be needed.
  • Would each snippet be isolated, or could they build on each other by referring to symbols from earlier ones (like in this documentation), making it more like a "reverse playground"?

Please remember that not everyone uses SwiftPM as part of their build process; they may use Bazel, CMake, or some other build system, so saying it should be part of SwiftPM specifically would make it unusable to many folks. If the distinction being drawn here is whether it should be "part of the compiler" vs. "another tool that's also distributed as part of a standard Swift toolchain and is used in SwiftPM's build plan but can also be invoked in a standalone fashion", then we need to state that clearly to make sure we aren't locking out a class of users.

In this particular case, I think it's entirely appropriate for the validation to occur in the compiler for one main reason: the compiler already emits a .swiftdoc file containing the documentation for a module and that file is distributed with the module and used by SourceKit/language server tooling to surface that documentation to users. This is unlike many other languages, where a separate tool is used to perform an independent scan of source code to extract the comments in order to generate documentation output. The Swift compiler is already parsing the structured Markdown in the comments in order to write it out in that serialized format, so having it validate it there as well seems sensible and the most efficient choice. Requiring a user to run a separate tool after the fact would be far less so, both for the user and for the tools.

2 Likes

Yes, this is my biggest issue with the existing documentation generators. I think @mattt was looking at this as part of swift-doc but I don't know if he's been able to solve this issue either.

Going forward, I think the solution here is an official linter (whether as part of swift-format or a separate tool), as this doesn't really have anything to do with the code. However, if Swift were to add support for compiling example code in the inline documentation itself, perhaps some aspect of doc linting could appropriately live in the compiler. (As an aside, we've stopped putting full examples in most of Alamofire's documentation, both due to the complexity and to stop copy / paste coding, which was becoming a support burden.)

While I know it's not a common pattern in the compiler today, I think it would be really useful. I also love the idea of verifying any code samples within a comment, but I'd be quite happy with just a flag that "something is missing" well prior to that advanced step.

1 Like

Hi, I can answer this. This cascading logic is implemented in the compiler and it presents an API that just asks, "what is the doc comment for this declaration?". If there is no doc comment on the declaration itself, it looks to protocol requirements and declarations inherited from superclasses.

I completely agree that uses of symbol data should have a consistent source of truth without a bunch of implementations trying to figure this kind of thing out. In fact, this functionality should already be in use for Xcode's Quick Help or anything else that asks for a declaration's doc comments.

3 Likes

As long as it's easy to turn off. Ideally via per-file and/or per-module/target settings in Package.swift somehow and in an IDE (Xcode's "Target - Build Phases - Compile Sources - Compiler Flags" per-file setting area comes to mind).

There is a lot of legacy code (in our large projects and in third party packages) that won't have documentation comments for every single thing. Since we set treat Warnings as Errors, no way to turn this off will be a nightmare.

I can see a perspective that believes that some public fields in protocols really don't need documentation (they are obvious from their name), so this "feature" could well feel like an annoying micromanaging over-protective-nanny "feature" to people with that perspective.

Addendum: I do like good documentation though :), so an optional something like this that gently encourages adding it seems like a nice addition.

2 Likes

I'd love this!

I think this feature will be useful for certain users, for example, the Swift standard library itself, swift-nio, Foundation, Dispatch etc. These are libraries that are so widely used that spending a non-trivial amount of time to document every single API is clearly beneficial.

However, the documentation of this new flag should explain the costs of requiring documentation on every public API. Specifically:

  • The maintenance costs are very real. Updating comments takes time, especially for engineers who are often not technical writers. If the comment takes more time to write and keep up-to-date than it provides value to readers, then it is a net negative for the project.

  • Documentation can bitrot. Out-of-date, misleading documentation can be worse than none at all.

  • A tool that requires a documentation comment can push developers to write meaningless comments that on the surface level look OK, and make the API look "documented", when in fact it is not.

    When the developer does not know how to write a good comment, they have a choice of not writing a comment, or writing a meaningless comment. A warning that requires a comment therefore forces the developer to sometimes write a meaningless comment.

Some examples that I have seen in the production code:

/// The customer ID.
var customerId: CustomerId
/// An abstract base class for all concrete image uploader classes.
class ImageUploader { ... }
extension ImageOrientation {
  /// Converts an `ImageOrientationProperty` to `ImageOrientation`.
  init(_ imageOrientationProperty: ImageOrientationProperty)
/// Define a convenience alias for `Result<String, NetworkError>`.
typealias MyLibraryResult = Result<String, NetworkError>

My preference as a reader is that such comments were omitted entirely. I'm worried about that a warning might push developers to write more such comments. I don't think that a warning can make people write more helpful comments. When a developer knows how to write a helpful comment, they usually do it without a reminder.

TL;DR: Unless there is a technical writer on the project, I think it would be better off not mandating a comment on every public API.

Aside: I think requiring a comment on every public API is similar in spirit to requiring unit tests to provide 100% code coverage. It is great to have thorough unit tests, and indeed the code coverage metric correlates with the quality of tests. So people with good intentions start demanding 100% code coverage. Achieving 70-95% coverage is very doable, and such tests provide value to the project by catching bugs. Covering the remaining code in tests requires a disproportionate amount of effort and might not provide value: the tests written for the last 5% coverage might not prevent any bugs in future. 100% code coverage also provides a false sense of security (just like doc comments on every API create a false impression that the library is thoroughly documented). Of course there are project like software for aerospace applications where 100% code coverage is justified, but they are exceptions rather than norm. Similarly, for documentation comments, only if the library is used very widely it makes sense to spend the effort to write a comment on every API -- but such libraries are exceptions.

7 Likes

Thanks for writing this.

Some documentation generators use a marker such as /// :nodoc: in order to remove some methods from the generated documentation. One use case is to avoid methods inherited from frameworks to interfere with the intended documentation (for example, you don't want to document viewDidLoad, and you don't want it to appear in the documentation either).

Based on this example, we could imagine a marker such as /// :trivial: which would silence the diagnostic we are talking about (and tell documentation generators to output the declaration without any documentation).

2 Likes

That's a good idea, it might work for some projects that want to mostly-document their APIs, expanding the user base of the feature.

However, for some projects, even spending the time and readers' attention on these markers might be too much of a cost. For some projects the right tradeoff might be to document the central abstractions and leave the rest undocumented -- not because the remaining APIs are trivial, but because the cost of documenting is unlikely to be balanced by a future benefit.

I'm not opposed to the feature -- I'm only saying that we should present it to users together with the costs, and explain that it is not the right choice for everyone.

1 Like

I think this functionality should be part of whatever tool is generating documentation from the source code, which is exactly the right time and place to be warned about any documentation issues. Unless that documentation generator is going to be built into the compiler, I don't see why the diagnostics should be. And if it is built into the compiler these warnings should only be produced when in that documentation mode.

1 Like

Wouldnā€™t you just set the flag to just turn off the feature in these cases?

Thank you all for the suggestions and your concerns. I hope I don't come off as insufferably philosophical but please let me provide some larger context around this flag.

A warning does not force, especially considering the flag is a conscious choice by the developer. I find these kinds of statements to be entirely too pessimistic and fatal about docs. One could argue the same for code itself. I wouldn't suppose that all developers are excellent writers any more than I would suppose that all writers are also excellent software developers (if only!). For the lucky projects that have a dedicated, experienced technical writer, I am admitting a little bit that a writer may edit a documentation comment in order to improve the docs of the public interface, or work together with developers to come up with good public comments. In that case, having a coverage tool may be useful. For those that don't have that luxury, it can simply be a way to find something you missed rather than scanning through all of your files manually. For those that don't care or want to know, they can just not use the flag. Please don't take my suggestion as too finalā€“I am suggesting this as an option, not a restriction.

Oftentimes, on the whole, I feel a strong underlying sentiment: good documentation is hard. It sure is. Let me offer something more holistic: good software is hard. To me, software is not separate from but includes tests and documentation. Just as test and documentation coverage may be superficial, so too may an API be overly complex, inflexible, inscrutable, diffuse. It may be that the API's design itself makes it difficult to describe any one declaration in the first place. The performance may not be adequate. There my be edge cases which cause crashes or undesired behavior. Should we still continue to write code at all? (Sometimes, I wonder...)

I understand that because the compiler can't reconcile the semantics of, say, English, with that of the programming language, that things can get "out of sync". Some languages can't prove even basic things about code, so establishing quality is ultimately up to us. This is where my holistic point of view comes into play. To say that, because you can't do something well and consistently, you might not bother, I just can't agree. That kind of stance takes away opportunities, agency, and responsibility. In this case, it either demotes documentation or at the very least divorces it from the process of writing software. It's just not consistent with the value of practicing pretty much anything.

To give a better idea of how I feel about documentation, let's take a look at some of the great examples above where the ball is clearly in the developer's court.

Regarding the triviality of some comments, I'll just say this: even the most trivial comments may enhance search indexing, for example. Sure, there may be no side effects, there may be no preconditions in a getter/setter, but there are other things we can explain: Why is this property important? Does it map onto a concept whose jargon differs from the code? Is it typically written out more completely in prose? It's up to the developer to decide that.

I think this is a useful doc comment, even if it's only because Swift does not have formal abstract classes (although that's not the only reason).

I also think this is a useful doc comment. Not every initializer is about "converting" something.

I think we can all probably agree this one isn't great for public consumption. As an internal developer comment, it can probably be improved as well. If we can know that, then we have an opportunity to improve it. More, this is a rather short concrete type spelling. For type aliases that hide a gigantic generic signature, maybe the type alias isn't the right choice in the first place. Again, let's consider these things together, not separate.

Finally, I just want to reiterate that this is not meant to be the end-all of establishing documentation quality. I think we all understand that measuring that is subjective in nature and you can't consider it without also considering the code and, more importantly, the person who might be reading the documentation and the context(s). This is really just a tool among many, not meant to establish a primary unit of quality any more than any other software metric. It's just a quick check to see where the gaps are.

As for the practices and views about documentation and the quality bar, we've barely scratched the surface.

4 Likes

Warnings usually do cause people to take action. That's why we add them -- we want to nudge developers in a certain direction. And people like to keep their source code warning-free.

In projects where multiple people contribute, it can be turned on by one developer who is trying to improve code health, but does not understand all implications and costs.

Sorry, I think I didn't explain my point well. These comments come from real-world projects. This code passed code review. So at least two people looked at each example and said "yes, this is the best way we can word this comment".

I am not saying that these comments can't be improved -- they certainly can be, and you suggested some good options. What I'm saying is that an average developer, when nudged by a tool to write a comment, will likely to write one of these trivial ones, and it will likely pass code review.