Diagnostic for undocumented public declarations

bitjammer · September 3, 2020, 4:57pm

Hi, I can answer this. This cascading logic is implemented in the compiler and it presents an API that just asks, "what is the doc comment for this declaration?". If there is no doc comment on the declaration itself, it looks to protocol requirements and declarations inherited from superclasses.

I completely agree that uses of symbol data should have a consistent source of truth without a bunch of implementations trying to figure this kind of thing out. In fact, this functionality should already be in use for Xcode's Quick Help or anything else that asks for a declaration's doc comments.

haikuty · September 5, 2020, 7:50pm

As long as it's easy to turn off. Ideally via per-file and/or per-module/target settings in Package.swift somehow and in an IDE (Xcode's "Target - Build Phases - Compile Sources - Compiler Flags" per-file setting area comes to mind).

There is a lot of legacy code (in our large projects and in third party packages) that won't have documentation comments for every single thing. Since we set treat Warnings as Errors, no way to turn this off will be a nightmare.

I can see a perspective that believes that some public fields in protocols really don't need documentation (they are obvious from their name), so this "feature" could well feel like an annoying micromanaging over-protective-nanny "feature" to people with that perspective.

Addendum: I do like good documentation though :), so an optional something like this that gently encourages adding it seems like a nice addition.

Bas_Broek · September 10, 2020, 2:28pm

I'd love this!

gribozavr · September 11, 2020, 11:36am

I think this feature will be useful for certain users, for example, the Swift standard library itself, swift-nio, Foundation, Dispatch etc. These are libraries that are so widely used that spending a non-trivial amount of time to document every single API is clearly beneficial.

However, the documentation of this new flag should explain the costs of requiring documentation on every public API. Specifically:

The maintenance costs are very real. Updating comments takes time, especially for engineers who are often not technical writers. If the comment takes more time to write and keep up-to-date than it provides value to readers, then it is a net negative for the project.
Documentation can bitrot. Out-of-date, misleading documentation can be worse than none at all.
A tool that requires a documentation comment can push developers to write meaningless comments that on the surface level look OK, and make the API look "documented", when in fact it is not.

When the developer does not know how to write a good comment, they have a choice of not writing a comment, or writing a meaningless comment. A warning that requires a comment therefore forces the developer to sometimes write a meaningless comment.

Some examples that I have seen in the production code:

/// The customer ID.
var customerId: CustomerId

/// An abstract base class for all concrete image uploader classes.
class ImageUploader { ... }

extension ImageOrientation {
  /// Converts an `ImageOrientationProperty` to `ImageOrientation`.
  init(_ imageOrientationProperty: ImageOrientationProperty)

/// Define a convenience alias for `Result<String, NetworkError>`.
typealias MyLibraryResult = Result<String, NetworkError>

My preference as a reader is that such comments were omitted entirely. I'm worried about that a warning might push developers to write more such comments. I don't think that a warning can make people write more helpful comments. When a developer knows how to write a helpful comment, they usually do it without a reminder.

TL;DR: Unless there is a technical writer on the project, I think it would be better off not mandating a comment on every public API.

Aside: I think requiring a comment on every public API is similar in spirit to requiring unit tests to provide 100% code coverage. It is great to have thorough unit tests, and indeed the code coverage metric correlates with the quality of tests. So people with good intentions start demanding 100% code coverage. Achieving 70-95% coverage is very doable, and such tests provide value to the project by catching bugs. Covering the remaining code in tests requires a disproportionate amount of effort and might not provide value: the tests written for the last 5% coverage might not prevent any bugs in future. 100% code coverage also provides a false sense of security (just like doc comments on every API create a false impression that the library is thoroughly documented). Of course there are project like software for aerospace applications where 100% code coverage is justified, but they are exceptions rather than norm. Similarly, for documentation comments, only if the library is used very widely it makes sense to spend the effort to write a comment on every API -- but such libraries are exceptions.

gwendal.roue · September 11, 2020, 12:09pm

Thanks for writing this.

Some documentation generators use a marker such as /// :nodoc: in order to remove some methods from the generated documentation. One use case is to avoid methods inherited from frameworks to interfere with the intended documentation (for example, you don't want to document viewDidLoad, and you don't want it to appear in the documentation either).

Based on this example, we could imagine a marker such as /// :trivial: which would silence the diagnostic we are talking about (and tell documentation generators to output the declaration without any documentation).

gribozavr · September 11, 2020, 12:27pm

That's a good idea, it might work for some projects that want to mostly-document their APIs, expanding the user base of the feature.

However, for some projects, even spending the time and readers' attention on these markers might be too much of a cost. For some projects the right tradeoff might be to document the central abstractions and leave the rest undocumented -- not because the remaining APIs are trivial, but because the cost of documenting is unlikely to be balanced by a future benefit.

I'm not opposed to the feature -- I'm only saying that we should present it to users together with the costs, and explain that it is not the right choice for everyone.

jawbroken · September 11, 2020, 1:27pm

I think this functionality should be part of whatever tool is generating documentation from the source code, which is exactly the right time and place to be warned about any documentation issues. Unless that documentation generator is going to be built into the compiler, I don't see why the diagnostics should be. And if it is built into the compiler these warnings should only be produced when in that documentation mode.

toph42 · September 11, 2020, 4:31pm

Wouldn’t you just set the flag to just turn off the feature in these cases?

bitjammer · September 11, 2020, 10:41pm

Thank you all for the suggestions and your concerns. I hope I don't come off as insufferably philosophical but please let me provide some larger context around this flag.

A warning does not force, especially considering the flag is a conscious choice by the developer. I find these kinds of statements to be entirely too pessimistic and fatal about docs. One could argue the same for code itself. I wouldn't suppose that all developers are excellent writers any more than I would suppose that all writers are also excellent software developers (if only!). For the lucky projects that have a dedicated, experienced technical writer, I am admitting a little bit that a writer may edit a documentation comment in order to improve the docs of the public interface, or work together with developers to come up with good public comments. In that case, having a coverage tool may be useful. For those that don't have that luxury, it can simply be a way to find something you missed rather than scanning through all of your files manually. For those that don't care or want to know, they can just not use the flag. Please don't take my suggestion as too final–I am suggesting this as an option, not a restriction.

Oftentimes, on the whole, I feel a strong underlying sentiment: good documentation is hard. It sure is. Let me offer something more holistic: good software is hard. To me, software is not separate from but includes tests and documentation. Just as test and documentation coverage may be superficial, so too may an API be overly complex, inflexible, inscrutable, diffuse. It may be that the API's design itself makes it difficult to describe any one declaration in the first place. The performance may not be adequate. There my be edge cases which cause crashes or undesired behavior. Should we still continue to write code at all? (Sometimes, I wonder...)

I understand that because the compiler can't reconcile the semantics of, say, English, with that of the programming language, that things can get "out of sync". Some languages can't prove even basic things about code, so establishing quality is ultimately up to us. This is where my holistic point of view comes into play. To say that, because you can't do something well and consistently, you might not bother, I just can't agree. That kind of stance takes away opportunities, agency, and responsibility. In this case, it either demotes documentation or at the very least divorces it from the process of writing software. It's just not consistent with the value of practicing pretty much anything.

To give a better idea of how I feel about documentation, let's take a look at some of the great examples above where the ball is clearly in the developer's court.

Regarding the triviality of some comments, I'll just say this: even the most trivial comments may enhance search indexing, for example. Sure, there may be no side effects, there may be no preconditions in a getter/setter, but there are other things we can explain: Why is this property important? Does it map onto a concept whose jargon differs from the code? Is it typically written out more completely in prose? It's up to the developer to decide that.

I think this is a useful doc comment, even if it's only because Swift does not have formal abstract classes (although that's not the only reason).

I also think this is a useful doc comment. Not every initializer is about "converting" something.

I think we can all probably agree this one isn't great for public consumption. As an internal developer comment, it can probably be improved as well. If we can know that, then we have an opportunity to improve it. More, this is a rather short concrete type spelling. For type aliases that hide a gigantic generic signature, maybe the type alias isn't the right choice in the first place. Again, let's consider these things together, not separate.

Finally, I just want to reiterate that this is not meant to be the end-all of establishing documentation quality. I think we all understand that measuring that is subjective in nature and you can't consider it without also considering the code and, more importantly, the person who might be reading the documentation and the context(s). This is really just a tool among many, not meant to establish a primary unit of quality any more than any other software metric. It's just a quick check to see where the gaps are.

As for the practices and views about documentation and the quality bar, we've barely scratched the surface.

gribozavr · September 12, 2020, 2:24pm

Warnings usually do cause people to take action. That's why we add them -- we want to nudge developers in a certain direction. And people like to keep their source code warning-free.

In projects where multiple people contribute, it can be turned on by one developer who is trying to improve code health, but does not understand all implications and costs.

Sorry, I think I didn't explain my point well. These comments come from real-world projects. This code passed code review. So at least two people looked at each example and said "yes, this is the best way we can word this comment".

I am not saying that these comments can't be improved -- they certainly can be, and you suggested some good options. What I'm saying is that an average developer, when nudged by a tool to write a comment, will likely to write one of these trivial ones, and it will likely pass code review.