Diagnostic for undocumented public declarations

bitjammer · September 3, 2020, 1:41am

Hi all,

I'm working full-time on documentation features and I'd like to gauge your feedback on emitting a diagnostic when a public declaration does not have a documentation comment.

I don't see that we have anything like this yet. This is just one signal could be picked up during local or PR builds, or shown in an IDE. For example, this could be used in conjunction with a documentation coverage summary as well.

There are a two options that I am thinking of:

Opt in to warning diagnostics with an opt-in flag.
Add something like -warn-undocumented-public-api, default is "no".

Emit a warning diagnostic by default.
Silence the warning only by adding a documentation comment or turning off the check globally with -no-warn-undocumented-public-api. Of course, I'm biased toward this solution but I'd like to know how you all feel about it.

Thoughts?

suyashsrijan · September 3, 2020, 1:51am

I think this is a really good idea. Although, we don’t generally offer diagnostics that can be enabled or disabled via flags, but perhaps this can be an exception.

SDGGiesbrecht · September 3, 2020, 1:54am

This functionality currently lives in swift-format, where it can easily be turned on or off. It was originally on by default, but caused so much grief it was demoted to opt‐in.

bitjammer · September 3, 2020, 2:22am

Do you know why that was implemented in swift-format rather than the compiler itself?

harlanhaskins · September 3, 2020, 2:33am

Just because swift-format attempts to be a linter, and that's a pretty good lint step the compiler didn't do at the time...

allevato · September 3, 2020, 2:36am

That rule suffers from trying to be implemented in swift-format because there isn't enough context from just the syntax tree to determine whether a documentation comment is "necessary".

For example, if a public method overrides a superclass method or implements a protocol requirement and the original decl has a documentation comment, should it be required to have its own documentation comment as well? I think (hope) most people would say "no", unless the new decl has different enough behavior worth documenting. But swift-format can't determine what's being inherited from/conformed to from the syntax tree, so the rule was far noisier than we wanted it to be.

Doing something like this in a system that's type-aware (either in the compiler, or another SourceKit-using tool) seems like a requirement for it to be correct.

(All the code that attempts to deal with doc comments in swift-format needs a major overhaul, but I also wouldn't mind seeing it move into other tools in situations where it makes more sense.)

owenv · September 3, 2020, 2:46am

I think integrating a warning for this in the compiler is a good idea, despite the fact that Swift has traditionally avoided "linter-ish" warnings and the ability to disable them on a case-by-case basis.

I would like to suggest a third option: instead of adding an option to control only this one warning, it might be good to widen the scope a bit and have one flag, say -enable-documentation-linting, that for now only controls this warning, but could be extended to cover other docs-related diagnostics in the future. For example, if at some point we introduced a warning if some but not all parameters of a function were covered by a doc-comment, I think it would be nice to have it under the same flag. It also seems more in line with Swift's existing approach to diagnostics, which is intended to avoid unnecessary config options that might fragment the ecosystem (which is less of a concern if we're just talking about comment formatting, but still worth considering IMO).

SDGGiesbrecht · September 3, 2020, 3:01am

If we go the compiler flag route (which I’m not entirely convinced of), then I think SwiftPM and other tools ought to at least be smart about it, such as by applying it to products but not to internal targets. It would then be less of a language dialect, and more of a coding environment, akin to how main.swift has unique rules regarding top‐level code, or how playgrounds have extra mark‐up available.

bitjammer · September 3, 2020, 3:02am

Thanks, that's a great point. I've also got that in the back of my mind, checking that docs semantically agree with the declaration, as a part of measuring coverage. I'd be a little worried that the flag name might indicate that more is going on than perhaps should but I think it makes sense to have one "check my docs" flag.

bitjammer · September 3, 2020, 3:15am

I agree. I think there are contexts or build scenarios where one can assume that someone has already assessed the quality/presence of the API docs or perhaps that they can't really be "published" anyway. That said, I would like to at least provide the option for a project owner to check their own doc comments, even if it is only for themselves, or their teammates, and not necessarily meant for wide distribution.

bitjammer · September 3, 2020, 3:17am

Thanks, this is exactly what I was getting at with my question above. The check would absolutely be based on the cascaded doc comments.

allevato · September 3, 2020, 4:00am

I agree, and I think having some limited degree of control over "strictness" is key here. To avoid feature creep, I imagine most scenarios where a user wants doc comment validation are one of these:

The documentation is not going to be published, so not every public decl needs a comment, but decls (of any visibility) that do should still be validated for correctness (i.e., parameters in the comment match the signature, you don't have a Returns section for a function that returns Void, etc.).
The documentation is going to be published, so every public decl should have full documentation (either inherited or its own) and be validated for correctness.

Being able to choose between those two modes (with "off" being the third) would satisfy folks who have more permissive style guidelines for project- and team-internal code, such as "if the only thing you would say about a method named setFoo is 'sets the foo', then leave it out," whereas if you were publishing these docs for an external SDK, you probably would document that in the interest of completeness and having something to link to.

This feature would probably also need some special cases to exclude things like public declarations that have a name with a leading underscore, since that typically signifies that they're only public as an implementation detail for another module or for testing, and we probably don't want to require documentation for those even in a so-called strict mode.

bitjammer · September 3, 2020, 4:27am

Yes, I think that makes a lot of sense.

I just recently had to wrestle with this when emitting symbol graphs that had to handle nested cases, @_spi, and some other more rare cases, so I think there is a good foundation I can repurpose to prevent this kind of noise.

Karl · September 3, 2020, 5:10am

I‘m definitely in favour of a sourcekit-based solution. I’ve only used it a little bit myself, but I think it’s supposed to be the library with “IDE-level” understanding of your code. It can do code completion and syntax colouring, and that requires a fairly complete idea of things. I really think we should be using it more - especially for command line tools! For too long have IDEs hogged the good stuff!

But it isn’t very easy to build a sourcekit tool that can scale to large projects. It would be really nice to have something with a simple API like swift-format‘s custom linter rules, but with greater understanding of your code. So you can write your own complex rules in a host that handles all the tricky stuff about using sourcekit (like figuring out import paths and so on).

There might already be a tool like that. I’m not extremely well versed on the various linters out there (there are a few, I think).

gwendal.roue · September 3, 2020, 8:14am

Hello, I'm really happy this concern is explicitly mentioned in this thread (as well as underscore-prefixed identifiers). I feel like we're in good hands

Does anyone of us happen to know what algorithm is used by Xcode in order to feed this cascade and display the relevant documentation when one option-clicks an identifier in the editor? It is private to Xcode, or available in some external tool, like SourceKit-LSP?

Because... Isn't the goal of the diagnostic we talk about to make sure:

Documentation is available in well-behaved IDEs
Documentation is available for well-behaved generators (html, etc.)

Isn't it a primary goal, then, to see if both can feed from the same source of truth?

xwu · September 3, 2020, 1:32pm

Whatever part of the project that @owenv refers to in the post below seems to have all of the prerequisite knowledge; shouldn't any missing docs or issues be diagnosed as part of that?

xwu · September 3, 2020, 1:36pm

If we are going to put this functionality into the compiler, I would love to see both of these modes check code examples in doc comments to make sure they compile (or, even, verify that the output matches what is claimed).

Still not sure if all of this has to be done in the compiler though. Could it be part of SwiftPM? What do peer languages such as Rust do in terms of validating documentation completeness?

allevato · September 3, 2020, 3:42pm

I think this would be a great addition and tie into the existing playground capabilities really well. At the same time, validation of code snippets seems like a topic of large enough depth that it deserves its own separate discussion from this one about validation of existence/structure of comments, since there would be a lot more details to work out:

How strict would it be: syntactic validation, semantic validation, or runtime validation? I can think of legitimate reasons for each.
There'd need to be a way to control whether validation was desired (since it may not be for all Swift code blocks), so maybe a new Markdown code fence language identifier would be needed.
Would each snippet be isolated, or could they build on each other by referring to symbols from earlier ones (like in this documentation), making it more like a "reverse playground"?

Please remember that not everyone uses SwiftPM as part of their build process; they may use Bazel, CMake, or some other build system, so saying it should be part of SwiftPM specifically would make it unusable to many folks. If the distinction being drawn here is whether it should be "part of the compiler" vs. "another tool that's also distributed as part of a standard Swift toolchain and is used in SwiftPM's build plan but can also be invoked in a standalone fashion", then we need to state that clearly to make sure we aren't locking out a class of users.

In this particular case, I think it's entirely appropriate for the validation to occur in the compiler for one main reason: the compiler already emits a .swiftdoc file containing the documentation for a module and that file is distributed with the module and used by SourceKit/language server tooling to surface that documentation to users. This is unlike many other languages, where a separate tool is used to perform an independent scan of source code to extract the comments in order to generate documentation output. The Swift compiler is already parsing the structured Markdown in the comments in order to write it out in that serialized format, so having it validate it there as well seems sensible and the most efficient choice. Requiring a user to run a separate tool after the fact would be far less so, both for the user and for the tools.

Jon_Shier · September 3, 2020, 3:43pm

Yes, this is my biggest issue with the existing documentation generators. I think @mattt was looking at this as part of swift-doc but I don't know if he's been able to solve this issue either.

Going forward, I think the solution here is an official linter (whether as part of swift-format or a separate tool), as this doesn't really have anything to do with the code. However, if Swift were to add support for compiling example code in the inline documentation itself, perhaps some aspect of doc linting could appropriately live in the compiler. (As an aside, we've stopped putting full examples in most of Alamofire's documentation, both due to the complexity and to stop copy / paste coding, which was becoming a support burden.)

Joseph_Heck · September 3, 2020, 4:27pm

While I know it's not a common pattern in the compiler today, I think it would be really useful. I also love the idea of verifying any code samples within a comment, but I'd be quite happy with just a flag that "something is missing" well prior to that advanced step.