RFC: Add Trivia.docCommentValue property

Hi all,
I have a pull request up to introduce Trivia.docCommentValue. This property extracts the text content from documentation comments, removing comment markers (///, /**, */) and stripping common indentation. The general idea is that tools like documentation generators, linters, LSPs and IDE features often need to work with the actual content of doc comments rather than the raw trivia text.

PR: Add docCommentValue property to Trivia by PhantomInTheWire · Pull Request #3230 · swiftlang/swift-syntax · GitHub

Here's the full interface to docCommentValue

extension Trivia {
  /// The contents of the last doc comment piece with any comment markers removed and indentation whitespace stripped.
  public var docCommentValue: String? { get }
}

The implementation handles several edge cases that would be error-prone to implement manually:

  • Both /// line comments and /** */ block comments
  • Consistent handling of /// (with space) vs /// (without space) - if all lines have the space, it's stripped; otherwise only /// is stripped
  • Computing and removing common indentation in block comments
  • Handling content that starts on the same line as /** or ends on the same line as */ (spaces between markers and content are trimmed)
  • Identifying the correct doc comment section when multiple exist (separated by blank lines or regular comments)
  • Canonicalizing line endings to \n
3 Likes

I have two thoughts about this:

  1. Why is it restricted to doc comments? Why wouldn't it return the content of non-doc comments as well?
  2. Multiple comment blocks can be associated with a syntax element. So if we have
    /// comment 1
    
    /// comment 2
    let a = 1
    
    the current implementation would only return comment 2, right? I wonder if returning [String] to accommodate multiple comment blocks instead of String? for only the last one (or none) was the better API. From the implementation side it should be clear when a new block starts — after empty lines or when the comment type changes.
2 Likes

I implemented similar logic in swift-format a while back so that we could interpret doc comments, but I never had the cycles to move it into swift-syntax proper. I'm glad that you've taken this up!

Documentation comments have very specific rules around how their prefixes are stripped and how they are interpreted when there is or isn't intervening whitespace. I think it would be fine to introduce something like a commentText property for trivia but it would be a slightly different implementation than one for doc comments.

Since the compiler has its own interpretation of what a doc comment is, it is absolutely critical that this API returns exactly the same thing as what the compiler would scrape from the same trivia when it builds its data structures for doc generation, SourceKit quick help, etc. (Indeed, as more of the compiler is implemented in Swift, I would expect this new property to replace the C++ doc comment parsing logic in lib/Markup.)

In the case you cited above, the intervening blank line is significant and it means that there is only a single doc comment for the declaration: comment 2. There cannot be "two doc comments" for a declaration—in general, they are all juxtaposed and merged into a single block or there is intervening vertical whitespace, in which case the last one wins.

1 Like

I agree with @allevato here.

I think what @SimplyDanny is pointing at is a slightly different API with a broader scope. Something along the lines of:

extension Trivia {
  /// The processed contents of all comment blocks in this trivia, including
  /// regular comments (`//`, `/* */`) and documentation comments (`///`, `/** */`).
  ///
  /// Each element in the array represents a contiguous section of comments.
  /// Sections are separated by:
  /// - One or more blank lines
  /// - Non-comment trivia (except horizontal whitespace)
  /// - A change in comment style (for example transitioning from `//` to `///`)
  ///
  /// For each block, comment markers are removed, common indentation is stripped,
  /// and line endings are canonicalized to `\n`.
  public var commentValues: [String] { get }
}

That feels like a reasonable complement, but it is intentionally a different abstraction than docCommentValue.

As for more context for why this PR is limited to documentation comments and not just general comments: this follows prior discussion on a stale PR on the same issue.

If this is something SwiftLint or similar tools actively need, I am happy to add it as a follow-up PR with a separate API and clearly differentiated semantics.

1 Like

To elaborate a bit on the detached example, there's nothing particularly semantically interesting about /// comment 1 in this example:

/// comment 1

/// comment 2
let a = 1

since it's detached from the declaration. I can imagine someone writing a linter who wants to be able to diagnose something like "this doc comment is broken/isolated, did you mean it to be part of the following declaration?" But extracting its content (i.e., stripping the prefixes) doesn't actually help with that—it's already possible to detect by looking at the raw trivia and observing that there is a a doc comment piece that is followed by more than one newline. So there's no real need for an API that goes beyond what your proposed docCommentValue already does—it wouldn't be the right tool for that different job.

1 Like

Understood. I was more focused on the "text extraction" aspect which (in my interpretation) shouldn't care about what's semantically correct. This API has a second important property, though, and that is the "get me the relevant doc comment" in the first place.

The doc comment for the new API could highlight the second aspect: "The contents of the last doc comment piece ..." sounds like an arbitrary decision. However, "the last doc comment" is actually "the one and only doc comment" and that's the actual strength of this new property.

1 Like

Swift Testing consumes code comments in a few places, but we don't try to strip the regalia. Would be nice if swift-syntax provided a way to do so so we could clean up our captured strings. :+1:

1 Like

This is great feedback, thanks @SimplyDanny. @PhantomInTheWire could you open a PR that clarifies this aspect of the doc comment?

1 Like