[Idea] [Pitch] Distinguishing code comments from text comments


(Doug McKenna) #1

All -

And now for something completely different …

I'd like to highlight a problem with nested block comment delimiters that third parties can solve locally, but which perhaps needs to be thought about globally at the language design level, including even among different languages that all use the C syntax for commenting.

Block comment delimiters /* and */ do not nest in C, C++, Objective C, and in some other languages that use these same delimiters for single- or multi-line commenting. But Swift does support nesting (as does Rust, IIRC).

As near as I can tell, the /only/ reason to support nested block comments is to enable the Swift user (programmer) to comment out multiple lines of existing code without having to worry about the presence of existing block comment delimiters already used in those lines that add text commenting about the code. Furthermore, by virtue of the Swift compiler ignoring nearly everything between the outermost /* and */ of the nest, the commented code need not be syntactically correct. This is different from placing code inside a conditional #if 0 … #endif or equivalent (as I understand what Swift compilation does or will do in contradistinction to what C does).

All this is fine with respect to the Swift compiler, which by design shouldn't care about any of the text inside the outermost /* and */ (other than keeping track of any delimiter nesting).

The problem is that with respect to the rest of the outside world, it would be really great if the delimiters to comment out code lines were not the same as the delimiters to comment out arbitrary text representing hints about hidden state for a human to read in the future. These really are two different activities, with two different intentions and meanings, even though they look the same from a compiler's limited point of view.

Suppose an analysis program that takes source code as mere text input wants to analyze or otherwise react to special patterns in that input source file's comments without caring about what the actual source code does (or even whether it's syntactically correct, other than comment delimiter syntax). The analysis program just wants to robustly find the correct boundaries between code and comment based on where /* and */ occur (and also line-ending // comments, but those are easy). Furthermore, this program wants to do its analysis in one pass of the input file. For instance, Doxygen likely does such an analysis.

In a non-nesting situation in C source code, it might be reasonable for any such text pattern analysis tool to assume that the text following a /* is not going to be code. In C-based languages, after all, there's a preprocessor that gets used to excise multiple lines of code, using #if 0 … #endif. The preprocessor idiom is used because it nests and isn't affected by pre-existing block comments in the code being kept from the compiler's purview.

But in Swift's nesting /* … */ world, there is no robust way for a one-pass text analysis tool to determine whether a given initially encountered outermost /* is introducing commented code lines, or a section of human-readable text (or for that matter, ASCII art!) that would usually be associated with the innermost /* … */ in a nest.

This can be solved locally on a per-tool basis. Because the Swift compiler doesn't care what's inside the outermost /* … */ (other than counting nesting balance), then the analysis tool might create its own private nested code comment syntax, asking the user to do something like

/*{
   excised code here /* commented text here */
}*/

to indicate to the analysis program that the block comment delimiters are really "ignore this code" delimiters rather than "ignore this comment text" delimiters.

Indeed, in a tool I'm writing that analyzes the commenting patterns in a source file (and not just Swift source files), the above is my solution, which is additionally useful because it leverages off of the IDE's ability to find matching braces so as to select everything between. It requires the interested, tool-using user to pre-distinguish in his or her source code the two types of commenting for the tool's benefit, but only if that programmer wants to use the tool. The Swift compiler doesn't care.

But it's not such a great situation when every analysis tool might implement its own private syntax to solve the same problem. No standardization means inevitable conflicts.

So it would be great if there were a more global, agreed-upon syntactic standard for distinguishing between commenting out code lines and commenting out arbitrary text explaining stuff to a human. Such a distinction would make it a lot easier and more robust for every non-compiler, pattern-matching, source code analysis tool to do interesting things (or, rather, to not do stupid things), without interfering with each other's needs and capabilities.

Surely there's an additional set of nesting delimiters to use, such as

/{
   excised code here /* commenting text here */
}/

/@
   excised code here /* commenting text here */
@/

or

/...
  excised code here /* commenting text here (no slashdot jokes, please) */
.../

that the source code editor would recognize as similar to braces so as to be able to highlight what's between. I'm not familiar enough with Swift syntax to know which characters after a slash, or other combinations, are unambiguously available. Discussion invited. The delimiter combo should be composed of ASCII (7-bit) Unicode code points.

Unlike so many other requests, this one seems like it would be pretty easy to implement, with interesting benefits. For instance, it would make for more accurate and useful syntax coloring algorithms in source code editors. The style for excised (uncompiled) code could be different from the style for text comments.

The same discussion might also apply to //-initiated comments, which extend to a line's end. If you want to comment out a single line of code, as opposed to introducing a compiler-ignored remainder-of-the-line text about code, a different delimiter would make life a whole lot easier for outside analysis tools and syntax colorers, should the user/programmer desire to use those tools.

All of this would be entirely optional to the Swift coder who doesn't care. The point is that if the Swift coder wants to use outside analysis or translation tools or better syntax coloring, it would be nice if these tools could all operate under the same set of syntactic rules about what's code and what's text in a comment. Such an addition to the language in Swift 3 would be influential among other language designers as well.

/; thoughts() // A penny for your /* thoughts? */

Doug McKenna
Mathemaesthetics, Inc.


(Dmitri Gribenko) #2

That is the root of the issue. You shouldn't be parsing source code
with ad-hoc tools. You need to use a real compiler. Just because it
happened to work with, say C, does not mean that you got it right. I
doubt that a program that you described was correctly handling, say,
line continuations in comments.

The bottom line is, you need to use a real compiler frontend to parse
source code.

Dmitri

···

On Fri, Feb 26, 2016 at 9:35 AM, Douglas McKenna via swift-evolution <swift-evolution@swift.org> wrote:

Suppose an analysis program that takes source code as mere text input

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/