[Pitch] Code block diffs in articles

omathews · January 9, 2023, 10:40pm

Code block diffs in articles

Acknowledgements

Thanks to @marcus_ortiz and @ethankusters for their support and feedback with drafts of this pitch.

Motivation

When writing instructional text, authors often need to show before/after states of small code blocks. This is useful with purely additive modifications:

From

print("Hello")

To

print("Hello")
print("World")

But it’s especially important to illustrate changed or deleted lines of code:

From

print("Hello")
print("World")

To

print("Goodbye")
print("Cruel")
print("World")

In that case, the author has several unsavory options:

Provide a code block illustrating the final state of the code, along with a description of what the user needs to do to arrive there, e.g. “On line 1, replace "Hello" with "Goodbye", and before line 2, insert the new line print("Cruel").”
Provide two code blocks illustrating the state of code before and after the changes, as shown above.
Use standard syntax for illustrating diffs with “+” and “-” characters in the first column of added and deleted lines, respectively. This unifies the listing, but doesn’t provide any visual indicator aside from the leftmost character.

Note

DocC Tutorials already handle line-highlighed diffs of source code, but the capability doesn’t extend to articles. Even if an article’s primary purpose is not to provide a long-form step-by-step coding walkthrough, code diffs are extremely helpful. For example, the author could note important API differences, or illustrate small refactoring tasks.

Proposal

Provide a code diff syntax for fenced code blocks. The syntax adds the word diff to the beginning fence of any code block, which instructs the DocC to parse the code for leading “+” and “-” characters and render those lines accordingly. A diff code fence can simply be ```diff, or be followed by a language specification, like this: ```diff swift.

Using a language specification in addition to diff allows for the possibility of keeping syntax highlighting within highlighted lines.

Examples

This

```diff swift
 print("Hello")
+print("World")
```

Renders the second line as a code addition.

This

```diff swift
-print("Hello")
+print("Goodbye")
+print("Cruel")
 print("World")
```

Could render two ways:

As one deleted line and two added lines
As one unified, changed line, with both “Hello” and “World” present and formatted to indicate deletion and addition, and one added line

Format

We can follow the basic rules of the git diff format, in which the first character indicates no change (space), an addition (+), or a deletion (-).

There’s some existing work in highlight.js that tackles this. Note that rendering the “unified” format in which sufficiently similar lines are diffed internally would require significant work beyond the simpler approach.

Alternate approach

Instead of the git diff format, which arguably requires more work when pasting code from source files, and when editing code blocks in Markdown source, we could use the following formatting rules.

Unchanged lines are rendered as is; no extra leading characters are required.
Any line beginning with one or more “+” characters, followed by one space or tab character, is considered an added line. The leading characters are deleted.
Any line beginning with one or more “-” characters, followed by one space or tab character, is considered a deleted line. The leading characters are deleted.

Within a code block, the number of leading indicator characters must be consistent. In most circumstances, a single character suffices; however, some languages—notably ObjectiveC—already use leading “-” and “+” characters. For such languages, the use of multiple leading indicators is required to disambiguate. For example,

```diff objc
- (void)instanceMethod:(NSInteger)x;
-- - (void)aRemovedInstanceMethod:(NSInteger)x;
+ (void)classMethod:(NSString *)s;
++ + (void)anAddedClassMethod:(NSString *)s;
```

In the code above, lines 2 and 4 are parsed as removed and added, respectively.

Unfortunately this increases the complexity of the algorithm. It also removes the capability to simply paste the output of git diff into code blocks.

jack · January 9, 2023, 11:00pm

Can you speak to why this doesn't follow the approach used by tutorials (using separate files that are diffed for you)? I think this approach makes sense for articles but the reasons should be documented for future decision-making.

omathews · January 9, 2023, 11:23pm

That's a good point. I actually think both approaches are useful for articles, and I have a forthcoming pitch that addresses the other one; otherwise I'd add a section with a rationale as you suggest.

ksluder · January 9, 2023, 11:30pm

git isn’t the only tool to produce or consume unified diffs. That only strengthens the argument for using unidiff format, IMO.

marcus_ortiz · July 7, 2023, 8:35pm

Sorry for the late reply, but the simpler solution mentioned here of using git-diff output in code listings as-is should be possible already (just the simple rendering with green/red lines and emphasized headers).

The more advanced rendering with syntax highlighting code within diffs or utilizing our special diff format for tutorial code would require some coordinated with the DocC compiler.

I did notice with a quick test that the renderer doesn't have the right CSS to map the green/red colors to the appropriate tokens though, so I can create a quick issue and address that.

ethankusters · July 7, 2023, 9:29pm

That sounds great to me. Adding support for the basic diff syntax highlighting and then (after seeing how that's used in practice) expanding if there's still a need seems like a great path forward. This will also create parity with GitHub's markdown code block rendering which is something a lot of DocC users expect.

@omathews what do you think of this as a first step?

omathews · July 7, 2023, 9:32pm

This looks great! We'll make heavy use of this in some upcoming content, so there'll be plenty of opportunity to test it out.