RFC: Additional Options for Code Blocks (highlight, strikeout, wrap, line numbers)

Following up on the recent RFC that introduced copy-to-clipboard on code blocks, I’ve been working on additional options that build on that work. These additional options would also be available behind the --enable-experimental-code-block feature flag to modify the presentation of code blocks:

  • highlight=[Int] — highlight one or more specific lines
  • strikeout=[Int] — strikethrough specific lines
  • wrap=Int — apply soft wrapping at a specified width
  • showLineNumbers — toggle line numbers on

The implementation of these additional options is available across two PRs, which work together:

To illustrate these options, here's a code block as it appears today, without any additional options:

```shell
# Enable custom routing.
RewriteEngine On

# Route documentation and tutorial pages.
RewriteRule ^(documentation|tutorials)\/.*$ MyNewPackage.doccarchive/index.html [L]

# Route files and data for the documentation archive.
#
# If the file path doesn't exist in the website's root ...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# ... route the request to that file path with the documentation archive.
RewriteRule .* MyNewPackage.doccarchive/$0 [L]
```

And here's the same example with the new options:

# shows line numbers and wraps lines at a character width of 60
# highlights lines 2, 5, 10, and 11
# adds a strikethrough line on lines 9 and 10

```shell, showLineNumbers, wrap=60, highlight=[2, 5, 10, 11], strikeout=[9, 10]
# Enable custom routing.
RewriteEngine On

# Route documentation and tutorial pages.
RewriteRule ^(documentation|tutorials)\/.*$ MyNewPackage.doccarchive/index.html [L]

# Route files and data for the documentation archive.
#
# If the file path doesn't exist in the website's root ...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# ... route the request to that file path with the documentation archive.
RewriteRule .* MyNewPackage.doccarchive/$0 [L]
```

These additions are meant to give authors more control over how their code is presented. Each option is opt-in at the code block level, so you can apply them only where they make an example clearer without affecting the rest of your documentation.

Thanks again to everyone who’s been part of these conversations. We’ve gotten some seriously helpful feedback in these discussions and we’ve appreciated what you all have shared. I’m looking forward to the discussion on these new options.

7 Likes

Why does wrap take a fixed number of characters/columns only? Wouldn’t it be better to wrap at whatever width the current browser layout is?

1 Like

The two are computed at vastly different times, and this option gives control to the author if they want the content to wrap (or not) with narrower view spaces.

If/when you make this purely dynamic and force wrapping based on the width of the viewport, unfortunately the user experience pretty quickly becomes quite poor with wider text boxes wrapping, sometimes illegibly. The goal of this was to provide a backstop to help when an author forgets or can’t easily wrap content in a wide code block rather than never have a scenario where horizontal scrolling of that code block might be needed with narrower view points (for example, an iPhone in portrait mode)

Any thoughts on how the highlight syntax could grow to support:

  • partially highlighted lines?
  • more styles than "highlight" and "strikethrough"?

Really looking forward to being able to highlight lines!

Have you thought about using something like <ol> or grid layout to implement the line numbers so that lines will wrap correctly around them?

I’m also interested in this. Particularly an “error” highlight would be super useful.

2 Likes

Yes, there are several places in the Swift Programming Language documentation that could benefit from a proper "error" highlight. For example, here are a few code blocks that illustrate errors using plain text comments in the code:

  • in the Opaque and Boxed Protocol Types documentation:

    func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
        if shape is Square {
            return shape // Error: return types don't match
        }
        return FlippedShape(shape: shape) // Error: return types don't match
    }
    
  • in the Memory safety documentation:

    var stepSize = 1
    
    func increment(_ number: inout Int) {
        number += stepSize
    }
    
    increment(&stepSize)
    // Error: conflicting accesses to stepSize
    
  • in the Generics documentation:

    func f<MyType>(x: inout MyType) {
        let x1 = x  // The value of x1 is a copy of x's value.
        let x2 = x  // The value of x2 is a copy of x's value.
    }
    
    func g<AnotherType: ~Copyable>(y: inout AnotherType) {
        let y1 = y  // The assignment consumes y's value.
        let y2 = y  // Error: Value consumed more than once.
    }
    

I thought about highlighting by token, which I think you could represent like this:

highlight=[1:3-6, 3:4, 5:3-7]

You could use that similar sort of syntax to support other styles. Are there other styles that you would like to see?

I’m more interested in exploring how the proposed syntax can grow to accommodate future directions and ideally to identify potential syntax issues before the syntax is settled and is still easy to change.

For example, the “error” highlight is different from the other two in that it typically comes with some message to display inline.

I have a feeling that diagnostics (errors, warnings, notes, remarks) should have their own presentation. I would imagine you might want IDE like overlays for those instead of line highlights!

I'm not concerned about the presentation. That's very easy to change after the fact.

My concern is whether or not the proposed syntax with all customization on the opening "fence" line of the fenced code block can grow to cover reasonable user-facing additions or if it's quickly going to reach a dead end where we need to introduce a new syntax to replace it (or worse; a mix of syntax for different features in the same area).

It's quite hard and slow to change user-facing syntax after it's been formalized and I don't want us to commit to a syntax unless we're confident that it's going to scale.

1 Like

This is basically the same argument that I made in this message

As we add more and more configuration to these code blocks I feel that we may reach a tipping point where the single "fence" line syntax is no longer viable.

Given that all these enhancements are encompassed by the same feature flag and share the same just introduced syntax, I feel like the syntax should be considered as a whole, including the future directions that we want it to be able to support.

1 Like

Jesse and I chatted briefly about this earlier today, and I totally get (and agree with) the concern about how this impacts the data structure of RenderNode, with potentially breaking or incompatible changes over time.

Jesse is taking a look at dropping back to leveraging a directive to provide the structure to encapsulate these variations, and allow for additional or alternate growth down the road.

If I'm following correctly, the primary concern is that we don't have any way to really handle breaking changes to the JSON data structures that are output and read by Docc-Render, so we need to be significantly more careful and thoughtful about how we add or adjust these structures down the road.

Does shifting to using a directive and passing that down into things help resolve this at any level? Or is this something that could be handled in the DocC code and JSON output structure adding a dedicated configuration "struct" that has baseline, default values that are expected to always properly render, and which would allow us to add more to down the road - although changing a property name or removing one would result in a breaking situation, so we need to be careful not to do that without a LOT of additional coordination.

Correct. We've never made a breaking change to the format of those JSON files so we've never defined a process for making breaking changes to the specification of those JSON files.

Other experimental features have needed to deal with this as well—designing in extensibility in how the information is represented in the JSON format—in order to consider potential future additions. For example, the highlighting of changed declaration tokens is represented as a string enumeration (below) even though there's only one highlight style today, so a boolean value would be sufficient today.

"highlight": {
    "type": "string",
    "enum": [
        "changed"
    ]
}

However, if we were to add another style of declaration token highlight, then the boolean encoding couldn't grow to support that but the string enumeration could.

That's a big reason why I'm pushing to explore additional possible features that we think that we want code blocks to support. Identifying what seems like good ideas that we're likely to want to add in the future informs what portions of the JSON format needs to be extensible and what portions don't need to be.

The developer-facing syntax for specifying these code block attributes doesn't impact the JSON format either positively or negatively. Regardless of syntax, the most likely implementation is that the internal DocC code would prepare the parsed information for the JSON format.

However, it can sometimes be easier to use a discussion about developer-facing features and syntax to explore the scope, which in turn informs what information DocC needs to encode in the JSON format.

For now, I think we should continue implementing the current syntax where everything is confined on the "fence" line of the fenced code block but at the same time we should explore likely future directions and be mindful that we might reach a point where having all the configuration on a single line becomes an unfeasible burden and that we might need to overhaul the syntax before marking this feature as non-experimental.

2 Likes

On the syntax side—just to introduce a new idea to the discussion that we haven't explored yet—today the diff syntax highlighting uses lines starting with + and - to indicate additions and removals. For example:

- func something() -> Int {
+ func something() -> String {
-     return 123_456
+     return "Hello, world"
}

We could draw inspiration from this syntax and let different highlight styles be defined by leading characters like +, -, > etc. For example, instead of:

```shell, showLineNumbers, highlight=[2, 5, 10, 11], strikeout=[9, 10]
# Enable custom routing.
RewriteEngine On

# Route documentation and tutorial pages.
RewriteRule ^(documentation|tutorials)\/.*$ MyNewPackage.doccarchive/index.html [L]

# Route files and data for the documentation archive.
#
# If the file path doesn't exist in the website's root ...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# ... route the request to that file path with the documentation archive.
RewriteRule .* MyNewPackage.doccarchive/$0 [L]
```

we could precede the highlighted and strikethrough lines with > (for "highlighted") and - (for strikethrough (assuming it's similar to a suggested removal)), possibly using a boolean configuration like highlighted on the "fence" start line to enable this style of parsing:

```shell, showLineNumbers, highlighted
  # Enable custom routing. 
> RewriteEngine On

  # Route documentation and tutorial pages.
> RewriteRule ^(documentation|tutorials)\/.*$ MyNewPackage.doccarchive/index.html [L]

  # Route files and data for the documentation archive.
  #
- # If the file path doesn't exist in the website's root ...
- RewriteCond %{REQUEST_FILENAME} !-f
> RewriteCond %{REQUEST_FILENAME} !-d

  # ... route the request to that file path with the documentation archive.
  RewriteRule .* MyNewPackage.doccarchive/$0 [L]
```

A downside with this syntax is that other tools are less likely to recognize it but an upside is that a person who sees the syntax in the other tool (who doesn't support it) can guess the author's intention.


Maintaining that idea of making the raw markup moderately human-readable; I could hypothetically see that syntax growing to support partially line highlights by adding a line below that consist of only whitespace and these highlight style characters (+,-, >, etc.). For example, if I wanted to emphasize that only the return type changed in

- func something() {
+ func something() -> Int {

I could hypothetically write it as something like:

```swift, highlighted
func something() -> Int {
                 ++++++
```

I could also imagine this hypothetical syntax growing to support other highlight styles and highlights with messages, by using ~ as the highlight indicator, the name of the style, a : separator, and the message. For example:

```swift, highlighted
func something() -> INT {
                    ~~~ error: Cannot find type 'INT' in scope
```

I'm not necessarily advocating for this syntax, just using to illustrate that there are alternatives beyond single line comma separated values and directive parameters that we haven't explored/considered yet.

3 Likes

I'll also say that I think it's worth talking about the syntax for these types of richer annotations because it can draw attention to some syntax limitations.

For example, using the invalidFlip code block with error messages (from the "Opaque and Boxed Protocol Types" documentation) as an example to look at how various syntax could represent those error annotations:

func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
    if shape is Square {
        return shape // Error: return types don't match
    }
    return FlippedShape(shape: shape) // Error: return types don't match
}

Here, each error annotation groups together 3 pieces of information:

  • a line and character range
  • a style ("error")
  • a message

The hypothetical syntax I used above groups this information like below:

```swift, showLineNumbers, highlighted
func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
    if shape is Square {
        return shape 
               ~~~~~ error: Return types don't match
    }
    return FlippedShape(shape: shape)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~ error: Return types don't match
}
```

The other two syntaxes each have different challenges with grouping information like this and with including longer form text as annotation information.

Assuming that the "fenced" line syntax uses a new list for each style, the first two could be grouped together like error=[3:15-20, 5:12-38]. It's possible that the error message could be interspersed with the ranges, representing a 3 pieces of the the first annotation like error=[3:15-20 "Return types don't match"]. That's not too bad on its own, but considering that all annotations have to fit on the same line, this can start to become cumbersome after only a couple of annotations (since potentially long messages are inlined):

```swift, showLineNumbers, error=[3:15-20 "Return types don't match", 5:12-38 "Return types don't match"]
func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
    if shape is Square {
        return shape
    }
    return FlippedShape(shape: shape)
}
```

Directive parameters have some of the same problems but has the benefit that each parameter can be specified on its own line. However, the parameters have to be primitives, meaning that more values may need to be represented as strings:

@CodeBlock(
  showLineNumbers: true,
  errorAnnotations: ["3:15-20 \"Return types don't match\"", "5:12-38 \"Return types don't match\""]
) {
  ```swift
  func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
      if shape is Square {
          return shape
      }
      return FlippedShape(shape: shape)
  }
  ```
}

Alternatively, directives can use nested directives to provide more complicated configuration:

@CodeBlock(showLineNumbers: true) {
  @Highlight(line: 3, start: 15, end: 20, style: "error") {
    Return types don't match
  }

  @Highlight(line: 5, start: 12, end: 38, style: "error") {
    Return types don't match
  }

  ```swift
  func invalidFlip<T: Shape>(_ shape: T) -> some Shape {
      if shape is Square {
          return shape
      }
      return FlippedShape(shape: shape)
  }
  ```
}

This can be very flexible but it can also be fairly verbose and significantly impacts other tool's ability to process and display the code block.

2 Likes

I recently updated how highlight and strikeout options are represented in the RenderNode JSON and wanted to share more about this change.

The author-facing syntax hasn’t changed. You’d still write something like:
```highlight=[1, 4, 6], strikeout=[1, 4, 7]

Previously, these options were encoded as separate arrays of integers in the JSON. That model only supported highlighting or striking entire lines. It wouldn't be able to grow to accommodate partial lines or additional information or styles, without introducing new options or breaking changes.

Now both are unified under a single LineAnnotation object. Each LineAnnotation has a style (”highlight” or ”strikeout”) and a range, represented in Swift as a Range<Position>. A Position includes line: Int and character: Int? (the character index isn’t used yet, but it’s already part of the schema). In JSON, a range is expressed as a two-item array of Position.

For example, the above syntax produces:

"lineAnnotations": [
  { "style": "highlight", "range": [{ "line": 1 }, { "line": 1 }] },
  { "style": "highlight", "range": [{ "line": 4 }, { "line": 4 }] },
  { "style": "highlight", "range": [{ "line": 6 }, { "line": 6 }] },
  { "style": "strikeout", "range": [{ "line": 1 }, { "line": 1 }] },
  { "style": "strikeout", "range": [{ "line": 4 }, { "line": 4 }] },
  { "style": "strikeout", "range": [{ "line": 7 }, { "line": 7 }] }
]

This model allows much more flexibility. We can now represent partial lines, single lines, or arbitrary ranges of lines and characters. It also makes the JSON more extensible. New styles like “error” could be added alongside optional fields (such as error messages) without introducing breaking changes.

1 Like

Thank you. Regardless of what user-facing syntax we end up using, that JSON format is going to be more robust and enable us to add additional code block annotations without breaking changes in the JSON format.

2 Likes