Supporting row and column spanning in Swift-DocC Markdown tables

Hi all!

Victoria (@QuietMisdreavus) and I are excited to pitch an enhancement to Swift-DocC’s Markdown support. This will affect existing clients of Swift’s cmark-gfm fork who choose to opt into the enhancement – including Swift Markdown and Swift-DocC.

Introduction

Row and column spanning is a common tool documentation authors reach for when conveying complex information in a table but is currently unsupported in both CommonMark and GitHub Flavored Markdown. In keeping with the philosophy of Markdown, we’d like to enhance the Swift Project’s fork of GitHub Flavored Markdown to add support for this feature in a way that continues to be backwards compatible with standard Markdown parsers.

Proposed Solution

For row spanning specifically, there isn’t much prior art in commonly-used, shipping Markdown parsers for us to pull from. By researching existing discussions on the topic in the CommonMark forums, we’ve developed a solution that preserves Markdown’s philosophy of using punctuation characters to convey formatting in a way that is still readable in plain text and renders in a reasonable way with standard Markdown parsers. We propose adding the following syntax to Markdown tables: the “ditto mark”.

a pair of marks " used underneath a word to save space and show that the word is repeated where the marks are

Ditto Definition & Meaning | Britannica Dictionary

The ditto mark is generally used in hand-writing as a shorthand to indicate that a given line should be the same as the one above. It’s often used in math since it can more clearly convey what is actually changing between two lines of an equation.

In a Markdown table it would be used for this same purpose – to indicate that the above cell contains information that should be conveyed in the current one.

Here’s an example of a Markdown table that includes row spanning with the new ditto syntax:

## Examples

| Sloth name   | Sloth colors | Sloth powers |
| ------------ | ------------ | ------------ |
| Stormy       | Gray         | Wind         |
| "            | Light blue   | Ice          |
| "            | Purple       | Rain         |
| Lava         | Red          | Fire         |
| "            | Orange       | "            |
| "            | Black        | "            |
| Electric     | Yellow       | Lightning    |
| "            | White        | Wifi         |

The ditto mark would be escaped with \" to indicate that the quotation mark should be rendered literally.

For column spanning, we think adopting an existing syntax from MultiMarkdown is the best path forward:

To indicate that a cell should span multiple columns, then simply add additional pipes (|) at the end of the cell, as shown in the example. If the cell in question is at the end of the row, then of course that means that pipes are not optional at the end of that row…. The number of pipes equals the number of columns the cell should span.

MultiMarkdown: Tables

In our example table that would look like:

## Examples

| Sloth name   | Sloth colors               || Sloth powers               ||
| ------------ | ------------ | ------------ | ------------ | ------------ |
| Stormy       | Gray                       || Wind                       ||
| "            | Light blue                 || Ice                        ||
| "            | Purple                     || Rain                       ||
| Lava         | Red                        || Fire                       ||
| "            | Orange                     || "                          ||
| "            | Black                      || "                          ||
| Electric     | Yellow                     || Lightning                  ||
| "            | White                      || Wifi                       ||

Given that input, Swift-DocC would render the following table:

While, GitHub’s renderer would display the following:

We think introducing the new ditto mark syntax and enhancing the existing pipe syntax strikes a good balance of maintaining compatibility with existing Markdown tooling while bringing new features to Swift-DocC and other clients of the Swift Project’s fork of cmark-gfm.

Victoria has put together an implementation of this pitch in the swift-cmark repository.

Alternatives Considered

Alternative One

The primary alternative we’re considering to the above proposal is using the caret (^) symbol in place of the ditto symbol ("):

| Sloth name   | Sloth colors | Sloth powers |
| ------------ | ------------ | ------------ |
| Stormy       | Gray         | Wind         |
| ^            | Light blue   | Ice          |
| ^            | Purple       | Rain         |
| Lava         | Red          | Fire         |
| ^            | Orange       | ^            |
| ^            | Black        | ^            |
| Electric     | Yellow       | Lightning    |
| ^            | White        | Wifi         |

This symbol has come up in several other proposals on this topic so has the potential to be the more readable and commonly understood symbol. I think the ditto symbol is a better choice since it is already intended for this purpose in hand writing, while the caret symbol is generally used to represent an arrow. Additionally, the caret is already used in the Swift community to represent a custom attribute in an attributed string so overloading that punctuation further could lead to confusion.

We’re definitely interested in feedback on this point in particular.

Alternative Two

Instead of extending Markdown syntax further, we could create a new Swift-DocC @Table directive to support this kind of table layout. It’s likely we’ll still want something like this in the future since Markdown tables don’t support multi-line content. However, we think that row and column spanning is a common enough feature that it’s worth extending Markdown in a backwards compatible way to support it here without requiring folks to migrate to an entirely different syntax to achieve this behavior.

Alternative Three

We could also extend table syntax to support dividers between every row like so:

| Sloth name   | Sloth colors | Sloth powers |
| ------------ | ------------ | ------------ |
| Stormy       | Gray         | Wind         |
|              | ------------ | ------------ |
|              | Light blue   | Ice          |
|              | ------------ | ------------ |
|              | Purple       | Rain         |
| ------------ | ------------ | ------------ |
| Lava         | Red          | Fire         |
|              | ------------ |              |
|              | Orange       |              |
|              | ------------ |              |
|              | Black        |              |
| ------------ | ------------ | ------------ |
| Electric     | Yellow       | Lightning    |
|              | ------------ | ------------ |
|              | White        | Wifi         |

However, this format renders very poorly when processed with a standard Markdown parser. It would also require users to migrate to a much more verbose syntax when introducing spanning to a table.

16 Likes

Just so that the lede isn't buried too much: There's an implementation available on swift-cmark that adds this syntax to GitHub-Flavored Markdown-style tables. I implemented both caret- and ditto-based row span, though set carets as the default since that's the broader consensus among Markdown implementations that use this syntax. Further work will need to be done before this is available in Swift-DocC, but the groundwork has already started.

6 Likes

This is neat, I like the simplicity. I'm not totally sold on the ditto character because you're not literally copying the text into the next row, but it's probably close enough and it looks nice visually.

In case it sparks some new ideas, these are the two (small) reasons I'm hesitating: First, in keeping with the existing table syntax, tokens should be visual rather than conceptual. Second, it's not uncommon to use a quoted string in a table. When using a markdown editor that syntax-highlights or live-renders, there'd be a moment when you type the first " where the row would disappear or the token would be highlighted incorrectly.

All that said, the reasons you list against using ^ are compelling and I don't really have a better idea. The best I could come up with is something like:

| Stormy | Gray       |
|:      :| Light blue |

But this might conceptually overload :'s use for cell alignment in dividers.


Are there plans to propose this for upstream GFM? It would be good to get some signal from those folks before we commit to this syntax forever.

2 Likes

I'd probably announce the implementation in the CommonMark Forum thread where table syntax is being discussed first. From what i can tell, upstream GFM doesn't take many outside contributions, so while i'm perfectly willing to post the patch there i'm not sure it would ultimately get accepted.

Of all the proposals or implementations of tables with row spans in them that i've seen, the vast majority go for the caret design. (I'm personally a fan of using a caret instead of a ditto mark, since it's not necessarily duplicating the contents, just referencing the cell above it, which the pointing effect of the caret implies. The one thing i would worry about is whether the caret is accessible on keyboard layouts other than US ANSI.) After that, the most common proposal is to add lines for each cell boundary, not just between columns, so that it is visually apparent when a cell is meant to span multiple rows or columns. I personally think Markdown tables are very tedious to author without tooling in the first place, and requiring row markers adds to the tedium.

The only other style of row span i've seen seriously offered is this extension to Python Markdown, which uses empty cells and flanking underscores to denote row span, with additional sigils to denote vertical text placement:

| Column 1                | Col 2 | Big row span   |
|:-----------------------:|-------| -------------- |
| r1_c1 spans two cols           || One large cell |
| r2_c1 spans two rows    | r2_c2 |                |
|_^                      _| r3_c2 |_              _|

I like the idea of the flanking underscores, but there's a shortcoming i edited out of this table: Cells with only underscores in them, even if they're not touching the column-marker pipes, will erroneously be considered a row-span marker unless you manually add a  (space character) entity to the cell, which will cause the plugin to see the space as part of the table cell and not try to parse it as a row-span marker. There's a way to work around this in swift-cmark/GFM, but the fact that it's an issue in the first place puts me off somewhat.

There's a recurring idea to use these colon markers as a per-cell text alignment indicator, so this would probably clash with that kind of proposal if it came up on our implementation.

2 Likes

Bumping this thread to note: Implementation PRs have landed in all the Swift-DocC repos' main branches:

The initial implementation opted to use the ^ marker for rowspan instead of the " marker from the proposal; the primary reason for this was to align with proposals in the CommonMark Forum discussion about table syntax. This is open to discussion; i left in the option (in swift-cmark) to use the "ditto-mark" syntax, so it would be a small change to start using it. I'd like to know what everyone thinks!

3 Likes

Amazing news! So excited to see this.


I'm not opposed to using ^ instead of " but I don't think we should make this decision based on these forum threads since CommonMark hasn't actually had a formal discussion around this syntax or landed any implementation as of yet. There really isn't any existing precedent for us to follow here so I think we need to argue for the better syntax based on its own merits.

I still generally stand by my argument from the original pitch:

But a couple of compelling counter arguments have come up:

And on Twitter as well:

https://twitter.com/DebugSteven/status/1554963135215284225?s=20&t=IT7rR_Voz48dQlOL0pQULQ

https://twitter.com/DebugSteven/status/1554963400341331969?s=20&t=IT7rR_Voz48dQlOL0pQULQ


To me the rationale of the ditto mark having existing precedent/familiarity in the academic world for a very similar use case still outweighs these arguments but I can definitely be convinced to the contrary. The fact that folks seem to gravitate towards ^ and intuit what it means when reading the table (even when unfamiliar with the syntax) is, to me, the strongest argument in its favor.

For those already familiar with the ditto mark from handwriting, I think the " symbol likely makes sense in this new context but if they're just entirely unfamiliar with it I think the behavior would be surprising. So I'd be interested in folks weighing in on how much familiarity the " symbol brings with it.

My personal argument in favour of ^ is that using " might introduce complications with parsing things as strings, especially for external tooling, e.g., an IDE doing syntax highlighting on documentation comments.

1 Like

CC: @jack who mentioned this as well. I'm not entirely convinced by the tooling argument, it seems like we should pick the better syntax from a UX/authoring perspective and make the tools work around that.

But even from the tooling angle, Markdown doesn't generally use " as a syntax element, while it does use ^ to represent an attributed string in general prose. The exception for " is for directive arguments, but that's a very specific location and not something tooling needs to generally highlight.

1 Like

I feel the attributed string argument is more compelling. Let's go with "

In my experience, syntax highlighting that is focused on Markdown doesn't highlight quoted strings as something special - the only time i've seen that happen is in these forums or in places that try to guess a language or highlight common syntax structures in situations where a language isn't given. I don't think having quote-marks will necessarily hurt in that situation, so long as the highlighter in question can understand that Markdown is being written instead of some generic pseudo-language.

My major rationale is primarily a matter of personal preference - the quote mark means something in natural prose, but the caret does not, generally speaking. The caret looks more like syntax, in the way that table structures, heading markers, or list markers do, since otherwise it's only use in prose writing is to fake an exponent or superscript. My one worry is how using more non-alphanumeric characters would affect users of non-QWERTY keyboards; i don't know how the caret is affected, but i know some characters like the backtick and tilde are much harder to access on some commonly-used keyboard layouts.

1 Like