Integrating cmark-gfm and swift-cmark

Hello! I’m here to talk about swift-cmark , and updating which version of CommonMark it uses.

Swift currently uses the swift-cmark library in SourceKit to parse doc comments and extract parts like the opening paragraph, any parameter or return-value documentation, etc. swift-cmark is currently based on the CommonMark cmark library. Swift-DocC also uses CommonMark to parse Markdown, but it uses the extended GitHub-Flavored Markdown version instead. This is hosted in the swift-cmark repo as well, but on the gfm branch instead of main .

Since Swift-DocC is built alongside the Swift compiler and included in toolchains, this means that there are two versions of swift-cmark being built for the compiler: One for swiftMarkup (used by SourceKit), and one for Swift-DocC.

While swift-cmark has some additions compared to commonmark/cmark , they are mostly limited to two key areas: Building as a Swift package, and building in the Swift compiler’s build environment. This means that any use of swift-cmark is more-or-less in line with using upstream commonmark/cmark . This is good news, as the C API for cmark-gfm is a strict superset of upstream cmark .

I would like to propose replacing the main branch of swift-cmark with the gfm branch, to unify the uses of this repo by various Swift projects. This will require modifying swiftMarkup and the Swift build script itself, to point to "cmark-gfm.h" instead of cmark.h . This will also require modifying swift-cmark/gfm itself, as some of the CMake updates on the main branch need to be transferred over.

I opened PRs on swift and swift-cmark to experiment with these changes:

I’m interested to see people’s thoughts. While this would simplify maintenance for Swift and Swift-DocC, i’m less certain about the broader community’s use of swift-cmark . Let me know if this change would affect your project!

2 Likes

There’s at least one problem that needs to be resolved on swift-cmark/gfm itself in addition to the CMake configuration. One of the additions to the gfm branch was some additional synchronization to make the conversion thread-safe. This synchronization uses pthreads, which introduces build issues on Linux and Windows. The Linux issue is straightforward enough to resolve (you need an explicit linker flag to link against pthreads), but solving this for Windows will be more complicated, since pthreads doesn’t exist on Windows (without using a port of the library, a la Cygwin). (cc @compnerd) This is an issue that will need to be investigated more as this integration work proceeds.

Hmm, as you noted use of pthreads is a serious concern. The pthreads semantics don't really work very well with Windows. The threading model needs to be fully abstracted to really work on Windows. What exactly does the library need in terms of threading support?

There are two main things that swift-cmark-gfm uses from pthreads:

  • Mutex locks (to guard global state like arena allocations)
  • Run-once functionality (to initialize the global plugins registry and to initialize pthreads mutexes before use)

The actual use of pthreads is all confined to one header, since i wrote it in a way that the locking behavior can be turned off as a build setting. The rest of the repo just uses the macros this header defines. If there are platform-exclusive APIs i can use in lieu of pthreads on Windows, it should be able to be slotted in with minimal disruption.

(Edit/follow-up: The functionality was written this way with the expectation that actual threads would be used by a consumer, since Swift-DocC processes documents in parallel. We just needed the synchronization so that cmark-gfm would be thread-safe.)

For mutex locks, are there requirements on the lock? Do they need to support recursive locking?

For non-recursive lock, I recommend that we use SRWLocks [2]. If we really must have recursive locking, then we don't have much of a choice and must fall back to CRITICAL_SECTION [3].

For run-once that is easy to support on Windows - InitOnceExecuteOnce allows us to do that; assuming that you need a static key, you can use the INIT_ONCE type and INIT_ONCE_STATIC_INIT initializer value. We can do that with dynamic keys as well if needed. [1]

[1] Using One-Time Initialization - Win32 apps | Microsoft Docs
[2] Slim Reader/Writer (SRW) Locks - Win32 apps | Microsoft Docs
[3] Using Critical Section Objects - Win32 apps | Microsoft Docs

1 Like

It looks like recursive locking isn't necessary, so SRWLocks should work just fine.

Thanks for the links! It looks like adding Windows support should be able to be dropped in without an issue.

1 Like

This is the case for the relationship between apple/swift-cmark:main and commonmark/cmark, but apple/swift-cmark:gfm also has functionality/behavior added that doesn't exist in upstream github/cmark-gfm—at least a new node type, among other additions.

Are there plans to upstream those additions to github/cmark-gfm to eliminate that drift so that apple/swift-cmark:gfm can consist of only additions needed for it to build via SPM and in the compiler environment?

The context of my request is that I would really like to have swift-format use swiftMarkup to parse and reflow doc comments, and I could take on that dependency externally, but the drift between apple/swift-cmark:gfm and github/cmark-gfm prevents me from building swiftMarkup within Google; our one-version rule for third-party dependencies requires me to use github/cmark-gfm since it's the canonical fork that we already use elsewhere.

Are the changes that were needed for DocC stable enough that upstreaming future changes directly to GFM is viable, instead of maintaining new functionality in the fork?

2 Likes

There are four primary changes that are in apple/swift-cmark:gfm that aren't in github/cmark-gfm:

  1. Support for building as a Swift package (including changing directory structure, hard-coding the config.h header, and adding a module map file)
  2. Bug fixes for source location information (posted upstream, but not reviewed)
  3. Refactoring of some global data into parser-local data, and synchronization around the remaining global data
  4. New syntax for "inline attributes", in support of Foundation's AttributedString functionality

As i mentioned, the second one has already been proposed upstream. The others are larger changes that don't feel in line with the upstream implementation. I would be willing to propose the thread-safety refactors in a broader sense, but it would likely require some broader discussion to be accepted.

There's another issue with proposing changes to github/cmark-gfm, though: There doesn't seem to be a place to discuss proposals specifically to GitHub-Flavored Markdown. The cmark-gfm issue tracker has several issues and pull requests which have not been responded to, and the repository itself is rarely updated, except by GitHub engineers to support the GitHub service itself. (By contrast, commonmark/cmark is regularly updated and has its own discussion forum for broader ideas.) For a specific example, my above-linked PR was opened in early 2021, and the only comment is by someone noting that a PR of their own (which is more thorough in its scope) likely solves the same issue.

To answer this directly: I personally believe that the bug fixes are worth upstreaming, and the thread-safety refactors are worth proposing for discussion. (Side note: the thread-safety refactoring is likely worth proposing directly to CommonMark itself, rather than just to cmark-gfm.) The other changes, for Swift package support and inline attributes, are relatively idiosyncratic and particular to our needs and would be a harder sell. I'd love to work with you to see what would be required to make apple/swift-cmark:gfm more viable for your purposes, if that's possible.

1 Like

Follow-up: I was able to post a PR to swift-cmark-gfm to add Windows support: [gfm] add thread-safety support for Windows by QuietMisdreavus · Pull Request #34 · apple/swift-cmark · GitHub

1 Like

Thanks for the context! I agree with you that it wouldn't be expected to have the Package.swift support to be upstreamed, and it's not a requirement for us anyway since we build everything with Bazel and mirror the package descriptions with our own build definitions.

I just made another attempt to build apple/swift-markdown against our github/cmark-gfm and it looks like the main thing that's present in Apple's fork that isn't in mainline GFM that causes the build to break is cmark_node_get_backtick_count. But patching that in is fairly straightforward, and along with the source location change you linked above, that got everything building and all the tests passing!

Having that resolved makes it a lot more realistic to start adopting swift-markdown for swift-format, even if I have to maintain a couple small patches on top of our own version of cmark-gfm while things are diverged a bit. :blush:

1 Like

That's excellent news! I'm really glad you were able to get it working. Definitely let us know if you run into any more issues using swift-markdown!

2 Likes