Unsafe characters in file names redux

@dhristov interesting, would you be able to point me to where such a change would need to be made? I suspect that this would be the renderer SPA. We could have this be a configuration option that we embed into the docc bundle to communicate to the SPA that the FS layout is v2 and that it should remap the request.

You could try the createDataPath function in utils/data.js. That is used to convert the current page route to a .json file path.

You could have a sanitiser in there, to ensure any characters in the URL that are invalid on the file system are encoded or replaced.

@marcus_ortiz brought up a the same issue in the Swift-DocC-Render PR. Marcus and I talked about it offline and agreed that it'd be better to discuss this in the Forum thread rather than the PR.

To summarize the issue and its high level impact: if the renderer expects the file names to have been transformed to avoid unsafe characters but the documentation was built before that change, then the renderer won't file the files at the transformed locations.

Renderer versions before this change support documentation built before this change and renderer versions after this change support documentation built after this change but no renderer version can support documentation built both before and after this change. This issue affects servers that hosts documentation for multiple projects, built over time. Additionally, if the projects are built with different toolchains it may not be feasible or even possible to rebuild the documentation for all projects with the same DocC version.

To make new versions of DocC Render support documentation build both before and after this change, it needs to be able to check if each project uses transformed file names or not.

Marcus and I talked about a few different alternatives, for example to indicate this as a feature flag in the theme-settings.json file (or some other file) or to modify the per-page index.html copies with an indication that that project uses transformed file paths but neither of those alternatives are a great solution to this problem.

We briefly talked about this topic in the Documentation Workgroup meeting today. That conversation didn't reach any specific conclusions other than that this might be a good topic to formally put on the agenda for an upcoming meeting (the next Documentation Workgroup meeting is August 14).

5 Likes

I wasn't able to attend the Documentation Workgroup meeting, but I discussed this briefly with others since then.

We came to the conclusion that the likely best way forward is adding a feature flag to the DocC compiler for users to opt-in to this new, more portable filename sanitization enhancement so that we can unblock building docs on Windows platforms without causing a backwards compatibility issue that would result if we were to make it the default behavior. Perhaps it would make sense to remove the flag and make it the default behavior in the next major version of DocC though.

For DocC-Render, this flag could be propagated into the features section of a theme-settings.json file so that it can know when to add the logic to retain the pretty URLs.

Let me know if I've misremembered or missed anything, thanks!

This is rather unfortunate. This means that this needs to be specified by everyone in order to build documentation with DocC on Windows. Perhaps we can make it the inverse, that is make it an option that one has to specify to opt-out of the new portable paths?

I agree that it is unfortunate, but I'm not sure if doing the inverse is a better alternative.

If we make it opt-out, we may disrupt workflows that rely on the older behavior and would require rebuilding/reconfiguring of existing projects to work with a newer renderer. Personally, it feels like too major/breaking of a change to make it the default, but maybe others feel differently.

From your discussion it sounds like the following compromise:

  • First make it an opt-in but with an announcement and maybe warnings when building (and not choosing the opt-in) that the default behavior will change;
  • then (in a later version) make it an opt-out, maybe with warnings when building (and not choosing the opt-out) that the default behavior has been changed;
  • later remove any warnings.

A little more context on this:

In the workgroup meeting we had talked about how the URL mapping solution would work in static hosting situations and took an action item to investigate that further. The team tried a handful of combinations of browsers and hosting environments and concluded that transforming the URLs in the renderer doesn't work for documentation in static hosting environments.

In order to support hosting documentation in static hosting environments we would need to update the URLs to match the file names.

Like I mentioned before (below), change the URLs would a breaking change that we'd want to offer a full release cycle for for developers and other tools to migrate to the new URL format.

Thank you for the context, David. It seems like this is the last remaining issue before we can bring DocC to Windows, so hopefully we can reach a consensus.

Can someone shed some light as to what backwards-compatibility concern we discussing here? Are we discussing changing:

a. The way documentation pages are represented on disk
b. The documentation URLs that users access in their browser

Or both?

If it's b, then there are UX concerns to discuss, including setting up redirects.

If it's only a, this seems like a tooling version concern, wherein if you build documentation using the DocC that writes files with the new encoding, you need to make sure your DocC render is new enough to support that. Is that correct?

Both. What we previously concluded was that static hosting environments—such as GitHub Pages—require that the URLs in the browser to match the file paths within the documentation archive on disk. As such, if we change the file names on disk we also need to change the URLs. Otherwise the links won't work in static hosting environments.

Redirects isn't something that DocC can configure for the web server. DocC can emit a list of new and old paths but we would be pushing the responsibility to each developer to set up the redirects on their web server, if they even have access to do so.

Given the backwards compatibility issues, I continue to agree with Marcus's assessment here:

with the possible caveat that I think it's too early to make an assessment today regarding the future default behavior of this flag.

The problem is that then you have a difference across Windows and everything else. Would it be acceptable to just have macOS have a different default from Linux and Windows? That is, the old path would be the default on macOS and the new names require opt-in and on Linux and Windows you get the new names and have to opt-out. Behavioural differences across platforms makes it difficult for adopting a tool as well as perpetuates this idea that not all the platforms are equal. This will also ensure that the behaviour is well exercised in new changes and does not just regress something on Windows accidentally.

2 Likes

DocC can emit symlinks, no? Or it can emit a page with <meta http-equiv="refresh" content="0;url=…">.

We wouldn't make any breaking changes without allowing for one full release to transition first. In other words; if we introduce a new flag now it would be opt-in for the 6.1 release and (possibly) opt-out for the 6.2 release.

I personally don't like the idea of intentionally regressing one platform to match the restrictions of another platform, even more so if it still doesn't provide the same behavior across all platforms.

Would it be acceptable to just have macOS have a different default from Linux and Windows?

I'm not sure I follow why we'd change the behaviour for Linux but not macOS. If we were to change behaviour, it seems like we'd want to change it across all existing platforms.

I do agree that having different default behaviours on Windows vs. macOS/Linux makes it more difficult to develop and to catch regressions on one of the platforms. Perhaps this is something we can mitigate via integration tests.

To an existing user of DocC, I can see how it would be unexpected (and potentially worse) to have doc URL paths to Swift methods no longer match the name of the API.

For example, replacing : characters in the existing URL path

…/NIOCore/EventLoopFuture/assertFailure(file:line:)

would make it less readable. I'm not sure what character apart from : would make it obvious that you're browsing documentation for the assertFailure(file:line:) method.

In my opinion, I think the following goals strike the right balance of usability and expansion of the DocC ecosystem:

  • Preserve the existing file URL syntax for documentation built and published on existing platforms. This is syntax that the Swift community is familiar with, and doesn't require setting up redirects for existing documentation.
  • Support "cross-compilation" of documentation via an opt-in flag, i.e., to build on Linux but deploy on Windows. This is important in scenarios where you want to build documentation on a specific platform but publish using another.
1 Like

what does it mean to “deploy” documentation in this context? as i understand it, almost all use cases involve previewing documentation on the same machine it was built on, with the exception being deploying pre-built docs to GitHub Pages, which is a “web” target and doesn’t have any special restrictions on : characters.

If you were to host the bundle yourself and not on GitHub, and were on a Windows host, the filename cannot contain :.

I too would prefer that.

Is there an integration test suite currently that could give an idea of what it can detect?

I don't believe we have something like that set up at the moment. It'd be nice to set up a test that runs docc preview and runs browser automation to verify that things are appearing on a page.

Unfortunately, that wouldn't be usable on Windows. The problem is that in order to support DocC on Windows, we need to excise the preview server as the dependency on NIO is problematic as NIO does not currently support Windows.

Gotcha. We could use a file-based HTTP server instead, e.g., python -m http.server, to get the website running locally on Windows.