Can DocC auto-generate anchors be used in non-ascii strings?

Hi,

I use Japanese to headings like "## テスト" in a markdown, and I'd like to use it for an anchor like "doc:SomeMarkdown:テスト". But it doesn't work.

If I change "テスト" to "test" ( "doc:SomeMarkdown:test"), it works well.

Is there any way to use non-ascii strings for DocC anchors?

Thanks.

1 Like

I guess that this is because "テスト" is escaped in here. (Sorry not sure)

Or

This definitely looks like a bug to me. Swift-DocC and Swift-DocC-Render handle unicode headline links differently.

Swift-DocC escapes unicode fragments after URLComponents refuses to parse them. After skimming through RFC 3986, this looks correct to me.

Swift-DocC-Render, on the other hand, permits unescaped unicode in headline ids, which is permitted in HTML5. The section links Swift-DocC-Render produces when hovering over headlines also use unescaped unicode and those work (tested in Safari and Firefox). It seems like browsers are a bit more generous with allowed characters in page fragments than the URL specification URLComponents implements.

@ronnqvist, since you've done some work on ValidatedURL, what do you think?

2 Likes

There's definitely several parts of the stack that could affect this, as you've pointed out. I double-checked what heading anchor Swift-DocC/Swift-DocC-Render emit and it does correctly emit "テスト" as the anchor ID, so the issue is likely not in swift-cmark or Swift-Markdown. That makes me think that the issue is in the link-anchor resolution, which is incorrectly escaping non-ASCII text or doing something else that makes it fail to match the correct heading anchor.

1 Like

Hi,

I wan't aware of : as a path and anchor separator. We may need to add support for that.

If you change the links to use a # to separate the path and the anchor then DocC should be able to resolve them (at least it does for me locally with that example heading):

2 Likes

Thank you for replies.

As far as I confirmed, in the case of "テスト", it's encoded in the href and didn't match with id in h2 heading. Is this different from your result?

These are screenshots. They are in "AccessControl.md" file. I checked this behavior on Safari and Chrome. I used localhost (http://localhost:8000/documentation/...).

スクリーンショット 2023-01-04 5.13.26

スクリーンショット 2023-01-04 5.12.53

Because DocC failed to resolve the original example link I didn't look at the resulting HTML until you pointed it out, but now I see the same result as you do:

  • a percent escaped href value for the anchor tag
  • a non-escaped id value for the heading tag

I don't know enough about the HTML to know what would be the right behavior in this situation. Perhaps @marcus_ortiz would be able to help with that.

1 Like

I think as you've observed, the issue is with the mismatch in that the URL does encoding of the fragment identifier, but there is no encoding done in the id attribute of the HTML for the heading. Since the encoding is necessary for the URL representation, I believe that we would also need to do this encoding for the attribute as well so that they match up, even though it isn't technically required to be encoded in the HTML.

Specifically, there are 2 places in the JSON that are relevant here:

  1. A url value in a references item, representing the link to the subsection
  2. An anchor value in the section data for the heading, representing the id attribute

To fix this, I believe we would want to update DocC to also do the URL encoding for the anchor values (2) in addition to the URLs (1).

Technically, we could do this in the renderer, but I think it makes sense for DocC to handle this since it is already generating this data.

2 Likes

Thank you for the reply!

I think it makes sense for DocC to handle this since it is already generating this data.

If I understand correctly, do we need to do something in urlReadableFragment?

That sounds reasonable to me, although someone with more familiarity with the Swift-DocC codebase might want to chime in on that.

2 Likes

Thank you!

What can I do next? Do I need to create an issue on the Github?

1 Like

Yeah, a GitHub issue for this problem would be much appreciated—thanks!

1 Like

Thank you very much!

I created this.

1 Like