Extending Swift-DocC Render JSON to support multi-language symbols

Hi everyone,

I'd like to propose an addition to the render JSON schema to allow a render node to represent documentation data for symbols that are available in multiple languages, for example symbols that are available in Swift and Objective-C.

Motivation

The render JSON schema currently represents documentation content for an API in a single language. To support documenting symbols in frameworks that are available in multiple languages, such as Swift and Objective-C, the render JSON specification needs to evolve.

Proposed Solution

We make an additive change to render JSON to store documentation data for multiple programming language variants. The schema for single-language symbols does not change, which preserves backwards-compatibility with existing clients and renderers.

The addition is a new top-level variantOverrides value which stores variants of documentation data for the symbol. If an API is available in Swift and Objective-C, the variantOverrides value stores content overrides that the renderer should apply when it’s rendering the docs for the Objective-C version of the symbol. These overrides are specified using the JSON Patch schema, which allows clients to easy apply them onto the render JSON.

Here’s an example:

{
  "variantOverrides": [
    {
      // Distinguishing characteristics for the rendering environment. For now,
      // just the programming language, but is kept flexible for future uses.
      "traits": [{ "interfaceLanguage": "occ" }],
      
      // Overrides to apply to the JSON, using the JSON Patch schema.
      "patch": [
        {
          "op": "replace",

          // Replacement of top-level object.
          "path": "/abstract",
          "value": [
            {
              "text": "This is the abstract for Objective-C",
              "type": "text"
            }
          ]
        },
        {
          "op": "replace",
          
          // Replacement of nested object.
          "path": "/metadata/title",
          "value": "Page title for Objective-C"
        }
      ]
    }
  ],
  "abstract": […same as before…],
  "primaryContentSections": [{…}, {…}],
  "variants": [
    {
        "paths": ["/documentation/mykit/myclass"],
        "traits": [{ "interfaceLanguage": "swift" }]
    },
    {
        "paths": ["/documentation/mykit/myclass"],
        "traits": [{ "interfaceLanguage": "occ" }]
    }
  ],
  …same properties as before…
}

Alternatives Considered

One alternative here is for Swift-DocC to produce separate render JSON files for each language a symbol is available in. However, documentation metadata for multi-language symbols is generally the same, for example the kind of a symbol (‚Äėclass‚Äô, ‚Äėfunction‚Äô, etc.), its relationships to other symbols (e.g., member of a class), etc., is common across the languages the symbol is available in. By default, user-authored documentation content for a symbol is also the same across the languages it's available in, since you only write documentation where the symbol is originally declared. As such, generating a render JSON file per variant would involve storing large amounts of duplicate data, so in this proposal, we preferred to only store the differences between language variants and allow for clients to have an easy way (by using JSON Patch) to apply those variants for the language they're interested in.

7 Likes

Thank you for looking into this! I guess FB8094382's days are numbered now.

I support this approach; it produces minimal extra data for additional programming languages. A few questions:

  1. How does a parser of this JSON file know which language is the 'primary' language? There's a traits section in variantOverrides for the non-primary variants, but no such data for the primary variant. I suggest the addition of a top level primaryVariantTraits array that follows the same structure as variantOverrides/traits. That way it's easy to model and compare the values. Naming it traits might also be ok, but could get a bit confusing since the patch wouldn't overwrite it.
  2. I expect in the implementation this will already be the case, but can we add a guarantee in the spec that each variant patch is the minimal amount of data necessary to produce the variant using the primary JSON? This guarantee would make it simpler for clients that want to display both variants at the same time, e.g. in a declaration section or diff view.
  3. At the bottom of your example you include a variants dictionary:
    "variants": [
        {
            "paths": ["/documentation/mykit/myclass"],
            "traits": [{ "interfaceLanguage": "swift" }]
        },
        {
            "paths": ["/documentation/mykit/myclass"],
            "traits": [{ "interfaceLanguage": "occ" }]
        }
      ]
    
    This implies that it's possible for two languages to produce different page URLs, however that doesn't seem possible with this model because you're only producing one JSON file, unless we also produce many redirects. Personally I'd consider this a good thing because it results in authorable urls that work across languages, but others might disagree and it's an important downside to point out with this approach.

One other thing I'd like to see is consideration for how this feature will work for JSON consumers that haven't been taught about a particular language. Ideally a docc reader app could open any bundle without having to implement specific language support. For example, should the render JSON include a mapping from interface language ID (e.g. occ) to presentation language (e.g. Objective-C)? Should it just use presentation names everywhere instead of IDs? We should additionally audit the rest of the render json to ensure that there are no other unmapped identifiers without display names, e.g. for platform support.

Hi Jack, thanks for the questions!

  1. How does a parser of this JSON file know which language is the 'primary' language? There's a traits section in variantOverrides for the non-primary variants, but no such data for the primary variant. I suggest the addition of a top level primaryVariantTraits array that follows the same structure as variantOverrides/traits . That way it's easy to model and compare the values. Naming it traits might also be ok, but could get a bit confusing since the patch wouldn't overwrite it.

The .identifier.interfaceLanguage property indicates what language the primary documentation content applies to, and I propose it continues to do so with this proposal's addition. Swift-DocC will generate a patch that replaces that value with the identifier for Objective-C when applying Objective-C variant data. As we add more trait characteristics (i.e., more than interfaceLanguage), we'll definitely want to consider the approach you suggested, it sounds great to me.

  1. I expect in the implementation this will already be the case, but can we add a guarantee in the spec that each variant patch is the minimal amount of data necessary to produce the variant using the primary JSON? This guarantee would make it simpler for clients that want to display both variants at the same time, e.g. in a declaration section or diff view.

I'm not sure it would be possible to specify this from a JSON schema standpoint, but we should certainly document that the intent of this design is to reduce duplication and that compilers should aim to produce space-efficient diffs.

At the bottom of your example you include a variants dictionary: (listing)
This implies that it's possible for two languages to produce different page URLs, however that doesn't seem possible with this model because you're only producing one JSON file, unless we also produce many redirects. Personally I'd consider this a good thing because it results in authorable urls that work across languages, but others might disagree and it's an important downside to point out with this approach.

  1. Right, in this proposal, pages that are available in multiple programming languages will get the same URL path. It's up to the renderer to decide whether to distinguish the variants at the URL level, e.g., via a query parameter. That being said, the render JSON variants schema is left flexible to allow for variants to be stored at different URLs if this is something we will want to support in the future.

One other thing I'd like to see is consideration for how this feature will work for JSON consumers that haven't been taught about a particular language. Ideally a docc reader app could open any bundle without having to implement specific language support. For example, should the render JSON include a mapping from interface language ID (e.g. occ ) to presentation language (e.g. Objective-C )? Should it just use presentation names everywhere instead of IDs? We should additionally audit the rest of the render json to ensure that there are no other unmapped identifiers without display names, e.g. for platform support.

This is a great question‚ÄĒcurrently renderers are responsible for exposing the multi-language data to users in whatever way they'd like. However, I expect that displaying the presentation names of the languages ("Swift", Objective-C") in a language switcher UI will be common, so Swift-DocC should vend these strings. It might make sense to emit these presentation strings in a separate file per documentation archive in order to reduce duplication across render JSON files.

1 Like

The .identifier.interfaceLanguage property indicates what language the primary documentation content applies to, and I propose it continues to do so with this proposal's addition ... As we add more trait characteristics (i.e., more than interfaceLanguage ), we'll definitely want to consider the approach you suggested.

I could apply the same logic in reverse to argue against complicating variants with a traits structure until it's necessary though, no? If we think it's useful in one place, we should do it in both. Otherwise as soon as we add a new trait type, we'll break anyone who's adopted variants but is using identifier.interfaceLanguage to uniquely identify the variant.

we should certainly document that the intent of this design is to reduce duplication and that compilers should aim to produce space-efficient diffs.

Yeah, sorry, that's what I meant‚ÄĒdocumentation in the spec. (warning: nit-picking ahead) I think we should use language stronger than aim‚ÄĒthat's not a strong enough guarantee to rely on. Must would be more appropriate if we want to make a reliable guarantee. And I also think from the perspective of consuming the JSON it's less about space efficiency and more about deduplication or content uniqueness. E.g. if I'm making an app that displays diffs‚ÄĒor simply a docs renderer that wants to display all variants at the same time without repeating anything‚ÄĒthe guarantee (or lack there of) is the difference between the app having to re-diff all of the JSON itself.


I just noticed that variants/paths is plural. When are there multiple paths for a single variant? Redirects?

"variants": [
    {
        "paths": ["/documentation/mykit/myclass"]

I could apply the same logic in reverse to argue against complicating variants with a traits structure until it's necessary though, no? If we think it's useful in one place, we should do it in both. Otherwise as soon as we add a new trait type, we'll break anyone who's adopted variants but is using identifier.interfaceLanguage to uniquely identify the variant.

That's fair. I think we should go ahead with your proposal here for consistency, it seems like a good point in time to introduce it. I filed SR-15364 to track this.

Yeah, sorry, that's what I meant‚ÄĒdocumentation in the spec. (warning: nit-picking ahead) I think we should use language stronger than aim‚ÄĒthat's not a strong enough guarantee to rely on. Must would be more appropriate if we want to make a reliable guarantee. And I also think from the perspective of consuming the JSON it's less about space efficiency and more about deduplication or content uniqueness. E.g. if I'm making an app that displays diffs‚ÄĒor simply a docs renderer that wants to display all variants at the same time without repeating anything‚ÄĒthe guarantee (or lack there of) is the difference between the app having to re-diff all of the JSON itself.

I'll add you to the PR for SR-15354 and we can iterate on the docs there.

I just noticed that variants/paths is plural. When are there multiple paths for a single variant? Redirects?

This is how the render JSON schema was originally defined, but yes, it adds flexibility for pages that could be accessible from different paths. DocC always emits a single path at the moment though‚ÄĒthe schema allows for potentially specifying redirects in the future.

1 Like
Terms of Service

Privacy Policy

Cookie Policy