@_nodoc attribute for hiding symbols from the symbol graph

xwu · August 4, 2022, 8:40pm

The idea of a "documentation category" is nice; perhaps @_documentationCategory would clarify the intent here but even as-is the base attribute reads pretty well.

Where I have more trouble is that "ignore underscored" reads as exactly the opposite of its behavior: at face value, it's an attribute for documentation to ignore underscored APIs. Meanwhile, @_documentation(underscored) says at face value that the documentation for the relevant declaration is underscored, which is a synonym for emphasized or marked as important—how surprising it would be for a user that the documentation instead disappears.

To me, it seems it'd be more self-explanatory to write something like @_documentation(hidden) or @_documentation(visible).

QuietMisdreavus · August 4, 2022, 8:46pm

Perhaps i'm leaning too much on language-spec/compiler-internal jargon, but at least there it's established that a symbol being "underscored" means "the name starts with an underscore"; various tools use this to mean "the symbol should be treated as internal to the package", including the symbol graph and Swift-DocC. I chose these instead of my previously-suggested forceVisible and forceHidden since those are also misleading (using forceVisible on an internal symbol won't make it start appearing in public docs, and using forceHidden won't hide the symbol from all docs, just public ones).

xwu · August 4, 2022, 8:48pm

This jargon shouldn't leak out of compiler internals.

Right, I wouldn't say that it "forces" anything.

taylorswift · August 4, 2022, 10:45pm

i think this is over-complicating things. access control (public, internal, etc.), @availableness, and “underscoredness” should be hints to infer defaults, there shouldn’t be any precedence rules determining when a @documentation(visible) should or shouldn’t win if it contradicts the inferred visibility. the attribute should always win, since it’s the most explicit expression of developer intent.

QuietMisdreavus · August 4, 2022, 11:12pm

I would worry about giving people a footgun to show an API to consumers that they can't actually use - if @_documentation(alwaysVisible) were applied to an internal or @available(unavailable) symbol, then it's potentially confusing to someone who sees a type or function being mentioned that is actually functionally invisible to them. If it's being shown for contributors, then you should be building internal or private docs anyway. If it's there to act as a stub symbol that only exists to collect documentation, then it should really be an article.

But i'll bite. What if we have three special cases?

@_documentation(alwaysVisible) does was @taylorswift suggests - overrides any other heuristic and forces the symbol to be visible, regardless of availability, access control, naming convention, or anything else.
@_documentation(treatAsInternal) does what @_documentation(underscored) currently does - applies "underscored name" logic to the symbol, hiding it from public docs but revealing it in internal docs.
@_documentation(alwaysHidden) is the ultimate eraser - the symbol will never appear in documentation. (There will likely need to be some control here, potentially alongside whatever fix happens for SymbolGraphGen: Provide support for including underscored or unavailable symbols · Issue #60163 · swiftlang/swift · GitHub, to create an "archive symbol graph" that includes even these symbols, but ordinary controls would leave it alone.)

There would need to be some finesse added to make sure @_documentation(alwaysVisible) skipped all the filtering checks, but that's an implementation detail that can be worked out. (I'm just trying to avoid writing tedious tests )

Having these "total overrides" makes the naming question easier, but adds the possibility of creating confusing situations. It would require more responsibility from package authors to create coherent documentation.

Alternately, something i thought of while writing the previous part up: If we keep the "underscore logic", the names could be something like shouldBeVisible and treatAsInternal, which provide more insight into what's happening than relying on an understanding of what happens when you prefix a symbol name with an underscore. The "should be" modifier provides a softer implication than "force" would, but might have the same kind of confusion. What do you think?

taylorswift · August 4, 2022, 11:28pm

i think the main goal here is the attribute should be clear on what it does, which is how we prevent it from becoming a footgun.

the envisioned use case of @documentation is to override the default inferred behavior, so @documentation(treatAsInternal) wouldn’t really help me understand what @documentation(treatAsInternal) does, since @documentation(_:) can also be used to make internal symbols visible.

as a package author, i care about whether a symbol is visible or not. so the compiler should just ask me yes or no “should this symbol be displayed or not”. this doesn’t have to be black and white; we could also have a “contributors only” mode, where we would now have 3 ‘flavors’ of visibility:

always visible (@documentation(visibility: all))
visible to contributors (@documentation(visibility: contributors))
never visible (@documentation(visibility: never))

i don’t actually think this is a radical departure from what you have proposed, i have just changed the namings slightly to avoid overloading on concepts like public, internal, “underscored”, etc., that we are simultaneously overriding.

QuietMisdreavus · August 5, 2022, 3:33pm

treatAsInternal is meant to lean on an understanding of access levels and how they affect documentation. An internal symbol is not usable from outside the package; therefore, @_documentation(treatAsInternal) would apply that logic to that symbol but only for documentation. @_documentation(...) in general messes with the visibility of symbols, so the juxtaposition of alwaysVisible and treatAsInternal shouldn't be that jarring since they're meant to be taken in the greater context of visibility and documentation in general.

I don't think relying on an understanding of access control keywords, at least the difference between public and internal, is that much of a stretch. The use of internal is meant to be a familiar metaphor for package and documentation authors. Your proposed names introduce a new concept of "audience", i.e. "who is the audience of this API", whereas we already have that mechanism in place for "public" versus "internal" docs, which again echo the access control keywords.

tgoyne · August 5, 2022, 6:19pm

My inclination would be to have the symbol graph always include all symbols, and add a DocC feature to set a minimum access level for which symbols would actually be included in the generated docs (defaulting to 'public'). Excluding or including symbols which normally would/wouldn't be included at a given visibility level would be done with something like @_documentationVisibility(public|internal|private). Non-underscored symbols would default to a documentation visiblity equal to their normal access level, and underscored would default to being lowered to internal.

The normal usage of this would be to mark things you want excluded from the docs with @_documentationVisibility(internal). If you like using built API docs for the project you're actively working on, you'd build the docs with a minimum access level of internal or private instead, which would give you docs for underscored and internal things (assuming they have docs to generate).

If you're weird you could even mark internal things as @_documentationVisibility(public) to have them be included in the public docs even though they can't be called. I can't think of an actual use for this offhand, but I'm not confident there are none.

Ultimately though I'm not too concerned what this ends up looking like. Some way to exclude public symbols from the docs is a requirement for us to consider switching from jazzy to docc, but any flexibility past that is very much just nice-to-have.

QuietMisdreavus · August 5, 2022, 7:46pm

I've filed add an option to emit every symbol into the symbol graph · Issue #60416 · apple/swift · GitHub to track emitting everything into a symbol graph. I think this feature can be tracked separately from the @_documentation attribute.

I like the idea of having something like @_documentation(visibility: public|internal|private) to override the visibility of a symbol. This uses the metaphor of access levels, as i mentioned above, to set which visibility level the symbol has in documentation.

That said, i still like the idea of adding arbitrary metadata, since that could be used for additional documentation features down the road. To that end, i'd still like to keep something like @_documentation(metadata: something) to add this information to a symbol for documentation purposes.

What do y'all think?

taylorswift · August 5, 2022, 8:13pm

this is a great idea, and something i’ve been advocating for for a while.

i don’t feel strongly about this, but from an educational perspective, it would feel weird to explain why the @documentation(visibility: public) is needed here:

// isn’t ``_ChannelInboundHandler`` already public?
@documentation(visibility: public)
public 
protocol _ChannelInboundHandler 
{
    ...
}

in the end though, i would just be grateful to have a way of overriding the default behavior at all.

benrimmington · August 5, 2022, 8:30pm

Should there be a general attribute that can selectively show/hide APIs in documentation, generated interfaces, code completions, and/or fixit notes?

e.g. ExpressibleBy…Literal initializers aren't intended to be called directly.
e.g. FixedWidthInteger has an undocumented init(_truncatingBits:) requirement, without a default implementation.

(There's an existing @_show_in_interface attribute, which only applies to protocol declarations.)

ronnqvist · August 5, 2022, 9:36pm

This looks great to me. I feel that this having all 3 levels of configurability will give library authors a lot of flexibility to control which symbols appear in their documentation.

Both naming conventions—alwaysHidden|treatAsInternal|alwaysHidden and visibility: public|internal|private—accomplish the same things and I feel that they both are easy to grasp when reading them in code. I don't have a strong preference for either of these.

alwaysHidden could potentially be used to hide a symbol from symbol graph file with private symbols but I don't think that's an important use case.

I think that's a cool and useful idea for associating free form information with a symbol but the same functionality (passthrough information to DocC) could also be achieved with a directive in the symbol's documentation comment. For example, instead of:

@_documentation(metadata: something(arg1, arg2))
struct Something {...}

a developer could write:

/// @Something(arg1, arg2)
struct Something {...}

Two benefits of using a directive would be that that it would work in documentation comments for other languages (without needing to add a @documentation attribute to their compilers / symbol graph extractors) and the directive has support for arguments and multiline content. It does however assume that the documentation comment is parsed with a markdown parser that knows what a directive is (such as Swift-Markdown).

Could this be achieved by passing -symbol-graph-minimum-access-level private?

taylorswift:

i don’t feel strongly about this, but from an educational perspective, it would feel weird to explain why the @documentation(visibility: public) is needed here:
// isn’t ``_ChannelInboundHandler`` already public?
@documentation(visibility: public)
public 
protocol _ChannelInboundHandler 
{
    ...
}

I'm not too concerned about this. Most of the confusion could probably be covered with a general comment along the lines of:

// Underscored symbols are considered implementation details (internal) by convention.

Calling out this convention in a comment could lead to interesting follow up questions about why this specific symbol is underscored but at the same time is not considered an implementation detail. On a case-by-case basis it may be useful to cover this by continuing the general comment with something like

// This [TYPE] is underscored but not an implementation detail because [REASONS].

xwu · August 5, 2022, 9:48pm

taylorswift:

i don’t feel strongly about this, but from an educational perspective, it would feel weird to explain why the @documentation(visibility: public) is needed here:
// isn’t ``_ChannelInboundHandler`` already public?
@documentation(visibility: public)
public 
protocol _ChannelInboundHandler 

How about...let's just make it not needed here.

With an explicit attribute to toggle visibility plus command line options for symbol graph generation plus filtering on the DocC side, there is no reason that a convention about underscored APIs should continue to be hardcoded as yet another standalone feature rather than a default that's overridable at the command line and/or filterable via DocC.

QuietMisdreavus · August 5, 2022, 11:22pm

Almost. Symbols marked with @available(*, unavailable) or @_spi are still filtered out in this mode.

This would be a much easier and more broadly applicable mechanism, though as you say, it restricts you to use Swift-Markdown to parse the text to get the information, making it more of a DocC-specific thing.

While the proposal to move filtering out of the symbol graph and into DocC is interesting, i believe that is a bigger feature than this thread started with. I think it's an interesting feature, but it should have its own thread instead of trying to use this targeted feature to piggyback it in.

taylorswift · August 6, 2022, 12:06am

i have started a continuing discussion about DocC filtering here

QuietMisdreavus · August 10, 2022, 7:10pm

I've updated the implementation to use the @_documentation(visibility: ...) and @_documentation(metadata: ...) forms i proposed up here.

Karl · August 10, 2022, 7:57pm

This will be difficult for source packages to adopt. Conditional compilation of attributes has never shipped (it is in review right now), so using this in a package would require excluding developers who need to support any currently-released version of Swift, or the imminent 5.7. A comment-based approach would, however, be usable straight away by everybody.

Such a design would require that the markdown processor understand DocC directives, but that seems to be the direction anyway. It would be good to decide just how far we're willing to go beyond standard markdown, so we have a consistent answer about whether these kinds of designs (using directives in doc-comments) are acceptable.

In the future, attributes should be easier to adopt (by source packages as well). But is there an intent for this to go through swift-evolution eventually? The compiler is full of incredibly useful functionality which never goes through evolution and seems to permanently live with a leading underscore. We've thus far not deeply embedded DocC or any documentation engine in to the language itself, and I worry that our hesitation on that is likely to remain, leading to yet another perpetual unstable-but-critical feature in Swift.

But if there is to be an attribute, let's not let it linger - let's get it ready for formal adoption as soon as possible. Let's not give more unstable features a chance to spread and become a defacto part of the language.

taylorswift · August 10, 2022, 8:13pm

a comment based approach would mean that tooling that operates on symbolgraphs would also need to parse and understand markdown. i can’t speak for the DocC stack, but this is a big change for Biome’s SymbolGraphs engine, which currently treats doccomment text as opaque string data.

another implication of this would be that symbolgraphs would need to embed parsed markdown trees (instead of raw markdown) since presumably we would not want to parse markdown twice, as this is already the most computationally intensive stage of documentation compilation, at least for Biome.

Biome is moving towards an AOT compilation model, and at some point, i do want to see pre-parsed markdown being stored in symbolgraph files instead of getting JITted on the server. but this would require a lot of changes to symbolgraphs in general that i don’t think we are currently prepared to undertake.

correct. my stance towards this is similar to my stance towards SE-0346 — it is not going to be practical to adopt this feature in libraries until we have some kind of lexically-scoped #if. of course, that should not delay pitching and reviewing this feature.

xwu · August 10, 2022, 9:53pm

Not necessarily: In the same way that the Swift tools version can be parsed out of a SwiftPM manifest without the need to parse and understand Swift syntax, the relevant visibility attributes could be restricted to appear on the topmost line of a doc comment, for example, with a restricted number of options such that no parsing or understanding of Markdown is required.

taylorswift · August 10, 2022, 10:02pm

// swift-tools-version:x.x is not a good analogy because that directive has a simple grammar, that i can deduce just by looking at it:

tools-version ::= 
    ' ' * '//' ' ' * 'swift-tools-version'  ' ' * ':' ' ' * minor-version
minor-version ::= 
    unsigned-integer-literal '.' unsigned-integer-literal

it has no disjunctions, and the only conditionals involved are the whitespace/digit *’s.

block directive syntax is comparatively complex. we could impose restrictions to make it easier to parse, like mandating that it be written on one line, with canonical whitespace, but it would be weird to have two different sets of rules for what constitutes a valid block directive.