Approaches for multiple extension block doccomments

taylorswift · April 29, 2023, 12:24am

preface: i am posting this here and not in Using Swift because it’s not really about using swift, it’s about the design of future tooling.

for those who aren’t aware, @theMomax did a lot of great work over the summer to land extension block symbols in lib/SymbolGraphGen, and although you probably need to enable some feature flags to use it in DocC, we can now theoretically provide documentation attached to individual extension blocks, as opposed to the extended type itself.

/// This is an extension block comment!
extension Wardrobe
{
}
/// This is another extension block comment!
extension Wardrobe
{
}
/// This is an extension block comment with a constraint.
extension Wardrobe where Season:ColdSeason
{
}
/// This is another extension block comment, where `Season == Spring`.
extension Wardrobe<Spring>
{
}
/// This is another extension block comment, where `Season == Summer`.
extension Wardrobe<Summer>
{
}

this is really exciting, but it presents a new connundrum: what to do when there is more than one doccomment per extended type?

unlike external markdown files, which can use merge directives to specify collation, there is not really an obvious collation strategy for multiple extension block comments. for example, we can:

discard all doccomments, except for the longest one across all extension blocks that extend the same type, and then use filenames + source positions to break ties (this is the current behavior of DocC).

This is another extension block comment, where `Season == Spring`.

concatenate doccomments across all extension blocks that extend the same type, using filenames + source positions to define ordering.

This is an extension block comment!
This is another extension block comment!
This is an extension block comment with a constraint.
This is another extension block comment, where `Season == Spring`.
This is another extension block comment, where `Season == Summer`.

discard all doccomments, except for the longest one across all extension blocks that extend the same type with the same constraints, and then use filenames + source positions to break ties.

This is another extension block comment!

This is an extension block comment with a constraint.

This is another extension block comment, where `Season == Spring`.

This is another extension block comment, where `Season == Summer`.

concatenate doccomments across all extension blocks that extend the same type with the same constraints, using filenames + source positions to define ordering.

This is an extension block comment!
This is another extension block comment!

This is an extension block comment with a constraint.

This is another extension block comment, where `Season == Spring`.

This is another extension block comment, where `Season == Summer`.

something else.

i personally am leaning towards #3, because i think it’s valuable to provide documentation broken down by generic signature, and i’m not really a fan of tying documentation collation to source file names. but i am interested in hearing how others are documenting extensions, and if there are any approaches i may have missed.

ahti · April 29, 2023, 5:05am

Do either discarding or concatenating really make much sense? I admit I have never considered putting a doc comment on an extension, but if I did I would put information that applies to all declarations within, or that ties them together in some way.

Thus, that bit of documentation should be shown somewhere close to the docs for the members of the extension, not unlike it is in the source file. This then removes the need to concatenate or discard anything as each comment has its own place next to its own declarations.

xwu · April 29, 2023, 6:41pm

It would great to get a sense of what content folks are putting into these extension doc comments—are they usually a shorthand for information applicable to all the members defined in an extension, or are they usually commentary about the type itself (and if so, why?).

Common usages of such doc comments may also differ depending on whether the extension states additional constraints, protocol conformance(s), both, or neither. Convention isn't necessarily determinative of what the right approach here should be, but the solution shouldn't "run against the grain," as it were.

Without reference to any such empiric knowledge, I'd be concerned that arbitrarily picking the longest comment while dropping others could be even worse than dropping all the comments altogether. If a user bothered to separate members into two groupings even though they could be combined into one, and they bothered to provide different documentation for each grouping, then showing one comment as though it applied to all members may be actively misleading:

Imagine, for instance, two groups of collection APIs distinguished by performance guarantees. The doc comments may say: for the one extension, "The following APIs are optimized for finding elements from the beginning of the collection"; for the other extension, "The following APIs are most performant when used to find elements from the end of the collection." Showing the first comment for all of the APIs simply because the word "beginning" has more letters than the word "end" would be...not good.

In this example, concatenation would lead to merely confusing (internally contradictory) text rather than actively misleading documentation, but it isn't hard to imagine scenarios where concatenation of text that isn't obviously mutually exclusive leads to a resulting comment that says something plausible but wrong for all the APIs it's attached to.

Consider, for example, if I had a concrete type that appropriately conforms to protocol P because it fulfills all of its requirements (including its semantic requirements). However, a small handful of methods required by P are not recommended for use in the concrete context because there are more performant alternatives. I group these implementations in a discrete extension and document this fact with an extension doc comment; all other requirements are implemented in other extensions which don't have any doc comments of their own. In the source code, the meaning of my doc comment is crystal clear, but if either only the longest doc comment is displayed or all of them are concatenated together in the documentation, the result would incorrectly warn off users from using all methods that I declared in extensions.

taylorswift · April 29, 2023, 8:00pm

thanks Xiaodi for the in depth review!

i agree that concatenation / dropping doccomments is not great, but i am struggling to think of workable alternatives to the existing DocC model. it would be great if we could do something like:

but this requires defining a canonical ordering for the extension blocks. for some symbolgraphs, sourcemap information is available, and for extension blocks in the same source file, we could order them by line position. but this ordering isn’t complete across extension blocks of the same signature distributed across multiple source files. it’s not clear to me how we could enable overriding the default ordering. and the sourcemap information itself is not always available at all.

defining a stable ordering for the extension blocks is important because we don’t want them to jump around the page every time the page is reloaded, or the docs are rebuilt.

we would also want the symbol categorization to be at least somewhat consistent with the one we use for internal nested declarations. pages for the module’s own types would be broken down by phylum (actor, class, struct, enum, protocol, etc.) but pages for external types would be broken down by extension block. this feels weird to me.

theMomax · April 29, 2023, 9:03pm

Thanks for bringing up this topic again @taylorswift. I definitely think that this area need work, but essentially I just didn't have the time to come up with, research, or implement a more advanced solution, so I just went with the simplest plausible solution, that also provides a stable result. The current behavior is not ideal, and I think @xwu 's comment shows its weaknesses very well.

I also want to emphasise that the changes I made only apply to extension blocks that extend a type from a different target, i.e. this discussion does not apply to regular extensions you make to local types.

That being said, I wanted to bring up another idea, how extension block comments could be handled:

Extension block comments, or even extension markdown files, could define a name in their DocumentationExtension Metadata tag, e.g. like this:

/// The following APIs are optimized for finding elements from the _beginning_ of the collection.
/// 
/// @Metadata {
///     @DocumentationExtension(name: CollectionExtensionsOptimizedForPrefixSearch)
/// }
public extension Collection { /* ... */ }

You could then use this name in a Include annotation somewhere else, e.g. an extension markdown file with @DocumentationExtension(mergeBehavior: override), or just the longest documentation comment on any of the relevant extension blocks:

/// This library adds advanced capabilities to Collection types.
/// 
/// ## Searching from the start
/// 
/// @Include(name: CollectionExtensionsOptimizedForPrefixSearch)
///
/// ## Searching from the end
/// 
/// @Include(name: CollectionExtensionsOptimizedForSuffixSearch)
public extension Collection { /* ... */ }

It's just a rough idea, but I still wanted to put it out there in case someone wants to pursue it further.

xwu · April 29, 2023, 9:05pm

Is there a stable ordering for members (properties, methods, subscripts, etc.)? If so, then there are multiple plausible stable orderings for extensions—for instance, by first member in order of appearance in the stable ordering of members, or by order of appearance in the stable ordering of members of the first member declared in an extension.

taylorswift · April 29, 2023, 9:15pm

yes, members have identities, so we can sort them by path and then break ties by ABI name. extension blocks also have SymbolGraphGen-level “identities”, but these identities are arcane and don’t usually match intuition, so it’s not really something i’m keen on surfacing to the end user. (not that ABI names are intuitive either, but at least it is something that is “real”, that library maintainers at least make some effort to keep stable, and can be inspected with swift demangle.)

extension blocks can be completely empty and declare only retroactive conformances, moreover they can be completely empty and declare no conformances either, yet still carry a documentation comment. i imagine because DocC “arbitrarily” drops doccomments, this pattern may become quite common, and it could take time to migrate away from it.

/// This is a completely empty extension block, and i have placed the
/// doccomment here because DocC only supports one doccomment, and
/// this documentation isn’t coupled to any particular extension block.
extension Runway
{
}

taylorswift · April 29, 2023, 9:23pm

by the way, it might help to standardize on some terminology:

extension block: a single lexical block, delimited in source by { }.
extension (or extension block group): a group of extension blocks that share the same generic signature. this implies they all extend the same type as well.
extension API (or simply extensions): a group of extensions that extend the same type, that appear in the same module.

xwu · April 29, 2023, 9:24pm

I think there is a critical distinction here between the two. It has been discussed in these forums at multiple points that a type extended from a different target is, although Swift does not allow one to name it, a distinct type. That is to say, module A's extended version of Swift.Int is distinct from module B's extended version of Swift.Int. Your work in surfacing documentation for such extensions is critically important, even if there are improvements that can be made down the line.

From a pragmatic standpoint, it also stands to reason that the author of A who has something to say about the type known as "A's extended version of Swift.Int" has nothing to attach that documentation to except an extension block. So the current implementation is defensible from that standpoint.

For types that are in the same target, by contrast, it has been said on these forums on multiple occasions that extensions do not have a distinct existence from the type that's extended. In every circumstance where the question in this thread here arises (i.e., where there are multiple extensions with the same generic constraints and conformances), it would have to be because extensions are deliberately used for code organization purposes. To me, then, it would be sensible if, just as we allow access modifiers (e.g., public) to be applied to extension blocks as a shorthand for applying them to all their members, doc comments that are found in these scenarios are treated as being applied to the members of the extension. On a pragmatic level, better unnecessary repetition than information loss.

taylorswift · April 29, 2023, 9:27pm

i don’t think this would be useful in practice, if i have written a passage and attached it to an extension block, i want it to appear in one canonical location, i do not want it parroted on every nested declaration.

xwu · April 29, 2023, 9:27pm

Quite common?! What documentation comment could one possible wish to affix to an extension that declares no conformances and has no members? I would strongly push to make the opinionated decision that the design of this feature should absolutely, positively pay no mind to supporting such silliness.

An alternative position would be that an extension block that is empty and states no conformance clearly cannot be discussing anything about its non-existent members or conformances, so it can be safely concatenated into documentation of the parent type and in no particular order. Everything else can still have a canonical order determined by the order of its members.

xwu · April 29, 2023, 9:28pm

I think we have very different conceptions, then, of what a doc comment on an extension block is "for," and I think that it would be good to get empirical evidence of how people are actually using it.

There is plenty of precedent, by the by, for "parroting" documentation in Swift (as I'm sure you're aware)—for example, this is what happens by default (at least when viewed through Xcode) when a protocol requirement is implemented without documentation of its own.

Now obviously, if there is one canonical place for documentation to appear which is universally appropriate and would never be overlooked, then that is clearly the superior option. However, if as you say there is no such solution, I would far prefer parroting everywhere in extension members than either of the choices presented above, for the reason that redundancy at least isn't misleading.

taylorswift · April 29, 2023, 9:36pm

DocC doesn’t support this:

/// This wardrobe only contains things that can be worn in
/// the winter or the fall.
extension Wardrobe where Season:ColdSeason
{
    subscript(cold index) -> Outfit<Season>
}
/// This wardrobe contains things that can be worn in spring.
extension Wardrobe<Spring>
{
    subscript(spring index) -> Outfit<Spring>
}
/// This wardrobe contains things that can be worn in summer.
extension Wardrobe<Summer>
{
    subscript(summer index) -> Outfit<Summer>
}

so we need to pick one extension block to host the combined documentation. but it’s not obvious which block should be the master block. in the absence of a clear winner, it would make the most sense to create another empty extension block for the purposes of hosting the documentation.

/// When `Season` is ``ColdSeason``, this wardrobe only
/// contains things that can be worn in  the winter or the fall.
/// When `Season` is ``Spring``, this wardrobe contains ...
extension Wardrobe
{
}

extension Wardrobe where Season:ColdSeason
{
    subscript(cold index) -> Outfit<Season>
}
extension Wardrobe<Spring>
{
    subscript(spring index) -> Outfit<Spring>
}
extension Wardrobe<Summer>
{
    subscript(summer index) -> Outfit<Summer>
}

this is currently “seen” as a problem from the perspective of many folks who are handling documentation archives, duplicated content is bad from a storage, browsability, and SEO standpoint, and reducing duplication of documentation is an oft-requested improvement. SymbolGraphGen currently contains several flags to omit duplicated content, and in the future i hope it will gain more such functionality.

xwu · April 29, 2023, 9:42pm

But isn't this discussion precisely about how to support this?

Are you referring to the case where Wardrobe is an external type? (Surely, as otherwise this documentation would be affixed to the primary declaration.)

As I replied above to @theMomax, I do think this is a distinct scenario with totally unique significance as compared to other uses of extensions—and, moreover, by construction it doesn't implicate the question posed in your main post on this topic of multiple extensions all indistinguishable from each other on the basis of generic constraints or protocol conformances, since it's literally about creating a singular extension block for hosting documentation. This can (and I think should, for the reasons I gave above to @theMomax) have its own special treatment.

xwu · April 29, 2023, 9:43pm

Of course it's not optimal; I'm just arguing that duplicated correct documentation is far, far better than incorrectly deduplicated documentation. I don't think that should be controversial.

taylorswift · April 29, 2023, 9:54pm

refactoring documentation requires human effort, DocC and other documentation engines cannot simply flip a switch to enable block-level documentation without a backcompat story for workarounds like empty extension blocks that became common before the feature was introduced.

i think the binary “internal-vs-external” framing is too limiting here, there is also the in-between situation where a module provides extensions for other modules in the same package. the DocC frontend doesn’t support this, but other documentation frontends do, and it is fully supported (bugs notwithstanding) at the SymbolGraphGen level.

the vast majority of locations you can currently leave doccomments within the AST are invalid and will cause your documentation to be lost. for example, if you leave a doccomment inside a func body, it will be lost before it even comes out of SymbolGraphGen. if you leave doccomments on extensions (to external types), they will be preserved by SymbolGraphGen, but sometimes lost at the DocC level. this isn’t always “incorrect behavior”, it is just unsupported.

when adding patterns to the list of “supported doccomment locations”, it’s important to consider the impact it will have on data duplication.

xwu · April 29, 2023, 10:26pm

My point above was that the proposed behavior here would, in the scenarios I outline above, produce worse (actively misleading, thus incorrect) output than just dropping the comments on the floor, what you label as “just unsupported.”

To be clear, my feedback would be that it is better to duplicate unnecessarily than to have the comments remain “unsupported,” but better to leave them unsupported entirely than to deduplicate incorrectly and show them.

Not that I am arguing for unnecessary bloat, but preferencing storage compactness (a factor you cite above) in a discussion about information presentation is an inversion of priorities, I would think, and we really shouldn’t need to be inventing ersatz compression heuristics.

xwu · April 29, 2023, 10:31pm

For clarity, are you saying that empty extension blocks are already common?

taylorswift · April 29, 2023, 10:35pm

the “proposed” behavior here is to preserve doccomments for each extension (multiple blocks, same generic signature), the current DocC behavior drops all (but one comment) on the floor. the “proposed” behavior would preserve more, but not all, of the possible doccomments.

maybe skim the transcript of Swift Package Indexing | Transcript: 19: The SPI project is growing up, DocC uploading with AWS Lambda, and Are we server yet? it is a real problem without good solutions at the moment, and i have also personally spent a lot of effort trying to work around it.

i am saying i think it is likely to become common very soon, now that Enablement of DocC Extension Support as Default in Swift 5.9 has landed.

your argument is essentially that this launch is premature and the feature is not ready for release. while i was not present for the vote on monday, i think it’s fair to say that ship has already sailed.

xwu · April 29, 2023, 10:46pm

Let’s be more precise here: you’re referring to the behavior implemented for Swift 5.9 in respect of extensions of external types only, yes? I don’t have a problem with that behavior, for the reasons I discuss above.

I do not think either this behavior or what you call the “proposed” behavior should be extended to extensions of local types because, in the scenario where such extension blocks cannot be otherwise distinguished, the author has deliberately signaled an intention that the extension block has been used for grouping purposes, and applying either one or all of the documentation comments to all blocks would be potentially misleading.

My rationale for this opinion is totally inapplicable in the scenario you outline where a user is motivated to use a single, empty extension to document something about an external type, but then again so is the entire issue about what to do about multiple such documentation blocks—unless I’m missing some reason why the Swift 5.9 implementation would encourage authors to create multiple such empty extension blocks in the future?