Improving the presentation of overloaded symbols in Swift DocC

taylorswift · December 28, 2023, 12:14am

i’m not entirely sure what you mean by documentation syntax as that could refer to a number of things (@_documentation attribute, swift-flavored markdown comments, swift-flavored markdown directives, DocC aside blocks, Unidoc asides, Jazzy asides etc.) but i imagine you mean something resembling the existing @available(renamed:) attribute, such as

extension A
{
    /// Some shared documentation...
    @_documentation(overloads: "f(x:)", "f(x:y:)")
    func f(x:Int)

    func f(x:Int32)

    //  For some reason, i, the writer, do not want this 
    //  included in the overload group
    func f(x:Int64)

    func f(x:Int, y:Int = 0)
}

of course, @available(renamed:) cannot disambiguate overloads today, so we are back to folded FNV-1 hashes

    /// Some shared documentation...
    @_documentation(overloads: "f(x:) [XYZ123]", "f(x:y:)")
    func f(x:Int)

this does shift the “symbol hash problem” from the references to the referents, but i’m not sure if this would actually reduce the number of hashes users would have to write, since this would essentially be asking users to eagerly deduce and record the hashes even if nothing is referring to that symbol.

dabrahams · December 28, 2023, 8:51pm

I don't mean anything in particular and am not attached to the specifics, but the thing that floated through my mind was some way of embedding a human-readable pattern in the doc comment itself.

/// ...
///
/// - Applies to: methods matching /f(.*)/

There's no reason documentation shouldn't be just as readable in-situ as it is when extracted.

taylorswift · December 28, 2023, 9:04pm

i know this was just a sketch, but your regex doesn’t actually select the methods you probably intended for it to, as it would also match something like foo(x:y:). you would need to escape the parentheses like /f(\(.*\))?/. not a game breaker, but it’s not very readable in-situ, and could get annoying pretty quickly because you would always be escaping parentheses and sometimes you might forget to escape the parentheses and we would have to design warnings for that.

it also doesn’t solve the symbol hash aspect of the problem, because a regex can’t select between overloads that have the same full name.

dabrahams · December 29, 2023, 7:57pm

I don't know what a symbol hash is, and I'm used to a regex engine (Emacs) where parentheses are not special unless escaped… and I'm not really trying to design the solution. My point was to put forward the basic requirements, as I see them, for a system that makes overloads usable.

And I agree, regexes in general are not all that readable. I can try to think of more readable DSLs for creating patterns, if you're interested. I'm not convinced it's important to select between symbols having the same full name, actually. If you don't want to group them, you can just fall back to documenting each one individually.

taylorswift · December 29, 2023, 8:36pm

a symbol hash is a bespoke variant of the FNV-1 hash applied to the mangled name of a declaration. you can learn more about them here:

https://www.swift.org/documentation/docc/linking-to-symbols-and-other-content#Navigate-to-a-Symbol

i’m not too thrilled about inventing a new regex-like DSL that we would have to teach, especially if it’s unlikely to become a dominant standard and will still have to coexist with the myriad of other documentation DSLs in the wild today.

taking a step back, i wonder if we are approaching the problem in a completely wrong way - to me at least, the strongest signal that overloads should be coalesced is if one overload has documentation and none of its overloads has documentation. we should have documentation tooling that just does the Right Thing for whatever documentation the writer has provided.

here’s a sketch i have in mind right now:

the documentation compiler should run a pass over all the direct children of each type in a module, grouping them by full name (e.g. all overloads of hiddleswifts(x:y:)).
for each full name with more than one declaration, if exactly one declaration has markdown documentation, then that symbol becomes the primary declaration for that full name, and all the other (undocumented) overloads become redirects to the primary declaration.
the documentation compiler should then run a second pass over the same nodes, grouping them by base name (e.g. all instance methods with the base name hiddleswifts)
for each base name with more than one declaration, if exactly one of them has markdown documentation, make that one the primary declaration for the base name and turn the others into redirects.
tabular data (e.g. lists of default implementations, overrides, etc.) should never be coalesced, the the web page for the primary declarations should still have links to view more detailed information about each overload. but codelinks in markdown documentation would never resolve to a symbol that has been merged into another symbol.

rauhul · December 29, 2023, 8:39pm

Is there a reason this needs regexes or symbol hashes? An author could tag each method with a unique id and the doc compiler could use that to merge symbols:

@documentationGroup("foo")
func foo(_: Int) { ... }
@documentationGroup("foo")
func foo(_: String) { ... }
@documentationGroup("foo")
func _foo() { ... }

Aside: I personally find the docc hashes annoying. When symbols change, docs can be left with dangling opaque hashes and figuring out what symbol they previously referenced and should now reference is quite frustrating. I ran in to this with swift-argument-parser which has a whole ton of overloads for @Argument, @Option and @Flag.

taylorswift · December 29, 2023, 8:46pm

i built a documentation compiler many years back (it was called Entrapta) that used what was essentially this idea. you can still find traces of the syntax today in the swift-png repository.

to put it bluntly, i hated it, because you - the documentation writer - basically had to come up with a parallel set of names for all the concepts in the code base and you had to maintain that naming scheme along with the actual naming scheme for the API itself. and when you were renaming the API, you had to think about how that would affect the names of the documentation “groups” and overhaul the groups to align with whatever new terminology you were adopting. in my opinion, documentation “groups” should absolutely not become first-class concepts.

michelf · December 29, 2023, 10:57pm

DDoc has a simple trick:

/// Documentation for c and d.
int c;
/// ditto
int d;

In D, a ditto doc comment tells the documentation parser to repeat documentation form the previous symbol (generally by grouping the symbols together). This only works when symbols are adjacent to each other, but this is often the case for overloads.

We could use this as a way to signal when overloads are to be documented together:

/// Test if `value` is greater or equal than zero.
/// - Parameter value: the value to test
/// - Returns: `true` if `value` is greater or equal than zero, `false` otherwise.
func isPositive(value: Int) -> Bool { value >= 0 }
/// ditto
func isPositive(value: Float) -> Bool { value >= 0 }
/// ditto
func isPositive(value: Double) -> Bool { value >= 0 }

taylorswift · December 30, 2023, 9:43pm

michelf:

/// Test if `value` is greater or equal than zero.
/// - Parameter value: the value to test
/// - Returns: `true` if `value` is greater or equal than zero, `false` otherwise.
func isPositive(value: Int) -> Bool { value >= 0 }
/// ditto
func isPositive(value: Float) -> Bool { value >= 0 }
/// ditto
func isPositive(value: Double) -> Bool { value >= 0 }

supposing we just had

/// Test if `value` is greater or equal than zero.
/// - Parameter value: the value to test
/// - Returns: `true` if `value` is greater or equal than zero, `false` otherwise.
func isPositive(value: Int) -> Bool { value >= 0 }

func isPositive(value: Float) -> Bool { value >= 0 }

func isPositive(value: Double) -> Bool { value >= 0 }

are there any situations in which we would not want the last two overloads to become aliases of the first one?

michelf · December 30, 2023, 11:27pm

In this case, lack of a doc comment could mean one of two things: either it hasn't been written yet or it is meant to be grouped. One of those needs human intervention (writing documentation). With ditto you can express grouping as an editorial intent and thus tell the two cases apart by automation.

But I guess the main advantage of ditto is it allows arbitrary groupings. You can group things together even if they aren't really overloads. You could for instance document operator + and += together as a group. Or maybe sort() and sorted(). No need for the name to be the same or for anyone to agree on a mechanical definition for an overload.

Note that in D the meaning of ditto is only that the documentation is repeated. It's up to each documentation generator to decide if and when they will present those repeated doc comments as a group or as separate pages.

taylorswift · December 30, 2023, 11:50pm

that might be relevant from a maintainer’s point of view, but for users reading the documentation, does it really matter?

when serving documentation to a reader, one has two choices:

show nothing at all.
direct them to a slightly different symbol with documentation that might also apply to the symbol the reader has queried.

i think #2 is preferable to #1, because if we just returned a blank page, the reader would probably just end up spending extra time searching for #2 manually.

sometimes the documentation for the symbol might be misleading when applied to the overloaded symbol. but that would happen anyway when the reader discovers the documented symbol. the difference is the reader would learn the wrong thing slowly by clicking through multiple pages instead of learning the wrong thing quickly by scanning a single page.

michelf · December 31, 2023, 12:46am

I was only suggesting ditto as an alternate way of creating groups, reacting to you saying you hated creating a separate set of names for grouping in Entrapta.

Then you asked why not just omit the doc comment altogether, and my answer is so yourself (as the maintainer) and automated tools (helping you) know you've reviewed it and it was not just an omission.

The question about what to emit in documentation when there is no doc comment at all is a third question for which I don't I don't really have an opinion. I suppose it should be up to the documentation generator to figure out something. Maybe it should be configurable. I mean, when there is no authoring intent, do your best but don't expect perfection. That's why there should be a warning, either in the generator's logs, on the output page, or both.

Note that nowhere in this I am talking about overloads. Overloads are just a heuristic you can use to have some likely-appropriate documentation when there is no doc comment telling the documentation generator what to do or how to group things.

taylorswift · December 31, 2023, 1:55am

okay, so it sounds like ditto as you described it serves at least two distinct purposes:

it’s useful as a linting marker for things like coverage metrics
it’s useful as a grouping hint for spatially contiguous declarations in a file

that seems motivating to me. my only concern is with implementation of #2, today the swift compiler does not support emitting any information about the spatial layout of a project’s files.

dabrahams · January 9, 2024, 7:51pm

I think this is roughly what I was getting at when I wrote “I'm not convinced it's important to select between symbols having the same full name, actually. If you don't want to group them, you can just fall back to documenting each one individually.”

There might still be a role for an explicit annotation saying that you're documenting an overload set, but it also might not be needed. I guess it would be good to be able to warn people when they have both group documentation and one or more of the functions is individually documented. But you could also potentially do that with a none-one-all rule and no special annotation.

Your sketch sounds reasonable to me, though some parts are unclear. I'm not sure I understand the motivation for separately passes doing full name and then base name grouping, and I don't understand the implications of no. 5. in your list.

FWIW, "ditto" seems pretty limited, as sometimes overloads are most logically in separate files (and certainly it wouldn't work for operators, which are static methods).

taylorswift · January 9, 2024, 9:33pm

the motivation for doing two passes is so that you could still have overload groups that are limited to a full-name match instead of just a base name match. so if you had

f(x:)
f(x:)
f(x:)

f(y:)
f(y:)
f(y:)

you still have the option to create two groups instead of having choose between documenting all six methods separately, or merging them all into one page.

the reason for the 5th bullet point is that in rare situations the individual declarations might come with very long lists of declaration-specific tabular data (“what overrides this”, “what default implementations does this have”, “what requirements does this satisfy” etc.) that is generated by the documentation compiler and i felt that injecting them into the written documentation would just make the pages too cluttered and hard to navigate.

lettieri · January 21, 2024, 1:47am

Thanks for all the recent feedback on this proposal! I’ll try to address some of that here.

First, I agree with @taylorswift that we should not treat a default implementation like an overload, even though it might appear that way just by looking at the symbol signatures. Much like a class and a structure cannot be overloads of one another, a protocol requirement and a default implementation cannot be overloads of one another. As such, we should maintain distinct pages for a protocol requirement and its default implementation.

Of course, a protocol requirement itself can have overloads, like this:

protocol MyProtocol {
    func doSomething(thing: String)
    func doSomething(thing: Int)
}

These should get the same merged-page behavior that other overloads get, as should any corresponding implementations. But the requirements and the implementations don’t mix.

Regarding generic constraints, DocC doesn’t currently distinguish whether these came from the enclosing extension or from the symbol declaration, and this proposal doesn’t aim to alter that. Under this proposal, DocC ignores all generics for the purpose of determining whether two symbols are overloads of one another.

Regarding the concern about a pathologically long overload list, it’s worth considering how to streamline the presentation of those in a future proposal. However, that appears to be a fairly rare situation in practice, and I don’t think a long list of that kind will be worse under this proposal than it is today, where you get a long list of symbols appearing instead a level up in the hierarchy. Under this proposal, you will at least have a way of condensing that list, as when you look at the default view of a merged page.

More generally, as mentioned in the original post, I definitely agree that there is room to expand on this proposal in the future — for example, by adding syntax to enable writers to define manual overload groups as a supplement to the purely data driven ones in this proposal. But I think what’s in the proposal now provides a lot of value, while laying the groundwork for future enhancements.