Improving the presentation of overloaded symbols in Swift DocC

lettieri · October 6, 2023, 11:24pm

Swift-DocC treats overloads — which I'm defining as symbols of a particular kind with identical base names and argument labels but different argument types and return types — as distinct symbols, each with its own documentation page. I propose adding an optional feature to DocC that, when enabled for a given build target, collects all the symbols that are overloads of each other onto a new kind of merged-symbol documentation page. The merged-symbol pages replace the individual pages that DocC would otherwise create for each overloaded symbol.

This change improves the navigation experience for readers by reducing repetitive information in both the sidebar and in topic groups of the rendered documentation. It reduces the number of results that turn up in web searches and sidebar filtering by folding overload pages into a single page. And for the same reason, it enables simpler linking to overloaded symbols, both for documentation authors and for others linking to your public content.

Motivation

Today, DocC automatically generates a unique page for every public symbol using the symbol’s full name as the last path component of the symbol’s URL. Every class, structure, and enumeration gets a page, as does every initializer, property, method, and so on within those entities. This ensures that documentation represents every part of the API without depending on a writer to manually identify each symbol. DocC links to all of these pages from the sidebar, as well as from topic groups that appear on other symbol or collection pages.

Separately, Swift supports overloads, which are symbols that have the same full name, differing only in generics or types. Typically, these kinds of symbols also have substantially similar behavior, performing a common operation on different kinds of inputs. For example, the following methods from SwiftUI are overloads because they have the same name, specifically searchable(text:placement:prompt:) , and differ only in how they acquire the text for a prompt:

func searchable(
    text: Binding<String>,
    placement: SearchFieldPlacement = .automatic,
    prompt: Text? = nil
) -> some View

func searchable(
    text: Binding<String>,
    placement: SearchFieldPlacement = .automatic,
    prompt: LocalizedStringKey
) -> some View

Supporting Swift overloads in the face of a one-page-per-symbol strategy creates challenges for both readers and authors. For example:

The docs contain multiple pages about what is conceptually a single, configurable behavior. Being presented with many substantially identical pages (in the sidebar, in a topic group, or in search results) adds to the reader’s cognitive load and reduces discoverability. Which page should I look at? Do I need to visit all of them? How do they differ? Ideally, there would be a single place to read about a particular behavior.
Link text is hard to read. To enable readers to distinguish between overloaded methods that appear in a list of symbol links, like in the sidebar or a topic group, DocC includes type and generic information in the link text in those places. This creates long, noisy links that are hard for a reader to scan through. The length also frequently produces truncation in the sidebar, eliminating the advantage that including the extra information was meant to confer. Meanwhile, the type and generic information typically isn’t relevant to a reader at this point in their journey, when they are browsing to learn what kinds of things are broadly possible.
An overloaded symbol’s URL is obscure. Because the last path component of a symbol’s URL is the symbol’s name, DocC appends a unique string to the URL of an overload to distinguish it from other overloads with the same name. However, these strings are hard for humans to discover or interpret, and can change in surprising ways as the code evolves, making overload symbol URLs difficult to work with for both readers and authors. For example, you can directly type the URL of a symbol into your browser if you know its name, but that isn’t possible if the symbol is an overload.

Proposed solution

I propose combining the documentation for all the overloads of a particular symbol onto a single page. The combined page provides enough information for a reader to discover all the overloads that the page represents, as well as a means to focus on a particular overload. To avoid overwhelming the reader, the page updates its content to reflect the overload in focus, including automatically-generated content like availability markers, as well as writer-generated content like abstracts and discussions. In most cases, most of the information across overloads is the same, with the primary difference being the types that appear in the declaration. However, updating all the content per selected overload provides flexibility for when there are differences.

Under this proposal, DocC applies overload page unification automatically, without writer intervention. A symbol that DocC identifies as an overload gets collected onto a page with its related overloads. Automaticity ensures uniform treatment across an entire project, and enables improvements to the way content is presented and accessed.

Because there is one page per overload, DocC can:

Present all the information about a group of overloaded symbols in one place. There are fewer pages a reader needs to visit to discover all the possible behaviors.
Simplify presentation of symbols at the point of curation. In links that appear in the sidebar and in topic groups, DocC can omit type and generic information, just like it does when rendering a symbol as a link inside a discussion. This makes it easier to scan through the list of available symbols.
Simplify symbols URLs. In the vast majority of cases, authors and readers won’t need to concern themselves with the unique strings that DocC uses to distinguish between the URLs of overloads.

Detailed design

Definitions

This proposal uses the following terms:

A symbol’s full name is the combination of its base name and, if applicable, argument labels (the keywords you use to refer to parameters at the call site) listed in parenthesis, delineated by trailing colons. The full name omits types and generics, as well as parameter names (which you use to refer to parameters inside a method, if different than the argument label). For example, this is the full name of one of the initializers for the Option structure in ArgumentParser:
```
init(name:parsing:help:completion:)
```
In contrast, the initializer's type signature includes type and generic information:
```
init<T>(
    name: NameSpecification,
    parsing: SingleValueParsingStrategy,
    help: ArgumentHelp?,
    completion: CompletionKind?)
```
A full-name overload — which I refer to simply as an overload throughout this proposal — is an initializer, method, function, subscript, operator, or property that has the same full name as another of the same kind of entity in the same scope. A symbol can be an overload of any number of other symbols. Two symbols with the same full name but of different kinds or in different scopes are not considered overloads, and remain completely distinct symbols. Similarly, symbols with the same base name but different arguments are not considered overloads.
An overload group is a collection of symbols that are overloads of one another.
A disambiguating hash — sometimes just referred to as the hash — is the small handful of characters that DocC appends to the URL of an overload that differentiates its URL from all the others in the same overload group. DocC calculates the hash based on additional information about the symbol, including its type signature and full scope.

Importantly, the hash is distinct from the type string that DocC uses to distinguish symbols with the same full name, but of different kinds, which are not considered overloads for the purposes of this proposal. For example, the following two methods are not overloads:
```
public class Something {
    public func something() -> Int { 0 } // something()-method
    public static func something() -> Int { 0 } // something()-type.method
}
```

Presentation

The merged page

All the symbols in an overload group appear on a single page that resembles the symbol pages in use today. Most of the information on the page — including eyebrow, title, abstract, availability markers, deprecation message, parameters, discussion, and return value — comes from the content associated with the overload that is currently in focus. Note that the eyebrow (which reflects the kind of symbol) and title (which is the full name) are the same for all overloads in the group. Otherwise, they wouldn’t be overloads.

The main difference is the declaration section, which now includes a collapsible list of declarations, with one declaration per overload. The page visually highlights the declaration of the overload that’s in focus and that corresponds to all the other content on the page. People browsing the page can use a new UI element below the declarations to alternately collapse and expand the declaration list. They can also click on any of the declarations when the list is expanded to bring a different overload into focus.

The sidebar

Today, the sidebar displays links to symbols using the type signature:

DocC uses this formatting for all symbols, whether part of an overload group or not, and whether an author has manually curated the symbol or DocC has automatically curated it. With the new feature enabled, DocC will instead display links to symbols using only their full name, and will only display one link per overload:

As an exception, an author might choose to link directly to a specific overload in manual curation if the author judges that the symbols in the overload group are different enough to warrant individual listings. While that’s likely to be rare, DocC will in that case revert to displaying the type signature for each explicitly named overload.

Topic groups

Today, DocC presents symbols in topic groups using three elements: the type signature as link text, an abstract, and optional generic information with zero or more statements like “Available when Element conforms to Equatable .”

After this change, these elements change as follows:

Link text - Like in the sidebar, topic group links will by default use full names, and will only use the type signature in rare cases when the author chooses to link to a specific overload from manual curation. In other words, the link text in the topic group will continue to match the link text in the sidebar.
Abstract - DocC continues to display abstracts, as before. However, each overload in a group might have its own abstract (along with other documentation content), so DocC must choose which of these to display in the topic group listing for a merged page. To resolve this, DocC will display the abstract of a deterministically selected overload — for example, sorting the overloads by type signature and then choosing the first of those with a non-empty abstract, if any.
Generics - Generic statements will be omitted. That information can be different per overload (indeed, it can be all that is different per overload), and in those cases there isn’t one statement that applies to all the symbols in the overload group. That information was and continues to be available on the symbol’s page, which is where a reader typically needs the information anyway.

Links in discussions

Today, DocC presents links to symbols that appear in discussions using the symbol’s full name.

This proposal doesn’t change the above presentation, but the behavior can vary slightly for overloaded symbols, depending on how the author constructs the link. Specifically, authors can choose a particular overload to bring into focus when someone follows the link. If authors don’t specify a particular overload, DocC chooses a default overload instead.

Authoring

Content

Writers don’t do anything special to author content for an overload. They provide the usual fields in documentation comments in source or in extension files as before, including abstract, discussion, and if applicable, parameters and return value. DocC decides where and how to display this information as described earlier in this proposal. In particular, DocC displays the content on the merged page that’s associated with the overload that’s currently in focus.

DocC chooses one abstract from all of the overload symbols to display in a topic group. As such, writers will typically use the same abstract for all of the overloads, and keep it generic enough to apply to all the overloads in the group. That way, it doesn’t matter which one DocC displays in a topic group.

Links

After the new feature is implemented, writers (and external linkers) can omit the disambiguating hash that overloads require today, and DocC still displays the merged page, but puts the first overload in focus. The “first overload” should be chosen deterministically.

Writers can optionally link to overload symbols using the disambiguating hash, as they did before. Thus, this proposal doesn’t invalidate any existing links that appear in current documentation, or made by external parties to your public documentation. DocC follows the link by displaying the merged overload page and putting the referenced symbol into focus.

Future directions

Group by base name rather than by full name

This proposal considers symbols with the same full name to be overloads of each other. In the future, DocC could add an option to broaden the definition of overloads to include methods with the same base name and different arguments. That would reduce the total number of pages readers need to visit to learn about a particular kind of functionality, but at the expense of some discoverability in the sidebar and in topic groups, plus the need to accommodate more variation on the merged page.

Enable manual definition of overloads

This proposal specifically adds no new syntax for authoring docs, but in the future it might be useful to define some markup that enables authors to indicate a default overload or a sort order. DocC would then use that information to present overloads on the merged page, as well as to pick the abstract to display in the topic group to represent the entire group.

As a further step, DocC could enable authors to manually define overload groups, either in addition to or in place of the overload groups that DocC detects automatically. This would require new syntax to mark methods as overloads of each other, as well as a generalization of the merged page to allow for a wider variety of differences between the overloads.

jrose · October 7, 2023, 1:00am

I have complicated thoughts around how default arguments and method families with slightly different argument lists fit into this, but I'll save those for later. Right now I just want to suggest some tweaks to terminology:

What you call a "signature" the compiler calls a "full name". I'm not sure if that terminology has reached official documentation yet. "Type signature" is a not-uncommon phrase that means the opposite set of information from "full name".
What you call an "overload" is historically ambiguous; I've used the term "name-based overloading" when talking about multiple programming languages to describe a group of methods with the same base name and different argument labels (whether or not the types are the same). I'd personally suggest picking something more specific, like "same-full-name overload" or "type-only overload", to avoid confusion. (I generally push in casual discussion for people to not think of "same base name" as "overloading" but because of default arguments, partially-written code, and erroneous code it can show up that way in practice.)
You also say "two symbols with the same name but of different types" and I think "of different kinds" would probably help clarify your intent.

I realize this is likely to be internal terminology, but having the internal terminology align with both the compiler and the user-facing documentation would be better than not, yeah?

bjhomer · October 7, 2023, 1:34am

I love this direction so much. Currently, I frequently encounter “duplicate” overloads when browsing documentation (especially when using SwiftUI) and the presentation you’ve shown here makes my heart happy.

I didn’t see any specification of the order of the overrides listed when viewing all overrides of a given “full-name signature”. I would expect that whichever symbol sorts first in that list would also be the one that is used for the abstract in the topic groups.

It would be nice if the “simplest” override were presented first, or at least the most approachable one. It might be possible to heuristically identify such overrides (e.g. prefer stdlib types like “String” over something like “LocalizedStringKey”, maybe?), or if not, maybe they could be sorted lexicographically on the tuple of the parameter type names? It would be nice to allow authors to designate one override as the “primary” one for documentation purposes, though.

Overall, though, I’m thrilled about this idea.

ronnqvist · October 8, 2023, 2:04am

I would also prefer to use this terminology. Both DocC and the symbol graph format call searchable(text:placement:prompt:) the symbol's "name" and call the combination of argument types and return types (for example (Binding<String, SearchFieldPlacement, Text?>) -> some View) the symbol's "type signature".

ronnqvist · October 8, 2023, 3:06am

This looks really great! I'm very excited for this.

The solution and design look good to me. My only feedback are on some very specific details:

This makes sense to me but I would like to provide a more complicated example to point out a few nuances about links and possible collisions because this doesn't mean that authors can omit disambiguation completely.

Say that you have a symbol name with multiple collisions across more than one symbol kind. For example:

public class Something {
    public func something() -> Int { 0 }     // something()-41une
    public func something() -> String { "" } // something()-1vm00
    
    public static func something() -> Int { 0 }     // something()-4jscr      
    public static func something() -> String { "" } // something()-7qh17
}

Given that symbols of different kinds are not considered an overload and doesn't get put on the same overloads page; the link something() without any disambiguating is still ambiguous because it could refer to either of:

something()-method the overload page with the two instance methods
something()-type.method the overload page with the two static methods

This is the same behavior that one would get today if there was only one instance function and one static function:

public class Something {
    public func something() -> Int { 0 }        // something()-method 
    public static func something() -> Int { 0 } // something()-type.method
}

So when you say that "writers can omit the disambiguating hash" that is correct but unfortunately writers may still need to include symbol kind disambiguation.

I would prefer to avoid this behavior. This sounds good in simple cases but there are real edge cases where this can result in some really surprising behaviors.

For example, say that my target has a Something class and a "Something-styles.md" article. If I have a <doc:Something-style> link then "style" contains only characters that are allowed in disambiguation hashes and it's short enough that it could be an unrecognized disambiguation hash for the Something class. Treating it as an unrecognized hash and ignoring it would mean that <doc:Something-style> resolves to the Something class—possibly without even raising a warning(?)—but I personally would prefer to have the a diagnostic that help me correct the link so that it resolves to the article that I intended to link to.

This situation is very rare today because the disambiguation can't be too short or too long. However, depending on the definition of an "invalid disambiguation hash" this type of situation could happen more often.

lettieri · October 9, 2023, 7:17pm

Thanks for the feedback! I definitely want to be consistent with other places where we talk about these things. I have made an update that tries to address this throughout — for example, talking about the symbol's full name rather than its signature, and using type signature to refer to the fully expanded version of its name that includes types and generics.

I know that overload can mean different things to different people. I have continued to use the term throughout to avoid getting too verbose, but tried to be more explicit about what I mean here, both the first time I use it, and in the Definitions section, where I refer to it by the more explicit name that you suggest. Hopefully, it's a bit clearer now.

You also say "two symbols with the same name but of different types" and I think "of different kinds" would probably help clarify your intent.

Absolutely right.

lettieri · October 9, 2023, 7:23pm

Thanks for the feedback @ronnqvist!

I have updated the proposal to hopefully better clarify the distinction between disambiguation hashes added for full-name overloads, and the type strings needed to tell apart symbols with the same name but of different kinds in the same scope. You are absolutely correct that this proposal doesn't alter the behavior of the latter.

You're right DocC shouldn't ignore bad links at compile time. I think what I was trying to get at here is that if there is a link in the wild (for example, coming from some third party site) that has an otherwise valid symbol link, but uses an unrecognized hash, possibly because the hash changed after the link was created, or because an overload that used to exist now doesn't, the docs should treat that as if it were a link without any hash and thus do it's best to land on a useful page. But that might be out of scope here for what I'm proposing here. That is, it isn't something DocC can do. So I have removed that sentence.

taylorswift · October 9, 2023, 8:12pm

do we want external links to silently change their target to a completely different symbol that happens to have the same full name? this seems like a kind of silent failure we would rather avoid.

lettieri · October 9, 2023, 9:22pm

That's a fair question that certainly deserves a wider discussion. I personally feel like there's value in serving something reasonable when someone requests a symbol page using a URL that's almost right. What constitutes reasonable and almost right are open questions, of course.

That aside, I'm going to sidestep that discussion for this proposal, and stick to proposing only what to do when a link to an overloaded symbol has:

A valid disambiguating hash — Go to the merged page with the specified overload in focus.
No hash — Go to the merged page with the default overload in focus.

gwendal.roue · October 10, 2023, 7:41pm

I don't know what this means (there are already hashes in DocC, but it looks like you talk about something entirely different which is not well defined at all)
I don't know how to produce one (which tool helps me produce the "valid hash" for the precise symbol I'm targetting?)
Please state clearly that the most basic of the internet rules (cool URIs don't change) is well understood, respected, and has consequences.

ronnqvist · October 10, 2023, 8:26pm

DocC can help produce these hashes for you. If you author a link that is ambiguous, the warning that DocC emits for that link will suggest the possible hashes that you could add and display which each suggested hash corresponds to.

lettieri · October 10, 2023, 8:31pm

I'm referring to the hash that DocC appends to the end of the full name of an overloaded symbol when it constructs a link to that symbol. I talk about this in the Definitions section of the original post. For example, the Option structure in ArgumentParser defines two initializers with the full name init(name:parsing:help:completion:). To create distinct URLs for these two initializers, DocC adds hashes after their names in the URLs:

https://apple.github.io/swift-argument-parser/documentation/argumentparser/option/init(name:parsing:help:completion:)-7slrf
https://apple.github.io/swift-argument-parser/documentation/argumentparser/option/init(name:parsing:help:completion:)-5k0ug

The hashes here are 7slrf and 5k0ug, respectively.

One of the benefits of this proposal is to reduce the amount that authors or anyone constructing a link to your docs needs to interact with these hashes, or even be aware of them.

bbrk24 · October 10, 2023, 9:32pm

How do unavailable overloads play into this? @available(*, unavailable) methods don’t have documentation generated for themselves, but can affect links to available overloads in unexpected ways (example).

gwendal.roue · October 11, 2023, 7:38am

Oh, thanks @ronnqvist. So we're talking about the same hashes we deal with today

DocC can help produce these hashes for you. If you author a link that is ambiguous, the warning that DocC emits for that link will suggest the possible hashes that you could add and display which each suggested hash corresponds to.

I never thought about expanding this warning! Indeed it lists all precise possibilities of "full names"! I no longer have to look for hashes in ~/Library/Developer/Xcode/DerivedData/.../MyModule.doccarchive/data/documentation :-)

gwendal.roue · October 11, 2023, 7:41am

Thank you @lettieri. For some obscure reason I thought that the proposal was aiming at grouping together more methods than the methods that are identical excluding the disambiguating hash. Sorry for my mislead comments.

Joseph_Heck · October 23, 2023, 4:03pm

Thanks @lettieri, I think this would be a notable improvement to overall documentation quality in the most general sense. The current, default configuration of DocC output for many of the projects I'm working on generates a lot of individual pages that have relatively minimal value in themselves. Collapsing those into a page would be great.

This proposal does acerbate a minor issue that's been growing with recent hash updates and disambiguation practices - it's getting harder to come up with a discrete list of all the symbols that a Package provides. I use this when I hit a package at the start, or when I'm trying to broadly organize the content. This adds just a bit more that makes getting that list harder, since it's collapsing disambiguated symbols. (They days of just looking at the indexed symbols appears to have dropped away)

I DO want to see this proposal fully implemented, regardless of that extra difficulty, as I think it'll significantly improve the signal-to-noise ratio of the content in individual pages.

taylorswift · December 19, 2023, 10:54pm

i’ve been investigating this sporadically in my free time and i’d like to solicit some more discussion about what kinds of declarations actually qualify as overloads for the purposes of doc page consolidation.

as i was prototyping an implementation of this in Unidoc, i uncovered some interesting edge cases that could complicate this idea, and i’m interested in what everyone’s thoughts are on how they should be handled.

default implementations

default implementations are always lexical “overloads” of the protocol requirement(s) they witness, but doc pages for protocol requirements are qualitatively very different from doc pages for non-requirements. for example, protocol requirements have Overrides Requirement In and Implemented By tables, but default implementations have Implements Requirement In tables, etc.

the requirements themselves can also be overloaded, which means the reader could potentially be presented with a page containing multiple protocol requirements and multiple default implementations and no intuitive way of understanding which implementations are associated with which requirements.

generic constraints

declarations with different extension constraints ought to be considered “overloads” of one another, but merging extension constraints on the basis of overloading seems problematic to me.

there is also the issue of synthesizing See Also topic groups; currently given some API like the following:

extension A<Int>
{
    func f(x:String) 
    func g() 
    func h() 
}
extension A<Double>
{
    func f(x:String) 
    func f(x:Substring) 
    func g() 
    func h() 
}

Unidoc will automatically synthesize a See Also section for each symbol containing the other members in the extension it appears in.

# func f(x:String) where T == Int

## See Also
- func g()
- func h()

# func f(x:String) where T == Double

## See Also

- func f(x:Substring)
- func g()
- func h()

# func f(x:Substring) where T == Double

## See Also

- func f(x:String)
- func g()
- func h()

this is helpful because you always know that the API in the generated section is available wherever the main symbol is available.

however, if we were to combine the three overloads into a single page, the generated sections become a lot more confusing to the documentation reader.

as one option, we could forget about generics entirely and merge/deduplicate the related API:

# func f(x:String) where T == Int
# func f(x:String) where T == Double
# func f(x:Substring) where T == Double

## See Also
- func g()
- func g()
- func h()
- func h()

but imo, that would make the See Also less useful, because it is mixing together symbols from different generic contexts.

finally, this doesn’t really fit anywhere, but i think we also need to be mindful of pathological APIs where consolidating overloads might create extreme conditions. one example is SwiftSyntax, which has a lot of overload families with hundreds of members. for example, SyntaxVisitor which has two groups of overloads with 250+ members each. i don’t know if concatenating 250 variants of this page into a single megapage you would have to scroll through is going to be an improvement from what we’re currently using.

dabrahams · December 21, 2023, 4:46am

This is a hugely important feature. The problem with overloads for users is that they have to read separate documentation for each one, and mentally try to factor out the commonality and distinguish the small differences, so that they can decide which of several functions to call. Anything you can do to mitigate that cost is fantastic.

Two things worry me, though:

How you've defined "overload" will keep you from unifying many of the most problematic cases, some of which have different numbers of arguments or variations in the argument labels. The most obvious examples are the moral equivalent of one function with some defaulted arguments (e.g. what you find in Cocoa because ObjC doesn't have defaulted arguments). But that is by no means the whole set.
The "without writer intervention" premise. If you can't write one piece of documentation that applies to the whole overload set, the consolidated page will be nearly as bad as having separate pages. The hundreds of implementations of == that satisfy Equatable shouldn't be separately listed or separately-documented. The conformance of each X to Equatable should be documented, and the semantics of == should be documented once. And if there is no unifying protocol for your set of overloads that differ only in their parameter and/or return types, assuming these functions have some commonality, they should still be documented once on the page where the overloads are listed (and if they don't they probably should have been named differently!)

taylorswift · December 21, 2023, 5:00am

assuming you would like all symbols sharing the same base name to be considered overloads of one another, how would you envision enabling the behavior? is this something that would be configured globally across a project?

dabrahams · December 27, 2023, 5:00pm

See my point #2. A good result demands writer intervention. You could just make it a part of the documentation syntax, so the writer has a way to tell you which declarations a doc applies to.