Hi all,
I'd like to pitch an enhancement to symbol links in Swift-DocC to add a more human readable alternative for link disambiguation based on function signatures.
Introduction
To link to a symbol in DocC, you write the path to the symbol wrapped in a set of double backticks (``), for example ``SlothCreator/Sloth/eat(_:quantity:)``
. If two symbols have the same name at a certain scope, the link is ambiguous and need to be disambiguated somehow.
Today, Swift-DocC supports two types of link disambiguation; symbol kind disambiguation (for example ``Color/red-property``
) and symbol hash disambiguation (for example ``Sloth/update(_:)-4ko57``
). See Formatting Your Documentation Content for more information about the current link disambiguation alternatives.
A link with a symbol kind disambiguation can be read and understood by a developer but a link with a symbol hash disambiguation is incomprehensible to people. For example, it's not clear which of these two functions that update(_:)-4ko57
refers to.
/// Updates the sloth's power.
///
/// - Parameter power: The sloth's new power.
mutating public func update(_ power: Power) {
self.power = power
}
/// Updates the sloth's energy level.
///
/// - Parameter energyLevel: The sloth's new energy level.
mutating public func update(_ energyLevel: Int) {
self.energyLevel = energyLevel
}
Symbol kind disambiguation works well when the ambiguous symbols are types, properties, or enum cases but doesn't help to distinguish between functions, operators, subscripts, or initializers where the symbol name is overloaded to accept different parameters or have different return values. Today, these need to use symbol hash disambiguation instead.
Symbol hash disambiguation work in IDE environments where the tooling can generate the hash disambiguation but they're very difficult to work with outside of IDE integrations and the resulting link text—no matter how it was created—is incomprehensible to a person. This, for example, makes it hard to know what symbol a link refers to when reviewing a documentation pull request.
Proposed Solution
As a human readable and human writable alternative to symbol hash disambiguation, I propose that we add support for disambiguating symbol links with "some amount of function signature information". This information is already available in symbol graph files as the "functionSignature" mix-in.
The syntax I propose for this is based how function signatures are written in Swift and many other languages; (ParameterTypes) -> ReturnTypes
. Parameter types or return types that are the same across multiple symbols can be written as "_" to avoid needing to write the full function signature. If only the return type information or only the parameter type information is sufficient to disambiguate the link, the other part of the type signature may be omitted.
With this syntax, the two update(_:)
functions above could be written as Sloth/update(_:)-(Power)
and Sloth/update(_:)-(Int)
indicating that they each take one parameter and that it is a Power
value in the first link and an Int
value in the second link. Like other link disambiguation, the symbol name and the disambiguation would be separated by a dash ("-").
Detailed Design
To get a better sense for this syntax, let's look at an example with more parameters and more return values. Consider this Swift function:
func something(one: Int, two: String, three: Bool) -> (Double, Int) { ... }
If there's another function with the same name but a single return value -> Double
, then the two functions can be disambiguated with ->(_,_)
and ->_
to indicate that the link returns either a two-element tuple or a single value. The developer could also fill in one or more of the types in the disambiguation, writing it like ->(Double,_)
, ->(_,Int)
, or ->(Double,Int)
to be more explicit. Similarly, the second link could also be written like ->Double
, ->(_)
or ->(Double)
.
If instead the other function has the same return types but its second argument was a Substring
instead of a String
, then those two functions can be can be disambiguated with -(_,String,_)
and -(_,Substring,_)
to indicate the type of the 2nd parameter. Like with tuple return values, the developer could chose to fill in the two unspecific types if they prefer to be more explicit.
Parameters would always be written surrounded by parenthesis, even if there's only one parameter. Parenthesis around the return types is optional when there's only one return type.
This new syntax would be supported in symbol links (and documentation links referring to symbols) but DocC wouldn't use type signature disambiguation in file names or web URLs.
In diagnostics for ambiguous links, DocC would suggest the minimal necessary disambiguation, preferring to disambiguate links in this order;
- kind
- return type(s)
- parameter types
- hash
Return types are suggested before parameter types because functions often return fewer values than they have parameters.
It would be supported to specify both parameters and return values, but the parameters must be specified before the return values. This restriction is to help with readability.
An early implementation of this experimental support can be found in this PR. The actual implementation—once merged—would exist behind a feature flag until we're confident in its quality and in the overall direction.
In that implementation, the link can use either kind and hash disambiguation or type signature disambiguation. This restriction is arbitrary, and could be lifted, but there shouldn't be necessary for link disambiguation to mix the two. If a piece of information doesn't disambiguate the link it can simply be omitted, leaving only the pieces that do disambiguate the link.
This isn't a replacement for symbol hash disambiguation. DocC will continue do support that as the fallback when this syntax can't disambiguate a link or when the developer prefers it for its exactness. The goal is to not need to use symbol hash disambiguation in many common cases but it's not a goal for this syntax would be able to fully describe and disambiguate every function. There's a trade-off between ease-of-use in the common case and richness and wide support.
Open Questions:
What's the right trade-off between brief disambiguation and preciseness of type information?
The implementation so far doesn't have syntax for parameter types or return types that are arrays, optionals, dictionaries, or generics. For example, a Range<Int>
parameter is currently spelled as Range
.
These could be added over time as non-breaking future syntax enhancements to allow developers to use the function type disambiguation to be used on more cases. We could also reverse this and start by always specifying the full generic type name and add syntax simplifications over time.
This question also applies to other languages, for example C or Objective-C, regarding how pointer parameters are spelled. Would developers expect to refer to a string parameter/return type in Objective-C as NSString
or NSString*
How should closure types be spelled?
The implementation so far doesn't have syntax for parameters or return types that are closures. It's not clear to me if a closure type parameter disambiguation would be too complex looking to be readable and writable. For example, a hypothetical disambiguated link for reduce(_:_:)
could be written as
reduce(_:_:)-(_,(Result,Element)->Result)
. Would such a syntax be discoverable enough that a developer could write it without IDE support?
Alternatives Considered
We considered a few different syntax alternatives. The main alternative spelled the parameter types inline with the function name, for example; Sloth/update(_:Power)
. However—ignoring the difficulties of paring this syntax—we found that it wasn't always easy to know if a word referred to the type of the previous parameter or the name of the next parameter. Adding the parameters inline could also lead to an inconsistent syntax for return values—which comes last in a Swift declaration but first in a C or Objective-C declaration.
Instead of simplifying the spelling of generic parameter or return types to only the wrapping type (for example Range
), we considered always requiring the full spelling (for example Range<Int>
). This would allow the current implementation to work in more cases at the cost of brevity.
Another alternative syntax we considered uses both the argument label and parameter name, for example Sloth/update(_ power:)
. Both this syntax and the inline-parameter-type-syntax raised questions about how the syntax would look if it wasn't consistently to all parameters and how it would apply to other languages. We considered how IDEs could improve the legibility of these syntax variations but the goal of the syntax is to be readable without IDE features.
We also considered using another characters than "-" do separate the type signature disambiguation from the symbol name but didn't find another good separator character. Using :
would read well when the full type signature is written but could look a bit odd when only the parameters or only the return value was specified.
Specifying only the minimal amount of return types or parameters types has a risk of an unambiguous link becoming ambiguous in the future when another symbol is added. This can be avoided by always spelling out all the parameters or even the entire signature, but that makes the disambiguation much longer. Ultimately, I feel that a developer who wants a future proof disambiguation can chose to use symbol hash disambiguation instead.
I'm looking forward to your feedback on the proposed syntax.
– David