Unified documentation links

taylorswift · October 29, 2023, 5:39am

i know the markdown symbol link format isn’t technically under the purview of evolution, but i figure it’s the best place for this since it is sort of proposing a standard.

Unified documentation links

author: @tayloraswift
status: unimplemented

This document describes the design of the unified codelink format.

Motivation

There are lots of different codelink formats currently in use across the Swift ecosystem, including the current Unidoc codelink format. Many of these formats are incompatible with each other.

Although the Unidoc format provides many features unavailable in DocC, developers are sometimes reluctant to use it because it is not compatible with DocC. Recent changes to the DocC codelink format have resolved some of the issues that originally motivated a different syntax, which makes it possible for us to design a unified codelink format that is more compatible with DocC.

Overview of existing formats

DocC symbol links

DocC uses the term symbol link to refer to what we call codelinks.

DocC symbol links are encapsulated with two backticks, and use the / character as the path separator. Here are some examples of DocC symbol links:

/// ``Int``
/// ``Array/count``
/// ``Unicode/Scalar``

DocC symbol links resemble URLs, and many aspects of their design were once motivated by the desire to make them usable as URLs. One consequence of this is that the original DocC symbol link format was case-insensitive with respect to disambiguation. (DocC symbol link casing must still match the original symbol casing.)

URL-symbol link equivalence was eventually abandoned, and the DocC symbol link format was declared by fiat to be fully case-sensitive around 2022. The DocC format is unversioned, so this ended up creating even more confusion about the behavior of DocC symbol links.

Codelink disambiguation

DocC symbol links can be disambiguated by phylum or by symbol hash. Valid DocC phyla are:

swift.associatedtype
swift.enum
swift.enum.case
swift.class
swift.func
swift.func.op
swift.var
swift.deinit
swift.init
swift.method
swift.property
swift.subscript
swift.macro
swift.protocol
swift.struct
swift.typealias
swift.type.method
swift.type.property
swift.type.subscript

Symbol hashes are 24-bit integers obtained by applying a bespoke variant of the FNV-1 hash function to the symbol’s ABI name. They are conventially written in lowercase base-36 without leading zeros.

It is theoretically possible for a hash to resemble a string such as func. Therefore, the swift. prefix is not redundant, but necessary to distinguish a hash from a phylum.

It is theoretically possible for a symbol link to use both disambiguation mechanisms, although this is almost never needed, because the hash disambiguator is almost always sufficient on its own.

DocC symbol links use the - character to denote the beginning of a disambiguation suffix. This was originally motivated by the desire to make symbol links usable as URLs, but makes the parsing of DocC symbol links challenging.

/// ``Sequence-swift.protocol``
/// ``Sequence/joined(separator:)-7w47r``
/// ``Sequence/joined(separator:)-swift.func-7w47r``

When encoded in URLs, this format composes badly with other URL features such as trailing slashes, and can create ambiguity of its own because - is also a valid Swift operator character.

Codelink vectors

DocC symbol links have no support for link vectors. When successfully resolved, only the final component appears in the rendered documentation. Documentation writers often work around this limitation by concatenating multiple symbol links together, which is cumbersome and error-prone.

/// ``Unicode`` `.` ``Unicode/Scalar`` `.` ``Unicode/Scalar/value``

Codelink namespacing

DocC symbol link namespacing is explicit and denoted by a leading slash, which is followed by a module name.

/// ``/Swift/Int``

While this simplifies the symbol resolution algorithm, it can create fragility when developers move symbols between modules, or rename modules. It is also inconsistent with the implicit namespacing used in Swift source code.

Developers sometimes mistakenly qualify symbol links with the wrong module name, which is particularly common when modules re-export symbols from other modules, or when the module is a “hidden” module such as _Concurrency.

Unidoc codelinks

Because this document proposes a new Unidoc codelink format, we will refer to the existing Unidoc codelink format as the V3 format.

Like DocC codelinks, Unidoc V3 codelinks are encapsulated with two backticks, but they use the . character as the path separator. Here are some examples of Unidoc codelinks:

/// ``Int``
/// ``Array.count``
/// ``Unicode.Scalar``

Unidoc codelinks were designed without any consideration for URLs, and were intended to match the lexical format of Swift source identifiers as closely as possible.

Using . avoids some of the parsing ambiguities inherent in the / path separator. For example, the DocC symbol link ``UInt128//==(_:_:)`` might refer to the tree entity ["UInt128", "==(_:_:)"], or it might refer to the tree entity ["UInt128", "/==(_:_:)"].

Codelink disambiguation

Like DocC symbol links, Unidoc V3 codelinks can be disambiguated by phylum or by symbol hash.

Phylum disambiguation is denoted by prefixing the symbol name with swift keywords. Some examples include:

/// ``struct Int``
/// ``var Array.count``
/// ``static var Int.bitWidth``

This format is extremely handy when disambiguating enum cases, but it suffers from nasty edge cases when mixed with vector syntax. It also implies some strange syntax for subscripts, for example, class Foo.Bar.subscript(_:).

Unidoc V3 symbol hashes appear in brackets at the end of the codelink. For example:

/// ``Sequence.joined(separator:) [7W47R]``

The base-36 digits are case-insensitive.

Codelink vectors

All Unidoc V3 codelinks are vectors by default. Therefore, when you write ``Unicode.Scalar``, you actually get the vector ["Unicode", "Scalar"]. Since this is not always desirable, Unidoc V3 codelinks allow you to trim leading components from the vector by using the space character ( ) as a path separator.

/// ``Dictionary Keys.contains(_:)``

This causes problems when combined with phylum disambiguation, because the hidden component can resemble a keyword. For example, ``struct Int`` might refer to a struct named Int, or it might refer to a nested declaration named Int inside a type named struct.

To address this problem, Unidoc V3 codelinks use backticks to escape keywords. For example,

``struct Int``

refers to a struct named Int, while

`` `struct` Int``

refers to a declaration named Int inside a type named struct.

This syntax confuses markdown parsers, and is difficult to read. It also introduces some subtle inconsistencies with the way some contextual keywords such as actor behave in source code. The consequence of all this is a syntax that is deceptively intuitive, but is actually incredibly complex and hard to understand.

Codelink namespacing

Unidoc codelink namespacing is implicit, and behaves similarly to namespacing in Swift source code. For example, ``Int`` can refer to the Int type in the current module, while ``Swift.Int`` refers to the Int type in the Swift module. However, if there is no type named Int in the current module, then ``Int`` itself can be used to reference the standard library Int type.

Namespaces are module-level. Any module that is a dependency (direct or indirect) of the current module can contribute symbols to the current module’s namespace.

As in source code, there is no way to reference a module with the same name as a type in the current module.

Swiftinit URL format

The Swiftinit URL format is a DocC-like link format that uses the . character to discriminate the letter case of the final path component. This allows the format to be case-insensitive, while reducing the frequency of path collisions. Accordingly, Swiftinit can assign paths such as /dictionary/keys and /dictionary.keys to symbols that would otherwise require disambiguation.

Swiftinit hashes appear as query parameters; therefore the DocC symbol link ``Sequence/joined(separator:)-7w47r`` would be roughly equivalent to /sequence.joined(separator:)?hash=7W47R in the Swiftinit URL format. The Swiftinit URL format otherwise behaves similarly to the DocC symbol link format.

Proposed unified codelink format

The proposed unified codelink format is a superset of the DocC symbol link format. This allows developers to incrementally adopt advanced features while reducing the amount of reformatting needed to upgrade existing documentation.

Case sensitive

The unified codelink format is case-sensitive. This is consistent with the behavior of Swift identifiers, and helps us avoid the complexity of a case-insensitive grammar.

Path separator

The unified codelink format uses the . character to join vector links. The / character is also a valid path separator, but only components that appear after the last / will appear in the rendered documentation. This means that prior to the last /, the two path separators are interchangeable.

Codelink	Renders as
``Unicode.Scalar.value``	`Unicode.Scalar.value`
``Unicode/Scalar.value``	`Scalar.value`
``Unicode.Scalar/value``	`value`
``Unicode/Scalar/value``	`value`

Trailing slashes are forbidden.

Multiple consecutive path separators are forbidden. This prevents ambiguity with custom operators.

Codelink	Renders as	Parses as
``Real...(_:_:)``	`Real...(_:_:)`	`["Real", "..(_:_:)"]`
``Real/..(_:_:)``	`..(_:_:)`	`["Real", "..(_:_:)"]`
``Real....(_:_:)``	`Real....(_:_:)`	`["Real", "...(_:_:)"]`
``Real/...(_:_:)``	`...(_:_:)`	`["Real", "...(_:_:)"]`
``Real./(_:_:)``	`Real./(_:_:)`	`["Real", "/(_:_:)"]`
``Real//(_:_:)``	`/(_:_:)`	`["Real", "/(_:_:)"]`

If a path component begins with an operator character, all subsequent . characters are treated as part of the operator name.

Codelink	Renders as	Parses as
``Real../.(_:_:)``	`Real../.(_:_:)`	`["Real", "./.(_:_:)"]`
``Real/./.(_:_:)``	`./.(_:_:)`	`["Real", "./.(_:_:)"]`

Disambiguators

The unified codelink format supports both phylum and hash disambiguators. Both disambiguators appear in brackets at the end of the codelink. To prevent ambiguity, the hash disambiguator is always written in uppercase base-36.

Codelink	Disambiguator
``Fake [struct]``	`phylum = struct`
``Fake [STRUCT]``	`hash = STRUCT`

The space before the opening bracket is mandatory. Spaces can appear inside the brackets, if the corresponding swift keyphrase contains a space.

Codelink	Disambiguator
``Fake.max [class var]``	`phylum = class var`
``Fake.subscript [class subscript]``	`phylum = class subscript`

All keywords must be present; the [func] disambiguator always selects a global or instance function, and never a class or static function. Similarly, you cannot use [class] to select a class member of any phylum, it will only ever match a non-actor class type.

The [actor] disambiguator is the only disambiguator that can match an actor type.

There is no disambiguator for operators; operators always use one of the [func] or [static func] disambiguators.

Backticks are never used to escape keywords. Therefore, in rare cases the [subscript], [deinit], or [init] disambiguators may be needed despite the keyword also appearing in the symbol name.

Codelink	Disambiguator
``Fake.subscript [subscript]``	`phylum = subscript`
``Fake.subscript [case]``	`phylum = case`
``Fake.init [init]``	`phylum = init`
``Fake.init [case]``	`phylum = case`

To aid searchability, the [let] disambiguator is forbidden; all such properties use the [var] disambiguator instead.

Backwards compatibility

For backwards compatibility with DocC, the unified codelink format also supports hyphen-prefixed disambiguators.

The legacy disambiguators behave the same way they do in DocC. This means some legacy disambiguators express filters that do not exist among the modern disambiguators:

Legacy disambiguator	Modern equivalent
`-swift.associatedtype`	`[associatedtype]`
`-swift.enum`	`[enum]`
`-swift.enum.case`	`[case]`
`-swift.class`	no equivalent
`-swift.func`	no equivalent
`-swift.func.op`	no equivalent
`-swift.var`	no equivalent
`-swift.deinit`	`[deinit]`
`-swift.init`	`[init]`
`-swift.method`	no equivalent
`-swift.property`	no equivalent
`-swift.subscript`	`[subscript]`
`-swift.macro`	`[macro]`
`-swift.protocol`	`[protocol]`
`-swift.struct`	`[struct]`
`-swift.typealias`	`[typealias]`
`-swift.type.method`	no equivalent
`-swift.type.property`	no equivalent
`-swift.type.subscript`	no equivalent

Trailing parentheses

The unified codelink format always ignores empty trailing parentheses. This means it is possible to refer to a property named x with a codelink spelled ``x()``, even though it could never be called that way in source code.

Namespacing

The unified codelink format uses implicit namespacing, and behaves similarly to namespacing in Swift source code. However, unlike Swift source code, it also supports explicit namespacing using the / prefix.

A / character followed by a single identifier is treated as a module name, and resolves to module-level documentation, if any exists.

A unified codelink cannot start with multiple consecutive / characters. It is possible to force the appearance of the module name in a vector link by using the . separator.

Codelink	Renders as
``/Swift/Int``	`Int`
``/Swift.Int``	`Swift.Int`

filip-sakel · October 29, 2023, 3:24pm

What’s the rationale for this? It seems confusing to refer to a property as x().

Overall great proposal! I’m really excited to be able to disambiguate symbols more easily!

taylorswift · October 29, 2023, 9:45pm

there is no rationale for this, it is just a potentially-unexpected consequence of allowing empty trailing parentheses in general. the preferred spelling of such codelinks is still ``x``, and i expect most style guides would prohibit the ``x()`` spelling.

prohibiting empty trailing parentheses for properties at the parser level means that empty trailing parentheses would effectively create a new type of implicit disambiguation filter, which cannot be expressed the normal way using keywords, and also cannot be expressed using any of the legacy DocC disambiguators.

keep in mind that there are already ways to write codelinks that form expressions that couldn’t possibly exist in source code. for example, you can write ``Foo.subscript(_:)`` to reference a subscript, but you cannot actually call a subscript that way.

taylorswift · November 3, 2023, 3:10am

hi all, there is now an implementation of the proposed link format available in swift-unidoc 0.3.14. this release also includes new resources for how to preview multi-target documentation locally.

the proposal has also been updated with future directions.

Future directions

Codelinks to overload families

There is some interest in enabling the codelink format to refer to an entire overload family, rather than a single declaration. This would obviate the need to use hashes in many situations.

We could extend the proposed codelink format to support an explicit syntax for referencing an overload family as a whole. For example, we could use the * character within disambiguation brackets.

/// ``Sequence.joined(separator:) [*]``

ronnqvist · November 14, 2023, 3:04pm

I couldn't attend the Documentation Workgroup meeting where this was discussed but from what I've been told there was general agreement that these individual pieces would be better discussed as separate pitches. I agree with that assessment so I'll reply to each piece as it relates to the DocC link syntax today.

I'll add a description of the DocC link syntax at the end for additional context and to highlight some features that are relevant to these proposed link syntax changes.

Path separators

It feels like a nice refinement to parse consecutive path separators as a single separator and include the subsequent separators in the name of the following path component. This avoid the need to escape the path separator in the most common cases, for example when linking to Swift's division operator.

It's still possible that a separator character occurs in the middle of a function or operator name. For example, the symbol name for this custom operator would be +/-(_:_:) which DocC can't link to today and the proposed syntax doesn't solve. I consider this to be a bug that would best be solved by supporting escaped separators (for example using a backslash).

public extension Int {
    static func +/- (lhs: Int, rhs: Int) -> (added: Int, subtracted: Int) { ... }
}

Disambiguation

Adding support for disambiguation written within as SymbolName [HASH] in addition to SymbolName-hash feels like a reasonable change that could make it easier for developers to utilize multiple documentation tools in parallel.

However, using Swift specific keywords instead of symbol kind identifiers defined in Symbol Kit may prove problematic for links in other languages.

It's not clear to me what's meant by "all keywords must be present". Is this referring to all keywords in the declaration of that symbol or to the minimal amount of Swift keywords necessary for that type of declaration. Needing to specify all keywords from the declaration of that symbol pose two problems:

These Objective-C method declarations contain no keywords that can be used to disambiguate the instance method from the class method:

@interface MyClass : NSObject
/// An instance method.
- (id)something;
/// A class method.
+ (id)something;
@end

This Swift declaration contains 5-7 keywords (depending on if you include "public" access modifier and the "throws" keyword that applies to the initializer's parameter) and the leading keywords can be ordered in many different ways:

public class Something {
    public required nonisolated convenience init(_ perform: () throws -> Void) async rethrows { ... }
}

On the other hand, mapping Symbol Kit symbol kind identifiers into other identifiers (for example, using "var" for both properties and instance variables) may have issues in languages other than Swift. This Objective-C class definition has both an instance property and an instance variable called "name". If both these have the same kind identifier they instead need to be disambiguated by their hashes.

@interface Person : NSObject {
    NSString *name; // a variable
}
@property(copy) NSString name; // a property
@end

As DocC adds support for additional languages we may run into more of these problems where that language has collisions that's not covered by the reduced set of symbol kind identifiers.

Case sensitive

This has no impact on DocC.

Trailing parenthesis

I'm not sure what problem this is solving. From my perspective it just adds ambiguity to links that wouldn't otherwise be ambiguous. For example an Objective-C class with a property and a method or a Swift type with a property and a static function both result in descendant path components named something and something() :

@interface MyClass : NSObject
@property NSInteger something; // a property
- (void)something:(NSInteger)argument; // a method
@end

public struct MyStruct {
    public var something: Int
    public static func something() {}
}

Assuming the exact spelling is preferred over the extra trailing parenthesis, this could result in a situation where

The property is added first and some links to the property are written with trailing empty parenthesis. These links resolve without warning.
The method is added later. Now the links with trailing parenthesis that used to resolve to the property resolve to the method instead. There is no warning highlighting this issue to the developer.

This is different from how symbol overloads and other link collisions work where adding a new symbol that results in a collisions results in a warning requiring links to be update to unambiguously refer to a specific symbol.

Namespacing

The main reason for requiring a leading slash to refer to a symbol in another module in DocC is about clarity while reading the raw link in source. For example, imagine that this namespacing was implicit and you encountered these two links in some documentation markup:

Swift/String
Swift/Sequence/partitioned(by:)

Reasoning about what symbols these two links refer to becomes harder than it may seem at first.

If the current module has one or more public^[1] extensions to the Swift's String type then Swift/String refers to the local page listing the extensions that the current module adds to Swift's String type. It's also possible, but unlikely, that the current module has a local symbol named "Swift" with a nested symbol called "String". If the current module has both a local "Swift/String" symbol and extends Swift's String type this link would be ambiguous, requiring "-struct" or "-struct.extension" disambiguation to uniquely refer to either the local type or the extension page.

If the current module has a dependency on Swift Algorithms then the second link refers to the partitioned(by:) function added to Swift's Sequence protocol in a public extension in Swift Algorithms (assuming the current module doesn't have such an extension).

We like to avoid a situation where a link resolves to one symbol without warnings and then after some project changes refer to another symbol without warning .

DocC could support implicit namespacing and handle collisions with external symbols the same as collisions within the current module, requiring "-struct" or "-struct.extension" disambiguation. This would solve the issue of ambiguous links but wouldn't help the developer reason about what symbol a link refers to.

It may also require the developer to disambiguate links like String-struct or Collection-protocol if their projects also extends those types or links like MyClass-2b5dq if one of their project's dependencies also has a public class with the same name (since both are classes they need to be disambiguated by hashes).

I personally like that I can look a link in DocC without a leading slash and know that it refers to some symbol in the current module.

Vector links

I haven't heard the term "vector link" before and couldn't find any DocC feature requests describing the type of issues that it aims to solve but after doing a bit of research on the issue I find this to be an interesting problem with a few possible alternatives to consider.

I searched GitHub for occurrences of concatenated symbols links and found 6 projects doing this: "Actomaton", "JivoSDK", "KeyboardKit", "MIDIKit", "react-native-custom-keyboard", and "SpotifyAPI".

Some of these projects use concatenated symbol links where each link adds another component to the previous link, for example:

``KeyboardContext``.``KeyboardContext/preview``
``Effect``.``Effect/init(id:sequence:)``
``SpotifyAPILogHandler``.``SpotifyAPILogHandler/bootstrap()``

Some instead concatenate distinct symbol links of properties where later links refer to a member of the previous properties type, for example:

``MIDIManager/endpoints```.``` MIDIEndpoints/outputs``
``Jivo``.``Jivo/session``.``JVSessionController/shutDown()``

If the goal is to create a single link that displays multiple path components developers can accomplish that today using []() syntax, for example [MyClass.myProperty](doc:MyClass/myProperty) . This isn't great since parts of the link is repeated but it is flexible in that the text can be freely customized, for example: [The myPropertyproperty onMyClass](doc:MyClass/myProperty) .

One alternative that could be interesting to discuss would be the ability to globally customize how much of a links should be displayed on the page and/or what the default should be. I could see the argument for displaying the containing type name for properties, methods, enum cases etc. unless the link is from the containing type's scope.

If we define a new syntax for customizing how a symbol link displays on the page I would like to see if there are syntax alternatives that can accommodate other types of link display configuration as well. I think there's a lot to explore here. For example, one feature that we want to support in DocC but don't know the right syntax for is "inactive links"; resolving the link to get the correct symbol name and warn if the link doesn't resolve but render the symbol name in "code voice" without making it a clickable link. This is different from directly putting the symbol in code voice because it allows for the symbol to display its language specific name when switch between Swift and Objective-C version of the page that contains the inactive link. I could also imagine cases when it'd be nice to link to a function and display its name but truncate its arguments.

Linking to overload groups

The Improving the presentation of overloaded symbols in Swift DocC proposal suggested not using links with no disambiguation (or only symbol kind disambiguation where necessary) for overload pages. For example, this class would have two overload pages that can be linked to using ``something()-method`` and ``something()-type.method`` respectively.

public class Something {
    public func something() -> Int { 0 }
    public func something() -> String { "" }

    public static func something() -> Int { 0 }
    public static func something() -> String { "" }
}

I don't see a need to add additional syntax for links that are already unambiguous.

For some of these changes where there's more to discuss it may be easier to create new threads to continue talking about each piece separately instead of having multiple ongoing conversations in the same thread.

How links work in DocC

The DocC link syntax is documented both in the Link to Symbols and Other Content section of the DocC documentation about its documentation markup (aimed towards people using DocC) and in the Linking Between Documentation page of the SwiftDocC framework documentation (aimed towards SwiftDocC contributors) so I'll try to not repeat too much about what's already documented in those places.

DocC supports two types of documentation links:

Symbol links; a symbol path surrounded by two grave accents on each side: MyClass/myProperty
General documentation links; markdown links with a "doc" scheme: <doc:MyArticle> or <doc:MyClass/myProperty>.

Symbol links can only link to symbols but general documentation links can link to all types of documentation content: symbols, articles, and tutorials. Both symbol links and general documentation links use a "path" in the documentation hierarch using forward slashes ("/") to separate each path component. Symbol links only consist of a "path" but general documentation links can also include a URI fragment to reference an on-page element (for example to reference tutorial sections or article headings) and a URI host which is almost never used.

doc://com.example/path/to/documentation/page#optional-fragment
      ╰────┬────╯╰────────────┬────────────╯ ╰───────┬───────╯
       bundle ID    path in docs hierarchy    on-page element

This means that there are a few different syntax alternatives for referencing a symbol in DocC:

``MyClass/myProperty``
<doc:MyClass/myProperty>
[](doc:MyClass/myProperty)
[Arbitrary text](doc:MyClass/myProperty)

Symbol links can't resolve on-page elements so if a developer writes MyClass/myProperty#Name-of-some-heading they'll get a warning with a fixit to use a <doc:> style link instead.

The link syntax in DocC isn't specific to Swift. Any language that can describe its symbols and their relationships in a symbol graph file can be used in DocC. In addition to the Swift compiler emitting symbol graph files, Clang is capable of emitting symbol graph files for C and Objective-C code.

DocC uses the symbol spelling for each language based on the data in the symbol graph files. In this example an Objective-C class links to its instance method using a symbol:

/// ``doSomethingWithFirst:second:``
@interface MyClass : NSObject
- (void)doSomethingWithFirst:(NSString *)first
                      second:(NSString *)second;
@end

Symbols that have representations in multiple languages can use either language's symbol spelling (although the spellings need to be consistent throughout the path). Regardless of which language's symbol spelling is used in the link, the rendered page will display the name of the symbol in the source language that the page is being displayed in. In this example a Swift class with custom Objective-C names links to its instance method using both the Swift spelling and the Objective-C spelling:

/// ``MyClass/doSomething(with:and:)``
/// ``TLAMyClass/doSomethingWithText:andNumber:``
@objc(TLAM
public class MyClass: NSObject {
    @objc(doSomethingWithText:andNumber:)
    public func doSomething(with text: String, and number: Int) -> Bool { ... }
}

If a link could ambiguously refer to more than one page, DocC needs additional disambiguation to make the link unique.
Disambiguation can be added at any path component that makes the link unique and is added to the end of that path component separated by a dash ("-"). Multiple disambiguation suffixes (and redundant disambiguation in general) is supported.

DocC currently supports two disambiguation kinds: "symbol kind" and "symbol hash" disambiguation. Two more disambiguation alternatives ("return type(s)" and "parameter types") have been pitched but are still being implemented.

If a symbol has a different type from the other symbols with the same symbol path, you can use that symbol’s type suffix to disambiguate the link and make the link refer to that symbol. Symbol kind disambiguation can include a source language identifier prefix but it's not needed.

/// ``red-property`` 
/// ``red-type.property``
public struct Color {
    public var red, green, blue: Double

    public static let red = Color(red: 1.0, green: 0.0, blue: 0.0)
}

If the colliding symbols are of the same kind the link needs to be disambiguated with a symbol hash instead. The symbol has is a folded FNV1 hash of the symbols unique identifier, as already explained in the original post.

In the extremely unlikely case where a symbol needs to be disambiguated by its symbol hash and that symbol's hash disambiguation spells out one of the 4 or 5 letter symbol kind disambiguations, the parsing ambiguity can be resolved by disambiguating with both a symbol kind and symbol hash. For example, if someFunction() was overloaded and the hash for one of the overloads was "enum" that overload could unambiguously be referenced using ("-func" because the symbol is a function and "-enum" because it's that symbol's hash)

/// ``SomeClass/someFunction()-func-enum

Sometimes it's preferable to disambiguate at an earlier path component to use a symbol kind disambiguation instead of a hash disambiguation. In this example both symbol links refer to the same symbol but only the first link can be understood by a human.

@protocol Something <NSObject>
- (void)something;
@end

@interface Something : NSObject<Something>
- (void)something;
@end

/// ``Something-class/something``
/// ``Something/something-4f2sm``
@interface SomeOtherSymbol : NSObject
@end

Links in DocC are made to resemble URLs for their familiarity to developers but a link in DocC is not the same as the web URL to that page. You can link to the less-than operator in Swift using /Swift/Comparable/<(_:_:) but the path of its web URL is /documentation/swift/comparable/_(_:_:)-9jp4d . Similarly, a symbol with representations in multiple languages can be linked to using either language's spelling (for example doSomething(with:and:) or doSomethingWithText:andNumber: but the path of that page's web URL is the same (preferring there Swift spelling in the web URL) regardless of which language's spelling the developer used in the link.yClass)

Depending on the access level of symbols included in the symbol graph files this could also include extensions of other access level, mainly internal extensions. ↩︎

taylorswift · November 15, 2023, 4:34am

hi David, thanks for taking the time to review the draft.

this is still unambiguous, because operator functions always have at least one argument. therefore, the grammar always expects a parenthesized argument list, and the parser can scan through all path separator characters until it encounters the parentheses.

note that there is already an implementation of the parser, which does not require any escape sequences to be added to the grammar.

i think that all keywords must be present was a poor wording, it should probably be replaced with: the phylum keyword (actor, struct, enum, func, var, etc.) must be present, and additionally any prefixed class or static modifier must be present.

here is an exhaustive list of valid patterns:

enum Filter:Substring
{
    case  actor             = "actor"
    case `associatedtype`   = "associatedtype"
    case `enum`             = "enum"
    case `case`             = "case"
    case `class`            = "class"
    case  class_func        = "class func"
    case  class_subscript   = "class subscript"
    case  class_var         = "class var"
    case `deinit`           = "deinit"
    case `func`             = "func"
    case `init`             = "init"
    case  macro             = "macro"
    case `protocol`         = "protocol"
    case  static_func       = "static func"
    case  static_subscript  = "static subscript"
    case  static_var        = "static var"
    case `struct`           = "struct"
    case `subscript`        = "subscript"
    case `typealias`        = "typealias"
    case `var`              = "var"
}

it’s important to remember that Symbol Kit kindnames aren’t going away, it is still going to be valid to use

``x-swift.func.op``

it’s just not going to be the preferred way to disambiguate swift symbols. there are a lot of things DocC can do that aren’t limited by the specification.

DocC could continue recognizing the Symbol Kit suffixes, and extend the basic suffix syntax to support other -x.y.z patterns.
DocC could extend the bracketed syntax to support something like [objc: property], or something different.

the new link syntax uses swift keyword patterns because those are likely to be more recognizable to new users of swift, and those who are not used to the XCode ecosystem.

you are right, i think it is reasonable to make the trailing parentheses significant.

these are good points, and i think it might be a good idea to defer the link resolution stuff to a separate proposal. i’ll update the draft to exclude discussion of changes to the link resolution algorithm. however, since there seems to be agreement that leading slash syntax should be part of the grammar, let’s keep that in.

fyi, SwiftSyntax uses this pattern extensively.

this isn’t quite the same, because the entire link is still one scalar link that points to myProperty and happens to have MyClass.myProperty in its link text. probably what the author was going for was to have the MyClass component of the link point to the page for MyClass.

linking to overload groups isn’t actually part of the draft, it is just mentioned as a possible future direction. if it is not something we are interested in, we can simply remove it from the draft entirely.

taylorswift · May 16, 2024, 10:55pm

i rewrote this proposal as two proposals, added additional examples, including “live examples” within the proposals themselves, and also extracted the expositional material into a third document here:

the only thing that changed in the implementation itself is that trailing slashes are now allowed.

ronnqvist · May 21, 2024, 11:22am

Please post those two proposals here in the Swift forums so that there is a thread to talk about each proposal.