(SymbolGraphGen) can we make `isImplicitlyPrivate` checking configurable?

the SymbolGraphGen tool currently offers no way to enable cataloguing of unconditionally-unavailable symbols, and underscored standard library symbols. (this checking is done by isImplicitlyPrivate in SymbolGraph.cpp.)

one implication of this is that sometimes we get holes in the generated symbolgraphs where edges are referencing USRs that the symbolgraph generator omitted. pruning edges that have one endpoint referencing an omitted symbol isn’t a solution, because a lot of standard library types use underscored protocols to vend public (“synthesized”) APIs on public types.

from an educational standpoint, it’s also sometimes valuable to catalog symbols that are unconditionally unavailable everywhere, if only to inform API users that the symbol has moved.

could we get an option to opt-out of the isImplicitlyPrivate checking? flagging implicitly private symbols is often better done in a higher layer of tooling.

1 Like

The intent for prefixing an API name with an underscore is usually to indicate that the API shouldn't be used by clients (some IDEs don't surface them as part of code completion for instance), so personally I don't think they're relevant by default in documentation. However, I'm all for extra customization to opt-out of this behavior for symbol graph files, including for using SGFs for non-documentation purposes. Would be super helpful to have a ticket for this to make sure we don't lose track of it! If you're interested in adding that support, @QuietMisdreavus and I would be more than happy to provide some pointers.

1 Like

they are not relevant in documentation, but SymbolGraphGen isn’t a documentation generator, it’s a serializer whose outputs documentation generators use to generate documentation. the holes in the symbol graphs are a major problem for such tooling (including swift-biome) because they impede certain optimizatons and break invariants that they rely on in order to serve large amounts of documentation at scale. although the end consumer of documentation will never see the hidden APIs, they are important for the tooling’s internal data model.

for example, swift-biome currently de-duplicates synthesized members against their generic base, since not doing so leads to quadratic explosion of synthesized symbols when protocols contain many extension members (like Sequence) and also have many conformers (like Sequence). this is part of the reason why SwiftSyntax’s symbol graph is 2 orders of magnitude larger than a typical library. however, this optimization is not possible if swift-biome doesn’t know about the generic base in the first place, because it was part of an underscored API.

yeah, i can implement it in a PR, i was just asking here to see if this was a change that has a good chance of getting accepted.

i don’t know where the issue tracker for SymbolGraphGen lives. should i just file it on bugs.swift.org as a compiler issue?

A new flag or flags to opt out of this behaviour sounds reasonable to me. Maybe something in the lines of -include-implicitly-private-symbols and -include-unconditionally-unavailable-symbols?

And yes, bugs.swift.org is where we track symbol graph bugs.

1 Like

FWIW I just stumbled upon this in TSPL:

Treat identifiers that begin with an underscore as internal, even if their declaration has the public access-level modifier. This convention lets framework authors mark part of an API that clients must not interact with or depend on, even though some limitation requires the declaration to be public. In addition, identifiers that begin with two underscores are reserved for the Swift compiler and standard library.

…which I think solidifies the argument for not including underscored symbols in SGFs by default when extracting SGFs with public visibility. Nonetheless, having this behavior configurable is still useful.

1 Like