I meant going before the dump code in the compiler and working with the source code information in memory from within the compiler as opposed to getting the compiler to take that information, serialize it, and then us deserialize it in a custom program; or perhaps serializing it from the dump command in a custom, more intutive format.
Everyone else is already being helpful but Iâm amused enough by being able to answer âdoes your position here standâ with âwell, yes, but Iâm no longer standing at it because I left Appleâ that Iâll reply as well. My useful contribution is that it would be reasonable for the AST in the compiler to be the most reliable source of information about the source code, but in practice it is not that; it is the most reliable source of information the compiler needs from the source code. The implementation is used by both the compiler and SourceKit and so itâs going to be fairly complete, but it might not be stored in one place, or only computed lazily, or reused between phases of the compiler with different semantics, or with important details âmissingâ because they would increase memory usage or slow down compilation if the user hasnât specifically asked for them, or even just not broken out because theyâre not important for what the compiler needs. Theyâre not meant for consumption elsewhere.
(Years ago we got someone complaining that what we have is more of a concrete syntax tree than an âabstractâ one, and, well, theyâre probably right. Thatâs what happens when a fuzzy name sticks early on in development and an academic concept gets used as shorthand.)
I realize this doesnât help you and probably wonât dissuade you from whatever youâre trying to accomplish. Iâll defer to everyone else for that.
For what itâs worth, I still stand by that decision too. Consuming the AST Dump worked after I contributed several bug fixes to the Swift compiler, but even then it required a lot of effort to adjust to every new swift version. Itâs been years since Iâve used any of it, but Iâd probably go with SwiftSyntax for the main structure and SourceKit for the type information if I had to write it again.
I suppose that's what I do: consume an in-memory dump of AST (from the compiler) and then pass the output into a tokenizer that is language-agnostic (I currently support ObjC and Swift but could add any number of languages, based on file extension). This tokenizer uses a generic srtucture and, while I dont care for most of the AST output (for now), I snatch up the most important ones like class, methods, functions, enums (and then some) location in sources. I track ancenstry for the graphic layout of the classes and for quick perusal in my object browser editor.
Otherwise, this app always scanning the source for any user activity would kill a laptop battery in no time. :-)
Even if re-scanning a source is super fast even on an M1 MBA.
You inspired me to remove the broken behavior of subst()
with GenericFunctionType
and fix the few remaining callers that rely on it, which is something I've wanted to do ever since I discovered that code path exists: Clean up GenericFunctionType substitution by slavapestov ¡ Pull Request #80301 ¡ swiftlang/swift ¡ GitHub.
This also revealed the crash location which I think you were talking about with parameter packs, where mangleTypeAsUSR()
calls getTypeForDWARFMangling()
, which had an unnecessary use of subst()
. This was totally my fault for introducing it in the first place. What happened is that your AST dumper is passing a GenericFunctionType
to mangleTypeAsUSR()
, which is something that hadn't been done before. This exercised the broken code path. You might want to revisit your examples and see if they're fixed once the above PR lands.
I saw that you just merged it and tried the old example. It's working now, thanks! The interface type for g
in the code below is being reported as $syyqd__c3lib4PackVyxxQp_QPGcD
(lib.Pack<Pack{repeat A}>) -> (A1) -> ()
, which looks right to me.
struct Pack<each T> {
func f(_ t: repeat each T) {
repeat g(each t)
}
func g<U>(_ t: U) {}
}
On the opaque type issue, I'm still seeing the conformance not being included when I pass the opaque archetype directly into printTypeUSR
(I had to remove an overly aggressive assertion that bans all archetypes in that function), but I'll keep plugging away at it; I haven't had time to really sit down and debug it yet.
Where is this assertion? Types that contain opaque archetypes answer false to hasArchetype() which is the condition we usually check.
In USRGeneration.cpp, I trap on the linked hasArchetype()
assertion when trying to generate the USR for a type corresponding to an UnderlyingToOpaqueExpr
node on the line marked below:
protocol P {}
struct S: P {}
func baz<T>(_ x: T) {
func foo() -> some P {
return S() // <- here
}
}
The type dumped right before the assertion:
(opaque_type address=0x130a0afc0 conforms_to="lib.(file).P@/Users/allevato/Scratch/jsonast/lib.swift:13:10" decl="lib.(file).baz(_:).foo()@/Users/allevato/Scratch/jsonast/lib.swift:17:8"
(interface_type=generic_type_param_type depth=1 index=0 param_kind=type)
(substitution_map generic_signature=<Ď_0_0 where Ď_0_0 : Copyable, Ď_0_0 : Escapable>
(substitution Ď_0_0 ->
(primary_archetype_type address=0x130a07580 conforms_to="Swift.(file).Copyable" conforms_to="Swift.(file).Escapable" name="T"
(interface_type=generic_type_param_type depth=0 index=0 name="T" param_kind=type)))
(conformance type="Ď_0_0"
(abstract_conformance protocol="Copyable"))
(conformance type="Ď_0_0"
(abstract_conformance protocol="Escapable"))))
I guess this isn't directly because of the opaque type, but because the substitution map refers to the T
from the parent context? EDIT: Yes, if I remove baz<T>
and just leave foo
, it doesn't assert. But I still need to figure out how to handle both cases.
Ah, this isnât about opaque archetypes at all. That type contains the primary archetype T, and you will get the same crash if you try to mangle any type that contains a primary archetype, like Array<T>
. You can use TypeBase::mapTypeOutOfContext() to replace primary archetypes with type parameters before mangling.
Thanks, I can confirm that works (and is probably more sound than the ad hoc transformRec
I'm currently doing to swap out archetypes with interface types).
But just to jump back to the original question about the opaque type, even now that I'm not converting anything needlessly to interface types, when I call printTypeUSR
to mangle some P
, that mangling doesn't appear to reference P
anywhereâshould it? What I end up with is:
"$s3lib3fooQryFQOyQo_D" -> "<<opaque return type of lib.foo() -> some>>.0"
The unpaired some
at the end is what bothers me. I can track down P
from the decl that owns it because it has an opaque_result_decl
that lists the conformances, but it would be nice if I could use the type mangling as the sole source of information here. But if my expectations aren't correct, then we can build something that works around it, like a separate table that maps opaque type manglings back to the corresponding decl.
I think the "sole source of information" that you're after here is the generic signature of an opaque result declaration. (OpaqueTypeDecl::getOpaqueInterfaceGenericSignature()
).
If I write
func foo<T>(_: T) -> (some Sequence<T>, some Any) {...}
Then the function declaration foo() also has an associated opaque result declaration which it points to, and vice versa. You can visit it while dumping foo()
itself. Alternatively, every SourceFile
has a table that maps those mangled names to opaque result declarations; you can dump this table instead. (This is how we resolve @_opaqueResultOf
in a module interface, for example.)
The generic signature of foo()
itself is <T>
, but the generic signature of its associated opaque result declaration adds two new generic parameters:
<T, R0, R1 where R0: Sequence, R0.Element == T>
The actual return type of foo()
is a tuple type that contains two opaque archetypes. Both archetypes refer to the same opaque result declaration. The first archetype's type parameter is R0
, and the second archetype's type parameter is R1
. Note that R1
is unconstrained in the generic signature.
Rightâwe dump that currently alongside the function that declares that opaque type:
{
"_kind": "func_decl",
"usr": "s:3lib3fooyQr_QR_txlF",
...
"result": "$s3lib3fooyQr_QR_txlFQOyxQo__AaByQr_QR_txlFQOyxQo0_tD",
"opaque_result_decl": {
"_kind": "opaque_type",
...
"declared_interface_type": "$s3lib3fooyQr_QR_txlFQOyxQo__AaByQr_QR_txlFQOyxQo0_tD",
"generic_params": [
"$sxD",
"$sqd__D",
"$sqd_0_D"
],
"reqs": [
{
"_kind": "requirement",
"first_type": "$sxD",
"kind": "same_type",
"second_type": "$s7ElementSTQyd__D"
},
{
"_kind": "requirement",
"first_type": "$sqd__D",
"kind": "conforms_to",
"second_type": "$sST_pD"
},
{
"_kind": "requirement",
"first_type": "$sqd_0_D",
"kind": "conforms_to",
"second_type": "$sypD"
},
{
"_kind": "requirement",
"first_type": "$sqd_0_D",
"kind": "conforms_to",
"second_type": "$sypD"
}
]
},
So I think I have everything I need, as long as I correctly map the depths/indices back to the right generic parameters. (The conforms_to
requirements here have the issue of collapsing invertible protocols, which I still need to fix.)
The use case I'm thinking of is that we'll have logic in our analyzer that asks the question "what type is <arbitrary expr>?", to which the answer might be that opaque type mangling. So if I'm passing that around and then need to reason about the specific conformances later, I'll need to make sure I have a handle back to the original opaque type decl, since I can't extract it from the type itself. Thanks for the pointer to the mapping in the SourceFile
âI didn't know about that, and it might be easier to work with than trying to remember arbitrarily nested decls.
Thereâs a getRequirementsWithInverses() that undoes the transformation, but I donât think you want to do that. A dump should faithfully represent the actual state of affairs.
Iâm wondering if it might be better for your use case to mangle opaque types as erased existentials, eg some P
just becomes any P
, etc.
The question âdoes a type parameter T conform to a protocol P in a generic signature Gâ is extremely difficult to answer for arbitrary T and G unless youâre the compiler. Looking at the explicit requirements is insufficient when T is a dependent member type, because of protocol inheritance, same type requirements, etc.
Specifically the case Iâm wondering about is something like
func f<T>(âŚ) -> some Sequence<T> {
âŚ
}
let s = f(âŚ)
let iter = s.makeIterator()
Now the type of iter is an opaque archetype with the same generic signature as s
, but whose type parameter is a dependent member type derived from the type of s
. In fact iter
conforms to IteratorProtocol, but this isnât explicitly stated in the signature of the opaque result declaration.
An existential type loses some of the generality but itâs easier to reason about.
Yeah, we're currently combining multiple sources of information like indexstore to help us navigate protocol inheritance hierarchies by computing the chains ourselves when we need to, but that can run into issues with, say, a conditional conformance via extension, where we would need to start evaluating where
clauses to decide if we should walk a certain edge or not. I'm not too keen on implementing a parallel type checker here.
This sounds like an interesting idea; I'll see how it shakes out. Maybe for any opaque type, we can show both the opaque type mangling and the equivalent existential mangling in the AST dump, and then we can decide which one (or both) we want to use at a later point.
Inheritance among protocols is straightforward (nothing is conditional), but yeah, once you start looking at concrete types, you're back to needing a generics implementation if you find yourself needing to evaluate conditional requirements. (However, you could maybe generate a table of all concrete types that appear in the output, with the protocols each one conforms to, perhaps).
I hope that one day, we will have a nice reusable library to handle such questions.
You can generate a table whose keys are every opaque return type that actually occurs in the output, and values are the erased existential types of each one (there's a ArchetypeType::getExistentialType()
method that does this).
I'm working on wrapping this up, and it looks like mapTypeOutOfContext()
alone isn't sufficient to get a type that ASTMangler can handle. After calling it, in some cases I still have ElementArchetypeType
s or ExistentialArchetypeType
s (or something like a FunctionType
that contains those).
The implementation of mapTypeOutOfContext
looks like it only handles primary and pack archetypes, so that seems to match what I'm seeing; is that expected?
I also noticed this recent PR ([Mangler] Handle local archetypes in `getDeclTypeForMangling` by hamishknight ¡ Pull Request #78855 ¡ swiftlang/swift ¡ GitHub), which touches on some of this, but I can't directly do something similar without updating ASTDumper to pass a DeclContext
through to all the individual printers. I can make that change if it's necessary to get safe types for USR generation, though.
The simplest thing to do would be to skip expressions whose types contain local archetypes. If you just want the USR to be unique, you can walk the type to collect all generic environments of all local archetypes, and then use mapLocalArchetypesOutOfContext() to rewrite them into distinct type parameters. This loses information though, but itâs probably fine. Otherwise, you can imagine a more general encoding where we record the relevant information about each local generic environment in a side table.
Another slightly hacky solution would be to replace local archetypes with their existential upper bounds before mangling. Again this loses information compared to encoding the generic signature of each local generic environment, but there isnât much you could do with a raw list of requirements anyway without reimplementing the requirement machine.