Improving the AST dump format

Hi all!

I would very much like to (attempt to) improve the way the Swift compiler prints its AST.

Currently, the -dump-ast option prints the AST in a format that is similar to S-Expressions, which is supposed to be relatively easy for other programs to parse. However, I've been finding it hard to create a reliable parser for this ast format. It seems to me that the output was never meant to be parsed by other programs (instead, it's meant for human eyes only), hence the difficulty.

Would there be any interest (or opposition) from the community to implementing support for a different format such as YAML or JSON?

As I've mentioned before, I'm minorly against this, because I don't think parsing -dump-ast output is a good idea. We don't actually want to promise that this is a stable or sensible format, and we don't want people writing tools that depend on swift -frontend (which is itself an internal, unstable interface). But others may feel differently.

2 Likes

How about using SwiftSyntax for this?

Anything using SwiftSyntax already implicitly depends on swift -frontend -emit-syntax so it feels like that boat may have already started leaving the harbor (unless we say that SwiftSyntax is one such "blessed" interface through which the frontend can be used).

But to the larger point, I agree—rather than try to change -dump-ast at this point (its concise ANSI-colored form is very nice for debugging and I wouldn't want to exchange that for a much more verbose data format), it would be nice to have a new mode analogous to -emit-syntax that contains semantic information instead of just raw syntax.

The output of SwiftSyntax isn't substitutable for -dump-ast because it only has syntax information; so for example, you can get the identifier for a type usage, but you can't easily determine which module it was imported from, whether it's a typealias for something else, etc. Similarly, IIRC -dump-ast contains synthesized declarations, whereas SwiftSyntax would not have those since it only presents what's actually in the source file.

1 Like

SwiftSyntax isn't stable yet either; when it is, it won't use -frontend. (*cough cough* @Xi_Ge, @akyrtzi)

4 Likes

The currently supported way to get semantic info from a source file is to use sourcekitd, via its 'index' or 'doc-info' request. Note though that those do not contain everything that an AST has, e.g. you don't get every expression and its type.

Yeah, that’s the thing for me. I need a lot of type data, and I can’t think of a way to do this without the ast dump :confused:
I remember looking at sourcekit a while ago but I seem to remember it also didn’t have everything.

Alternatively, I wouldn’t mind having more information from sourcekit :thinking:

1 Like

The general goal with sourcekitd requests is to provide a reasonably stable interface, and not be tied to internal details of the Swift AST that can change at any point.
If you'd want output that describes as much details and structure of the AST as possible and essentially ties itself to internal details then there's not much benefit going with a sourcekitd request versus adding an option for the compiler to provide that output.

Yeah, I've taken some time to look at what sourcekitd can do, and it doesn't seem to have a lot of information that's only available in the AST level.

I understand that the AST is meant for internal use (as it should be) and isn't stable, so it will be on me to deal with changes made in future versions of the language. I also understand not wanting to provide a standardized output for fear it may mislead programmers into thinking it's a stable API of some sort.

However, as there is substantial source code information that's only available in the AST, I feel it can be beneficial to provide this standardized output nonetheless (perhaps with a warning of some sort). This would allow programmers that are willing to do this extra work to access this information in practice.

I also agree with @allevato that -dump-ast is useful for its own reasons, and that it would be bad to replace it completely. Perhaps the -dump-ast code could be refactored to separate the output format logic, and then an alternative output format could be added as well. This process would likely fix a handful of inconsistencies I've identified in the current -dump-ast code, and might also make it easier to maintain.

There's a fundamental limitation in "swift -frontend -emit-syntax" where any symbol described does not have line/col or global source offset where the symbol is defined. Or I have missed an option to generate all these.

At this point I have a swift class capable of loading the entire JSON structure and surrounding code that outputs source-independant "tokens". These tokens are the same as those I generate from clang "clang -cc1 -ast-dump".

I use these in a custom app that is a workspace/project source/class browser (remember ObjectMaster?) where I can browse and edit sources. While I can pull up ObjC source on a per-method basis, the same is not (yet) possible in swift because the -emit-syntax doesn't yield source location information.

I can still use the app to cross-reference all code usage to spot dead code (my current goal for this app), but it would be superb to be able to display functions are they are browsed, rather than resort to full-source.

Hey @mouser, I'm not sure if this will help you but lately I've been using libSyntax to get the AST and SourceKit to get the type information (instead of the AST dump). It's a bit hard to match libSyntax's information with SourceKit's, but looking at your screenshot maybe libSyntax alone will solve your problem.

Here's my code that deals with libSyntax and SourceKit, I hope it helps: Gryphon/SwiftSyntaxDecoder.swift at release · vinivendra/Gryphon · GitHub

1 Like