Improving the AST dump format


(Vinicius Vendramini) #1

Hi all!

I would very much like to (attempt to) improve the way the Swift compiler prints its AST.

Currently, the -dump-ast option prints the AST in a format that is similar to S-Expressions, which is supposed to be relatively easy for other programs to parse. However, I've been finding it hard to create a reliable parser for this ast format. It seems to me that the output was never meant to be parsed by other programs (instead, it's meant for human eyes only), hence the difficulty.

Would there be any interest (or opposition) from the community to implementing support for a different format such as YAML or JSON?


(Jordan Rose) #2

As I've mentioned before, I'm minorly against this, because I don't think parsing -dump-ast output is a good idea. We don't actually want to promise that this is a stable or sensible format, and we don't want people writing tools that depend on swift -frontend (which is itself an internal, unstable interface). But others may feel differently.


(Ankit Aggarwal) #3

How about using SwiftSyntax for this?


(Tony Allevato) #4

Anything using SwiftSyntax already implicitly depends on swift -frontend -emit-syntax so it feels like that boat may have already started leaving the harbor (unless we say that SwiftSyntax is one such "blessed" interface through which the frontend can be used).

But to the larger point, I agree—rather than try to change -dump-ast at this point (its concise ANSI-colored form is very nice for debugging and I wouldn't want to exchange that for a much more verbose data format), it would be nice to have a new mode analogous to -emit-syntax that contains semantic information instead of just raw syntax.

The output of SwiftSyntax isn't substitutable for -dump-ast because it only has syntax information; so for example, you can get the identifier for a type usage, but you can't easily determine which module it was imported from, whether it's a typealias for something else, etc. Similarly, IIRC -dump-ast contains synthesized declarations, whereas SwiftSyntax would not have those since it only presents what's actually in the source file.


(Jordan Rose) #5

SwiftSyntax isn't stable yet either; when it is, it won't use -frontend. (*cough cough* @Xi_Ge, @akyrtzi)


(Argyrios Kyrtzidis) #6

The currently supported way to get semantic info from a source file is to use sourcekitd, via its 'index' or 'doc-info' request. Note though that those do not contain everything that an AST has, e.g. you don't get every expression and its type.


(Vinicius Vendramini) #7

Yeah, that’s the thing for me. I need a lot of type data, and I can’t think of a way to do this without the ast dump :confused:
I remember looking at sourcekit a while ago but I seem to remember it also didn’t have everything.


(Vinicius Vendramini) #8

Alternatively, I wouldn’t mind having more information from sourcekit :thinking:


(Argyrios Kyrtzidis) #9

The general goal with sourcekitd requests is to provide a reasonably stable interface, and not be tied to internal details of the Swift AST that can change at any point.
If you'd want output that describes as much details and structure of the AST as possible and essentially ties itself to internal details then there's not much benefit going with a sourcekitd request versus adding an option for the compiler to provide that output.


(Vinicius Vendramini) #10

Yeah, I've taken some time to look at what sourcekitd can do, and it doesn't seem to have a lot of information that's only available in the AST level.

I understand that the AST is meant for internal use (as it should be) and isn't stable, so it will be on me to deal with changes made in future versions of the language. I also understand not wanting to provide a standardized output for fear it may mislead programmers into thinking it's a stable API of some sort.

However, as there is substantial source code information that's only available in the AST, I feel it can be beneficial to provide this standardized output nonetheless (perhaps with a warning of some sort). This would allow programmers that are willing to do this extra work to access this information in practice.

I also agree with @allevato that -dump-ast is useful for its own reasons, and that it would be bad to replace it completely. Perhaps the -dump-ast code could be refactored to separate the output format logic, and then an alternative output format could be added as well. This process would likely fix a handful of inconsistencies I've identified in the current -dump-ast code, and might also make it easier to maintain.