Swiftc equivalent for clang's -dump-tokens

Does the compiler offer anything to dump the output of the lex phase (equivalent to the -dump-tokens option from clang), e.g.,

% clang -fsyntax-only -Xclang -dump-tokens test.c
int 'int'	 [StartOfLine]	Loc=<test.c:1:1>
identifier 'main'	 [LeadingSpace]	Loc=<test.c:1:5>
l_paren '('		Loc=<test.c:1:9>
r_paren ')'		Loc=<test.c:1:10>
l_brace '{'	 [LeadingSpace]	Loc=<test.c:1:12>
r_brace '}'		Loc=<test.c:1:13>
eof ''		Loc=<test.c:1:14>

Would this be something worth adding?

It's not quite the same, but swift -frontend -emit-syntax dumps a JSON representation of the syntax tree, and you could recursively scrape similar information out of objects that have a tokenKind property. Doesn't look like it has the line/column numbers in there directly though.

What are you trying to do? Can it be done with SwiftSyntax instead?

Right now my only use case was for debugging purposes, just to have a concise listing of the tokens that the parser is actually seeing for a particular input. Not particularly compelling, but thought it was interesting that clang has it as an option with no obvious equivalent for swiftc. Does SwiftSyntax/libSyntax offer something like that?

You can subclass SyntaxVisitor and only override visit(TokenSyntax), doing whatever you want to in there.

Here's a quick-and-dirty example that prints the tokens in a file using a format similar to clang's -dump-tokens option:

2 Likes

Nice, thanks @allevato! Just curious—does this solution require the program to successfully parse in order to dump all the tokens, or will it make it through the whole file even if parsing bails out at some point?

As far as I know SyntaxParser.parse always returns a "valid" tree (i.e., doesn't throw an error) even for unparsable inputs; it only throws errors if the syntax parser is incompatible or the root node was an unexpected type. There are various nodes like UnknownStmtSyntax, UnknownDeclSyntax, etc., that will be used in the tree if something couldn't be parsed correctly, but the tokens will still be accessible underneath them.

With print)"nope"( as the input, it still prints the following:

identifier	'print'	[StartOfLine]	Loc=<test.swift:1:1>
rightParen	')'	Loc=<test.swift:1:6>
stringLiteral	'"nope"'	Loc=<test.swift:1:7>
leftParen	'('	Loc=<test.swift:1:13>
eof	''	[StartOfLine]	Loc=<test.swift:2:1>

(I updated the gist because I realized I wasn't handling [StartOfLine] correctly for the first token.)

1 Like