[GSOC]Integration of libSyntax with the rest of the compiler pipeline

Hello. I'm DexinLi from China. I’m in the first year of master degree in University of Science and Technology of China. I used to work with Clang AST and know about compiler pipeline.
I’m interested in the idea of "Integration of libSyntax with the rest of the compiler pipeline"
It would be appreciated if anyone could help me to start the gsoc project.

Hi Dexin,
I'm glad you are interested in this project!

Great!

I'm happy to help :) Do you have any specific question?

Thank you, @rintaro.

In the idea's description:

  1. Having the parser generate only a libSyntax tree
  2. Derive the AST nodes from the libSyntax tree and have the AST nodes point to libSyntax nodes for source information
  3. It should be possible to provide a serialized libSyntax tree to the compiler and have typechecking and code-generation functionality without needing to parse code.

For 1and 2, Does the parser generate AST now? So I need to modify the parser to generate a libSyntax tree instead of an AST then derive the AST.
Like this,

Before: source code ---(by current parser)---> AST ---> IL

Now: source code ---(by modified parser)---> libSyntax Tree ---(to be implemented)---> AST ---> IL

For 3, If my understanding about libSyntax and AST is right, the typechecking and code-generation functionality should be operated on AST nodes instead of libSyntax tree. So I'm little confused about this step.

Your understanding is right.

To be accurate, the current Parser generates both AST and libSyntax tree. But I guess it's difficult to use the current libSyntax tree generation functionality for this task. Currently, it's like a by-product of AST parsing.

"the typechecking and code-generation functionality" == "the rest of the compiler pipe line" here.
For 3, it means:

Serialized libSyntax tree JSON —(JSON deserializer)—> libSyntax Tree —(to be implemented)—> AST —> IL

Thank you! @rintaro

So for this part, I need to separate the original parser. The first part will use SyntaxParsingContext to generate a libSyntax tree and the second part will traverse the libSyntax tree to derive the AST. Am I right? (This would be a huge job to finish.)

For this part, I need to write a serializer and a deserializer to perform conversion between libSyntax tree and JSON. Is that right?

Exactly.

I don't think you should use SyntaxParsingContext. Instead, SyntaxBuilder would be the main part of the job. For example, the current Parser::parseExprArray would be something like:

// expr-array  -> '[' expr-array-element* ']'
SyntaxParserResult<ExprSyntax> SyntaxParser::parseExprArray() {
  ArrayExprSyntaxBuilder builder;

  // Parse '['.
  builder.useLeftSquare(consumeToken(tok::l_square));

  // Parse elements.
  SyntaxParserResult<ArrayExprElementSyntax> elementsResult = parseExprArrayElementList();
  if (elementsResult.isError()) {
    // Error handling.
  }
  builder.useElements(elementsResult.get());

  // Parse ']'.
  if (Tok.isNot(tok::r_square)) {
    // Error handling.
  }
  builder.useRightSquare(consumeToken(tok::l_square));

  // Return successful result.
  return makeSyntaxParserResult(builder.build());
}

And yes, this is definitely a huge and challenging job. You also need to modify AST to point to libSyntax nodes for source informations (e.g. getStartLoc()/getEndLoc()). In your proposal, I expect you to plan what you will finish in the time frame, and what you will not.

We already have a serializer; try swift -frontend -emit-syntax source.swift with the latest master build. For the deserializer, we are able to use LLVM YAML parser.

Thank you, @rintaro!
I think I've already figured out most part of the job.
Does swift/SyntaxSerialization.h at main · apple/swift · GitHub contains the entire serializer? If it is, I think I can take a stab at the deserializer this week to be my first commit.

Great!

Yeah, SyntaxSerialization.h defines how libSyntax tree serialized. The actual JSON serializer is (currently) swift/Basic/JSONSerialization.h

We are looking forward to your PR!

Hi @rintaro
I'm working on the JSON deserializer. I want to know what the output should be?
RC<RawSyntax> built by RawSyntax::make or RC<Syntax> built by SyntaxFactory

Hi @DexinLi
I think JSON deserializer itself should output RC<RawSyntax>. However, as written in README.md RawSyntax is implementation detail. So serialized Syntax reader eventually wrap RC<RawSyntax> with SourceFileSyntax.

Hi @rintaro
I'm writing on the draft of my proposal. Do we migrate to the new parser step by step or writing the new one in parallel and replace the old one at a time?

Great! I'm looking forward to it!

Step by step is the best, but, at this point, I'm not sure it's even possible. Either way, we should add a compiler option to enable/disable the new parser. If you have a concrete plan for incremental migration, please propose it :+1:

Cross reference: GSoC: Integration of libSyntax with the rest of the compiler pipeline

Hi @rintaro
This is the draft of my proposal, please have a look.
https://docs.google.com/document/d/1g0tiS-VtNt4_DZ-yQrFJDJgtwaUZe1i56GW7m9Nn7O4/edit?usp=sharing

I've updated my proposal.

Please make sure the proposal is actually submitted to the GSoC website for the Swift project. Otherwise it won’t be officially reviewed for considerstion.

Could I submit my proposal now?