GSoC: Integration of libSyntax with the rest of the compiler pipeline

Vinicius_Vendramini · March 19, 2018, 10:54pm

Hi all!

I'm a masters student from the University of São Paulo, and I'm very interested in pursuing this project for GSoC :)

I've played around a lot with the parser in the past, trying to get it to dump the ast in yaml instead of the current s-expression-like representation. In the end I couldn't see this idea through (though I got really close), but I always wanted to come back and try something new.

I'm currently writing a transpiler from Swift to Kotlin for my masters project, so learning about libSyntax and helping its development in a meaningful way would be a big help!

Is there someone specific I should talk to about this? I'd really like to know what the first steps would be towards this project so I could dip my toes in it and really get a sense its scope. I know it's marked as hard, but that just makes it more fun!

Cheers!

harlanhaskins · March 19, 2018, 11:27pm

Hi @Vinicius_Vendramini!

I'm super glad that you're interested in this! I want to call in @rintaro and @Xi_Ge, as they're the current stewards of libSyntax and the parser.

My personal, super ambitious goal for libSyntax is to reformulate the compiler pipeline from

          -> AST (with syntactic bits) -> Typechecked AST -> SIL -> ... -> Binary
<source> |
          -> libSyntax AST

to

<source> -> libSyntax AST -> Semantic AST (with no syntactic bits, pointing back to libSyntax) -> Typechecked Semantic AST -> SIL -> ... -> Binary

That's my personal dream. @rintaro and @Xi_Ge can give you more specifics on what they're expecting for this project.

Best of luck with GSoC! I look forward to your proposal.

Vinicius_Vendramini · March 19, 2018, 11:30pm

Hi @harlanhaskins!

I remember watching your talk on libSyntax for Try! Swift NYC 2017, it was really enlightening

I was actually a bit surprised to see this project on GSoC, I was under the impression that libSyntax wasn't yet ready to fully replace the parser. Glad to see I was wrong!

rintaro · March 20, 2018, 9:07am

Hi Vinicius,

Thank you for being interested in this project!
Just FYI, there's another discussion about this project: [GSOC]Integration of libSyntax with the rest of the compiler pipeline

harlanhaskins:

My personal, super ambitious goal for libSyntax is to reformulate the compiler pipeline from

          -> AST (with syntactic bits) -> Typechecked AST -> SIL -> ... -> Binary
<source> |
          -> libSyntax AST

to

<source> -> libSyntax AST -> Semantic AST (with no syntactic bits, pointing back to libSyntax) -> Typechecked Semantic AST -> SIL -> ... -> Binary`

That’s my personal dream.

And so is mine
In addition to this, we want to input serialized libSyntax tree (JSON) directly to the compiler.

Since it's super ambitious, I don't think we fully finish this work and merge into the repo in this GSoC timeframe. But I believe we should eventually do this. In your proposal, I expect you to plan what you will finish in the timeframe, and what you will not.

Roughly, we should do:

Implement a parser to parse only into libSyntax tree.
Modify AST nodes to hold libSyntax nodes for source information (getLoc() etc.)
Implement a libSyntax tree to AST translator. This should do:
- Everything the current parser do:
  - Diagnostics
  - Name binding
  - Resolve #if config
  - Code completion
- (Hopefully) Move some feature from Sema:
  - Resolve import
  - Fold sequence expression (Resolve operator precedence)
  - Resolve type expression
Make a way to input serialized libSyntax tree to the compiler

Vinicius_Vendramini · March 20, 2018, 12:01pm

Hey @rintaro!

Ok then, let's see if I understand this correctly. The current parser builds an AST that has both syntactic and semantic information (and also the libSyntax AST, which is separate). So perhaps we could:

Change the parser so that the current AST gets its syntactic information from the libSyntax AST.
Move the code in the parser that generates the semantic information into a separate "AST Translator" that can be run later. I suspect this will take most of the time.
If there's time, allow the compiler to read a libSyntax JSON (now that it can convert the libSyntax into a full AST).

In the end, it might look something like this:

[source] --Parser--> [libSyntax AST (syntactic)] --AST-Translator--> [AST (semantic)] --Typechecker--> ...

Does that seem right to you?

Vinicius_Vendramini · March 20, 2018, 12:03pm

Also, since this is a difficult project that might not be ready in time, do you think it would make sense for me to contact the other student you mentioned (DexinLi) and try to work out a way to separate the project in two, so we can both work on it?

rintaro · March 20, 2018, 12:29pm

Sounds great, but I'm not 100% sure this strategy works.

AST nodes should refer SyntaxData (wrapped with concrete ***Syntax type) because calculating AbsolutePosition needs complete SourceFileSyntax. That means we cannot construct any AST nodes until we finish libSyntax tree parsing for whole source file.

(Hmm, I currently have no idea what to do for incremental parsing @Xi_Ge @harlanhaskins, WDYT? )

It makes sense to me, but I have to confirm that with other people. I'll discuss about it later.

DexinLi · March 20, 2018, 12:30pm

Hi @Vinicius_Vendramini
Actually I'm thinking about how much time each parts of this project would cost and which part I would finish for this project. So it's OK for me to separate the project into two if it is permitted.

Xi_Ge · March 20, 2018, 5:57pm

rintaro:

Vinicius_Vendramini:

Change the parser so that the current AST gets its syntactic information from the libSyntax AST.

Move the code in the parser that generates the semantic information into a separate “AST Translator” that can be run later. I suspect this will take most of the time.

If there’s time, allow the compiler to read a libSyntax JSON (now that it can convert the libSyntax into a full AST).

Sounds great, but I'm not 100% sure this strategy works.

AST nodes should refer SyntaxData (wrapped with concrete ***Syntax type) because calculating AbsolutePosition needs complete SourceFileSyntax. That means we cannot construct any AST nodes until we finish libSyntax tree parsing for whole source file.

(Hmm, I currently have no idea what to do for incremental parsing @Xi_Ge @harlanhaskins, WDYT? )

I think incremental parsing is beyond the scope of a summer project. Actually, i think incremental parsing depends on adopting libSyntax to the compiler pipeline for several reasons.

We don't want to design incrementalness for both AST and Syntax nodes.
AST is not designed for mutation; so it's naturally not incrementalable, even if we have a walk-around to make it so.
Most of our existing IDE feature still uses AST, so making syntax tree incremental alone won't provide much benefit.

It makes sense to me, but I have to confirm that with other people. I'll discuss about it later.
[/quote]

Vinicius_Vendramini · March 20, 2018, 6:53pm

Great! Let's see if this works out then

Ok then. I'm going through the (long) process of cloning and building the compiler so that I can take a better look at the code. Once that's done I'll be back to brainstorm some more

Vinicius_Vendramini · March 20, 2018, 10:27pm

@rintaro @Xi_Ge Can you clear something up for me?

I remember taking a look at libSyntax a while ago and noticing that it wasn't ready for this kind of integration... I think it wasn't yet able to parse a few language constructs (i.e. it couldn't parse enums or something). So I guess my question is, does it already support the full language?

ksvsk · March 21, 2018, 1:01am

github.com

apple/swift/blob/main/lib/Syntax/Status.md

# libSyntax nodes status

## Expression

### Done:
  * NilLiteralExpr
  * IntegerLiteralExpr
  * FloatLiteralExpr
  * BooleanLiteralExpr
  * StringLiteralExpr
  * DiscardAssignmentExpr
  * DeclRefExpr
  * IfExpr
  * AssignExpr
  * TypeExpr
  * UnresolvedMemberExpr
  * SequenceExpr
  * TupleElementExpr
  * TupleExpr
  * ArrayExpr

This file has been truncated. show original

This might help.

Vinicius_Vendramini · March 21, 2018, 10:54am

Thanks @ksvsk, that's the file I was thinking about!

So, wouldn't we have to finish this list before libSyntax can really be integrated into the Parser? I'm not sure how to do that though, do we just have to find the places in the parser where it already generated the AST and make it also generate the libSyntax AST?

rintaro · March 21, 2018, 3:33pm

@Vinicius_Vendramini @DexinLi

Unfortunately, we decided not to select 2 students for this project. Mainly because of the lack of mentoring bandwidth. Sorry for that.

Also, we think this task is overwhelming for GSoC project. This task is super high volume, non-additive, technically hard, etc. Even if we manage to finish the implementation, it's highly possible that it takes long time to review, merge, and migrate, or even impossible to be merged.

We discussed about this today, and came up with an idea which might be possible to implement in this GSoC time frame:

"Implement libSyntax tree to AST translator"

The tasks are:

Accept libSyntax tree JSON as an input to compiler
Deserialize JSON to libSyntax tree
Translate libSyntax tree to AST without modifying current libAST.

 (serialized JSON) 
        ↓
  [deserializer] @DexinLi already implemented this. Thanks!
        ↓
 (libSyntax tree) 
        ↓
   [translator] to be implemented.
        ↓
      (AST)

Snippet of the translator will be something like:

Expr Translator::visit(IdentifierExprSyntax node) {
  Identifier Name = ASTCtxt.getIdentifier(node.getIdentifierToken().getText());
  SourceLoc Loc = getSourceLoc(node.getAbsolutePosition());
  Expr E = new (ASTCtxt) UnresolvedDeclRefExpr(Name, DeclRefKind::Ordinal, Loc);
  return E;
}

This is very narrowed version of the original idea, but it's still very valuable because using this, 1) We can implement and test libSyntax parser independently 2) we can incrementally implement libSyntax backed AST nodes.

More importantly, this is purely additive feature. I think we can easily merge this compared to the original idea.

Vinicius_Vendramini · March 26, 2018, 11:06am

Hi guys! Sorry it took me so long to respond. I tracked down my supervising professor and we decided this was too out-of-scope for the project I’m doing. Thanks for the help though!

Best of luck to whoever gets to do this!