Changes to the SIL Parser

ParserSIL

Problem Statement

The SIL parser is one 7000 line file. It attempts to do three things: read SIL text, check and verify the input, and transform the input into SIL instructions/functions/etc. There are, at least, five overarching problems with the current implementation:

  1. The current implementation is old, complicated, and monolithic.
  2. The parser is a standalone library (for no good reason) which complicates the build system.
  3. It is hard to have “drop-in” tooling with this implementation.
  4. The current implementation has no interoperability with the serializer.
  5. The current implementation tries to do everything in one place: read, check and parse.

Solution

I’d like to propose a solution to incrementally fix all these issues. First, we’ll move the parser into lib/SIL/Parser unifying it with the SIL library.

Second I’ll update the parser to model a new flow of information. This model has three steps: read, check, parse and allows for drop-in tooling at any point. This will allow us to do large-scale transformations, pretty-printing, and utilize other tools for easier debugging and maintenance. Here’s a pseudo-code example of the proposed implementation:

Read text:

Input

sil @foo {
bb0():
    x = copy_value y
    store x to [init] z
}

Read

{ OperandIDs, Attrs } readCopyValue() {
    return { readOperand() };
}

type: copy_value
operands: { ValID, TypeID } 

{ OperandIDs, Attrs } readStore() {
    bool isInit = false;
    auto src = readOperand();
    expectAndConsume("to");
    consumeOptionalAttr("init", isInit);
    auto dest = readOperand();
    return { { src, dest }, isInit };
}

type: store
operands: { ValID, TypeID }, { ValID, TypeID }
attributes: "init"

Check:

void checkCopyValue(OperandIDs operands, Attrs attrs) {
    require(operands.size() == 0);
    // Other checks...
}

// Check store in the same way.

Parse:

SILInstruction *emitCopyValue(OperandIDs operands, Attrs attrs) {
    auto y = getLocalValue(operands.front.val, operands.front.type);
    return builder.createCopyValue(y);
}

Splitting the implementation up in this way gives us a few major wins. First and foremost, separation of concerns allows for all three steps to be as simple and straightforward as possible. Second, the verification step allows for much more sensical and intuitive sil parsing errors. Last, this approach models very closely after the serializer’s implementation and will allow us to have interoperability between the two in the future.

9 Likes

I am happy to have someone look at the SIL parser! It has needed love for a long time.

Some thoughts:

  1. I really like the split up of responsibilities! Can you describe the API in more detail. E.g. give functional names to things in Read. I want more explicit details.

  2. I wonder if you could refactor the SIL verifier to having a part that assumes well formed SIL and one that does not. Then we could reuse those checks and not have to maintain them in two places!

  3. I think that parse should be pretty easy to implement the way you are doing it. Make sure to use SILNodes.def to declare one of these for every instruction and provide an exhaustive switch when doing the dispatch. That way if someone doesn't provide a definition for a new instruction, they will get a linker error.

My thought is that the way to bring this up is as follows (tell me if you agree).

  1. We do the move, fixing formatting as we go.
  2. We extract out the checks from the visitor. I think it would be great if we didn't have two verifiers for common code.
  3. Once we have that, I think the order doesn't matter. The way the parser is written, before we parse instructions we always do the extraction of the operand, attr first. So I imagine the two parts could be refactored separately or in parallel since they have a well defined interface.

I mostly agree. I think the formatting should be updated as we pull things into the visitors but, I don't feel too strongly.

The instruction parsing is the main change here and I think it's important that I update it in an incremental way. I suggest that I add the visitors and a list of "supported" instruction (which will only exist during the "transition period" to the new parser). As I add visitors to handle more instructions I'll add them to the supported list so that they are visited. If they're not in the list it will default to the old implementation. Once all the instructions are supported, I'll remove the old implementation and list.

I'll flesh out and document the APIs here shortly.

I haven't completely made up my mind yet about the checking. My thinking was that the added checker would only do the bare minimum in terms of validating that there were the correct number of operands, results, etc. Not type/attribute checking that is already in the SIL verifier. We should probably run the verifier over the SIL after its been parsed but I think we already do that.

Here's a draft of what the API could look like. I'm sure that I'll need to tweak these and add to it along the way but, these are, at least, the fundamentals. Let me know what you think and feel free to suggest improvements.

struct SILParserOperand {
	ValueID value;
	TypeID type;
};

using SILParserOperands = SmallVectorImpl<SILParserOperand>;

SILParserAttributes is an unsigned integer where each bit position corresponds to a different attribute.

struct SILParserResult {
	SILParserOperands operands;
	SILParserAttributes attrs;
};

Note: both ValueID and TypeID are already defined in the compiler.

Here are the primary APIs for each stage:

Note: visitors will be called on each instruction in order before the next instruction is parsed. This means that the checker can lookup values/types that have already been parsed.

/// Gets a local value that corresponds to ValueID if it exists. Otherwise, return None.
Optional<SILValue> getLocalValue(ValueID);

/// Returns: *getLocalValue(ValueID);
SILValue takeLocalValue(ValueID);

/// Gets a local type that corresponds to TypeID if it exists. Otherwise, return None.
Optional<SILValue> getLocalType(TypeID);

/// Returns: * getLocalType(TypeID);
SILValue takeLocalType(TypeID);

// READ

/// A visitor modeled for each SILInstruction kind. For example, if we parse a "copy_value" we call the visitor
/// "readCopyValue." The read visitors are encouraged to return the information they have when and if they
/// run into an error. The checker will see that the information is incomplete or incorrect and diagnose the error.
SILParserResult read();

/// Reads a single operand from the source file. Does not parse the type. If there is no operand, returns None.
/// No error diagnostic is emitted.
Optional<ValueID> consumeOperand();

/// If the raw text is matched, consumes the matched text in the source file and returns true. Otherwise, returns false.
bool consumeRaw(StringRef raw);

/// Reads a single attribute from the source file. If there is no operand, returns None. No error diagnostic is emitted.
Optional<SILParserAttributes> consumeAttr();

// CHECK

/// A visitor modeled for each SILInstruction kind. For example, if we parse a "copy_value" we call the visitor
/// "checkCopyValue." Checks that the correct information is present in order to parse the instruction. If the
/// correct information is not present, emits a diagnostic and returns false. This visitor should not check types
/// or anything else beyond the syntax required to create the SILInstruction. This visitor is responsible, however,
/// for checking that all operands and types *exist*.
bool check(SILParserResult, unsigned resultCount);

// PARSE

/// A visitor modeled for each SILInstruction kind. For example, if we parse a "copy_value" we call the visitor
/// "emitCopyValue." Emits (returns) a SILInstruction built with the information from SILParserResult. The result
/// of this function will be used to populate "local values." I.e. the result ValueID(s) from the SILInstruction, if any.
/// This visitor can and should expect that the types and values its given exist.
bool emit(SILParserResult);

While we're at it, would it be reasonable at all to expose a C/Swift API for SIL Parser at some point, similarly to what we have for SwiftSyntax? I always wanted to have at least some support from developer tools when looking into SIL, at least something more than basic syntax highlighting. Is there anything that could prevent this in princible, e.g. SIL not being stable enough? I'm not saying it should be done as a part of the current SIL Parser refactor described above, just wondering if I ever wanted to build some dev tools for SIL based on its parser code, would it even make sense to propose such C/Swift API for it?

1 Like

Hi Zoe,

It would be great to really rethink this. Among other problems, the SIL parser inherits a monolithic design from Clang and other similar recursive descent parsers and took its own random walk over time.

It is focused on a different domain, but I'd recommend looking at the patterns used in the MLIR asm parser - not because of MLIR itself, but just because it is an example of a new design that is very methodical in its approach.

I'd like to call out one aspect in particular: The parser state (including the lexer etc) are split out into a class that is independent of the parser itself. This allows the parser to be split into multiple different classes, for example, the base parser class has the general utilities for parsing comma separated lists, tokens etc. Other more specific parsers can have state and methods local to their domain, e.g. the affine parser, the operation parser and the module parser.

Another important thing is the methodical use of ParseResult and by-reference output results (instead of a mishmash where parser routines return things in different ways). This allows sequences of parsing logic to be expressed in linear code sequence.

All this said, the parser would be way way way nicer when implemented in Swift with Swift's error handling, but alas...

-Chris

2 Likes
  1. Maybe? I think that is out of scope for this change.
  2. Really for what you want, one would need to have a way of talking about the syntax of SIL, not the semantics. There is an impedance mismatch there that one would expose. This is worked around in SwiftSyntax by not working on the actual AST itself but instead using a different type of parse tree specifically meant for manipulating the text of a program (for instance it stores relevant white space, provides manipulation tools/etc.). SIL is not setup to be used in that way.

Hi Chris,

Thank you for your message. After a very brief look over the MLIR parser, I agree, It's a great design.

Separating different aspects of the parser is one of the main goals of this change. Eventually, it would be ideal to separate both the state and the stages of the parser but, there is a lot that currently depends on the structure of SILParser and factoring out the state before other infrastructure is in place would be challenging at best. That being said, I'll make sure to think about making these changes in a way such that it is easy to move the state into its own class down the road.

The other aspects, splitting up parsers and having a linear code sequence I whole-heartedly agree with and intend to incarcerate into this change.

I've just finished the boilerplate for my proposed change (obviously subject to change), and I'll post about the API that I've designed later tonight. The design is such that there is (or will be eventually) little to no recursive parsing. And, (eventually, right now I'm only working on instruction parsing), we can have different parsers for top-level sil declarations, sil instructions, types, etc.

Anyway, thanks again for your thoughts and I'll look more carefully through the MLIR parser and see if I find anything else we can steal :)

-Zoe

Here's my design proposal, I've decided to limit this proposal to the SILInstruction parser. Assuming this goes well, I plan to extend this design to other parsers, such as the type parser, and top-level declaration parser, but, for now, I think it's best to limit myself to the SILInstruction parser.

Yesterday I posted a rough outline of the API that I want to implement but, I don't think I did a very good job conveying exactly how those functions that I defined work together. If you're like me in the sense that you can understand something best by looking through some code, you can take a look at this PR I made. Otherwise, keep reading :)

I like to think about what's happening as an assembly line. We start with the raw text, and we want the final result to be a polished SILInstruction that can be a node of information used in the compiler.

Like an assembly line, we need to do different operations to the node as it makes its way from the raw text form to the polished SILInstruction form. Also like an assembly line, these operations should happen synchronously—one after another. A linear code sequence, as Chris put it. I've broken these operations down into three steps: "read," "check," and "emit."

"Read" is the first operation. The "ReadSIL" class is a "Parser" (and is used as the parser during the transition period). The operation this step performs is text to information. We have to extract the information from the text we receive. This step doesn't care if the information makes sense or fits together; it just reads it and gives it to the next step.

Note: If the text is so malformed that the information cannot be read, this step simply exits early, allowing the next step to diagnose the issue. If the text is so malformed that the reader cannot even determine what SILInstruction is being parsed, then an error diagnostic is emitted.

"Check" is the next operation. The "CheckSIL" class is a basic checker whose only job is to make sure that the information we have makes sense. It ensures that the information we read, in the form of a SILParserResult , can be used to create a SILInstruction.

"Emit" is the last operation. The "EmitSIL" class does the final step. It turns the valid information we have into a SILInstruction. A SILInstruction must be able to be created given the information provided to this visitor.

As you can see, these steps all fit together sequentially and communicate by passing around information in the form of a SILParserResult . This allows for maximum separation of concerns both at an "operation" level and at an instruction level.

FYI. Lets let this percolate a bit for people to look at. Lets give it a week or two.