SILBuilder ergonomics for improved debug scope handling

From the review of: SILBuilder ergonomics for improved debug scope handling #16355
Author: @vedantk

I'm responding in the forum because I'd like to have Vedant and Adrian's comments recorded for posterity.

@Adrian_Prantl says:

I don't think that there is a case where an instruction's scope should ever be changed without also changing its location. For all practical purposes scope and location form an inseparable unit. (The one exception is inlining, because the scope happens to be the data structure where the inline history of an instruction is stored, but that's almost an implementation detail).

Good. That was my original understanding until I once asked about the purpose of SILBuilderWithScope and was told something along the lines that we can't break the debugger by jumping between scopes.

If instructions are simply supposed to propagate their location info, that's much more straightforward. Now I really don't understand the purpose of SILBuilderWithScope, If scope and location are tied together, why is one enforced during construction and another independently passed per instruction. Why don't we derive the scope from the debug location. Wouldn't that have avoided all our inconsistent debug info problems?

It would help to enumerate the use cases and provide a strawman API even if this PR doesn't implement everything. Caveat: this is off the top of my head, so I'm sure we'll see the reality is more complex once we look at replacing actual use cases.

Case #1: Polymorphism, and cases we haven't thought of

A noninstantiable base class can exist for polymorphic use cases and future extension:

class SILBuilderBase {
protected:
  SILBuilderBase();
public:
  virtual createSomeSILInstr(WithOperands);
}

Case #2: Replacement

I'd say the vastly predominate case is instruction replacement. That should be the simplest and strictest:

/// This corresponds to a single insert point the inherits scope+location from /// an existing instruction.
class SILBuilder final: public SILBuilderBase {
  // Insert instructions here and inherit debug location from this position.
  SILBuilder(SILInstruction *atPosition);
}

Case #3: Code Motion

A secondary, but important use case is code motion, where we separate debug info from insertion point. I see a couple options:

a. Provide another constructor:

  // Insert instructions at this position with the given debug location
  // *and scope*, which should almost always be derived from location.
  // Use this for code motion.
  SILBuilder(SILInstruction *atPosition, SILLocation withLoc);

What's wrong with keeping it simple?

b. Use a more specific class name to force people to think harder about it.

// Use this for code motion.
class SILBuilderWithLocation final: SILBuilderBase {
  // Insert instructions at this position with the given debug location
  // *and scope*, which should almost always be derived from location.
  SILBuilder(SILInstruction *atPosition, SILLocation withLoc);
};

Adding another class name doesn't seem very beneficial to me. Either way, I don't think it would be hard for developers to pick the right API for the use case as long as the rules are simple and the API is clearly commented. The underlying issue causing current confusion seems to be the separate handling of debug scope vs. location.

Now, with the simple APIs above, can we eliminate the location argument from all the createInstruction entry points? It really doesn't belong there.

Case #4: Inlining

It seems easy enough provide more flexible class where scope and location aren't tied together.

SILBuilderWithDebugScope final: SILBuilderBase {
  SILBuilder(SILInstruction *atPosition, SILLocation withLoc,
             SILDebugScope withScope);
}

As in Case #3, I would be fine with that being just another constructor on plain old SILBuilder.

Case #5: SILGen

SILGen already has its own builder. So why is this a case? Can't it simply construct a SILBuilderWithDebugScope whenever it creates an instruction?

Case #6: Code no-one wants to touch yet

I'd be fine with a DeprecatedSILBuilder initially while we cleanup passes.

I personally agree with your opinions on #2/#3, Andy. I don't have opinions about #1/#5/#6, but I think they're largely less important scenarios.

Also, I would like to point out that in order to make code motion hard to get wrong we should find a way to force people to create a new object when moving. In particular, I'm getting more and more convinced that setInsertPoint() shouldn't be a thing anymore.

The reason why I'm stressing this out is because we have been witnessing many bugs where we reuse a SILBuilder passing it by reference around, and therefore inheriting a wrong debug scope. Several examples are:

https://github.com/apple/swift/commit/4bfab82eda5c5a6359b8cbbc9720c4550daa073e
https://github.com/apple/swift/commit/3d9eb3d7e556b958dc06e8b17aabee8e1182e40f
https://github.com/apple/swift/commit/1a153d410ab67d93aa6824afd8b7b99799c1765d
https://github.com/apple/swift/commit/e8e90df9b4be31a52e2659b89a98095bef872684
https://github.com/apple/swift/commit/14d3779e2960bc028a1081946e5e639b300a860d

Here is my current thinking after talking to @Adrian_Prantl, @vedantk, and @dcci .

From the point of view of the compiler, there should be one concept of a debug location that includes both the debug scope and line info. I'll call that SILDebugLocation. (Adrian already added that type, we just need to convert all our APIs to start using it).

There are various inliner-like passes that need to modify the scope without changing the line info. There should be a utility for doing that independent from the SILBuilder. In general it's easy to build pass-specific builders that do extra book-keeping.

We're trying to solve two problems by changing SILBuilder API:

  1. When creating multiple instructions, we may forget to update the debug location.

  2. It's easy to set an entirely incorrect location or a location that is inconsistent with the scope.

Let's look at these problem in the context of a strawman proposal:

class SILBuilder {
protected:
  SILBasicBlock::iterator insertPosition;
  SILDebugLocation debugLocation;
public:
  // Basic constructor.
  SILBuilder(SILBasicBlock::iterator insertPosition, SILDebugLocation debugLocation);
 
  // Most common convenience constructor for inheriting SILDebugLocation.
  explicit SILBuilder(SILInstruction *insertBefore);
 
  LoadInst *createLoad(<load operands only>);
  StoreInst *createStore(<store operands only>);

protected:
  SILBuilder &operator =(const SILBuilder &);
};

This stripped down builder class makes the reuse problem (#1) easier to identify, and defines away problem #2 wherever it is used. Code that reuses a basic SILBuilder object is explicitly relinquishing control of debug info and insertion point--it is impossible to change either from this interface. It should only be reused in very limited situations in which all the instructions to be created are obviously part of the same location.

Protecting the assignment operator compensates for the fact that we don't declare it's methods const and we're not good about passing around const reference. The point is that the position/location is immutable (without downcasting).

Whenever the compiler expands, inserts, or clones instructions in the same scope, the common convenience constructor for inheriting SILDebugLocation is exactly what the code wants:

SILBuilder(SILInstruction *insertBefore);

The most common usage would be:

SILBuilder(oldInstruction).createLoad(address);

But it could also be used as an abstract interface:

// This utility generates a snippet of code for a single debug location.
// It is unable to change the insertion point or debug information.
void genNewInstruction(const SILBuilder &builder, SILValue address) {
  builder.createLoad(address);
}

// When the builder is constructed, we know we're trying to replace an instruction.
SILBuilder B(oldInstruction);
genNewInstruction(B);

In the much less common case of moving code by recreating it while preserving its old location, the basic constructor can be called directly:

SILBuilder(SILBasicBlock::iterator insertPosition, SILDebugLocation debugLocation);

The basic constructor is also useful for inserting implementation instructions that don't correspond to source expressions as "at line 0".

Now comes the question of how to handle existing code that reuses the same SILBuilder for multiple locations and updates the insertion point and debug location as it goes. That code falls into two categories:

  • Lowering SIL one scope at a time (e.g. SILGen)

  • Manually expanding or replacing multiple instructions.

For lowering, a simple wrapper around SILBuilder can provide convenience and safety:

class SILGenBuilder { 
  SILBasicBlock::iterator currentPosition;
  SILDebugScope *currentScope;
  SILLocation currentExprLocation;

  // Conceptual interface. In practice, these would be RAII wrappers
  // that maintain a stack.
  pushScope(scope);
  popScope();
  pushExpression(expr);
  popExpression();

  SILBuilder builder() {
    return SILBuilder(currentPosition, SILDebugLocation(currentScope, currentExprLocation));
  }

  LoadInst *createLoadCopy(<load operands only>) const {
    // Handle the copy...
    builder().createLoad(<...>);
  }
}

This solves another serious issue with the SILGenBuilder inheriting SILBuilder's API. We don't actually want code in SILGen accidentally calling the low-level builder, it's very important to know at the call site which API is being used. It's currently nearly impossible tell which API is being invoked at the call site making the SILGenBuilder implementation essentially unreadable.

The second category of code that arbitrarily reuses a SILBuilder to track the current insertion point and debug location is the most dangerous. If the reused SILBuilder is already part of another object's state, then there's actually no reason to reuse it.

For example, given code that currently looks like this:

void genNewInstruction(SILBuilder &builder, SILValue address) {
  builder.createLoad(address);
}
class CodeGenState {
  SILBuilder B;

  void genCode(SILValue address) {
    genNewInstruction(B, address)
    B.setInsertionPoint(somewherElse);
  }
};

The fact the the code is tracking and mutating the insertion point and debug location should be made explicit. That can be done with a convenient wrapper around the insertion point and debug location:

struct InsertionPoint {
  SILBasicBlock::iterator insertPosition;
  SILDebugLocation debugLocation;

 // For convenience.
 InsertionPoint(SILInstruction *inheritFrom);
};

void genNewInstruction(const SILBuilder &builder, SILValue address) {
  builder.createLoad(address);
}
class CodeGenState {
  InsertionPoint insertionPoint;

  void genCode(SILValue address) {
    genNewInstruction(SILBuilder(insertionPoint), address);
    insertionPoint = InsertionPoint(somewhereElse);
  }
};

If there is still enough code in the compiler that needs to manually adjust the insertion point and isn't already part of some other object's state, we could still have a mutable-insertion-point SILBuilder:

class SILBuilderPosition : SILBuilder {
public:
  // Raw insertion point and debug location.
  void setInsertionPoint(InsertionPoint &pt);

  // Inherited insertion point and debug location.
  void setInsertionPoint(SILInstruction *insertAfter);
};

Usage:

void genCode(SILBuilderPosition &buildAtPos, SILValue address) {
  buildAtPos.createLoad(address);
  buildAtPos.setInsertionPoint(somewhereElse);
}

There are two important points that the design should adhere to:

  • This "mutable position" SILBuilderPosition type cannot be the same type as the base SILBuilder because that would completely defeat the original motivation. We need to be able to isolate the code that is allowed to change the debug location and insertion point. i.e. If I'm editing a utility that that takes SILBuilder, I need to know whether I can change the location, or whether someone else might be doing it.

  • For sanity, the setInsertionPoint() semantics should be identical to just constructing a new SILBuilder with that position. Converting the code from this to a normal SILBuilder should be a purely mechanical (albeit time consuming) process.

Note that the exactly same thing could be also accomplished better by:

class SILBuilderPosition final : SILBuilder {
public:
  SILBuilderPosition &operator =(const SILBuilderPosition &);
}

void genCode(SILBuilderPosition &builAtPos, SILValue address) {
  buildAtPos.createLoad(address);
  buildAtPos = SILBuilderPosition(newInsertionPoint);
}

The first SILBuilderPosition hack might just allow slightly easier migration. However, since it's likely no one will want to rewrite the vast amount of compiler code using SILBuilder, we will need to keep around a LegacySILBuilder with the old interface anyway for a while.

Another thing to keep in mind when redesigning SILBuilder: we should remove all debug-only APIs from SIL types. i.e. SILFunction and SILInstruction should not have a getLocation() API, unless they really always have a location. The way to do that is to only access debug information through an explicit debug info API. Currently, calls to getLocation() are easily hidden behind other APIs like getDeclContext(). As a result, there are incorrect assumptions burried within the optimizer that some debug information exists. That debug information is not serialized, so this is an insidious problem that we need to define away.

1 Like

@Andrew_Trick +1!

@Andrew_Trick (belated) +1 from me as well