Improving the Debug Output of the Type Inference Algorithm - An Update

amritpan · September 26, 2022, 10:21pm

Hi everyone!

This summer I worked on improving the debug output of the type inference algorithm with Pavel Yaskevich (@xedin) as my mentor as part of GSoC '22. It has been a tremendous experience figuring out how various expressions are evaluated by the Swift type checker and understanding the debug output, something that seemed elusive until these last few weeks!

I also want to thank @xedin for being a highly attentive and supportive mentor! Between the daunting subject matter and figuring out the code, there were many occasions where I was completely out of my depth. But thanks to our weekly discussions and reviews, I was able to repeatedly explore concepts that seemed hazy or challenging until they became accessible in the context of this project!

The new output now prints additional information about the constraint solver steps, includes previously missing information, and has been lightly reformatted. Here are some of the notable changes that we made to the type inference algorithm debug output.

Type Variables

The initial type variable printing has been modified to add specificity and coherance. Previously, the initial type variables for an invalid expression such as 1 + “a” would print as:

$T0 [lvalue allowed] [noescape allowed] delayed bindings={} @ locator@0x1258ee200 [OverloadedDeclRef@/…]
$T1 [noescape allowed] delayed literal=3 bindings={} @ locator@0x1258f0ac0 [IntegerLiteral@/…]
$T2 [noescape allowed] delayed literal=3 bindings={} @ locator@0x1258f0b78 [StringLiteral@/…]
$T3 [noescape allowed] delayed bindings={} @ locator@0x1258f0c40 [Binary@/… -> function result]

In this output, it was not difficult to see that this simple expression had been broken up and assigned type variables in the following way: $T0 for the + operator, $T1 for 1, $T2 for “a”, and $T3 for the result type of this function. However, the binding options, attributes, and potential binding information that follows these type variables were difficult to follow, and in some cases, entirely missing.

With the new output, these type variables will print binding options and attributes as clearly labeled groups. Potential bindings will print both the direct bindings (e.g., subtype of Int) and default bindings (e.g., inferred from an ExpressibleBy protocol), as well as <empty> when bindings are unavailable. The above output now prints as the following improved version:

$T0 [allows bindings to: lvalue, noescape] [attributes: delayed] [with possible bindings: <empty>]) @ locator@0x13ca3e400 [OverloadedDeclRef@/…]
$T1 [allows bindings to: noescape] ($T1 [attributes: delayed, [literal: integer]] [with possible bindings: (default type of literal) Int]) @ locator@0x13ca3f398 [IntegerLiteral@/…]
$T2 [allows bindings to: noescape] [attributes: delayed, [literal: string]] [with possible bindings: (default type of literal) String]) @ locator@0x13ca42e68 [StringLiteral@/…]
$T3 [allows bindings to: noescape] [attributes: delayed] [with possible bindings: <empty>]) @ locator@0x13ca42f30 [Binary@/… -> function result]

Specifying exact literal conformances is useful when these type variables appear later in the output with updated potential bindings. For example, for the invalid expression, 1 + “a”, the solver will evaluate $T2 (“a”) as of type String, and then attempt to solve $T1 (1) as also of type String using the operator overload of (String, String) -> String) and produce this potential binding:

($T1 [attributes: [literal: integer]] [with possible bindings: (subtypes of) String, (default type of literal) Int])

This binding then fails the subsequent type variable binding attempt of $T1 := String because $T1 is an Int, has an ExpressibleByIntegerLiteral conformance constraint requirement, and String and Int do not inherit type information from each other. However, this potential binding shows a binding state at this point in the solver path. Here, $T1 has a possible binding of Int via literal inference and of a subtype of String via the operator overload. The failure following this potential (and conflicting) binding is easier to follow as a result.

Simplification Process

To make it even easier to follow the constraint solver’s inference process, the new output also prints out the constraint simplification process. Any new constraints added or removed during a constraint solver attempt are printed out sequentially followed by its simplification result and failed constraints, where applicable.

Previously, the output would print a solver step attempt, possibly some updated type variable bindings, and a found solution. The simplification process that occurred during this solver step was entirely missing. For example, when solver attempted to evaluate the + operator in 1 + “a” as an overload of (String, String) -> String, the previous output would show:

  (attempting disjunction choice $T0 bound to decl Swift.(file).String extension.+ : (String.Type) -> (String, String) -> String [[locator@0x13a201600 [OverloadedDeclRef...]]];
    ($T1 literal=3 bindings={(subtypes of) String})
    ($T2 literal=3 bindings={(subtypes of) String})
    Initial bindings: $T1 := String, $T1 := Int
    ...
    (attempting type variable $T2 := String
      (failed constraint $T5 literal conforms to ExpressibleByIntegerLiteral [[locator@0x13d206360 [Binary...]]];)
    )
    (attempting type variable $T1 := String
      (failed constraint $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x13d2062c0 [Binary...]]];)
    )
    (attempting type variable $T1 := Int
      (failed constraint $T1 operator arg conv String [[locator@0x13d2062c0 [Binary...]]];)
    )
    (attempting type variable $T1 := Double
      (failed constraint $T1 operator arg conv String [[locator@0x13d2062c0 [Binary...]]];)
    )
  )

This output shows the failed constraints for each attempt but one of the failing constraints was added during simplification of the original disjunction choice, a process entirely hidden here.

In the new output, we can follow the simplification process to see how the initial constraints and new simpler constraints generated at each scope are added to conclude that while $T2 (“a”) can be a String type, $T1 (1) cannot. Let's take a look at these steps.

The relevant initial constraints:

$T1 literal conforms to ExpressibleByIntegerLiteral
$T2 literal conforms to ExpressibleByUnicodeScalarLiteral

This is followed by the disjunction choice of (String, String) -> String as the overload for +. In this simplification result, we see that the constraint (_const $T1, _const $T2) -> $T3 applicable fn $T0 is broken up into 2 simplified constraints ($T1/$T2 operator arg conv String) which are added to the constraint system. The original constraint is then removed and the outcome is marked simplified.

  (attempting disjunction choice $T0 bound to decl Swift.(file).String extension.+ : (String.Type) -> (String, String) -> String [[locator@0x13d201600 [OverloadedDeclRef...]]];
    ...
    (considering -> (_const $T1, _const $T2) -> $T3 applicable fn $T0 [[locator@0x13d87e9f0 [Binary@/... -> apply function]]];
      (simplification result:
        (added constraint: $T1 operator arg conv String [[locator@0x13d2062c0 [Binary...]]];)
        (added constraint: $T2 operator arg conv String [[locator@0x13d206360 [Binary...]]];)
        (removed constraint: (_const $T1, _const $T2) -> $T3 applicable fn $T0 [[locator@0x13d2061f0 [Binary...]]];)
      )
      (outcome: simplified)
    )
  )

Next, the solver attempts to evaluate $T2 (“a”) as a String type. In addition to the newly simplified constraint added above ($T2 operator arg conv String), the solver considers the inherited initial constraint of $T2 literal conforms to ExpressibleByUnicodeScalarLiteral. $T2 is then bound to a concrete String type and both of these constraints are removed from the constraint system.

(attempting type variable $T2 := String
  (considering -> $T2 literal conforms to ExpressibleByUnicodeScalarLiteral [[locator@0x137a20c68 [StringLiteral...]]];
    (simplification result:
      (removed constraint: $T2 literal conforms to ExpressibleByUnicodeScalarLiteral [[locator@0x137a20c68 [StringLiteral...]]];)
    )
    (outcome: simplified)
   )
   (considering -> $T2 operator arg conv String [[locator@0x137a20f60 [Binary...]]];
     (simplification result:
       (removed constraint: $T2 operator arg conv String [[locator@0x137a20f60 [Binary...-> apply argument -> comparing call argument #1 to parameter #1]]];)
     )
     (outcome: simplified)
     )
     ...
)

However, when the solver attempts to solve for a concrete type for $T1 (1) next, we get a different simplification result. First, $T1 has updated potential bindings, which now includes String via the disjunction choice of (String, String) -> String:

($T1 [attributes: [literal: integer]] [with possible bindings: (subtypes of) String, (default type of literal) Int])

Then the solver attempts to match $T1 as a String, Int, or Double type (because Double also conforms to the ExpressibleByIntegerLiteral protocol) with the constraint requirements of the inherited $T1 literal conforms to ExpressibleByIntegerLiteral and the simplified constraint of $T1 operator arg conv String.

  (attempting type variable $T1 := String
  ...
    (considering -> $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x137a1d198 [IntegerLiteral...]]];
      (simplification result:
        (removed constraint: $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x137a1d198 [IntegerLiteral...]]];)
        (failed constraint $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x137a1d198 [IntegerLiteral...]]];)
      )
        (outcome: error)
      )
    )
  )
  (attempting type variable $T1 := Int
    (considering -> $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];
      (simplification result:
        (removed constraint: $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];)
        (failed constraint $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];)
      )
        (outcome: error)
      )
    )
  )
  (attempting type variable $T1 := Double
   ...
     (considering -> $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x137a1d198 [IntegerLiteral...]]];
       (simplification result:
         (removed constraint: $T1 literal conforms to ExpressibleByIntegerLiteral [[locator@0x137a1d198 [IntegerLiteral...]]];)
       )
         (outcome: simplified)
       )
     )
     (considering -> $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];
        (simplification result:
          (removed constraint: $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];)
          (failed constraint $T1 operator arg conv String [[locator@0x137a20ec0 [Binary...]]];)
        )
          (outcome: error)
        )
      )
    )

Each of these ultimately fail to find a concrete type for $T1 that also matches the overload choice and the solver exits this scope to attempt the next disjunction choice that could be an overload for the + operator in 1 + “a”.

When I first started this project and looked at the outputs of various expressions, I found myself wondering why the solver takes particular steps. The output did not clearly show this and sometimes the next steps taken by the solver seemed arbitrary. Printing out the constraint simplification process and its result help demystify the deductive reasoning that drives the solver’s next steps.

Solver Scope Changes

In addition to simplification printing, the new output now prints all changes made during each solver scope. These changes include newly created and assigned type variables, type variables with new bindings, type variables that are extended an equivalence via an existing type variable, constraints that are added or removed, and new potential bindings that are a result of that solver scope.

Let's revisit our example expression of 1 + “a”. At one point, the solver attempts an + operator overload of the AdditiveArithmetic protocol ($T0:= <Self where Self : AdditiveArithmetic> (Self.Type) -> (Self, Self) -> Self) and then tries to evaluate 1 ($T1), “a” ($T2), and the result of this addition ($T3) as of type Self. Previously, the output would print out initial and upcoming bindings for a disjunction choice or a type variable binding attempt, like so:

(attempting disjunction choice $T0 bound to decl Swift.(file).AdditiveArithmetic.+ : <Self where Self : AdditiveArithmetic> (Self.Type) -> (Self, Self) -> Self [[locator@0x137a1c200 [OverloadedDeclRef...]]];
  (overload set choice binding $T0 := ($T4, $T4) -> $T4)
  ($T1 bindings={(subtypes of) Int})
  ($T2 literal=3 bindings={(subtypes of) Int})
  Initial bindings: $T1 := Int
  (attempting type variable $T1 := Int
  ...

Here, $T4 seems to inexplicably appear and it is not immediately obvious how $T4 relates to $T1 and $T2. The bindings for $T1 and $T2 print but it is not clear if these bindings were those added during this scope or if they are in addition to pre-existing bindings. So to explicitly show all of the changes that occur during this disjunction choice simplification, the new output prints them out, like so:

  (attempting disjunction choice $T0 bound to decl Swift.(file).AdditiveArithmetic.+ : <Self where Self : AdditiveArithmetic> (Self.Type) -> (Self, Self) -> Self [[locator@0x137a1c200 [OverloadedDeclRef...]]];
    ...
    (Changes:
      (Newly Bound: 
        > $T0 := ($T4, $T4) -> $T4
      )
      (New Type Variable: 
        > $T4
      )
      (New Equivalence: 
        > $T3
      )
      (Added Constraints: 
        > $T4 conforms to AdditiveArithmetic [[locator@0x137a21bc8 [OverloadedDeclRef... -> opened generic -> type parameter requirement #0 (conformance)]]];
        > $T1 transitive conformance to AdditiveArithmetic [[locator@0x137a21bc8 [OverloadedDeclRef... -> opened generic -> type parameter requirement #0 (conformance)]]];
        > $T1 operator arg conv $T4 [[locator@0x137a21bc8 [OverloadedDeclRef.. -> apply argument -> comparing call argument #0 to parameter #0]]];
        > $T2 transitive conformance to AdditiveArithmetic [[locator@0x137a21bc8 [OverloadedDeclRef... -> opened generic -> type parameter requirement #0 (conformance)]]];
        > $T2 operator arg conv $T4 [[locator@0x137a20f60 [Binary... -> apply argument -> comparing call argument #1 to parameter #1]]];
      )
      (Removed Constraint: 
        > (_const $T1, _const $T2) -> $T3 applicable fn $T0 [[locator@0x137a20df0 [Binary... -> apply function]]];
      )
    )
    (Potential Binding(s): 
      ($T1 [attributes: [literal: integer]] [involves_type_vars: $T3] [with possible bindings: (default type of literal) Int])
      ($T2 [attributes: [literal: string]] [involves_type_vars: $T3] [with possible bindings: (default type of literal) String])
      (attempting type variable $T1 := Int

Now the output shows that during this solver step, $T0 receives a binding of the disjunction choice and $T4 is a new type variable created in this scope and assigned to type Self. Then, new constraints are added to convert $T4 to $T1 and $T2 ($T1/$T2 operator arg conv $T4) and through these conversions, $T1 and $T2 gain transitive conformance to AdditiveArithmetic protocol. The function result type $T3 also gains an equivalence class to $T4 (Self). In Potential Bindings, we see updated bindings for $T1 and $T2 that resulted from these solver scope changes.

At this point, the solver will attempt to solve $T1, $T2, and $T3 as all Int types, all String types, or all Double types which will fail because $T1 and $T2 are not the same type. Upon this failure to find a suitable type for $T1, $T2, and $T3, the solver will recognize that 1 + “a” does not conform to the AdditiveArithmetic protocol, rewind all of its scope changes back to its initial disjunction attempt of $T0:= <Self where Self : AdditiveArithmetic> (Self.Type) -> (Self, Self) -> Self), and move onto the next disjunction attempt for $T0.

Printing these changes presents a snapshot of what is happening during each disjunction, conjunction, and type variable attempt instead of leaving this to your imagination or to be sorted out via other existing clues.

Other Changes

Scores and found solutions have also been updated to print both the reason for increasing a score and the local value by which the score increased:

(found solution: [component: applied fix(s), value: 2])

New type variables assigned to generic parameters in disjunction choices or as part of opened types are now printed along with their assigned generic parameter. For example, in the following disjunction choice where $T19 was assigned to Self, the overload set choice binding will state this new assignment with [Self := $T19]:

(attempting disjunction choice $T0 bound to decl Swift.(file).AdditiveArithmetic.+ : <Self where Self : AdditiveArithmetic> (Self.Type) -> (Self, Self) -> Self [[locator@0x137323200 [OverloadedDeclRef...]]];
  (overload set choice binding $T0 := ($T19, $T19) -> $T19) [Self := $T19])

This is also useful in Solutions with Opened Types, where previously these types would print only their DeBruin index notation (τ_i_i) and the generic parameter’s type variable. These will now print with both the generic parameter being referenced by the index and the concrete type found for the type variable, like so:

Opened types:
  locator@0x137323200 [OverloadedDeclRef@/…] opens 'Self' (τ_0_0) -> Int [from $T19]

And finally, a general reformatting of the output is currently underway that will replace the usage of () to group information with a more legible presentation that makes the solver scopes easier to follow. Below is a work-in-progress example:

  | Potential Binding(s): 
  `- $T1 [attributes: delayed, [literal: integer]] [with possible bindings: (default type of literal) Int]
  `- ...
  {
    !> Attempting disjunction choice $T0 bound to decl Swift.(file).String extension.+ : (String.Type) -> (String, String) -> String [[locator@0x13f859e00 [OverloadedDeclRef...]]];
    |  Overload set choice binding $T0 := (String, String) -> String
    |
    `-> Considering -> (_const $T1, _const $T2) -> $T3 applicable fn $T0 [[locator@0x13e81f5f0 [Binary...]]];
    |  `- Simplification result:
    |    `- Added constraint: $T1 operator arg conv String [[locator@0x13e81f6c0 [Binary... -> apply argument -> comparing call argument #0 to parameter #0]]];)
    |    `- ...
    |  `-- Outcome: simplified
    |
    |  * Changes:
    |  | Newly Bound: 
    |  |  > $T0 := (String, String) -> String
    |  |  > $T3 := String
    |  | ...
    |
  }

hborla · November 15, 2022, 2:44am

This is really awesome work! I can't tell you how many times I've had to step through the constraint system code or add extra logging just to see new constraints or type variables that are added during simplification. I rely a lot on -debug-constraints, so I'm excited to see improvements in this area!

One place where I find myself still needing extra information is during constraint generation. Currently, constraint simplification that happens during constraint generation is totally opaque in -debug-constraints, so I often find myself stepping through the code just to see what constraints are added and subsequently simplified all within CSGen. This is especially apparent now that more and more constraint generation is happening lazily, e.g. using the Conjunction infrastructure.

Have you thought about whether the initial constraint system state output is still useful, or whether it should always print added type variables and constraints as they happen during constraint generation? Perhaps the output could still print out the full constraint system state after constraint generation is done.