[RFC] Changing the codegen of _ = $VAR to allow parity between SIL/Sema Diagnostics

Michael_Gottesman · December 10, 2024, 9:18pm

Just wanted to get some thoughts/feedback if any one has any objections upon this. TLDR: I think we should at least codegen a begin_borrow [var_decl] for this case to provide a SILLocation/use for diagnostics to enable Sema/SIL diagnostics to have parity and enable SIL diagnostics around this form of use.

Today when we codegen the following:

class Klass {}
func test(_ x: Klass)  {
   _ = x
}

we do not get any output from SILGen:

// test(_:)
sil hidden [ossa] @$s5test24testyyAA5KlassCF : $@convention(thin) (@guaranteed Klass) -> () {
// %0 "x"                                         // user: %1
bb0(%0 : @guaranteed $Klass):
  debug_value %0 : $Klass, let, name "x", argno 1 // id: %1
  %2 = tuple ()                                   // user: %3
  return %2 : $()                                 // id: %3
} // end sil function '$s5test24testyyAA5KlassCF'

This means that one cannot emit diagnostics since we do not even have any instructions with a SILLocation for _ = x.

In the past when only working with SIL diagnostics (e.x.: when I was bringing up the noncopyable type checkers), this was not an issue since we took the position that _ = x was just a way to provide a use to quiet an unused variable diagnostic from Sema but it did not signify codegen.

That is IMO defensible for noncopyable types that are purely a SIL based diagnostic... but it clashes and creates special rules when dealing with concurrency diagnostics/semantics that are partially Sema based and partially SIL based. Despite SIL not being able to emit diagnostics in such a case, using Sema information we have in the past and we do today. As an example:

func testNonSendableCaptures(ns: NotSendable, a: isolated MyActor) {
  Task {
    _ = a
    _ = ns
  }

  Task { [a] in // expected-tns-warning {{passing closure as a 'sending' parameter risks causing data races between 'a'-isolated code and concurrent execution of the closure}}
    _ = a
    _ = ns // expected-tns-note {{closure captures 'a'-isolated 'ns'}}
  }
}

Aside: The above example was supported before RBI was implemented in Sema and in RBI we today use Sema information to emit this via a layering violation which I want to eliminate. I had to use that information due to the lack of codegen in this case.

in the above example by using Sema information, we can actually emit a diagnostic since ns is viewed as a true capture of the closure passed to the Task initializer. By moving away from using Sema information here and using a pure SIL soluation, we cannot handle this test case since there is literally nothing for SIL to process.

So we would have to introduce a special case that concurrency rules do not apply if one captures a value using _ = ns. This is an active contradiction from other concurrency behavior where (as an example), if we capture self using _ = self will result in a semantic change. As an example:

actor MyActor {
  func captureSelf() {
    // Disconnected, not isolated.
    let _ = { }
    // Isolated to self since we capture self... even though self is used as a
    // '_ = self'.
    let _ = { _ = self }
  }  
}

Notice how in the emitted SIL, despite the position that _ = self does not have real codegen effects, it does change codegen since our capture of self makes the second closure actor isolated to self:

// closure #1 in MyActor.captureSelf()                                                                                    
// Isolation: nonisolated                                                                                                 
sil private [ossa] @$s4test7MyActorC11captureSelfyyFyycfU_ : $@convention(thin) () -> () {
bb0:                                                         
  %0 = tuple ()                                   // user: %1                                                               return %0                                       // id: %1  
} // end sil function '$s4test7MyActorC11captureSelfyyFyycfU_'                                                            
                                                                                                                          
// closure #2 in MyActor.captureSelf()                                                                                    
// Isolation: actor_instance. name: 'self'                 
sil private [ossa] @$s4test7MyActorC11captureSelfyyFyycfU0_ : $@convention(thin) (@sil_isolated @guaranteed MyActor) -> ()
 {                                                                                                                        
// %0 "self"                                      // user: %1                                                             
bb0(%0 : @closureCapture @guaranteed $MyActor):                                                                          
  debug_value %0, let, name "self", argno 1       // id: %1                                                               
  %2 = tuple ()                                   // user: %3 
  return %2                                       // id: %3
} // end sil function '$s4test7MyActorC11captureSelfyyFyycfU0_'

This to me is a clear contradiction of the stated position that _ = x does not have a true semantic effect on the program and can be ignored.

With that in mind, I propose that we change the codegen here to make it so that we actually do codegen a variable here. It would just involve emitting a begin_borrow [var_decl] in such a case. The begin_borrow [lexical] would provide an instruction to place the self location upon. It would also preserve the current behavior that with noncopyable types _ = value is not a consuming operation. This would ensure that SIL level optimizations have something to work with. As an example the above closure would look as follows:

// closure #2 in MyActor.captureSelf()                                                                                    
// Isolation: actor_instance. name: 'self'                 
sil private [ossa] @$s4test7MyActorC11captureSelfyyFyycfU0_ : $@convention(thin) (@sil_isolated @guaranteed MyActor) -> ()
 {                                                                                                                        
// %0 "self"                                      // user: %1                                                             
bb0(%0 : @closureCapture @guaranteed $MyActor):                                                                          
  debug_value %0, let, name "self", argno 1       // id: %1       
  %1 = begin_borrow [var_decl] %0 : $MyActor
  end_borrow %1 : $MyActor
  %2 = tuple ()                                   // user: %3 
  return %2                                       // id: %3
} // end sil function '$s4test7MyActorC11captureSelfyyFyycfU0_'

NOTE: I am just using begin_borrow here b/c we already have a var_decl bit on it. One could imagine adding another instruction that just marked the value.

Andrew_Trick · December 11, 2024, 12:08am

So, basically you want different semantics at the diagnostic level for:

func test(_ x: Klass)  {
   _ = x
}

vs.

func test(_ x: Klass)  {}

I don't know if that makes sense from a language perspective. Let's see what others say.

As far as the SIL representation:

The function argument is the "var_decl". So it wouldn't make sense to introduce another var_decl. begin_borrow [var_decl] is for local variables that have been promoted to SSA (so don't have an alloc_box (var_decl).

I'm not sure why you can't simply diagnose the argument itself. But if the point is to distunguish the zero-use case from the one-or-more-use case, then you want some placeholder to indicate the use. It sounds like you want a placeholder that side-steps the intruction deleter during diagnostics, then gets deleted before canonical SIL (or even before mandatory inlining).

I recently used extend_lifetime for this purpose and abused the MoveOnlyWrappedTypeEliminator for "diagnostic lowering" of such things (we could convert that into a proper post-diagnostic lowering pass).

Michael_Gottesman · December 11, 2024, 12:22am

Andrew_Trick:

So, basically you want different semantics at the diagnostic level for:
func test(_ x: Klass)  {
   _ = x
}
vs.
func test(_ x: Klass)  {}
I don't know if that makes sense from a language perspective. Let's see what others say.

I think you may have misunderstood the larger point. The larger point is that today there isn't any SIL instruction associated with _ = x... so if one ever wants to emit a SIL diagnostic upon that specific line... there is no way to do that. If there weren't any interesting diagnostics that needed that... I think our current model would be fine. But as I showed above with the Task example there is an interesting example where we could emit a diagnostic in Sema but cannot emit one at the SIL level... which would be a regression if I were to rip out that code which I am currently trying to do so I can eliminate a layering violation in RBI's closure handling.

The new var_decl is for the '_', not the argument. The begin_borrow [var_decl] is not just about alloc_box [var_decl] that have been promoted to SSA. It also can be used directly by the frontend to signal a new SSA variable's lifetime has started. One can git grep around SILGen and see several instances where this occurs.

The problem with just using the argument is that has the wrong source location. Generally the SILFunctionArgument will have the VarDecl associated with the actual in source VarDecl. As an example in the Task example, the source location associated with the function parameter is actually going to be ns, not the actual use site of ns.

Michael_Gottesman · December 13, 2024, 1:03am

Just to follow up. I spoke with @John_McCall @Joe_Groff offline and they were ok with this approach.

Their main suggestion was to just add an ignored_use instruction rather than a begin_borrow [var_decl] which sounded great to me!

Andrew_Trick · December 13, 2024, 10:55pm

That was exactly my suggestion above, except that I've used extend_lifetime rather than adding a new instruction.