Ideas for debugging a bad memory access at runtime

I'm trying to get some guidance/ideas/suggestions on a bad runtime memory access that I have encountered.

I have a SIL function x, which returns a closure y, whose signature looks as below -

sil private @y : $@convention(thin) (Double, @owned @callee_guaranteed (Double) -> @out SomeStruct1, @owned @callee_guaranteed (Double) -> (Double, Double)) -> @out SomeStruct1

x partially applies the 2nd and 3rd arguments of y and does so using the
[callee_guaranteed] ownership attribute.

partial_apply [callee_guaranteed] %60(%55, %59) : $@convention(thin) (Double, @owned @callee_guaranteed (Double) -> @out SomeStruct1, @owned @callee_guaranteed (Double) -> (Double, Double)) -> @out SomeStruct1

At the LLVM IR level, the corresponding partial-apply forwarder for y copies the closed over args because y takes the 2nd and 3rd args as owned parameters. But while copying (or retaining) the 3rd arg in the forwarder, I get a bad memory access.

define internal swiftcc void @"$partial_apply_forwarder_y"(ptr noalias nocapture sret(%swift.opaque) %0, double %1, ptr nocapture readonly swiftself %2, ptr %3) #1 {
entry:
  %4 = getelementptr inbounds <{ %swift.refcounted, %swift.function, %swift.function }>, ptr %2, i64 0, i32 1
  %.fn.load = load ptr, ptr %4, align 8
  %.data = getelementptr inbounds <{ %swift.refcounted, %swift.function, %swift.function }>, ptr %2, i64 0, i32 1, i32 1
  %5 = load ptr, ptr %.data, align 8
  %6 = getelementptr inbounds <{ %swift.refcounted, %swift.function, %swift.function }>, ptr %2, i64 0, i32 2
  %.fn1.load = load ptr, ptr %6, align 8
  %.data2 = getelementptr inbounds <{ %swift.refcounted, %swift.function, %swift.function }>, ptr %2, i64 0, i32 2, i32 1
  %7 = load ptr, ptr %.data2, align 8
  
  // COPY OF THE 2nd ARG
  %8 = tail call ptr @swift_retain(ptr returned %7) #45
  
  // COPY OF THE 3rd ARG
  // THIS IS WHERE I GET A BAD MEMORY ACCESS!
  %9 = tail call ptr @swift_retain(ptr returned %5) #45
 
 // CALL Y HERE 
  tail call swiftcc void %3(ptr noalias nocapture sret(%swift.opaque) %0, double %1, ptr %.fn.load, ptr %5, ptr %.fn1.load, ptr %7)
  ret void
}

The issue only happens in optimized builds and the forwarder is in reality exercised multiple times before the crash happens since it's a merged forwarder. The exact code-path where the crash happens is exercised multiple times as well (at least that's what I've deduced from the stack traces of the crash).

I am looking for some suggestions on how I can debug this issue. Since the issue only happens in optimized builds I have a mild hunch that some optimization pass is clobbering one of the registers which eventually leads to a bad memory access. So may be I could try and turn off, lets say, register allocation? Are there any other things I could try here?

If you haven't tried it yet, the first thing I would reach for would be to run your build with -sil-verify-all. That will run the full SIL verifier after every optimization pass, which is good at catching many classes of broken memory invariants. If that doesn't immediately cause the compiler to catch the issue, then you can also use -Xllvm -sil-print-function=y to print the SIL of y after every pass in order to look over how it gets changed by each pass, and -Xllvm sil-opt-pass-count=N to stop the optimization pipeline after N passes. The latter is useful for bisecting the optimization pipeline; you can increase and decrease the number of passes until you find the exact pass number that introduces the crash.

There are a few other related command-line options for controlling the optimization pipeline described in the "Debugging the Compiler" guide: https://github.com/apple/swift/blob/main/docs/DebuggingTheCompiler.md#debugging-on-sil-level

6 Likes