Optimization of local functions that capture local variables

I was working on a function and realized it would be more readable if I moved some repeated code into a local helper function. I expected the helper function to be inlined and optimized away, but testing showed that it made things 10% slower.

Here is a simplified example, where foo and bar do the same thing but bar uses a local helper function and foo does not:

func foo() -> [Int] {
  var x = Array(repeating: 0, count: 4)
  for _ in 0 ..< 100 {
    let i = Int.random(in: x.indices)
    x[i] += 1
  }
  return x
}
func bar() -> [Int] {
  var x = Array(repeating: 0, count: 4)
  func update(_ i: Int) { x[i] += 1 }
  for _ in 0 ..< 100 {
    let i = Int.random(in: x.indices)
    update(i)
  }
  return x
}

Looking on Godbolt, we see that compiling with -O optimization produces 93 lines of assembly for foo, and 230 lines for bar. That’s more than double the assembly code for something that I thought would be transparent to the compiler.

Is this a known issue, and is there any way to fix it?

4 Likes

Yes, don't capture the array inside the helper function. Pass it to the helper function as an inout parameter. This will make it equivalent to foo.

1 Like

Yeah I was just about to post this. I agree with @Nevin though — it makes sense that the compiler should be able to inline update() into foo in a way that's identical the non-helper function version.

On another note: one thing I notice is that the even though the inout version generates assembly basically identical to the non-inout version, the compiler doesn't merge and reuse those implementations.

3 Likes

Yeah, the compiler doesn't seem to inline non-escaping closures with captures very well.

Yeah, the compiled code is exactly the same, so it's really odd that it isn't deduplicated. I would have expected one of them to just jmp to the other if both are defined.

1 Like

There might be some clues in here:

I don't see how that changes anything here. The version that doesn't optimize well doesn't pass the value at all. It captures it.

The version that does pass it (as a mutable borrow aka inout) optimizes exactly the same as if there were no helper function.