Understanding strange @inlinable and @inline(never) behavior

i’ve noticed some pretty strange performance behavior with the @inlinable and @inline(_:) attributes. namely, i have a function which accounts for a majority of the benchmark run time, with the following signature:

func unpack<Color>(as _:Color.Type) -> [Color] where Color:PNG.Color

this function is just a wrapper function that calls a non-generic member function required by the PNG.Color protocol.

i have a copy of the benchmarking code inside the module (PNG) that this function lives, and a copy in a separate target which calls this function from across the module boundary.

as expected, the benchmarks run much, much slower when they call this function from outside the PNG module:

from outside PNG module: 67.768 ms
from inside PNG module:   3.968 ms

so i tried to fix this by adding an @inlinable attribute:

func unpack<Color>(as _:Color.Type) -> [Color] where Color:PNG.Color

but that didn’t help at all:

from outside PNG module: 71.931 ms
from inside PNG module:   3.960 ms

so i tried to isolate the issue by adding an @inline(never) attribute on top of the @inlinable attribute:

@inlinable @inline(never)
func unpack<Color>(as _:Color.Type) -> [Color] where Color:PNG.Color

incredibly, this solved the issue:

from outside PNG module: 4.114 ms
from inside PNG module:  4.241 ms

now i’m wondering why @inline(never) even has such a drastic impact on performance. if the function body and specializations are already available to the compiler, shouldn’t the compiler be able to make the decision to inline (into the benchmarking code) on its own?

i suspect the compiler’s default behavior is to inline the body of the unpack(as:) function (which contains another generic function call) into the benchmarking code, which undoes the effect of the @inlinable attribute, since it just turns back into another generic call across a module boundary. if that’s the case, why would the compiler inline the body without replacing the inner generic call with a specialized function call?

1 Like

Your theory is certainly possible, but it’s hard to see why the compiler would specialize in one code path but not the other. This is probably best answered by looking at the generated binary to work out what the difference is.

It does seem like in the first case the compiler has elected not to specialize the call, for some reason, while in the second case it has.

These sorts of things always smell like phase ordering issues to me. One optimization exposes the opportunity for another but because they’re run in the opposite order it doesn’t go into effect.

That’s just a hunch though, to truly confirm you’d likely need to dump SIL after each optimizer step.

Just re-ran my benchmarks using the latest nightly toolchain, and it appears this problem has been fixed. thanks to whoever fixed it!

1 Like