A large part of libFunction uses Foo.self == ....self (to use type as an argument), not to actually use Foo instances itself, maybe this makes it more apt to @_specialize? Though you seem to imply that @inlinable encompasses @_specialize, so it is indeed quite baffling. libFunction looks something like this:
It's possible that the "inlined" function was only specialized on some of the generic parameters, but not the ones that you have explicit checks for.
I think you're doing the work of @_specialize manually. If you see a speedup with @_specialize on the types that you already check explicitly, then it's probably because @_specialize is a more efficient implementation of those checks. When the code is inlined and specialized on other parameters, you might lose that efficiency.
I'm just guessing though. In general, if @inlinable slows the code down I think it's worth a filing bug.
You might be able to use lldb to look at the symbols that were emitted in the user binary to see what happened.
To be clear, @inlinable results in speedup compared to unannotated functions (~30x), only that @_specialize is even better (extra 2-3x). If that's what you meant, I'll file a bug report some time later.
Yes, that's what I meant. I'm mainly curious to know if my hypothesis about the speedup is correct, but can't debug it now. It would also be good to have a strategy for giving you the benefits of using both annotations.
I suspect that means @_specialize, which happens within the module, has access to some internal information that @inlinable is unaware of, since it happens outside the module.
When this happens to me, I start profiling and usually find an unspecialized method in the offending call stack that is being called by the method I’m worried about, but isn’t the method itself. At that point, adding @inlinable to that method to open it up for the client module to see usually solves the problem (drilling down as many layers as necessary). Unless I’m using library evolution mode and care about ABI stability, @inlinable has always been faster in the end once it’s done right.
Cross‐module optimization will probably make the whole problem go away, as the implementation in master already essentially applies @inlinable automatically to all generic methods.
Turns out this is what happened. There are a few protocol conformances that I forgot to inline. With each additional inline, the perf gets better, until it reaches parity with in-module @_specialize. It makes sense since those aren't available to the user-module. So I won't be filing a bug.
There's one thing still, that when I annotate libFunction with both @inlinable and @_specialize, the compiler seems to use @inlinable to optimise, despite the @_specialize being a full specialisation that exactly matches the generic, but it is a separate issue. I'm not sure if I should file this one, or if it is a bug to begin with.