@inlinable performance example

I'm trying to show what the performance impact of using @inlinable can be on some code. But when writing a small example I can't get the performance difference I'm expecting to show up. What would be a good example showing off the performance hit of not using @inlinable in a swift package?

Trying to illustrate this difference I made two swift packages one with all code in a single file and one where there was an executable target and a second target to introduce a module boundary which contains some private code. Somehow when profiling these scenarios however I end up seeing [inlined] in my stack trace in Instruments for both versions! Any ideas would be welcome! I'm trying to get a better view on how the compiler optimises certain code.

The example I tried to use:
"library code":

public func doPublicThing(values: [Double]) -> [Double] {
    var finalValues: [Double] = []
    for value in values {
        let finalValue = doInternalThing(value: value)
        finalValues.append(finalValue)
    }
    return finalValues
}

internal func doInternalThing(value: Double) -> Double {
    var value = value
    for i in 0 ..< 1000 {
        value += Double(i * 2)
    }
    return value
}

Executable target code "main.swift":

let values: [Double] = (0 ..< 1000).map(Double.init)
var results: [Double] = []
for i in 0 ..< 1000 {
    let values = values.map { $0 + Double(i) }
    let newValues = doPublicThing(values: values)
    results.append(newValues.reduce(0, +))
}
print(results)

The original Swift Evolution proposal for the @inlinable attribute goes into more detail, but generally, you want to apply it when specifying generic code across module boundaries. I would modify your benchmark example as follows:

@inlinable
public func doPublicThing<T: FloatingPoint>(values: [T]) -> [T] {
    var finalValues: [T] = []
    for value in values {
        let finalValue = doInternalThing(value: value)
        finalValues.append(finalValue)
    }
    return finalValues
}

@usableFromInline
internal func doInternalThing<T: FloatingPoint>(value: T) -> T {
    var value = value
    for i in 0 ..< 1000 {
        value += T.init(i * 2)
    }
    return value
}

Ah ok so it seems that inlining doesn't work the way I thought. It only applies to generic code? That's very helpful! Thanks. I'm a little surprised however.

As-per the original proposal:

Within the scope of a single module, the Swift compiler performs very aggressive optimization, including full and partial specialization of generic functions, inlining, and various forms of interprocedural analysis.

On the other hand, across module boundaries, runtime generics introduce unavoidable overhead, as reified type metadata must be passed between functions, and various indirect access patterns must be used to manipulate values of generic type. We believe that for most applications, this overhead is negligible compared to the actual work performed by the code itself.

However, for some advanced use cases, and in particular for the standard library, the overhead of runtime generics can dominate any useful work performed by the library. Examples include the various algorithms defined in protocol extensions of Sequence and Collection, for instance the map method of the Sequence protocol. Here the algorithm is very simple and spends most of its time manipulating generic values and calling to a user-supplied closure; specialization and inlining can completely eliminate the algorithm of the higher-order function call and generate equivalent code to a hand-written loop manipulating concrete types.

The library author can annotate such published APIs with the @inlinable attribute. This will make their bodies available to the optimizer when building client code in other modules that call those APIs. The optimizer may or may not make use of the function body; it might be inlined, specialized, or ignored, in which case the compiler will continue to reference the public entry point in the framework. If the framework were to change the definition of such a function, only binaries built against the newer version of library might continue using the old, inlined definition, they may use the new definition, or even a mix depending if certain call sites inlined the function or not.

1 Like

Out of curiosity, I performed a "find and delete" using this regex in Numberick:

@(inlinable|_transparent|usableFromInline|frozen)

My 208 mico-benchmarks went from completing in 1.4 seconds to 118.0 seconds.

2 Likes

So, I reran with some generic code as suggested by @Dan_Stenmark. Now there's a clear difference. Thanks!

The final bit I'd like clarity on however. Are there situations where using @inlinable is important that do not relate to generics? For generics having access to the implementation at compile time clearly has big benefits. Are there examples that aren't related to generics that also have this kind of performance impact? Since @inlinable doesn't seem to be only written to improve generics code (or is it?) but in general is a way of exposing function implementation to the compiler between modules. With the goal of the compiler being able to apply more optimizations (generic specialisation being a clear one in this case).

I haven’t tested, but perhaps something like this:

// Module A

func addOne(to x: Int) -> Int { return x + 1 }
func subtractOne(from x: Int) -> Int { return x - 1 }
// Module B

import A

func doSomething(times n: Int) -> Int {
  var x = 0
  for _ in 1...n {
    x = addOne(to: x)
    x = subtractOne(from: x)
  }
  return x
}

doSomething(times: 1_000_000_000)

With inlining, the compiler should see that the loop body amounts to a no-op and eliminate it. (Although overflow checking might make that more difficult. Perhaps try with &+ and &- instead.)

This is a question that's better for the Core Team to answer, but in my personal experience, modern compilers and CPU branch predictors have rendered manual inline annotations largely superfluous, outside of very specific cases (e.g. Swift generics across module boundaries) that may vary on a per-language basis.

IMO, if you're seeking to optimize a piece of low-level code, you're better off looking into the Ownership modifiers being introduced in Xcode 15's (Swift 5.9) toolchain. Otherwise, focus on your algorithms and heap allocations.

@inlinable has nothing to do with procedural inlining, it has to do with module abstraction. it is the @inline(_:) attribute that controls procedural inlining. it's kind of unfortunate that such different tools have been given such similar names, but it is what it is…

@taylorswift I keep forgetting this... Thanks for reminding me

1 Like

Interesting idea! I've tried to measure both but don't see an optimisation like a no op happening at the moment. (Or it's happening in all cases?)

I'm a standard library engineer rather than a compiler optimizer engineer, but I may be able to shed some light here.

Generally speaking, the cases where inlining helps are ones where the caller has static information about some unknown in the callee. Cross-module generics are the most frequent place that shows up in Swift, because most callers do in fact have static type info their generic callee lacks, but there's lots of other cases. A non-exhaustive selection of examples off the top of my head:

  • Lifetime information: after inlining, an unknown-lifetime heap allocation being returned to the caller may turn out to be non-escaping and get promoted to the stack or even removed entirely. The most common example of this is _read and _modify coroutines, which if not eliminated end up calling malloc for simple Array accesses.
  • Value information: nil-checks can be eliminated if the caller knows a value has already been checked, branches based on arguments can be eliminated when the argument is known, and so on. I believe this also includes things like "the value is known to be in this range" and even "the value is known to have this bit set", but don't quote me on those.
  • Return value use information: imagine calculating a gigantic struct, then only looking at one thing in it. Why bother calculating the rest or occupying registers with values you're not going to use? This is one thing that the "values in Swift don't have a fixed location in memory" non-guarantee buys the compiler; it's free to only pass around the values actually in use (I believe this is "scalar replacement of aggregates" in the literature).
  • Register use information: a common minor code size and performance issue is having to insert unnecessary register moves to get the arguments for a function call into the correct ABI-mandated argument passing registers (or worse, spilling them to the stack). After inlining, there's no need for that. Similarly, there's no need to worry about saving and restoring callee-saved registers.

As noted above, inlinable doesn't directly control any of this. All it does is make the implementation of the function in question part of the interface to its module, rather than a hidden implementation detail. The optimizer is then free to inline it, not inline it, spawn type-specialized copies of it, or peek at it to get some of the above info without inlining it. The downside of this is that it becomes much trickier to safely change the implementation in future versions of the library due to it leaking details of how it works internally in the module interface. Handle with care!

24 Likes

Thanks @David_Smith! Exactly the answer I was hoping for :). That makes a lot of sense to me. It just allows the compiler to make more optimisations but cross module generic specialisation is just one of the optimisations applied often and that has a clear, easily measurable impact.

I now wonder. Especially with the new package accessibility modifier, is there a way to mark a whole target inlinable within a Swift package? That sounds useful (When you don't care about ABI stability) since it's so easy to forget marking something @inlinable when basically everything is allowed to be inlinable.

The downside of this is that it becomes much trickier to safely change the implementation in future versions of the library due to it leaking details of how it works internally in the module interface. Handle with care!

This is only an issue if you care about ABI stability right? Or are there dangers lurking that I'm skipping over here?

1 Like

That sounds right to me. If everything is being compiled together then you shouldn’t have any binaries hanging around containing the old implementation, so there shouldn’t be any issues.

If you're compiling the modules together then isn't the compiler free to make those optimizations anyway, without @inlinable being required?

I always expected that in the past but it's definitely not the behaviour I've been seeing when trying to confirm that.

no, for SPM projects, it cannot.

Yeah, to be unambiguous here the answer is no. Module boundaries are optimization boundaries in SwiftPM.

Would be nice if there was a flag to get around that!

Some relevant discussion from 2020 (unfortunately I don’t think there’s been an announcement-worthy milestone yet, though I haven’t tracked day-to-day progress)