Is it possible to prevent inlining and tail call optimization?

I'm trying to do some performance assessment with the Time Profiler in Instruments. It's pretty clear that some traces are being scrambled by inlining and/or tail call optimization, to the extent that it's hard to tell where time is really being consumed.

Is there any way to turn these optimizations off globally? I know this would have its own biasing effect on the execution profile, but it seems like those distortions would be easier to deal with than the ones introduced by inlining and tail calls. And of course, I want to keep all other optimizations.

This WWDC 2015 talk and this 2017 article by Jordan Morgan both suggest adding -fno-optimize-sibling-calls to CFLAGS (and by extension, I assume -fno-inline-functions for inlining), but this no longer seems to have any effect. These are Clang flags; was the Swift compiler previously based on Clang and is now separate?

1 Like

Those are Clang driver flags; Swift never implemented them. You can suppress inlining of a function with @inline(never), or more forcibly all optimization of a function with @_optimize(none), but AFAIK the Swift driver does not have flags like those.

3 Likes

On a similar note, is there a way to make the optimizer not cull a variable?

e.g. let _ = (0...1_000_000).map { $0 * $0 } would get optimized out because it has no side effects, and there's no usage of the result.

Is there a way to prevent this, without actually using the value in some expensive way, such as printing it?

You could write an @_optimize(none) function that takes the value as an argument but does nothing. I believe that will prevent it from getting optimized away.

Hmm, I was playing around in god bolt and I couldn't make it behave much differently. The "expensive computation" was eliminated out.

@inline(never) is almost always enough to make this work, as you can see on Godbolt. So long as you pass the result of a dependent computation to an @inline(never) function, the compiler cannot throw that computation away.

Possibly helpful:

That's not really guaranteed, though, since while @inline(never) suppresses inlining, it does not suppress other interprocedural optimization. LLVM in particular could notice that the callee does nothing and still remove the call even though it's not inlined. @_optimize(none) AIUI is supposed to correspond to the clang/llvm optnone attribute, which prevents any interprocedural optimization, so it seems like a bug if it does not. @Erik_Eckstein is that accurate?

1 Like

You're the LLVM expert, so I defer to your knowledge. That said, I've never observed this happening.

The Swift benchmarks use @inline(never), though they have a mysterious comment that says "It's important that this function is in another module than the tests which are using it." Given that there is no cross-module optimisation I have no idea why that's necessary (or even why @inline(never) would be needed in that case), but nevertheless, that's been my reference point for this kind of thing.

How can I use this in my own tests? Trying to use it in god bolt leads to "error: unknown attribute: _optimize"

For the sake of sharing experience, I recently used @inline(never) with success, in order to prevent release builds from consuming more memory than debug builds.

The problem was that a heavy static global was loaded in memory, in release builds only, even though this global is actually never used at all, defeating the expectation that Swift globals are lazily loaded.

We start from "actual pseudo code":

struct SomeHeavyType {
    static var `default`= { ... }()
}
some loop {
    use SomeHeavyType.default through some function
}

Then we apply inlining optimization:

some loop {
    use SomeHeavyType.default
}

And then (now this is a guess), the optimizer extracts the loop invariant in a temp local before the loop starts:

doubts

I'm not sure that the optimization which pre-loads the global is really the extraction of a loop invariant, because I'm not sure the optimizer is allowed to assume the global is constant during the loop execution, due to its public setter (some other thread may mutate it concurrently, after all). But very likely I just don't understand well the exclusivity laws of Swift.

let local = SomeHeavyType.default
some loop {
    use local
}

The last optimization, whatever is really is, has the global loaded even if the loop is empty.

In the end, this creates a measurable difference between the release and debug builds, when two conditions are met:

  • The global uses a lot of memory
  • The loop is always empty (because the user doesn't use a feature, for example).

The solution I chose was to make sure that no use of this heavy global is never ever inlined, with @inline(never) (Don't load association inflections unless necessary by groue · Pull Request #757 · groue/GRDB.swift · GitHub).

1 Like

Note that that particular global optimization should be less aggressive in the top-of-tree compiler.

1 Like

Thanks Joe, that's exactly that! And TIL about "hoisting" :+1: