Optimization Controls and Optimization Hints

Optimization Controls and Optimization Hints

Static heuristics in Compilers recognize pattern in program structure such as loops, asserts, throwing an exception to judge execution frequency of code sections. These execution frequency estimates are used to make trade-offs in code size vs optimization aggressiveness judgements.
To deal with cases where static compiler heuristics fail programming languages can provide optimization controls and/or hints to give code authors more control over when an optimization should happen or to communicate properties useful for a compiler’s optimization heuristic. (e.g whether a region of code of code might be more/less frequently executed).

Furthermore, optimization controls guarantee an optimization happens regardless of possibly changing compiler heuristics between compiler versions making them important tool for predictable code optimization.

I think it would be good to push on adding controls and hints as part of the language and would like the community’s input on how that should look like.
I would like to start with seeding some, in my opinion, high value annotations.

Swift provides some controls and hints today. Some of them are fully supported, for example the optimization control @inline(never) . The dual @inline(__always) is underscored and therefore not consider to be “supported”. Some of them are only available in an underscored form, for example the optimization hint _onFastPath() that should indicate to the inlining heuristic that a path containing a function call to this function is hot/frequently executed.

I would like to propose to make some optimization controls and hints more widely available in a supported form for when a predictable optimization (“optimization controls
) and influence on static compiler heuristics (“optimization hints”) are desired.

I think it useful to think about optimization annotations along several axis. One axis we already mentioned: optimization control versus optimization hint. A second axis is how declare the behavior/the scope: declaration site vs use site. A third useful axis is how far the annotation reaches: the immediate annotated object/function or transitively reachable objects/functions.

Optimization control/Optimization hint

An optimization control is a directive to the compiler to perform an optimization.
@inline(always)/@inline(never) are examples of optimization controls: the inlining optimization is guaranteed to always happen/never happen. When an optimization control is applied it should be a compiler error if it can’t be followed.

An optimization hint is an annotation that conveys information that influence compiler heuristics.
_onFastPath() is an optimization hint that tells compiler heuristics that this call is situated on a frequently executed path. Optimizations that use notions of how frequently execute a path is can take this hint into consideration.

Declaration site versus use site annotation

An annotation can be a declaration site annotation such as @inline(never) applying to all instances of a declaration: we want the annotated function never to be inlined regardless of use site properties. Or it can be a use site annotation such as the existing _onFastPath(), when we want the annotation to apply to only to a specific context inside of a function.

func context() {
  if cond {
    _onFastPath()
    frequentlyExecuted()
  }
  notExpectedToBeFrequentlyExecuted()
}

In the context of thinking about inlining we can form a matrix of possible useful options.

Use site Declaration site
Optimization control inline { call() } @inline(always) func g() {}
Optimization hint frequentlyExecuted { call() } @inline(ifNotCold) func g() {}

The inline { } use site control specifies that all calls inside the braces are supposed to be inlined, this request for inlining only applies to one level of referenced functions.
The dual @inline(always) site optimization control specifies that the function g() should always be inlined.
The use side optimization hint frequentlyExecuted{} specifies that the code inside the braces is to be consider “hot” by compiler heuristics such as the inlining heuristics.
The declaration site hint @inline(ifNotCold) could be used by library authors to signal to the optimizer to only inline the function into regions not identified as cold according to its heuristics.

Optimization controls

There are several important classes of possible optimization controls: inlining, specialization, and constant propagation. I would like to focus on the first two, there is on going active work on constant propagation in the community: https://forums.swift.org/t/pitch-3-swift-compile-time-values.

Inlining controls

I think there are two important declaration site optimization controls and one use site control

  • Always inline this function into the caller @inline(always); I think this optimization control should apply at all optimization modes (optimized, debug).
    This control should either cause a compiler error if the annotated function is not also @inlinable or should imply @inlinable.

    • We could have a variant that restricts this to only optimized mode: @inline(release)
    • Or a variant that restricts the applicability only to the local module@inline(onlyInModule)
  • Never inline this function (@inline(never)), this control already exists.

  • A use site inline { call() } annotation describing that the immediate call sites within the braces must be inlined. If immediate call()s inside the braces cannot be inlined because their body is not available this should be a hard compiler error.

    • It would make sense to add the dual neverInline { call() }

Specialization controls

I think the following specialization controls would be useful.

  • A use site control to specialize all transitive generic call sites reachable from the braces specialize { callSite<Int, SomeType>() } with a dependence on the types in the substitution maps of the immediate generic call sites func callSite<T , V>.

    This should force specialization in both debug and release modes and should error if specialization is not possible.

    Example

func a() {
  specialize {
    g<Int>()
  }
}

func g<T>() {
  t<T.Assoc>() // error if not specializable
  f() // does not descend into non-generic calls
  c<Int> // not an error if not specialzed because substitution map does not derive from T
  
}

Optimization hints

Optimization hints function to influence compiler heuristics in a way to make existing heuristics more predictable and to provide further information to the heuristics.

Execution frequency

Communicating execution frequency where existing compiler heuristics don’t find a pattern to queue of is important.

frequentlyExecuted { callSite() }
This use site annotation would inform the compiler that this code within the braces is expected to execute many times. It would provide a general hint to various optimizations to strongly favor performance over code size for such sections. This might include more aggressive loop unrolling, vectorization, and inlining. A version of this currently exists as a call to the library function _onFastPath().

Inlining

A @inline(ifNotCold) declaration site hint and maybe also a use site version inlineIfNotCold {}. The intention is to inline the annotated function/code if the compiler heuristics determines the call is on a path that is determined to not be “cold”.

Having listed some optimization attributes I think would be useful to add, what are your thoughts and requirements?

13 Likes

Just reading quickly (may read it again later).

  1. If we are going to have frequentlyExecuted… we probably also want a cold version of that. I don’t know what you would call it. For arguments sake I am going to call it just onColdPath. One would make onColdPath no inline and as a nice benefit it would make the cold code out of line which one often times wants to do with runtime functions that have an inline hot path and an out of line cold path.
  2. Also, a version of must tail would be useful. It is sort of similar to this in the sense that tail calls are run by compiler heuristics today and a must tail like thing would be forcing the compiler to emit a tail call regardless of heuristics or exit.
2 Likes

If I understand correctly you would like to have the already existing (unstable) hints stabilized and make versions of them on both the callsite and definition?

I’m concerned about making @inline(__always) without underscores as it completely disables heuristics instead of giving the compiler just hints. The fast path (didn’t know about this actually) should change syntax if it ever gets official, but I guess this is ok (given I don’t know the exact details of this hint) as it only hints the compiler and doesn’t override it.

I worry that any closure-based API would have ergonomic issues unless run-once closure types are also part of the proposal. For example:

let name: String
let age: Int
inline {
  // currently: error because name and age could already be initialized
  name = getUserName()
  age = getUserAge()
}

print(name, age) // currently: error because name and age are not necessarily initialized

(this is a toy example but I imagine issues like this would readily appear in real code)

4 Likes

This is a good point.

A scoped annotation like inline {} need not necessarily be implemented as a closure based API.

3 Likes

@inline(__always) comes very close to what I call an optimization control above. It is not merely an optimization hint, rather –in release mode– the swift optimizer will always inline function annotated with it, without consideration to other heuristics. There are some corner cases where it will not inline and what would make the non-underscored version an optimization control would be the behavior that it would not silently decide to not inline rather result in an error.

Obligatory “@inline(__always) does not and cannot guarantee inlining when the function is used as a value; however, this should not be considered a fatal flaw, as writing the function call inside a closure expression would have the same behavior; rather, it should merely be mentioned in any official documentation about it”.

3 Likes

I would have frequently benefit from this feature for code that is generic and I know will not get specialized. AFAIK the compiler currently doesn’t inline any generic function (that isn’t marked @inline(__always)) without specialization. Without using inline {} it is hard to say for sure but I have the feeling that just inlining one level will likely not be enough. An option to inline more aggressively would probably be necessary for unspecialized generics.

Also, AFAICT adding @inline(__always) to a function/method doesn’t seem to influence closure parameters reliably. It would be great if it would also be allowed to force inlining of closure arguments e.g. func withStyleAPI(_ body: @inline(__always) () → Void).

Another interesting use case is FloatingPointRoundingRule and the functions that use it e.g. FloatingPoint.round(_:). All methods that take a FloatingPointRoundingRule are marked @inline(__always) which is great if those functions are used directly with a constant. However, if further abstractions are added that also take FloatingPointRoundingRule as a parameter they also need to be marked @inline(__always) or otherwise they become a performance trap. It seems desirable to better describe this in the API. Maybe it would be possible to mark the type FloatingPointRoundingRule as @inline(always)so that all functions that use it are forced to get inlined or potentially just have a higher inline priority.

Would it also be possible to have a version that is added to the declaration that fails to compile if a use site can’t be specialized? I think this would need to be viral if any calling function is itself generic.

A variation of @specialized that would be useful is to be able to define two implementations. One that is used when specialization is guaranteed to happen and one which is used if the unspecialized version is used. This would allow to fine tune the implementation for both cases and e.g. add certain checks that are rather expensive if not specialized but are essentially free or optimized away if specialized such as type casts.

Overall all the proposed features sound very useful to me. Really looking forward to be able to use them!

1 Like

Interesting, and I agree with your conclusion.

I had not thought about the (potential) expectation when the function is used as a value. Thank you for mentioning it.

In the context of @inline(always) attribute being used as an optimization control – that is, a guarantee that inlining will happen– I think, we will want to specify that the guarantee only applies to cases when the function is used as a “static reference” in an application in the “Swift source” – not when it is used as a value that is called at a later point.

@inline(always)
func callee() {
  print("body of callee")
}

func someOtherCallee() {}

func caller() {
  callee() // guaranteed to inline
  var functionValue = callee
  if cond {
    functionValue = someOtherCallee
  }
  functionValue() // not guaranteed to inline
  
}

A function marked with inline(always) can be used to initialize a function value (e.g var functionValue = callee) that is later called. The optimizer might be able to forward the function value’s function reference and captured arguments from the initialization to the application (call) of the function value.
In such a situation, the inline(always) attribute should be conceptually “demoted to”/”function as” an optimization hint since we rely on other –not necessarily “guaranteed to happen”– optimizations to facilitate the eventual inlining.

3 Likes

Thank you for the input.

My intention for inline {} is for it to be an optimization control. I see optimization controls as guarantees of optimization to happen. That means they will be diagnosed if they cannot happen. This diagnosis should happen regardless of compiler version or optimization mode.

If the attribute applies to transitive calls what describes the set of transitive calls?

For the immediate case if you have a callee() annotated with inline{} it should be an error if the body is not available to the caller. That would be the case for example if the definition is in another module and not marked with @inlinable.

How could this look in the transitive case? One answer is that it would require that all referenced functions transitively are also @inlinable. I think that would be a rather onerous constraint as source code evolves over time.

One thing to note: once we have an inline {} annotation. Having it defined as I have, does not preclude compiler heuristics (for example, in optimized mode) to take the annotation as a hint that transitive calls should probably also try harder to inline but it would not be part of the guarantee that the optimization control makes.
If you want that guarantee you would also have to mark the transitive calls with inline{}.

1 Like

One thing that occurred to me while reading this discussion is that adding caller optimization control tools like inline { }to the language implies new source compatibility rules for library authors. The presence of @inlinable on a function may now be required by some dependent code in order to build without errors.

1 Like

This is a good point to bring up. Using an optimization control such as inline {} would imply a source break in the using library if the dependent library removes @inlinable.

I think that is desirable from the performance guarantee perspective but certainly has its costs and can be a surprising consequence for the using library’s author.

1 Like

The way that I thought about the later case (and I think this would apply to the closure case too) in the past is in terms a mix of “constant propagation” and “function specialization”.

I think the case you mention might be handled by a @constant_specialize_argument attribute that does the following. If a such marked argument of a callee is constant at a call site, it is propagated into the function body of a specialized version of the callee function and that specialized function is called at that call site instead.

@constant_specialize_argument(mode)
func someOtherFunction(..., mode: MaybeConstantMode) {
  switch mode {
    case .default:
       cheap()
    case .expensive:
       expensive()
  }
  ...
}

@constant_specialize_argument(mode)
func someFunction(..., mode: MaybeConstantMode) { 
  someOtherFunction(..., mode)
}

func caller() {
   someFunction(…, mode: .default)
}

In this example, the compiler would generate a specialized version of someFunction() and someOtherFunction where the mode argument is removed by the call site’s mode = .default initialization and constant propagated into the specialized function.

func someOtherFunction_mode_default(...) {
  cheap()
  ...
}

func someFunction_mode_default(...) { 
  someOtherFunction_mode_default(...)
}

func caller() {
   someFunction_mode_default(…)
}

This same attribute would facilitate propagating literal closure arguments into functions.

1 Like

How do you feel this compares with the conventional way to inform compiler heuristics, profile-guided optimization (PGO)?

Is the expectation that programmers do their own profiling (hopefully) and then annotate in each place where they think an optimization hint could help? Embedding hints manually into the source code seems to amplify the fundamental problem of that hint becoming outdated in the future, as that could regress performance even compared to no hint at all.

In PGO, the profile data is in essence a set of optimization hints, coming directly from the profiling tool. It’s typically kept in a separate file, so it’s easy to experiment with different profiles (or disable them entirely), to ensure the profile has not become outdated.

The downside of typical PGO data formats is that they’re not designed for humans to easily read or edit them, but I can imagine profile formats that are human-readable, and/or integrate with an IDE that, for example, overlays the information visually with the code.

2 Likes

I think we may need to investigate mitigations for this. Perhaps something like downgrading the errors to warnings when building in non-current language modes, so that people can address them while upgrading rather than when they install new tools?

Making things inlineable in the stdlib is already a tremendously precarious choice without having even more ways to go wrong in the future.

1 Like

At least in a source only world like the package ecosystem, the aggressive/full CMO ought to give users the control to force all methods to be marked as @inlinable which should give them a tool to unblock themselves.

I view hints as complementary/additional tool to fully automatic PGO.

I view “profile information/profile intuition” serialized back into source code as a valuable tool that evolves as part of editing the code (“is this annotation still a valid assumption?”).

I imagine workflows where profile information feeds into the decision to annotate source code. This could be the programmer’s intuition about hotness, being informed by profiling their application, or more integrated workflow where a profile feeds into tools suggesting source code annotation.

You can maybe see how this is somewhat similar to your last comment with the slight difference of where we locate the source of truth: externally to the source code or in the source code.

The downside of typical PGO data formats is that they’re not designed for humans to easily read or edit them, but I can imagine profile formats that are human-readable, and/or integrate with an IDE that, for example, overlays the information visually with the code.

1 Like

I think we may need to investigate mitigations for this. Perhaps something like downgrading the errors to warnings when building in non-current language modes, so that people can address them while upgrading rather than when they install new tools?

Making things inlineable in the stdlib is already a tremendously precarious choice without having even more ways to go wrong in the future.

It is very possible that we would end up with some flag to enable/disable the performance error diagnostics behavior and that the right default is to not have them enabled.

If we want these performance annotations to be hard guarantees, another possibility could be to say that they can only be applied to calls that reference declarations from the same module and/or package. Modules like the standard library that are comfortable committing to the inlinability of parts of their API could opt in to allowing strong performance annotations on those APIs.

4 Likes

Interesting. A combination of what you suggest and a stronger opt-in mode maybe be an answer.