Can the compiler optimize into fma instructions?

Looking on swift.godbolt.org, I see that this function compiles into a single fma call:

func f(x: Float, y: Float, z: Float) -> Float {
    return z.addingProduct(x, y)
}

But this function remains a multiply followed sequentially by an add:

func f(x: Float, y: Float, z: Float) -> Float {
    return x * y + z
}

Is there any way for me as a programmer to tell the compiler, “Please optimize all expressions like x * y + z into fma instructions”?

I want to write nice readable code using standard arithmetic operators, and I want the compiler to make it as efficient as possible. I understand that the results may be different between the two implementations, and I prefer the fma (it should be both faster and more accurate, if I understand correctly).

I just don’t want to have to write z.addingProduct(x, y) everywhere. Can the compiler do that for me?

There is an llc flag -fp-contract=fast. I'm not sure how to pass that in though. The most straightforward solution

$ swiftc tmp.swift -emit-assembly -Xfrontend -Xllvm -Xfrontend -fp-contract=fast
swift (LLVM option parsing): Unknown command line argument '-fp-contract=fast'.  Try: 'swift (LLVM option parsing) --help'
swift (LLVM option parsing): Did you mean '--print-gc=fast'?

doesn't work. :thinking:

1 Like

z.addingProduct(x, y) and x * y + z are different operations, with different results. why would you want the compiler switching them around willy nilly? i do think though fma would benefit from a nicer shorthand, maybe with a different operator like +*

static 
func +* (lhs shift:Float, rhs factors:(Float, Float)) -> Float 
let w:Float = z +* (x, y)

Not “willy-nilly”—at my explicit command.

And as stated in the original post, I want this because x * y + z is much nicer to read and write, while the fma instructions provide superior accuracy and performance.

if x * y + z maps to the fma operation, then how do you get the doubly-rounded behavior? this is important if, for example, you are trying to reproduce data emitted by a different application that does not use the fma operation...

what happens if you write:

let a:Float = x * y 
let b:Float = z + a

does this somehow convert into an fma as well? if it doesn’t, why should it produce different results from the structurally identical

let b:Float = z + x * y 

?

As stated in the original post, I am asking if there is a way to tell the compiler, “Always replace x * y + z with fma instructions”.

If there is such an option, and you don’t want to use it, then the way you don’t use it is by not using it.

There is no way to express this at present in Swift.

If we were to add it, we would want it to be scoped to language blocks; there’s not any precedent that I know of in the language for replacing operator definitions within a block like this, so it requires inventing some new patterns.

Scoping it only to functions might be easier to achieve (or at least require less discussion), but is somewhat limited in use (also you need to be careful about the semantics of inlining in a mode like this.)

4 Likes

I have not looked at how clang handles this yet, but presumably if we added an --fp-contract=fast flag to the frontend and passed it through, the optimization would be applied, right?

Technically we can add the option pretty easily. As a matter of policy, it’s not obvious that having a global option is a good idea.

The simplest solution (though not exactly what you want) might be to add a magic "fusable" multiplication operator that would expand to an llvm fmul node with the right flags to allow it to combine with a later add or sub to form an fma in LLVM. That's pretty much trivial to set up, if you're willing to use &* or similar for it.

There are also Incredible Operator Hacks that will "work" in your own code:

struct FusedMultiplyAdd<Value: FloatingPoint> {
  var first, second: Value
}
infix operator *+ : MultiplicationPrecedence
extension FloatingPoint {
  static func *+(left: Self, right: Self) -> FusedMultiplyAdd<Self> {
    return .init(first: left, second: right)
  }
  static func +(left: Self, right: FusedMultiplyAdd<Self>) -> Self {
    return left.addingProduct(right.first, right.second)
  }
  static func +(left: FusedMultiplyAdd<Self>, right: Self) -> Self {
    return right.addingProduct(left.first, left.second)
  }
}

let x = 2.0 *+ 2.0 + 1

Whether or not this is a good idea is up to you. It at least shouldn't run into type-checker concerns because FusedMultiplyAdd is a known struct type rather than a generic parameter and because the custom operator isn't overloaded.

4 Likes

Yeah; the slight downside to this approach is that you have to also define overloads for - and prefix - and += and -= and probably a few things that I've forgotten, but the upside is that it works in your code today with no stdlib or language changes.

1 Like

Well, the other downside is that it requires me, the programmer, to actively identify and specify each and every place where an fma should occur, instead of writing simple code and having the compiler choose the most accurate and efficient instructions for it.

Picking a more efficient instruction is one thing; picking a more accurate instruction (i.e. one that produces a different value) is something else entirely. It's completely understandable why you want this, but it's a very old idea with a long history of causing problems in other languages because suddenly the semantics of code start varying by optimization settings and compiler version. If it's not the default behavior, it's a bit different, but then we need to think carefully about how to actually fit it into the language; we don't want to just add a global switch.

7 Likes

Adding to what John said, note that FMA is only "more accurate" when viewed in isolation; as part of a larger computation it can (and does) cause errors if not deployed with care. The standard example of this is complex multiplication:

result.real = a.real*b.real - a.imag*b.imag
result.imag = a.real*b.imag + a.imag*b.real

If evaluated without FMA, these expressions have the desirable property that z * conj(z) is pure real for any z (i.e. the imaginary part evaluates to exactly zero). However, if FMA is inserted by the compiler, the rounding symmetry that guarantees this property is lost.

9 Likes

This is related to the idea of fast math, as well as rounding mode support - neither of which Swift support at the moment. I'd love to see better support for this, but I also don't like global flags.

The best approach (if we could make it work) would be a scoped behavior like the C99 pragmas, so you could mark a region of code. One of the open questions though is what the semantics are across function calls. The most logical behavior is something like a dynamic mode switch - but that is impractical to implement.

-Chris

4 Likes

A dynamic mode switch is undesirable, because you may have primitives that require precise rounding, and they should not inherit unsafe fast-math transforms from their caller.

E.g. complex multiplication might not want to allow FMA formation so that z * conj(z) is always pure real¹; if it chooses those semantics, the semantics of a caller should not override them, even when it is inlined.

¹ This is a somewhat contrived example, and unnecessary--complex multiplication behaves just fine under FMA formation--the relative error bounds are preserved, even if the result can seem a little weird. But there are lots of very real examples where it does matter, they're just a little bit harder to explain, so this one is convenient.

2 Likes

Yes, good point. We need some model that allows function call abstractions though. Even in the most extreme example, all swift operators are calls, not primitives like in C.

1 Like

Right; I'm picturing something like what @Joe_Groff used to call "scoped imports," where if you supply a shibboleth, we import a different operator definition of * in that scope that permits (or requires) fusion. So we would replace the definition of * (and related operators), rather than propagate fast-math flags onto all called functions.

1 Like