API proposal: fast difference-of-products

scanon · August 30, 2025, 1:48am

Motivation
Expressions of the form a*b - c*d or a*b + c*d are extremely common and notorious sources of cancellation error in floating-point arithmetic. Some well-known examples include 2x2 matrix determinants and the discriminant term in the quadratic formula. It's quite easy to construct ill-posed examples (for determinants, when the rows/columns are nearly collinear; for discriminants when the roots of the quadratic are close together) for which essentially all accuracy is lost if a naive computation via two multiplies and an add or a multiply and an FMA is used.

Kahan's lecture note goes into excessive detail on these motivating examples, for anyone interested in a more complete treatment of the problem.

Proposed Solution
There is a simple algorithm from Kahan that eliminates the problem entirely with almost zero performance cost. Kahan was particularly interested in 2x2 determinants for the purposes of finding singular values, through he wrote his note in terms of quadratics, but the algorithms is more generally useful; Matt Pharr calls it "difference of products," which I find to be a pretty good name (even though it's also applicable to sums of products). In pseudocode, the algorithm is as follows:

// given a, b, c, d, compute ab - cd
let (head, tail) = Augmented.product(-c, d)
let result = head.addingProduct(a, b) + tail

Instead of two multiplies and an add, this requires two FMAs and a multiply, so the performance impact of adopting it is negligible on modern targets, and it completely solves the problem; instead of essentially arbitrary relative error, this achieves an error bound of 1.5ulp for binary floating-point types.¹

I propose to make this algorithm available in the Augmented enum of RealModule, under the binding: Rough sketch of fast difference-of-products. by stephentyrone · Pull Request #329 · apple/swift-numerics · GitHub. As a strawman API for development purposes, I have bound it as:

extension Augmented {
  /// ab - cd
  static func fastDifferenceOfProducts<T: FloatingPoint>(
    _ a: T, _ b: T, _ c: T, _d: T
  ) -> T
}

Alternatives

Naming: I generally like Matt Pharr's name vs "determinant" or similar, because it suggests that it can be used for any difference of products, but I'm open to other suggestions. In particular, "difference of products" suggests the argument ordering (a, b, c, d) for (ab - cd), but a "determinant" name might suggest (ad - bc), and other names would suggest something else. This is primarily of use as a building block for higher-level numerical algorithms, so I do not expect "normal" users to be faced with it often, but it's still worth getting right.
Naming (pt 2): Why "fast difference of products" instead of just "difference of products"? I have in mind leaving space for a more expensive sub-ulp accurate (or correctly-rounded) implementation, or an implementation that produces both head and tail. But maybe those would best be placed under a different name, and this should have the shorter name.
This algorithm does not handle overflow/underflow. That's OK, neither does Augmented.product. But it's worth keeping in mind with regard to API design.
Should there also be an API for ab + cd (or should that be the API)? There isn't one for head-tail a-b; users are expected to write sum(a, -b), but maybe that should be revisited.
Maybe Augmented should be reserved for algorithms that have head-tail results or parameters, and algorithms like this that merely use augmented arithmetic belong elsewhere.

¹ this is not obvious, and for a decade or so after publication people only remarked that the error was surprisingly good most of the time. Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller published the clearest proof that I'm aware of.

jrose · August 30, 2025, 2:11am

Bikeshed color

Augmented.differenceOfProducts((a, b), (c, d)). Exact same ABI (I think), follows the more mathy conventions rather than using labeled arguments (though you could consider labeling the second tuple minus), but much quicker to guess the meaning for someone who isn’t familiar with this operation than the fully flat placeholder form.

xwu · August 30, 2025, 2:11pm

I do think this should have the “best” name and one that doesn’t mention how it’s implemented with augmented arithmetic: while the underlying algorithm is very interesting to me, I suspect most people will want to use this API for an ideal tradeoff of performance and accuracy regardless of how it’s implemented.

It seems to me that the most consistent spelling with stdlib precedent and that users will seek out is:

(a, b).subtractingProduct(c, d)

It is too bad that we can’t currently extend tuples, but something along those lines would be nice.

grynspan · August 30, 2025, 9:34pm

For the sake of discussion… perhaps this ~~optimization~~ correctness fix could be made at the compiler level?

scanon · August 30, 2025, 10:19pm

No, sometimes users really want ab - cd and the compiler has to respect that, or else engineering is impossible.