SIMD performance question

The definition of simd_fract(x) is precisely "x - floor(x) clamped to [0,1)" (fract is defined this way because without the clamp, x - floor(x) produces 1.0 for tiny negative numbers, which breaks some algorithms). Why do you expect it to be faster with an additional clamp? (I'm genuinely curious, because it seems like a documentation bug if this isn't clear).

As for the lack of difference between fast_recip and precise_recip, the argument to the function is a compile-time constant; I expect that the optimizer is able to propagate it through so that neither is actually evaluated at run-time. I'm not set-up to test this right now, but a quick glance at the disassembly would confirm.

1 Like