The definition of simd_fract(x)
is precisely "x - floor(x)
clamped to [0,1)
" (fract
is defined this way because without the clamp, x - floor(x)
produces 1.0
for tiny negative numbers, which breaks some algorithms). Why do you expect it to be faster with an additional clamp? (I'm genuinely curious, because it seems like a documentation bug if this isn't clear).
As for the lack of difference between fast_recip
and precise_recip
, the argument to the function is a compile-time constant; I expect that the optimizer is able to propagate it through so that neither is actually evaluated at run-time. I'm not set-up to test this right now, but a quick glance at the disassembly would confirm.