This is to continue a discussion I was having with @rxwei at NeurIPS.
Because it matters for the similar work we are doing in JuliaLang.
And is generally interesting
Consider that one has v, dv = valueWithDifferential(body, x)
or valueWithPullback
, or valueWithGradient
etc
where the general promise is that v = body(x)
.
As I recollect the discussion:
It was @rxwei's position that v
had to be exactly equal to body(x)
.
It was my position that it had to be fairly close, but that some error due to floating point math was permitted.
In particular if I had to quanify it,
I would argue that for v_true
being the actual result of body(x)
according to the math, and v
beiing the value according to v, dv = valueWithDifferential(body, x)
then | body(x) - v_true | >= | v - v_true |
for all x
.
i.e. if you have some operation like normal sin
from a good LibM, that is accurate to 1 ulp,
then the value according to some operation that also calculates or tracks derivitive information
should also be accurate to 1 ulp.
But its error could be in the oposite direction.
If however it was something like SLEEF's fast implementation of sin
which is only accurate to 3 ulp, then it too only has to be accurate te 3 ulp, again it could be in opposite direction.
The reason for allowing this disagreement is that to allow efficient computing of the deriviatives, it is sometimes useful to change how one compute the value, so as to calculate useful intermediary values that get reused when calculating the derivative information.
And it is really fiddly to be ensure floating point math remains the same.
Given it is not even associative (i.e. a + (b + c) != (a + b) + c
).
And example of this is /
, here is code from some of our AD tooling for Julia showing how that is useful.
I just would have no confidence that this will give identical ansers to performing the operation directly.
The math is correct but floating points are not to be trusted.
That final point is why I think this is OK.
everyone doing this kind of thing should be aware that floating points are not to be trusted.
TensorFlow documents that exact floating point values returned by functions are excluded from their SemVer promise.
Anyway, this is an interesting conversation worth having.