SE-0259: Approximate Equality for Floating Point

Steve has shown the kind of harm this would create:

So we'd like zero not to be a special case. We want a continuous fonction.

But a function that returns a boolean can not be continuous, and floating points are not continuous either!

Let's try harder, and consider two functions of real numbers: the bounds of the interval of numbers considered approximatively equal to a given number:

For all (x, y), y.isAlmostEqual(to:x) iff y is inside [minBound(x), maxBound(x)].

To preserve scalability of floating points, minBound and maxBound are linear... until we enter the zero region.

But when do they change behavior? The answer is in the proposal:

the common numerical analysis wisdom that if you don't know anything about a computation, you should assume that roughly half the bits may have been lost to rounding.

So let's have minBound and maxBound become non-linear in the neighborhood of a well-picked small number Z which has half the bits of zero.

We'd have minBound and maxBound behave like this:

  • minBound(x) and maxBound(x) are linear for all values far enough from zero (relative tolerance)
  • minBound(0) = -Z, maxBound(0) = Z (absolute tolerance).

A definition of maxBound can be: constant around zero, then linear:

|    x
|   x
|xxx
|
|
•-----

For minBound, it is less easy. It must be linear for nearly all values, but end on -Z on 0. So let's make it this way (two linear regions):

|         xx
|       xx
|     xx
|    x
|   x
•--x--------
| x
|x
x
|

This looks odd, isn't it? But it has some requested qualities :-)

2 Likes

I bet I have just rediscovered the joined use of both relative and absolute tolerance, but broke it in an horrible way, and made it a total mess. Maybe that's all we need, after all: a function that accepts two tolerances, and provides non-zero default values for both.

1 Like

Even when adding support for relative and absolute tolerance, I am not convinced that the proposed isAlmostEqual function is a good match for this use case:

func isAlmostEqual(to other: Color) -> Bool {
    return red.isAlmostEqual(to: other.red, absoluteTolerance: 0.5 / 255) &&
        green.isAlmostEqual(to: other.green, absoluteTolerance: 0.5 / 255) &&
        blue.isAlmostEqual(to: other.blue, absoluteTolerance: 0.5 / 255) &&
        alpha.isAlmostEqual(to: other.alpha, absoluteTolerance: 0.5 / 100)
    }
}

This is a custom function with clear requirements and you can just as easily use abs(red - other.red) < 0.5 / 255 to make clear what you want.

The default tolerance in the proposed function suggests that isAlmostEqual would often be used without having to think about the tolerance. But what what is the use case for that?
When you really only want to account for rounding errors, then comparing only half the bits may not be enough. When you want to check wether two numbers fall into the same equivalence class, then this equivalence class should be specified explicitly.

It depends on what you mean by "the problem". If you mean "the natural scale of my problem is roughly unity, so anything vaguely zeroish should be considered zero," then it's still a problem. If you mean "a pure relative tolerance breaks down at zero," then it goes away. That's a lot of the subtlety that I'm trying to convey.

What I find confusing is that it seems from the discussion that we can use x.isAlmostEqual(to: y) for (almost) any values of x and y so long as neither value is zero.

If that is the case, then I find it curious that isAlmostEqual would work with .ulpOfOne but not with zero, so (naively) I would think that the function could substitute .ulpOfOne for zero and this would be good enough for a function that is testing approximate equality.

But I might be misunderstanding you. Perhaps you are saying that we should use isAlmostZero when we know we want to compare with zero, but isAlmostEqual can also handle zero values (less well).

Jeremy

This is a very good example which has to be addressed by any isAlmostEqual proposal.

1 Like

Zero has always been a special case in some situations, and unless all mathematicians in the last thousand years made a terrible oversight, it will stay that way forever - so that exception in a single method isn't that bad for me.

Still, the imho best way to resolve the issues this proposals aims to tackle will be possible when generics become more powerful and allow us an easy way to define custom numeric types:

typealias DrivingDistance = CustomDouble<precision: 0.01> // one centimeter
let a = 4000 as DrivingDistance // four kilometers
let b = 4000 + 0.008 as DrivingDistance // eight millimeters more
a == b // true

But that might never happen, so adding some simple methods (I'd still include helpers for absolute tolerance ;-) now will at least help those who are already aware of the problems with float equality.

What if we broaden this so that instead of looking for exactly zero, we check whether either argument would pass the isAlmostZero test? That gives us the right behavior in your a and a/2 test.

This approach would make "approximate equality" transitive around zero, which seems to be the big problem here.

As discussed, this costs you scale invariance. That may be a tradeoff we're willing to make, but it has very real downsides.

No sane approximate equality will ever be transitive--it would immediately follow that every value was "almost equal" to every other value; that's an anti-goal.

2 Likes

This approach would need both the absolute and relative tolerances as parameters, using the absolute tolerance near zero.

Did you mean "will ever be" here? This would be transitive only near zero, the same way isAlmostZero is already "transitive".

I think this should hold: if x.isAlmostZero() == true and y.isAlmostZero() == true, then x.isAlmostEqual(to: y) == true.

Yes, fixed.

The property you're discussing isn't quite transitivity, and the implementation you're suggesting is essentially my proposed alternative, except that it's not necessary to special case values "near zero"--that falls out automatically from having an absolute tolerance present.

Here’s a probably-really-bad idea:

Near zero, we want to use an absolute tolerance.

Near infinity, we want to use a relative tolerance.

There are N binades between 0 and ∞, where N depends on the specific type.

If we used a combined relative-and-absolute tolerance, we could make those tolerances be a function of the binade, and thus transition smoothly from purely-absolute when the exponent is at its minimum, to purely-relative when the exponent is at its maximum.

Here is another possible approach:

func isApproximately( x:Self, tolerance:Self = Self.ulpOfOne.squareRoot(), scale:Self = 1 ) -> Bool

Unlike Steve’s isAlmostEqual(to:), which derives the scale from the numbers being compared, this function uses a presumed scale of 1 and depends on the user to specify a different scale when appropriate.

The default scale works fine as long as you are dealing with numbers that are, say, within an order of magnitude of 1. When you are dealing with all very large numbers or all very small numbers, you must provide a different scale for the best results.

This uses absolute approximation (after applying the scale), and thus works fine when comparing numbers that could be equal to or close to zero.

The downside is: Will users remember to provide a different scale when it is appropriate?

I'm confused. This is exactly the proposed alternative in the post you're replying to, up to naming, except that you're providing a default value of the scale parameter (which, for the reasons I explain there, is problematic).

I didn't see this asked:
Is there a reason to exclude equality here:

let scale = max(abs(self), abs(other), .leastNormalMagnitude)
return abs(self - other) < scale*tolerance

Instead of

let scale = max(abs(self), abs(other), .leastNormalMagnitude)
return abs(self - other) <= scale*tolerance

They're equivalent formulations. You can transform one to the other by adjusting tolerance to compensate. It's mostly an arbitrary choice, though I think the range of sensible tolerances is a little bit nicer with <.

Ah, I didn't realize that's what you were proposing in your second proposed solution. I was confused by the label "absoluteTolerance". I think "scale" would be better.

(9.0).isAlmostEqual(to: 10.0, tolerance: 0.1)

Engineering notion for tolerance 10 ± 10% is the range [9,11]

Similar implementations that I've seen (Python, Boost, NumPy) have <=

So these are two arguments to include equality.

The "near zero" part is what lets us provide a default value for the absolute tolerance — I worry that without that we're re-injecting part of the problem that I'm hoping this API can solve. If we use scale instead of absoluteTolerance as the other parameter, can we provide a default value?

Scale and absolute tolerance are two names for the same concept. There is no getting around the fact that providing a default value for either is problematic; it necessarily will cause some values that should not be considered equivalent in some use cases to be considered equivalent (if non-zero) or vice-versa (if zero). There is no way around that.

To be blunt: if there were a universally-correct way of doing this, it would already be in use. You cannot clever your way out of it (even though many of the replies on this thread seem to be trying to do so), because the problem doesn't arise from API or representation limitations--it is fundamental to the notions of scale independence and numerical approximation. The task is to design API that capture the complexity in a way that helps users reason about it, not to design a magic API that somehow just works for all cases.

3 Likes

Do you have any reservations about a throwable API to force users to handle the zero case?