Comparable and FloatingPoint types

scanon · October 11, 2018, 3:13am

Background: Comparable and Equatable vs. IEEE 754

The Comparable and Equatable protocols want to imply a total order. This is mostly satisfied by FloatingPoint types, except for the behavior with NaNs. A total order requires that for any a and b, either a ≤ b or a ≥ b is true. But IEEE 754 requires that if either a or b is NaN, both of those comparisons are false.

Why Do We Care

Aside from mathematical purity, there's some highly undesirable behavior that results from the current behavior of API that depends on Comparable and Equatable with floating-point values. Some examples of misbehavior:

> let foo = [2.0, .nan, 1.0].sorted()
foo: [Double] = 3 values {
  [0] = 2
  [1] = NaN
  [2] = 1
}
> foo.contains(.nan)
$R0: Bool = false

Obviously, these results are quite counterintuitive and likely to cause bugs for unsuspecting users.

Proposed Remedy

Use the @_implements feature to make it so the comparison operators yield a total order on floating-point data when used in a generic context constrained only to Comparable or Equatable. When the comparison operators are used in a concrete context, or in a generic context constrained more tightly than Comparable (e.g. an algorithm generic over FloatingPoint), they should continue to have the current (IEEE 754) behavior.

Details

On FloatingPoint, BinaryFloatingPoint, and the concrete floating point types, the operators would continue to work as they currently do. When constrained only to Comparable or Equatable, we would use the following implementations instead:

@implements(Equatable,==(_:_:))
static func _totalEquals(lhs: Self, rhs: Self) -> Bool {
  // Normal equality, except all NaNs compare equal to each other.
  return lhs == rhs || lhs.isNaN && rhs.isNaN
}

@implements(Comparable,<(_:_:))
static func _totalLess(lhs: Self, rhs: Self) -> Bool {
  // Normal less than, except NaN ordered below everything else.
  return lhs < rhs || lhs.isNaN && !rhs.isNaN
}

@implements(Comparable,<=(_:_:))
static func _totalLessEqual(lhs: Self, rhs: Self) -> Bool {
  // Normal less-equals, except NaN ordered below everything else.
  return lhs <= rhs || lhs.isNaN
}

We would also need to modify hashing for FloatingPoint numbers to hash all NaNs to the same value; this is simple and involves only a well-predicted branch for most use.

Benefits

API on Comparable and Equatable is much better behaved; the defaults Just Work.
Floating point algorithms ported from other languages Just Works, because in such a context you still get the IEEE 754 comparisons.
No performance penalty for the most common operations (concrete comparisons), very slight performance penalty for the comparisons that we replace.

Drawbacks

It's surprising to have a basic operator behave differently in a concrete or generic context. I personally believe that it's OK here, because users don't write a lot of generic code against Comparable; rather they will the existing APIs that depend on it. So all that most users see is that those APIs now give more sensible answers. Users writing code against more specific protocols would see no change in behavior.
Very small performance penalty. I think that's ok in the name of getting a sensible answer.
It isn't quite possible to do this yet. An implementation is blocked by ([SR-8961] @_implements still not quite powerful enough · Issue #51466 · apple/swift · GitHub)

Alternatives Considered

OMG Floating-point is broken, make < and == do the right thing.
Source breaking, and also breaks everyone's expectations coming from almost any other language. A non-starter. Contrary to many people's belief, this would not be a violation of IEEE 754; the standard says "you have to have an operation with these semantics", but does not specify how it needs to be spelled in source code.
Use the IEEE 754 totalOrder predicate.
Unfortunately, this is really a totalOrder on canonical encodings, rather than numeric values. Doing this would lead to behavior just as confusing, in the presence of redundant encodings:

> [-0.0].contains(0)
$R0: Bool = false
> [20.0 as Decimal64].contains(2.0e1) // hypothetical IEEE decimal type
$R1: Bool = false

This predicate was well-intentioned by the 754 committee, but the semantics are totally wrong for our purposes.

Do nothing.
I guess. The world hasn't quite caught fire yet.

Additional Notes

We should simultaneously deprecate the legacy isEqual(to:), isLess(than:) and isLessThanOrEqualTo(:) methods on FloatingPoint. These were originally added because we didn't yet have static operators, and they're occasionally useful as hooks, but the implementation ofFloatingPoint comparisons should be made consistent with BinaryInteger where we can, and have the operators be the primary thing.

We should consider introducing an intermediate protocol that refines Numeric and is refined by both BinaryInteger and FloatingPoint, which would be Numeric + Comparable. As a straw man, I would call it OrderedNumeric. On it's own, this doesn't enable any new generic code to be written, but in the presence of @_implements it can. This would be the protocol that abstracts over floating-point and integer, but where floating-point comparisons still have IEEE 754 semantics. I know that Xiaodi has needed to be able to describe this at some point in the past.

scanon · October 11, 2018, 3:18am

There's been a lot of past discussion around this issue:

I'm probably missing a few more.

What's different now? Mostly, that we actually (almost) have the machinery in place to build a relatively simple solution. We didn't before.

This proposal is also significantly more narrowly targeted than previous attempts have been. It's not a big sweeping language change. If you squint, you can almost pretend that we're just fixing the behaviors of a few APIs where they currently have bogus behavior. Obviously, it's actually a bigger change than that, but from the perspective of most user's experience of it, that's about the scope of it.

As a personal note, I'll be on vacation for the next two weeks. Some other folks from the standard library team will be helping with this pitch, but if there are specific IEEE 754 questions that I need to respond to, the latency will be longer than usual. Nonetheless, I'd like to start discussion of this so that we can make the change for Swift 5 if we decide to pursue it.

xwu · October 11, 2018, 3:26am

If the overall scheme were to be adopted, such an addition might be useful but I rather think it would be reasonable for Numeric.== to have IEEE semantics.

scanon · October 11, 2018, 3:27am

Yes, I think that's fine. My concern was mainly for your use w.r.t. Comparable, since Numeric does not conform.

xwu · October 11, 2018, 3:31am

Right, there are certainly advantages to adding such a protocol but I think a side benefit of Numeric not conforming to Comparable (in addition to accommodating complex numbers) is that we can sidestep the issue here with respect to < and > rather cleanly (by which I mean that there's only Comparable.< and FloatingPoint.< with no question of what the semantics of the operator would be on an intermediate protocol).

graskind · October 11, 2018, 3:35am

I agree strongly that we need a solution, but I'm very reluctant to go with making operators behave differently in generic vs. concrete context. People will inevitably run into the difference, even if not on a day-to-day basis. There will be a lot of "if I do X my code works but not if I do Y" issues as developers who don't understand the fundamental design issues of floating-point in general assume the language is to blame - in other words, what we have now but with a few extra points of confusion.

scanon · October 11, 2018, 3:38am

I'm sympathetic to that line of argument (that's why I highlighted it), and I agree that it feels icky, but ... in this specific case, it's relatively hard to imagine a scenario in which it would actually come up. It will certainly be less common than [Float].contains(.nan), for example.

Paul_Cantrell · October 11, 2018, 3:42am

My gut reaction is that this is a showstopper. The bugs it would produce, though rare, would be insidious — and there would be no compiler help in flagging them.

xwu · October 11, 2018, 3:43am

Likewise, I was once quite opposed to this proposed design for the ickiness factor, but I've come around to believe that this is probably the only workable solution. One should also mention, perhaps, that it's not unprecedented; I believe Kotlin also adopts the same design.

It's no different than concrete methods shadowing protocol extension methods, which Swift freely allows. In that sense, though icky, one might say it's also Swifty.

scanon · October 11, 2018, 3:47am

There's also no compiler help flagging the bugs we have today, and they're much, much, easier to back into, because you end up there without writing any generic code of your own.

With the proposal, the way you would end up in the bad state would be to have code generic over Comparable or Equatable, and other code on concrete types or FloatingPoint, and somehow have a requirement that they produce the same result. That's a pretty weird situation though, because the algorithms that you write against Comparable are fundamentally different from the algorithms that you write against the FloatingPoint protocol or types, because they have a totally different API surface.

I would also point out that this subject has been actively discussed off-and-on for over three years by some very clever people, and so far no one has come up with a clearly better solution. That's not to say that it's impossible, just that it's not easy.

jawbroken · October 11, 2018, 3:50am

I agree. You are much more likely to encounter the bugs that this proposal is designed to address than the ones it will theoretically introduce. So it seems like an easy choice to take the insidious bugs that are less likely to occur than the insidious bugs that are more likely to occur, given the option.

graskind · October 11, 2018, 3:58am

Okay, I admit, that's a good point - consider me convinced that this is a valid solution

Paul_Cantrell · October 11, 2018, 4:21am

It’s not so hard to imagine a context where the developer assumes > is consistent with sorted():

// Iterating over all nums ≤ max in sorted order:
for n in nums.sorted() {  // Sorting using Comparable behavior, NaNs come first
    guard n < max else {  // Wrongly assuming > is consistent with sorted()...
        return            // ...we ignore all elements if nums contains a NaN
    }
    ...
}

Granted, this code is also broken under the current misbehavior of sorted()! But this proposed solution feels like a game of illogic whack-a-mole.

I’ll also grant there’s not an obvious better solution.

The strictly correct one would be not to let Float implement Comparable at all, and instead have a separate OrderedFloat wrapper. The attendant conversion nuisance and source impact is probably a deal-killer for that.

I’m also tempted to propose an IEEE-compliant &> for Float, in the spirit of &+ for Int. I can see that the existing source impact of that is probably not acceptable. It is worth considering, however, Swift’s precedent here: when translating code from other languages, one does have to consider whether to use + or &+ — and the operator that gives the more familiar behavior is the weird-looking one, not the default +.

xwu · October 11, 2018, 4:40am

Paul_Cantrell:

It’s not so hard to imagine a context where the developer assumes > is consistent with sorted() :
// Iterating over all nums ≤ max in sorted order:
for n in nums.sorted() {  // Sorting using Comparable behavior, NaNs come first
    guard n < max else {  // Wrongly assuming > is consistent with sorted()...
        return            // ...we ignore all elements if nums contains a NaN
    }
    ...
}
Granted, this code is also broken under the current misbehavior of sorted() ! But this proposed solution feels like a game of illogic whack-a-mole.

I’ll also grant there’s not an obvious better solution.

What's nice, at least, is that with the proposed change this is likely to be immediately obvious as either all elements are ignored or the code works correctly (depending on whether we decide NaN is greater than or less than everything), nothing in between. Today, there are no semantic guarantees as to what the result might be.

Joe_Groff · October 11, 2018, 2:07pm

Great to see movement on this again. This is a great way to pretty much completely solve the problem of floating point vs total comparison. I think you should make it clearer in your pitch, though, that the splitting point for the two interfaces sits above FloatingPoint and Integer protocols. (You do mention this, but only near the end, by the time some people's panic reflex may already have triggered.) That was a sticking point in previous iterations of this discussion, and I think we should make it clear you're addressing it.

I think the main benefit of this proposal isn't so much that it allows floating point to be weird, but that it allows Equatable and Comparable to be completely normal—anyone writing generic code at that level can assume the expected identities hold and don't have to worry about edge cases introduced by floating-point behavior.

beccadax · October 11, 2018, 3:02pm

The only suggestion I’d make is to provide a way to access these versions of the operators even in a concrete context, instead of using underscored names. Otherwise, I think I’m for it.

Joe_Groff · October 11, 2018, 3:15pm

Yeah, eventually I think we want to have explicit syntax for "namespacing" declarations and references, which would supersede the need for @_implements, e.g.:

struct MyFloat {
  static func Equatable.==(l: MyFloat, r: MyFloat) -> Bool { /* total equality */ }
  static func Numeric.==(l: MyFloat, r: MyFloat) -> Bool { /* IEEE equality */ }
}

let totallyEqual = Equatable.==(x, y)
let floatallyEqual = Numeric.==(x, y)

nnnnnnnn · October 11, 2018, 3:33pm

I think I'm in favor of this change, though it really could lead to some pretty confusing behavior. Am I right in this interpretation?

let numbers = [1.0, 2.0, 3.0, .nan]
let containsAnNaN = numbers.contains(.nan)
// containsAnNaN  = true

var findAnNaN = false
for x in numbers {
    if x == .nan {
        findAnNaN = true
    }
}
// findAnNaN == false

Is there any way we can make direct comparisons to .nan warn, like we do with as? casts that always succeed? That won't take care of all cases, but could help folks who expect floating-point values to have the same semantics as other Equatable types.

nnnnnnnn · October 11, 2018, 3:39pm

We do have the <= operation, as (Keanu Reeves voice) isTotallyOrdered(belowOrEqualTo:). It would be nice to have isTotallyOrdered(below:) and isTotallyOrdered(equalTo:) to match the more commonly required operations.

@scanon This brings to mind another question: What's the behavior of numbers.sorted(by: >)? Does the fact that I'm passing in the actual operator mean that I get the type-specific case instead of the Comparable conformance version? We don't have a way to say "sort in descending order", so there's an implicit promise that sorted(by: >) == sorted().reversed(). My hunch is that this change would break that, or more accurately, still fail to satisfy that expectation.

Nevin · October 11, 2018, 4:04pm

I think we should give more serious consideration to switching IEEE-754 semantics over to “&”-prefixed comparison operators. This would put them better in line with integers, where the wrapping arithmetic found in other languages requires an ampersand to access in Swift.

Note that with integers it only makes a difference which operator you choose when overflow occurs, and similarly with floating-point it only makes a difference when NaN is involved.

In fact, to put integers and floats on equal footing, it might even makes sense to have the standard comparison operators *trap* when they encounter NaN. If that is too inconvenient when sorting though, then the proposed “NaN is less than everything else” would work.

Yes, introducing the “&”-comparison operators will be source-breaking. However, the migration will be automatable by simply inserting an ampersand wherever the types are statically known to conform to FloatingPoint, so concrete code will perform exactly the same.

Generic code will change in exactly the same way as is being proposed in this thread, so when considering the two options there is no difference between them here. They are both equally source-breaking (or source-fixing as the case may be).

The major difference is consistency: IEEE-754 comparisons will always begin with an ampersand, and Comparable ones will always have their usual spelling.