Comparable and FloatingPoint types

:rocket: is outside the scope of this pitch, sorry (but this doesn’t prevent it, either).

1 Like

Is this override only possible within a file context? It would be really nice to be switch the default within a function (e.g. when working with lots of CGFloat values), but I can't make that work.

IMO it ought to be possible to change it at function scope, but I believe that's a known bug that we don't look at local scopes for the default literal types.

Of the differences between this proposal's design and Kotlin's design:

I think I agree that .nan > .infinity is superior to .nan < -.infinity on the following basis:

  • Troubles arise when users expect their input to be more restricted than it actually is; of such assumptions, it is more likely that a user will expect to work with a set of numbers in the domain [0, inf)--i.e., only with positive numbers--than in the domain (-inf, -0].
  • An exceptional value less than zero (as NaN would be in this proposal's design) is more surprising than an exceptional value greater than infinity when the user makes that assumption.

I think that 0 == -0 is a superior choice for the reasons you outline, of which most important in my mind is that FloatingPoint.== should imply Equatable.==, even if we are severing the relation going the other way.

(Loose thought: now, if we don't adopt an amendment that Numeric.== should be the same as FloatingPoint.== for floating-point types, then I think we ought to give serious consideration to negative NaN sorting below negative infinity and positive NaN sorting above positive infinity (Ă  la totalOrder) so as to preserve the relation that x == -x implies x == 0 among signed numeric types.)

Are there any existing proposals for this? I've been pondering this exact thing for a couple of days now, and would love to see it.

Personally, I think this is a huge showstopper for this approach. We've discussed similar things in the past and have never done them for this reason. The arguments against this are numerous: surprising behavior when a function is made generic (or non-generic), the fact that code behaves subtly differently in different contexts can bite you when copy/pasting (e.g. from stackoverflow), and the fact that this issue is as much of a potential bug for generic code as it is for non-generic code.

Furthermore, making this change would be a source compatibility change. While (in this instance) I can't imagine any problems that this would cause, we still have these standard guidelines for such changes:

Source-breaking changes in Swift 5 will have an even higher bar than in Swift 4, following these guidelines:

  • The current syntax/API must be shown to actively cause problems for users.
  • The new syntax/API must be clearly better and must not conflict with existing Swift syntax.
  • There must be a reasonably automated migration path for existing code.

To these points, while I agree that this behavior isn't optimal, I haven't seen evidence of this causing active problems for users - do anyone actually search a list for a NaN in a concrete context but not a generic context? Do you have any other evidence of active harm or widespread confusion that would be worth making a change for?

Keep in mind that the bar is high here - I personally wouldn't consider a few posts on stack overflow to be sufficient to be worth changing such a fundamental symmetry in the language (between generic/nongeneric code).

-Chris

1 Like

Completely agree with this point.

FWIW, I consider #1 to be the default option that we should go with unless there is something clearly better. We have had this behavior for years now and (as you say) it is highly precedented in other languages. I agree it is suboptimal, but not enough to be worth heroic solutions given where we are.

I haven't carefully considered this, but what about a variant of this:

  • Do not change any existing behavior.
  • Introduce an new/additive <=> operator, returning less/greater/equal/unordered, which is (by default) implemented in terms of the < and == operators for a type. Floating point types would override this default implementation with the obvious impl.

This solution is nice because:

  1. It does not affect source or ABI compatibility.

  2. sort and specific other algorithms could (optionally) be changed to use this, incrementally fixing a bug for FP types and possibly making those algorithms more efficient. If done in time for Swift 5, this operator could might even be able to be added as a requirement for Comparable types.

  3. It is additive, so it can be done at any time.

  4. For users that hit surprising behavior in practice, e.g. your contains example, they can use this (along the a closure predicate) to get the behavior they want.

Have you considered an approach like this?

-Chris

5 Likes

I think that it is a real problem, but it's easy for people to shrug their shoulders and say "well, floating-point is weird, I guess". Absence of evidence is not evidence of absence and all that. The behavior with contains and whatnot are inconveniences that users can work around, but as @Ben_Cohen mentions upthread, the non-totality of comparisons also has the potential to cause otherwise safe algorithms against Comparable to be unsafe.

This would be the other reasonable fix, to my mind, but I don't think that I would want it to return unordered. I would want to require that such an operator implements a total order. @Gankro wrote a fairly detailed proposal of this at some point, IIRC. As long as we stay away from the horrifying thing that is the C++ proposal, I think we could make it work. It's a larger change, but one with less API surface and language churn, as you say.

I understand the value that being an additive change brings. It's easy to do, and it doesn't disturb the status quo. But I think disturbing the status quo is warranted. With the proposed <=> operator:

  • The "obvious" way of comparing via <, ==, etc. operators brings technically correct but surprising behavior
  • The more conventionally desirable behavior is hidden into a less discoverable place (<=>).

I think we should value typically desirable behavior as more important than behavior that's less commonly desired, but necessitated by a technical standard.

I would bet fair money that most people who want NaN propagation in their algorithms, know enough already on the subject matter that they can look for these tools when they need them (e.g. some differently operators). This is already precedented with the behavior of overflow-tolerant operators like &+. If people made the conscious choice to want overflow, then they probably know enough to be able to discover these operators to apply to their less-common use case.

On the other hand, I don't think that your average programmer who wants sensible default behavior (i.e. not expecting [5, 4, 3, .nan, 2, .nan, 1].sorted() to yield [3.0, 4.0, 5.0, nan, 2.0, nan, 1.0]) will be able to find <=>.

7 Likes

My understanding is that Swift has that feature to allow concrete implementations of a function to more efficiently arrive at the same conclusion as the generic version, not to provide for different semantics in concrete vs generic contexts. Is that not the case?

This? http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf

Ok, I've been asked to expand my argument. This is all just MHO, not speaking for anyone else.

For me, the core argument of this proposal is:

  1. that people understand IEEE arithmetic, the semantics of comparable, and when it is appropriate to use one or the other.
  2. that people think about code differently in generic and non-generic contexts.
  3. that people can/should be expected to know that FP comparisons have a weird behavioral difference (which, AFAIK is unprecedented anywhere else in Swift).

I agree that the points above will be undeniably true for some people, but MHO is that I don't think it is true for most people. The third bullet point is particularly concerning for me, because this imposes a conceptual tax on everything that uses floating point, along with ample footguns.

Furthermore, this just shifts the weird behavior around. I can see the stack overflow questions now, things like "why does this assertion fail in my code?"

  // This should work with a global sane and consistent Comparable.
  var x : [Float] = getStuff()
  x.sort() 
  assert(x[0] <= x[1])

I can imagine people getting confused why changing a manually written loop to contains (or back) suddenly behaves differently, or why these are suddenly different:

  if arr.contains(fpValue) {
  if arr.contains({ $0 == fpValue}) {

I can imagine people being surprised that the behavior of min and max just changed. This semantic breakage should be specifically mentioned in the proposal because it could significantly impact existing numeric code.

I can also similarly imagine people copying generic floating point algorithms into their concrete code and having them behave differently, or have their code fail when converting from concrete to generic constraints.

The advantages of my source/ABI-compatible proposal sketch (introducing a new <=> operator) are:

  • There is no rush to do it. ABI stability branch date is not far off and I'm very concerned we don't have time to properly evaluate and explore this proposal.
  • Use of an explicit <=> operator makes it clear to maintainers of the code that something interesting is going on.
  • It is fully source and binary compatible.

I get this the current behavior is seriously suboptimal, but the time to fix it was years ago and we didn't (for various sad reasons). IMO, the cure in this proposal is worse than the disease.

-Chris

5 Likes

On the other hand, I think most people would (reasonably) expect <=> to behave consistently with == and <, <=, >, and >=.

Is there really no possibility of fixing this properly in the future?

I would support doing nothing right now if significant breaking changes of the kind required to solve this problem were on the table down the road. Adopting total ordering for the conventional comparison operators and the & prefix for floating point semantics looks like the best long-term solution to me.

I really hope the community is willing to consider these kinds of changes in "breaking releases" that happen infrequently. That would go a long way towards allowing the language and library to continue to evolve and be refined over time rather than accumulating cruft nobody likes but carried forward only because the breaking changes are too significant.

20 Likes

I second everything Matt is saying here.

Additionally, wasn’t it stated somewhere that the definition of “source compatibility” we’re using is essentially “It will be possible to compile existing programs by specifying which version of Swift the compiler should use”?

I seem to recall hearing that, but I don’t remember where. If so, that might make a significant and important one-time source-break more palatable, since existing programs can continue to be compiled in Swift-4 mode without any modification to their source code or program behavior.

While I agree with this, like Steve said, pretty much every other programming language has NaN == NaN returning false. The only exceptions are Awk, Groovy and OCaml (which has 2 equality checking operators - Nan = Nan does return false).

That said, I still think it's pretty surprising behaviour to most programmers, who in general are not floating-point experts but rather reach to float/double whenever they need a non-integer number.

If we were making the language again, I would probably recommend making Decimal a core type in the standard library and the default type for literals with decimal points, and relegating IEEE754 to something you need to spell explicitly. But that's totally out of the question now.

2 Likes

I couldn't agree more. The sooner this is fixed the better in MHO.

3 Likes

That's really well put. I don't think there's anyone using Swift because they like the source-compatibility. ObjC fills that roll. People want a programming language that's a joy to use.

I agree there's a high bar for breaking changes, whose exact height should be proportional to the benefit of a proposed change, but I think we shouldn't rule things out entirely prima facie.

In this case, we're evaluating if it is a worthwhile goal to have basic math be intuitive and predictable to programmers of all backgrounds. I think so, yes. Just look at the 231K+ views, 3K+ up votes, 890 stars of the famous Is floating point math broken? Stack overflow question.. It's abundantly clear that people don't generally understand floating point math, and if it weren't for the precedent of IEEE 754 baked into most languages, they wouldn't need to.

Think about it, how often are you facing challenges with respect to subnormal numbers, funky nan operator behaviour, or imprecise rounding. Now, of those times, how many of them were actually for a use case that required their consideration?

I think Python makes the right choice by making arbitrary precision numbers be the default, while making IEEE 754 available on an opt-in basis when the programmer has found a motivating case. I would love to see Decimal be the default floating point type in Swift. In many ways, Swift already embraces the mindset that sometimes sacrificing performance for readability, maintainability and ergonomics is worthwhile... We make such a sacrifice anytime we use Int for a count, array index, or other place where its full 64 bit range and negative support isn't necessary. But it saves us from tons of senseless casting. We see just how annoying it would be otherwise, when we interact with C or ObjC APIs. How many times have you written something like this, prior to Swift 4.2?

extension RandomAccessCollection {
    var randomElement: Element {
        return self [Int(arc4random_uniform(UInt32(self.count)))]
    }
}

Though I still think it's probably still a good idea to keep Int as the default for integers. Floating point numbers as array and loop indexes disturb me.

3 Likes

If there is any chance that Decimal can be the default floating point type in Swift, I'd happily trade all the source breakage in the world. Float and Double are almost never what you I need - it's just what you have to deal with.

2 Likes

This really is a separate topic from the one discussed here. IEEE Decimal types also have the concept of NaN and the same issue here about floating-point equality would apply whatever the default floating-point type.

3 Likes

Agreed - moving from IEEE Floats to IEEE Decimals wouldn't solve the issue addressed by this pitch: IEEE754 just doesn't fit well into our protocols*.
But the issue could be fixed with a set of floating point types that adhere to the standard, and another set of types that violate IEEE754, but don't break the fundamental rules of Comparable and Equatable.

Imho it's better to have a small break with IEEE754 than keeping a contradiction in Swift itself - and we wouldn't even have to introduce new types to make a split, because we already have CFloat and CDouble...

* I explicitly do not say IEEE itself needs "fixing" - from a mathematical perspective, it's actually rather stupid to force nan and infinity into a strict order, and nan != nan makes a lot more sense than nan == nan

3 Likes