[pitch] Comparison Reform

xwu · April 14, 2017, 12:18am

Getting this sorted out is definitely a worthwhile effort. I do have
thoughts about this proposal:

I continue to have reservations about an identical spelling (e.g. `==`)
giving two different answers with the same values of the same type,
depending on the generic context. It is a very *clever* design, but it is
also a very *subtle* behavior that I can see leading to much confusion and
befuddlement for any user who is not well versed *both* in the intricacies
of IEEE floating point *and* in the intricacies of Swift.

Actually, the fact that this behavior cannot even be achieved without
currently non-existent compiler features means that it is not possible to
understand what's truly going on without reading *this document*, after
mastering *both* IEEE floating point *and* Swift
generics/protocols/extensions/static vs. dynamic dispatch. All to use `==`
correctly. Which is to say, most people will simply not even know if they
happen to be using the `==` they did not intend to use.

I think consideration should be given to a design that achieves a
user-facing but not onerous differentiation between level 1 and level 2
equality. However, the only one I can think of is essentially a different
shade of the `PartiallyComparable` alternative already outlined in the
document. Yet I cannot help but think that the rejected alternative may be
advantageous in one key aspect. `FloatingPoint` comparison is in a sense
"less refined" (not exactly precise language, I know) than the level 2
ordering proposed here, at least in the sense that the latter offers more
semantic guarantees about the relationships between comparison operators.
It's weird that the less refined `FloatingPoint` refines the more refined
`Comparable`, and I think the acrobatics with compiler support illustrate
how the design is actively working against Swift's overarching direction.

Anyway, so much for that about the design overall.

As for nitpicking details, I agree with others and think it's a poor
precedent to say that we're going to use an Obj-C name because it's not
clearly worse than the obvious Swift API guideline-compliant name. When
that's the case, it seems to me that the whole point of having Swift API
naming guidelines and making that huge migration from Swift 2 to 3 was so
that going forward we could name things consistently using one consensus
style. `compare(_:)` does not merit a term-of-art exception when the Swift
name is clearly `compared(to:)`.

···

On Thu, Apr 13, 2017 at 3:17 PM, Ben Cohen via swift-evolution < swift-evolution@swift.org> wrote:

Hi swift evolution,

Here’s another pitch, for The Propoosal Formerly Known As Spaceship.
Comparison Reform

   - Proposal: SE-NNNN
   - Authors: Robert Widmann <https://github.com/codafi>, Jaden Geller
   <https://github.com/jadengeller>, Harlan Haskins
   <https://github.com/harlanhaskins>, Alexis Beingessner
   <https://github.com/Gankro>, Ben Cohen
   <https://github.com/airspeedswift>
   - Status: *Awaiting review*
   - Review manager: TBD

Introduction

This proposal is for changes that we believe should be made to the
existing comparison system by:

   - Making FloatingPoint comparison context sensitive, so that its
   Comparable conformance provides a proper total ordering.
   - Introducing a new ternary-valued compare(_ other: Self) ->
   ComparisonResult method.
   - Removing unnecessary customization points from Comparable.

Motivation

The motivation comes from several independent points:

1: The standard comparison operators have an intuitive meaning to
programmers. Swift encourages encoding that in an implementation of
Comparable that respects the rules of a total order
<https://en.wikipedia.org/wiki/Total_order>\. The standard library takes
advantage of these rules to provide consistent implementations for sorting
and searching generic collections of Comparable types.

Not all types behave so well in this framework, unfortunately. There are
cases where the semantics of a total order cannot apply while still
maintaining the traditional definition of “comparison” for these types.
Take, for example, sorting an array of Floats. Today, Float’s instance of
Comparable follows IEEE-754 and returns false for all comparisons of NaN.
In order to sort this array, NaN s are considered outside the domain of <,
and the order of a sorted array containing them is unspecified. Similarly,
a Dictionary keyed off floats can leak entries and memory.

2: Generic algorithms in the Swift Standard Library that make use of the
current Comparable protocol may have to make extra comparisons to
determine the ordering of values when <, ==, and > should have different
behaviours. Having a central operation to return complete ordering
information should provide a speedup for these operations.

3: The existing comparison operators don’t “generalize” well. There’s no
clean way to add a third or fourth argument to < to ask for non-default
semantics. An example where this would be desirable would be specifying the
locale or case-sensitivity when comparing Strings.

4: Comparable is over-engineered in the customization points it provides:
to our knowledge, there’s no good reason to ever override >=, >, or <=.
Each customization point bloats vtables and mandates additional dynamic
dispatch.

5: When quickly writing a Comparable type, it is easier to implement a
single ternary statement than to separately implement == and <.
Proposed solutionComparisonResult

Foundation’s ComparisonResult type will be mapped into Swift as

@objc public enum ComparisonResult : Int {
  case orderedAscending = -1
  case orderedSame = 0
  case orderedDescending = 1
}

Comparable

Comparable will be changed to have a new ternary comparison method: compare(_
other: Self) -> ComparisonResult. x.compare(y) specifies where to place x
relative to y. So if it yields .orderedAscending, then x comes before y.
This will be considered the new “main” dispatch point of Comparable that
implementors should provide.

Most code will continue to use < or ==, as it will be optimal for their
purposes. However code that needs to make a three-way branch on comparison
can use the potentially more efficient compare. Note that compare is only
expected to be more efficient in this specific case. If a two-way branch is
all that’s being done, < will be more efficient in many cases (if only
because it’s easier for the optimizer).

For backwards compatibility reasons, compare will have a default
implementation defined in terms of <, but to enable only using compare, <
and == will also have default implementations in terms of compare.

The compiler will verify that either compare, or < and ==, are provided
by every type that claims to conform to Comparable. This will be done in
some unspecified way unavailable outside the standard library (it can be
made available to in the future, but that’s an unnecessary distraction for
this proposal).

Types that wish to provide comparison “variants” can do so naturally by
adding compare methods with additional arguments. e.g. String.compare(_
other: Self, in: Locale) -> ComparisonResult. These have no
language-level connection to Comparable, but are still syntactically
connected, implying the same total order semantics. This makes them easier
to discover, learn, and migrate to.

To reduce bloat, the operators <=, >=, and > will be removed from the set
of requirements that the Comparable protocol declares. These operators
will however continue to exist with the current default implementations.
FloatingPoint

No changes will be made to the FloatingPoint protocol itself. Instead,
new extensions will be added to it to change the behaviour of comparison.

The new behaviour centers around the fact that compare(_: Self) ->
ComparisonResult will provide a total ordering that’s consistent with
Level 2 in the IEEE 754 (2008) spec. This is mostly the same as the
standard (Level 1) IEEE ordering, except:

   - -0 < +0
   - NaN == NaN
   - NaN > +Inf (an arbitrary choice, NaN can be placed anywhere in the
   number line)

Level 2’s distinguishing of -0 and +0 is a bit strange, but is consistent
with Equatable’s Substitutability requirement. -0 and +0 have different
behaviours: 1/-0 = -Inf while 1/+0 = +Inf. The main problem this can lead
to is that a keyed collection may have two “0” entries. In practice this
probably won’t be a problem because it’s fairly difficult for the same
algorithm to produce both -0 and +0. Any algorithm that does is also
probably concerned with the fact that 1.0E-128 and 2.0E-128 are
considered distinct values.

Note: IEEE specifies several other potential total orderings: level 3,
level 4, and the totalOrder predicate. For our purposes, these orderings
are too aggressive in distinguishing values that are semantically
equivalent in Swift. For most cases, the relevant issue is that they
distinguish different encodings of NaN. For more exotic encodings that
don’t guarantee normalization, these predicates also consider 10.0e0 <
1.0e1 to be true. An example where this can occur is *IEEE-754 decimal
coded floating point*, which FloatingPoint is intended to support.

We will then make the comparison operators (<, <=, ==, !=, >=, >)
dispatch to one of compare(_:) or FloatingPoint’s IEEE comparison methods
(isLess, isEqual, isLessThanOrEqualTo) based on the context.

   - If the context knows the type is FloatingPoint, then level 1
   ordering will be used.
   - If the context only knows the type is Comparable or Equatable, then
   level 2 ordering will be used.

This results in code that is explicitly designed to work with
FloatingPoint types getting the expected IEEE behaviour, while code that is
only designed to work with Comparable types (e.g. sort and Dictionary)
gets more reasonable total ordering behaviour.

To clarify: Dictionary and sort won’t somehow detect that they’re being
used with FloatingPoint types and use level 1 comparisons. Instead they
will unconditional use level 2 behaviour. For example:

let nan = 0.0/0.0
func printEqual<T: Equatable>(_ x: T, _ y: T) {
  print(x == y)
}
func printEqualFloats<T: FloatingPoint>(_ x: T, _ y: T) {
  print(x == y)
}
print(nan == nan) // false, (concrete)
printEqual(nan, nan) // true, (generic Equatable but not FloatingPoint)
printEqualFloats(nan, nan) // false, (generic FloatingPoint)

If one wishes to have a method that works with all Equatable/Comparable
types, but uses level 1 semantics for FloatingPoint types, then they can
simply provide two identical implementations that differ only in the bounds:

let nan = 0.0/0.0
func printEqual<T: Equatable>(_ x: T, _ y: T) {
  print(x == y)
}
func printEqual<T: FloatingPoint>(_ x: T, _ y: T) {
  print(x == y)
}

printEqual(0, 0) // true (integers use `<T: Equatable>` overload)
printEqual(nan, nan) // false (floats use `<T: FloatingPoint>` overload)

As a result of this change, hashing of floats must be updated to make all
NaNs hash equally. -0 and +0 will also no longer be expected to hash
equally. (Although they might as an implementation detail – equal values
must hash the same, unequal values *may* hash the same.)
Misc Standard Library

Types that conform to Comparable should be audited for places where
implementing or using Comparable would be a win. This update can be done
incrementally, as the only potential impact should be performance. As an
example, a default implementation of compare(_:) for Array will likely be
suboptimal, performing two linear scans to determine the result in the
worst-case. (See the default implementation provided in the detailed
design.)

Some free functions will have <T: FloatingPoint> overloads to better
align with IEEE-754 semantics. This will be addressed in a follow-up
proposal. (example: min and max)
Detailed Design

The protocols will be changed as follows:

ComparisonResult, currently a type found in Foundation, will be sunk into
the Swift Standard Library:

@objc public enum ComparisonResult: Int, Equatable {
  case orderedAscending = -1
  case orderedSame = 0
  case orderedDescending = 1
}
public protocol Comparable: Equatable {
  func compare(_ other: Self) -> ComparisonResult

  static func < (lhs: Self, rhs: Self) -> Bool
}
extension Comparable {
  func compare(_ other: Self) -> ComparisonResult {
    if self == other {
      return .orderedSame
    } else if self < other {
      return .orderedAscending
    } else {
      return .orderedDescending
    }
  }
}
public func < <T: Comparable>(lhs: T, rhs: T) -> Bool {
  return lhs.compare(rhs) == .orderedAscending
}
// IEEE comparison operators (these implementations already exist in std)extension FloatingPoint {
  public static func == (lhs: T, rhs: T) -> Bool {
    return lhs.isEqual(to: rhs)
  }

  public static func < (lhs: T, rhs: T) -> Bool {
    return lhs.isLess(than: rhs)
  }

  public static func <= (lhs: T, rhs: T) -> Bool {
    return lhs.isLessThanOrEqualTo(rhs)
  }

  public static func > (lhs: T, rhs: T) -> Bool {
    return rhs.isLess(than: lhs)
  }

  public static func >= (lhs: T, rhs: T) -> Bool {
    return rhs.isLessThanOrEqualTo(lhs)
  }
}

// Comparable comparison operators (provides a total ordering)extension FloatingPoint {
  @_inline
  public func compare(_ other: Self) -> ComparisonResult {
    // Can potentially be implemented more efficiently -- this is just the clearest version
    if self.isLess(than: other) {
      return .orderedAscending
    } else if other.isLess(than: self) {
      return .orderedDescending
    } else {
      // Special cases

      // -0 < +0
      if self.isZero && other.isZero {
        // .plus == 0 and .minus == 1, so flip ordering to get - < +
        return (other.sign as Int).compare(self.sign as Int)
      }

      // NaN == NaN, NaN > +Inf
      if self.isNaN {
        if other.isNaN {
          return .orderedSame
        } else {
          return .orderedDescending
        }
      } else if other.isNaN {
        return .orderedAscending
      }

      // Otherwise equality agrees with normal IEEE
      return .orderedSame
    }
  }

  @_implements(Equatable.==)
  public static func _comparableEqual(lhs: Self, rhs: Self) -> Bool {
    lhs.compare(rhs) == .orderedSame
  }

  @_implements(Comparable.<)
  public static func _comparableLessThan(lhs: Self, rhs: Self) -> Bool {
    lhs.compare(rhs) == .orderedDescending
  }
}

Note that this design mandates changes to the compiler:

   - @_implements (or an equivalent mechanism) must be implemented to get
   the context-sensitive FloatingPoint behaviour.
   - The compiler must verify that either == and <, or compare(_:) is
   overridden by every type that conforms to Comparable.

Source compatibility

Users of ComparisonResult will be able to use it as normal once it
becomes a standard library type.

Existing implementors of Comparable will be unaffected, though they
should consider implementing the new compare method as the default
implementation may be suboptimal.

Consumers of Comparable will be unaffected, though they should consider
calling the compare method if it offers a performance advantage for their
particular algorithm.

Existing implementors of FloatingPoint should be unaffected – they will
automatically get the new behaviour as long as they aren’t manually
implementing the requirements of Equatable/Comparable.

Existing code that works with floats may break if it’s relying on some
code bounded on <T: Equatable/Comparable>providing IEEE semantics. For
most algorithms, NaNs would essentially lead to unspecified behaviour, so
the primary concern is whether -0.0 == +0.0 matters.
ABI stability

This must be implemented before ABI stability is declared.
Effect on API resilience

N/A
Alternatives ConsideredSpaceship

Early versions of this proposal aimed to instead provide a <=> operator
in place of compare. The only reason we moved away from this was that it
didn’t solve the problem that comparison didn’t generalize.

Spaceship as an operator has a two concrete benefits over compare today:

   - It can be passed to a higher-order function
   - Tuples can implement it

In our opinion, these aren’t serious problems, especially in the long term.

Passing <=> as a higher order function basically allows types that aren’t
Comparable, but do provide <=>, to be very ergonomically handled by
algorithms which take an optional ordering function. Types which provide
the comparable operators but don’t conform to Comparable are only pervasive
due to the absence of conditional conformance. We shouldn’t be designing
our APIs around the assumption that conditional conformance doesn’t exist.

When conditional conformance is implemented, the only
should-be-comparable-but-aren’t types that will remain are tuples, which
we should potentially have the compiler synthesize conformances for.

Similarly, it should one day be possible to extend tuples, although this
is a more “far future” matter. Until then, the (T, T) -> Bool predicate
will always also be available, and < can be used there with the only
downside being a potential performance hit.
Just Leave Floats Alone

The fact that sorting floats leads to a mess, and storing floats can lead
to memory leaks and data loss isn’t acceptable.
Just Make Floats Only Have A Total Order

This was deemed too surprising for anyone familiar with floats from any
other language. It would also probably break a lot more code than this
change will.
Just Make Floats Not Comparable

Although floats are more subtle than integers, having places where
integers work but floats don’t is a poor state of affairs. One should be
able to sort an array of floats and use floats as keys in data structures,
even if the latter is difficult to do correctly.
PartialComparable

PartialComparable would essentially just be Comparable without any stated
ordering requirements, that Comparable extends to provide ordering
requirements. This would be a protocol that standard IEEE comparison could
satisfy, but in the absence of total ordering requirements,
PartialComparable is effectively useless. Either everyone would consume
PartialComparable (to accept floats) or Comparable (to have reasonable
behaviour).

The Rust community adopted this strategy to little benefit. The Rust libs
team has frequently considered removing the distinction, but hasn’t because
doing it backwards compatibly would be complicated. Also because merging
the two would just lead to the problems Swift has today.
Different Names For compare and ComparisonResult

A few different variants for ComparisonResult and its variants were
considered:

   - Dropping the ordered part of ComparisonResult’s cases e.g. .ascending
   - Naming of ComparisonResult as SortOrder
   - enum Ordering { case less, equal, greater } (as used by Rust
   <https://doc.rust-lang.org/std/cmp/enum.Ordering.html>\)
   - Case values of inOrder, same, outOfOrder

The choice of case names is non-trivial because the enum shows up in
different contexts where different names makes more sense. Effectively, one
needs to keep in mind that the “default” sort order is ascending to map
between the concept of “before” and “less”.

The before/after naming to provide the most intuitive model for custom
sorts – referring to ascending or less is confusing when trying to
implement a descending ordering. Similarly the inOrder/outOfOrder naming
was too indirect – it’s more natural to just say where to put the element.
If the enum should focus on the sorting case, calling it SortOrder would
help to emphasize this fact.

This proposal elects to leave the existing Foundation name in-place. The
primary motivation for this is that use of the compare function will be
relatively rare. It is expected that in most cases users will continue to
make use of == or <, returning boolean values (the main exception to this
will be in use of the parameterized String comparisons). As such, the
source compatibility consequences of introducing naming changes to an
existing type seems of insufficient benefit.

The method compare(_:) does not fully comport with the API naming
guidelines. However, it is firmly established with current usage in
Objective-C APIs, will be fairly rarely seen/used (users will usually
prefer <, == etc), and alternatives considered, for example compared(to:),
were not a significant improvement.
Add Overloads for (T, T) -> ComparisonResult

It would be slightly more ergonomic to work with ComparisonResult if
existing methods that took an ordering predicate also had an overload for (T,
T) -> ComparisonResult. As it stands, a case-insensitive sort must be
written as follows:

myStrings.sort { $0.compare(_ other: $1, case: .insensitive) == .orderedAscending }

With the overload, one could write:

myStrings.sort { $0.compare($1, case: .insensitive) }

we decided against providing these overloads because:

   - The existing algorithms in the standard library can’t benefit from
   them (only binary comparisons).
   - They bloat up the standard library (and any library which intends to
   match our API guidelines).
   - They potentially introduce confusion over “which” comparison
   overload to use.

And because we can change our mind later without concern for source or ABI
stability, as these overloads would be additive.
Future Work

This section covers some topics which were briefly considered, but were
identified as reasonable and possible to defer to future releases.
Specifically they should be backwards compatible to introduce even after
ABI stability. Two paths that are worth exploring:
Ergonomic Generalized Comparison for Keyed Containers

Can we make it ergonomic to use an (arbitrary) alternative comparison
strategy for a Dictionary or a BinaryTree? Should they be type-level
Comparators, or should those types always store a (Key, Key) ->
ComparisonResult closure?

We can avoid answering this question because Dictionary is expected to
keep a relatively opaque (resilient) ABI for the foreseeable future, as
many interesting optimizations will change its internal layout. Although if
the answer is type-level, then Default Generic Parameters must be accepted
to proceed down this path.
ComparisonResult Conveniences

There are a few conveniences we could consider providing to make
ComparisonResult more ergonomic to manipulate. Such as:

// A way to combine orderingsfunc ComparisonResult.breakingTiesWith(_ order: () -> ComparisonResult) -> ComparisonResult

array.sort {
  $0.x.compare($0.y)
  .breakingTiesWith { $0.y.compare($1.y) }
  == .orderedAscending
}

and

var inverted: ComparisonResult
// A perhaps more "clear" way to express reversing order than `y.compared(to: x)`
x.compare(y).inverted

But these can all be added later once everyone has had a chance to use
them.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution