What is intended for using Heap with non-comparable values?

anon9791410 · December 14, 2022, 6:41am

With the other heap implementations I've used in Swift, sorting closures are employed. That comes with some problems, but it also allows for elements that are not Comparable. What should we do for such elements, with the Heap that's currently in Collections?

The following works, but using a different closure for every element is overkill.

public extension Heap {
  /// A  non-`Comparable` "`Value`" type, which provides a `Comparable` value for use with a `Heap`.
  struct ElementValuePair<Value> {
    public var value: Value
    private let makeElement: (Value) -> Element
  }
}

public extension Heap.ElementValuePair {
  init(
    _ value: Value,
    _ makeElement: @escaping (Value) -> Element
  ) {
    self.init(value: value, makeElement: makeElement)
  }

  var element: Element { makeElement(value) }
}

extension Heap.ElementValuePair: Comparable {
  public static func < (lhs: Self, rhs: Self) -> Bool {
    lhs.element < rhs.element
  }

  /// Only necessary because Comparable: Equatable. 😞
  public static func == (lhs: Self, rhs: Self) -> Bool {
    fatalError()
  }
}

lorentey · December 24, 2022, 12:19am

Swift is a strongly typed language.

We want to ensure that a function that takes a Heap<Foo> will always get elements out in a consistent order that is (for distinct items) entirely and unambiguously defined by the type Foo. This is a core design principle; it is not a negotiable feature. For container types of comparable values to be practical, it is crucial that the ordering relation for their values is defined by their static type.

That is to say, two Heap instances of the same Element type must be guaranteed to agree on the ordering they use. This is precisely the same principle that ensures that two Set values of the same Element type will apply the same definition of element equality.

The idiomatic way to set up a local conformance to Comparable is the same as it is for Equatable, Hashable, or any other standard protocol -- it is to wrap the type in a custom struct.

Here is a silly example with a wrapper type that flips the < relation:

struct Decreasing<Value: Comparable>: Comparable {
  var value: Value

  init(_ value: Value) { self.value = value }

  static func ==(left: Self, right: Self) -> Bool {
    left.value == right.value
  }

  static func <(left: Self, right: Self) -> Bool {
    left.value > right.value
  }
}

let heap: Heap<Decreasing<String>> = []

(This is a silly example, because Heap implements a min-max heap -- however, it might come handy for, say, the upcoming SortedSet type.)

Note: properly implementing == is not optional -- an implementation that simply calls fatalError violates Comparable's requirements, and it is almost guaranteed to cause headaches later. Do not do that.

anon9791410 · December 24, 2022, 4:11am

No, that doesn't answer my question at all. As I said, I'm talking about the cases where the elements are not Comparable*.

As an exercise, implement Dijkstra's algorithm in a matrix indexed by SIMD2<Int32>, using this Heap implementation in Collections. It can be done without a closure, but I say it can't be done with just this type. So I don't think it's very useful. If I'm wrong, I'd like to see it documented why.

A weightless implementation using "ElementValuePair"

/// A dictionary that can be walked backwards, to achieve to shortest path to `start`.
/// - Parameter connected: Whether two orthogonal neighbors are considered to be connected. Order matters.
func shortestPathSources(
  from start: Index,
  connected: (_ source: Element, _ destination: Element) -> Bool
) -> [Index: Index] {
  var shortestPathSources: [Index: Index] = [:]
  var stepCounts = [start: 0]
  func stepCount(index: Index) -> Int { stepCounts[index]! }
  var heap = Heap([Heap.ElementValuePair(start, stepCount)])

  while let source = heap.popMin()?.value {
    let newStepCount = stepCounts[source]! + 1

    for destination in orthogonalNeighbors(source)
    where
      connected(try! self[validating: source], destination.value)
      && stepCounts[destination.key].map({ newStepCount < $0 }) != false
    {
      stepCounts[destination.key] = newStepCount
      shortestPathSources[destination.key] = source
      heap.insert(.init(destination.key, stepCount))
    }
  }

  return shortestPathSources
}

* It doesn’t actually matter if the associated values are Comparable or not. It will just get you in the right headspace if you imagine that they might not be, because their < is unrelated to what’s being sorted via the Heap.

lorentey · December 24, 2022, 9:17pm

Thanks for telling more about what you're actually trying to do -- this is very helpful!

The code snippet you quoted is not self-contained, which makes it difficult to reason about it. However, the mutations of the stepCounts dictionary sometimes decrease the priority of heap members while they're contained in the heap. This is a crucial bug, because Heap is not aware of this happening. Your problem isn't really about the Comparable conformance; rather, it hangs entirely on this issue.

Decreasing the priority of a heap member sometimes requires the binary min-max heap to sink or raise the affected member along the (logical) path that includes it, to restore the min-max heap property. In order for this to happen, the heap needs to be explicitly notified of the priority change; i.e. you need a heap that provides operations to let you retrieve an existing item and update its priority. Heap doesn't implement such operations.

Trying to work around this by sneakily changing the priority of an unvisited node behind the back of the Heap is simply not going to work correctly, no matter what. Doing that is equivalent to mutating the hash value (a.k.a., the Equatable identity) of a Set member: it is a serious programming error.

Protocols like Equatable/Comparable/Hashable inherently require that the values returned by ==/</hashValue/etc. will be stable: if a and b are constant values, then the result of a == b or a < b must not ever change. (Otherwise e.g. Equatable.== will not implement a proper equivalence relation, rendering the conformance invalid.)

In the case of ElementValuePair, the result of < depends on the result of an arbitrary, user-specified closure, which makes it impossible to statically reason about the validity of its Comparable conformance. In the case of the closure used in shortestPathSources, < ends up consulting the stepCounts table, whose entries are mutated within the inner loop, sometimes resulting in heap items silently changing their minds about what their relative weights are. The ElementValuePair.< function is therefore not stable; that is to say, the ordering relation is not well-defined. This is going to lead to violations of core heap invariants, rendering the data structure useless. Garbage in, garbage out.

So the problem here isn't that it's difficult to spell the Comparable conformance: it never is, as long as you actually have a well-defined ordering relation. (If you don't have a well-defined ordering relation, then the concept of heaps does not apply -- the construct becomes meaningless.) In the case of Dijsktra's algorithm, you just need to associate your unvisited nodes with their shortest known distance, and the right way to do that is to simply wrap this pair of values in a straightforward custom struct:

struct Unvisited<Node>: Comparable {
  var node: Node
  var distance: Int

  static func ==(left: Self, right: Self) -> Bool { left.distance == right.distance }
  static func <(left: Self, right: Self) -> Bool { left.distance < right.distance }
}

This is one way to correctly model the minimum-queue of unvisited nodes in Dijkstra's algorithm. There is nothing weird or difficult about the Comparable conformance here -- implementing this type is a routine, trivial task.

Because this type is a struct, inserting one of its instances into Heap actually inserts a standalone copy of the value, whose priority value cannot possibly be changed without the cooperation of the container. This is a crucial feature, because it eliminates a whole class of errors that involve accidental uncoordinated mutations of such data -- such as the ones arising from the externally referenced priority values in ElementValuePair.<.

Unfortunately, Heap doesn't provide an operation to decrease the weight of its items after insertion, so it is simply not suitable for use in this version of Dijkstra's algorithm.

But that's not a problem! You just need to adapt the algorithm to remove the need to decrease priorities. One common way of doing that is to simply insert duplicates in these cases:

var shortestSources: [Node: Node] = [:]
var distances: [Node: Int] = [start: 0]
var unvisited: Heap<Unvisited<Foo>> = [Unvisited(node: start, distance: 0)]

while let source = unvisited.popMin() {
  guard source.distance == distances[source.node] else { 
    continue // ignore obsolete dupe
  }
  let d = source.distance + 1
  for destination in source.node.neighbors {
    if d < distances[destination] ?? Int.max {
      distances[destination] = d
      shortestSources[destination] = source.node
      unvisited.insert(Unvisited(node: destination, distance: d))
    }
  }
}

(Note: This code is just illustrating the idea; it probably isn't bug-free.)

In practice, I expect this modified algorithm to perform better than the variants that require a heap that supports decreasing the priority of a heap member.

Heap intentionally doesn't provide a way to look up arbitrary existing members in it; and without such a lookup function, there is also no way for it to allow updating their priorities. This is very much by design -- the machinery required to speed up such lookup operations (such as maintaining a lookup table) would significantly slow down the core heap operations (insert, popMin and popMax), while Heap really is designed to make those as fast as possible.

Heap is intentionally not the appropriate choice for use cases that absolutely do require looking up, updating, or removing arbitrary heap members. Luckily, Dijsktra's algorithm is not one of these use cases; but sometimes they do pop up. My plan is to cover them in a future Collections release, after 1.1. (For what it's worth, I think the name PriorityQueue would be right for this follow-up construct, in the footsteps of PR #51. Unfortunately that PR does not implement a lookup feature; it merely concentrates on implementing FIFO ordering for duplicate priorities instead, and I don't think that necessarily needs to be implemented as a first-class construct.)

anon9791410 · December 25, 2022, 7:40am

It is actually not. The pattern I experience is that usually, when people ask for a solution to a generic problem on programming forums, the answerers feel a need to focus on a use case, as if that's what's being asked, instead of attacking the generic problem. It's quite disrespectful to everyone's time—and it especially doesn't make sense to do that when the subject is a generic data structure.

I'm feeling very frustrated that you spent time outside of the solution on these previous posts. You have better things to do. I felt like most of your previous post was covered by

…but I should have elaborated. Thank you for doing that much better than I could have.

Moving on, it's crucially important to your understanding that you realize while you think that you've shown a different solution here…

lorentey:

the right way to do that is to simply wrap this pair of values in a straightforward custom struct:

struct Unvisited<Node>: Comparable {
  var node: Node
  var distance: Int

  static func ==(left: Self, right: Self) -> Bool { left.distance == right.distance }
  static func <(left: Self, right: Self) -> Bool { left.distance < right.distance }
}

…it's actually the same as ElementValuePair above, except:

It's not generic, and therefore not suitable for the Collections package.
(If this were actually "the right way", then Swift should not have generics. Instead, we should be manually copying, pasting, and making small naming changed to source code for every type.)
The == is a lie. Distance being equivalent certainly does not imply that two of these are equivalent.
There's not a closure.

You seem to be very hung up on #3, but as I said,

Your solution involving Unvisited reinforces to me that Heap will not be usable without the following, closure or no:

public extension Heap {
  /// A "`Value`" that uses an accompanying `Heap.Element` for sorting  via a `Heap`.
  /// - Note: If `Value` itself is `Comparable`, it can of course be inserted into a Heap directly.
  ///   This type is explicitly for cases where a different sorting rule is desired.
  struct ElementValuePair<Value> {
    public var element: Element
    public var value: Value
  }
}

// See above for identical `Comparable` conformance.

As far as I can tell, it's impossible to avoid this violation. Comparable inheriting from Equatable is generally useful, but in this case, it's a bug.

P.S. As much as I'd like to not be discussing this, considering this is all publicly posted…

…This is incorrect dogmatic thinking. It's certainly another contractual violation, but it represents a perfectly legitimate algorithmic variation, in the above, which I used to solve https://adventofcode.com/2022/day/12 as an excuse to try out Heap. I could have cached the step counts, like you did, using the classical implementation, but there's no need.

Despite the usability issue discussed here, Heap worked great anyway. Thanks!

lorentey · December 25, 2022, 6:24pm

Oh well; best of luck! ¯\_(ツ)_/¯

taylorswift · December 26, 2022, 9:22pm

to provide a little context, this is indeed an issue some longtime users of swift like myself recognize, and i think if we were to go back to 2015 and redesign Comparable, Equatable, and Hashable, it would probably not look exactly the same as it does now.

there is a thread with extensive discussion of this issue here.

in my view EHC’s biggest limitation is it doesn’t have the concept of “projected identity spaces”. to provide a concrete example, we cannot do things such as:

let separator:[OddNumber] = [3, 5, 9]
let numbers:[Int] = [3, 5, 9, 2, 0, 1]

numbers.starts(with: separator)

because we don’t have a way of expressing that an Int can be compared with an OddNumber by promoting the OddNumber value to “Int space”. swift also eschews the notion of a “reinterpret cast”. this is a language design principle, because “rebinding” [OddNumber] → [Int] is unsound under mutations. on the other hand that decision removes the typical escape hatch other languages use to work around this limitation.

i know this sounds like a toy example that is far removed from your Heap problem, but many pain points that come from EHC-based data structures like Heap aren’t actually problems with the data structure itself, at their core they are problems with identity spaces.

this got slightly better with the introduction of Identifiable a minor release or two back. but to my disappointment, we have not made much additional progress in this direction since then.

TLDR; programming in swift requires a very essentialist worldview, and things like “X is behaving as a Y in context Z” do not make sense to many of us, because the language does not just expect that X can “behave” like a Y, it expects that X must be a Y and be able to do Y things anywhere it travels.

C++ parameterizes this by always making you “say the Z”, which makes for namespaced::colon<angle::bracket, even_more::brackets<std::hell<everything::needs:to::be::said<out::loud, so::everyone::defaults<to::auto>>>>>. swift avoids this problem by requiring that every X should have exactly one canonical Z. some of us like this because you can reason about X without having to think about Zs.

as you get better at swift, you will learn how to factor away the Zs in your code, and you will just have to worry about Xs and Ys.

anon9791410 · March 4, 2023, 12:58am

I love how offensive this is given that I've been using it for precisely as long as is possible for anyone outside of Apple…

…but while your Swift code looks annoyingly iconoclastic, you know more things than pretty much everybody else, including the rest of us here. What would you do for == instead? I've been having no problems with the fatalError, but I'd rather do better if I can. I don't know if that's possible to glean, from what you said. My current belief is that there's no point in it—I don't think Heap can do anything useful with ==. So maybe a random Bool is the best way to express that, in case anyone ever accidentally makes a commit to Heap which relies on ==?

import HeapModule

public extension Heap {
  /// A "`Value`" that uses an accompanying `Heap.Element` for sorting  via a `Heap`.
  /// - Note: If `Value` itself is `Comparable`, it can of course be inserted into a Heap directly.
  ///   This type is explicitly for cases where a different sorting rule is desired.
  struct ElementValuePair<Value> {
    public var element: Element
    public var value: Value
  }
}

public extension Heap.ElementValuePair {
  init(_ element: Element, _ value: Value) {
    self.init(element: element, value: value)
  }
}

extension Heap.ElementValuePair: Comparable {
  public static func < (lhs: Self, rhs: Self) -> Bool {
    lhs.element < rhs.element
  }

  /// Only necessary because Comparable: Equatable. 😞
  public static func == (lhs: Self, rhs: Self) -> Bool {
    fatalError()
  }
}

tera · March 4, 2023, 4:54am

Aside the issue of whether it's right or wrong for Comparable to also require Equatable †... Would it be unreasonable if Comparable got this default implementation of Equatable?

extension Comparable {
    public static func == (lhs: Self, rhs: Self) -> Bool {
        !(lhs < rhs) && !(rhs < lhs)
    }
}

Note, this logic doesn't quite work in the exceptional case of Float's NaN, but Floats got Equatable of their own, so that shouldn't be a problem.

If it's unreasonable for some reason (compatibility, etc) – perhaps at least this is the one we could use in our types instead of the fatalError in the ==.

Personally I do not see a big problem with fatalError in the EQ implementation ("only fix it when it is proved to be broken"). It's just obvious that some people do worry about it (hence we have this discussion every now and then) that whoever is requiring Comparable (including the standard library or the compiler itself) could internally call EQ just to proof its point that it can:

extension Comparable {
    func isLessThan(_ rhs: Self) -> Bool {
        let less = self < rhs
        
        // 😈 Let's make sure whoever is using fatalError in their EQ
        // implementation is royally screwed. 😈

        let equal = self == rhs
        precondition(!less || !equal)
        
        return less
    }
}

(† - although I do feel that the question of whether Comparable should require Equatable is one of the greatest philosophical questions in Swift, even if bears no real implications for practice).

Slava_Pestov · March 4, 2023, 4:57am

So what we really need is for a way to explicitly name and refer to conformances, so if I have a struct Heap<T: Comparable>, then instead of writing Heap<Foo> and relying on global conformance lookup to find the Foo: Comparable conformance, I can say Heap<Foo, FooComparableA> or Heap<Foo, FooComparableB>. You could even imagine a dynamic witness table construction operation that takes a closure for each protocol requirement, which would solve the problem of storing a closure inside each instance of Foo, while still allowing fully general instances.

I'm joking -- or am I?

;-)

taylorswift · March 4, 2023, 5:18am

this would be enormously helpful in the SerDe domain, where you have a lot of similar looking protocols for various serialization formats (JSON, BSON, ION, protobuf, etc). this would go a long way in reducing code duplication.

anon9791410 · March 11, 2023, 6:51am

.

I realized that the concept of what's going on here can be generalized. While a heap shouldn't need to rely on Equatable, maybe it's fine if it does, anyway. The "nonconformer" is already being thrown out when it comes to < —why not extend that to ==?

The ElementValuePair is only useful as a unit within Heap; it will always be broken apart externally. It's not really semantically Comparable, even without checking for equality.

The language provides no mechanism to delegate all protocol conformance. Is there a general term for that kind of operation, and can it be done better than the following, which I think is terrible?

/// An instance that needs to conform to protocol, but doesn't,
/// which delegates to an instance of something that does conform.
/// The entire type is considered to conform, when the delegate does.
public protocol ConformanceDelegator {
  associatedtype Delegate
  var delegate: Delegate { get }
}

extension ConformanceDelegator where Delegate: Equatable {
  public static func == (lhs: Self, rhs: Self) -> Bool {
    forwardToDelegates(lhs, ==, rhs)
  }
}

extension ConformanceDelegator where Delegate: Comparable {
  public static func < (lhs: Self, rhs: Self) -> Bool {
    forwardToDelegates(lhs, <, rhs)
  }
}

private extension ConformanceDelegator {
  private static func forwardToDelegates<Result>(
    _ self0: Self,
    _ requirement: (Delegate, Delegate) -> Result,
    _ self1: Self
  ) -> Result {
    requirement(self0.delegate, self1.delegate)
  }
}

public extension Heap {
  struct ElementValuePair<Value> {
    public var delegate: Element
    public var value: Value
  }
}

extension Heap.ElementValuePair: ConformanceDelegator & Comparable { }