What is intended for using Heap with non-comparable values?

With the other heap implementations I've used in Swift, sorting closures are employed. That comes with some problems, but it also allows for elements that are not Comparable. What should we do for such elements, with the Heap that's currently in Collections?

The following works, but using a different closure for every element is overkill.

public extension Heap {
  /// A  non-`Comparable` "`Value`" type, which provides a `Comparable` value for use with a `Heap`.
  struct ElementValuePair<Value> {
    public var value: Value
    private let makeElement: (Value) -> Element
  }
}

public extension Heap.ElementValuePair {
  init(
    _ value: Value,
    _ makeElement: @escaping (Value) -> Element
  ) {
    self.init(value: value, makeElement: makeElement)
  }

  var element: Element { makeElement(value) }
}

extension Heap.ElementValuePair: Comparable {
  public static func < (lhs: Self, rhs: Self) -> Bool {
    lhs.element < rhs.element
  }

  /// Only necessary because Comparable: Equatable. 😞
  public static func == (lhs: Self, rhs: Self) -> Bool {
    fatalError()
  }
}

Swift is a strongly typed language.

We want to ensure that a function that takes a Heap<Foo> will always get elements out in a consistent order that is (for distinct items) entirely and unambiguously defined by the type Foo. This is a core design principle; it is not a negotiable feature. For container types of comparable values to be practical, it is crucial that the ordering relation for their values is defined by their static type.

That is to say, two Heap instances of the same Element type must be guaranteed to agree on the ordering they use. This is precisely the same principle that ensures that two Set values of the same Element type will apply the same definition of element equality.

The idiomatic way to set up a local conformance to Comparable is the same as it is for Equatable, Hashable, or any other standard protocol -- it is to wrap the type in a custom struct.

Here is a silly example with a wrapper type that flips the < relation:

struct Decreasing<Value: Comparable>: Comparable {
  var value: Value

  init(_ value: Value) { self.value = value }

  static func ==(left: Self, right: Self) -> Bool {
    left.value == right.value
  }

  static func <(left: Self, right: Self) -> Bool {
    left.value > right.value
  }
}

let heap: Heap<Decreasing<String>> = []

(This is a silly example, because Heap implements a min-max heap -- however, it might come handy for, say, the upcoming SortedSet type.)

Note: properly implementing == is not optional -- an implementation that simply calls fatalError violates Comparable's requirements, and it is almost guaranteed to cause headaches later. Do not do that.

5 Likes

No, that doesn't answer my question at all. As I said, I'm talking about the cases where the elements are not Comparable*.

As an exercise, implement Dijkstra's algorithm in a matrix indexed by SIMD2<Int32>, using this Heap implementation in Collections. It can be done without a closure, but I say it can't be done with just this type. So I don't think it's very useful. If I'm wrong, I'd like to see it documented why.

A weightless implementation using "ElementValuePair"
/// A dictionary that can be walked backwards, to achieve to shortest path to `start`.
/// - Parameter connected: Whether two orthogonal neighbors are considered to be connected. Order matters.
func shortestPathSources(
  from start: Index,
  connected: (_ source: Element, _ destination: Element) -> Bool
) -> [Index: Index] {
  var shortestPathSources: [Index: Index] = [:]
  var stepCounts = [start: 0]
  func stepCount(index: Index) -> Int { stepCounts[index]! }
  var heap = Heap([Heap.ElementValuePair(start, stepCount)])

  while let source = heap.popMin()?.value {
    let newStepCount = stepCounts[source]! + 1

    for destination in orthogonalNeighbors(source)
    where
      connected(try! self[validating: source], destination.value)
      && stepCounts[destination.key].map({ newStepCount < $0 }) != false
    {
      stepCounts[destination.key] = newStepCount
      shortestPathSources[destination.key] = source
      heap.insert(.init(destination.key, stepCount))
    }
  }

  return shortestPathSources
}

* It doesn’t actually matter if the associated values are Comparable or not. It will just get you in the right headspace if you imagine that they might not be, because their < is unrelated to what’s being sorted via the Heap.

Thanks for telling more about what you're actually trying to do -- this is very helpful!

The code snippet you quoted is not self-contained, which makes it difficult to reason about it. However, the mutations of the stepCounts dictionary sometimes decrease the priority of heap members while they're contained in the heap. This is a crucial bug, because Heap is not aware of this happening. Your problem isn't really about the Comparable conformance; rather, it hangs entirely on this issue.

Decreasing the priority of a heap member sometimes requires the binary min-max heap to sink or raise the affected member along the (logical) path that includes it, to restore the min-max heap property. In order for this to happen, the heap needs to be explicitly notified of the priority change; i.e. you need a heap that provides operations to let you retrieve an existing item and update its priority. Heap doesn't implement such operations.

Trying to work around this by sneakily changing the priority of an unvisited node behind the back of the Heap is simply not going to work correctly, no matter what. Doing that is equivalent to mutating the hash value (a.k.a., the Equatable identity) of a Set member: it is a serious programming error.

Protocols like Equatable/Comparable/Hashable inherently require that the values returned by ==/</hashValue/etc. will be stable: if a and b are constant values, then the result of a == b or a < b must not ever change. (Otherwise e.g. Equatable.== will not implement a proper equivalence relation, rendering the conformance invalid.)

In the case of ElementValuePair, the result of < depends on the result of an arbitrary, user-specified closure, which makes it impossible to statically reason about the validity of its Comparable conformance. In the case of the closure used in shortestPathSources, < ends up consulting the stepCounts table, whose entries are mutated within the inner loop, sometimes resulting in heap items silently changing their minds about what their relative weights are. The ElementValuePair.< function is therefore not stable; that is to say, the ordering relation is not well-defined. This is going to lead to violations of core heap invariants, rendering the data structure useless. Garbage in, garbage out.

So the problem here isn't that it's difficult to spell the Comparable conformance: it never is, as long as you actually have a well-defined ordering relation. (If you don't have a well-defined ordering relation, then the concept of heaps does not apply -- the construct becomes meaningless.) In the case of Dijsktra's algorithm, you just need to associate your unvisited nodes with their shortest known distance, and the right way to do that is to simply wrap this pair of values in a straightforward custom struct:

struct Unvisited<Node>: Comparable {
  var node: Node
  var distance: Int

  static func ==(left: Self, right: Self) -> Bool { left.distance == right.distance }
  static func <(left: Self, right: Self) -> Bool { left.distance < right.distance }
}

This is one way to correctly model the minimum-queue of unvisited nodes in Dijkstra's algorithm. There is nothing weird or difficult about the Comparable conformance here -- implementing this type is a routine, trivial task.

Because this type is a struct, inserting one of its instances into Heap actually inserts a standalone copy of the value, whose priority value cannot possibly be changed without the cooperation of the container. This is a crucial feature, because it eliminates a whole class of errors that involve accidental uncoordinated mutations of such data -- such as the ones arising from the externally referenced priority values in ElementValuePair.<.

Unfortunately, Heap doesn't provide an operation to decrease the weight of its items after insertion, so it is simply not suitable for use in this version of Dijkstra's algorithm.

But that's not a problem! You just need to adapt the algorithm to remove the need to decrease priorities. One common way of doing that is to simply insert duplicates in these cases:

var shortestSources: [Node: Node] = [:]
var distances: [Node: Int] = [start: 0]
var unvisited: Heap<Unvisited<Foo>> = [Unvisited(node: start, distance: 0)]

while let source = unvisited.popMin() {
  guard source.distance == distances[source.node] else { 
    continue // ignore obsolete dupe
  }
  let d = source.distance + 1
  for destination in source.node.neighbors {
    if d < distances[destination] ?? Int.max {
      distances[destination] = d
      shortestSources[destination] = source.node
      unvisited.insert(Unvisited(node: destination, distance: d))
    }
  }
}

(Note: This code is just illustrating the idea; it probably isn't bug-free.)

In practice, I expect this modified algorithm to perform better than the variants that require a heap that supports decreasing the priority of a heap member.

Heap intentionally doesn't provide a way to look up arbitrary existing members in it; and without such a lookup function, there is also no way for it to allow updating their priorities. This is very much by design -- the machinery required to speed up such lookup operations (such as maintaining a lookup table) would significantly slow down the core heap operations (insert, popMin and popMax), while Heap really is designed to make those as fast as possible.

Heap is intentionally not the appropriate choice for use cases that absolutely do require looking up, updating, or removing arbitrary heap members. Luckily, Dijsktra's algorithm is not one of these use cases; but sometimes they do pop up. My plan is to cover them in a future Collections release, after 1.1. (For what it's worth, I think the name PriorityQueue would be right for this follow-up construct, in the footsteps of PR #51. Unfortunately that PR does not implement a lookup feature; it merely concentrates on implementing FIFO ordering for duplicate priorities instead, and I don't think that necessarily needs to be implemented as a first-class construct.)

10 Likes

It is actually not. :frowning: The pattern I experience is that usually, when people ask for a solution to a generic problem on programming forums, the answerers feel a need to focus on a use case, as if that's what's being asked, instead of attacking the generic problem. It's quite disrespectful to everyone's time—and it especially doesn't make sense to do that when the subject is a generic data structure. :grinning:

I'm feeling very frustrated that you spent time outside of the solution on these previous posts. You have better things to do. I felt like most of your previous post was covered by

…but I should have elaborated. Thank you for doing that much better than I could have.


Moving on, it's crucially important to your understanding that you realize while you think that you've shown a different solution here…

…it's actually the same as ElementValuePair above, except:

  1. It's not generic, and therefore not suitable for the Collections package.
    (If this were actually "the right way", then Swift should not have generics. Instead, we should be manually copying, pasting, and making small naming changed to source code for every type.)
  2. The == is a lie. Distance being equivalent certainly does not imply that two of these are equivalent.
  3. There's not a closure.

You seem to be very hung up on #3, but as I said,

Your solution involving Unvisited reinforces to me that Heap will not be usable without the following, closure or no:

public extension Heap {
  /// A "`Value`" that uses an accompanying `Heap.Element` for sorting  via a `Heap`.
  /// - Note: If `Value` itself is `Comparable`, it can of course be inserted into a Heap directly.
  ///   This type is explicitly for cases where a different sorting rule is desired.
  struct ElementValuePair<Value> {
    public var element: Element
    public var value: Value
  }
}

// See above for identical `Comparable` conformance.

As far as I can tell, it's impossible to avoid this violation. Comparable inheriting from Equatable is generally useful, but in this case, it's a bug.


P.S. As much as I'd like to not be discussing this, considering this is all publicly posted…

…This is incorrect dogmatic thinking. It's certainly another contractual violation, but it represents a perfectly legitimate algorithmic variation, in the above, which I used to solve Day 12 - Advent of Code 2022 as an excuse to try out Heap. I could have cached the step counts, like you did, using the classical implementation, but there's no need.

Despite the usability issue discussed here, Heap worked great anyway. Thanks! :+1:

Oh well; best of luck! ¯\_(ツ)_/¯

4 Likes

to provide a little context, this is indeed an issue some longtime users of swift like myself recognize, and i think if we were to go back to 2015 and redesign Comparable, Equatable, and Hashable, it would probably not look exactly the same as it does now.

there is a thread with extensive discussion of this issue here.

in my view EHC’s biggest limitation is it doesn’t have the concept of “projected identity spaces”. to provide a concrete example, we cannot do things such as:

let separator:[OddNumber] = [3, 5, 9]
let numbers:[Int] = [3, 5, 9, 2, 0, 1]

numbers.starts(with: separator)

because we don’t have a way of expressing that an Int can be compared with an OddNumber by promoting the OddNumber value to “Int space”. swift also eschews the notion of a “reinterpret cast”. this is a language design principle, because “rebinding” [OddNumber][Int] is unsound under mutations. on the other hand that decision removes the typical escape hatch other languages use to work around this limitation.

i know this sounds like a toy example that is far removed from your Heap problem, but many pain points that come from EHC-based data structures like Heap aren’t actually problems with the data structure itself, at their core they are problems with identity spaces.

this got slightly better with the introduction of Identifiable a minor release or two back. but to my disappointment, we have not made much additional progress in this direction since then.


TLDR; programming in swift requires a very essentialist worldview, and things like “X is behaving as a Y in context Z” do not make sense to many of us, because the language does not just expect that X can “behave” like a Y, it expects that X must be a Y and be able to do Y things anywhere it travels.

C++ parameterizes this by always making you “say the Z”, which makes for namespaced::colon<angle::bracket, even_more::brackets<std::hell<everything::needs:to::be::said<out::loud, so::everyone::defaults<to::auto>>>>>. swift avoids this problem by requiring that every X should have exactly one canonical Z. some of us like this because you can reason about X without having to think about Zs.

as you get better at swift, you will learn how to factor away the Zs in your code, and you will just have to worry about Xs and Ys. :slight_smile: