[API Review] PriorityQueue APIs

It sounds like we agree then.

A real-world priority queue would not have an element type of Int.

The element type would have at least two fields, and only one of those fields would be used for comparison.

Performance in that situation is what I would find relevant.

• • •

(Or, I suppose, it could have one field which is a reference or index, and the priority would be derived by dereferencing and evaluating a property of the pointee.)

Yes, and I did try running such a benchmark too just to validate previously - it was slower than the Int, but not by much, as it was basically just two ints in that case... So I didn't package it up in a PR as the current Int test was representative of the performance if using the priority queue in that manner.

Given the right abstraction it seems like we should be able to maintain performance and API flexibility while not locking ourselves to either approach. We should be able to use raw Ints for priority for maximum performance while building higher levels APIs that take closures (or their KeyPath equivalent) for dynamic prioritization. A closure-based approach should be more amenable to optimization and users would be able to use KeyPaths if they wanted to pay the performance price. Once KeyPaths are optimized (at the very least, the KeyPath -> closure transform is essentially unoptimized last I checked) that performance can be recovered, allowing more users to move to the higher-level abstraction if they need.

I also wonder if a Prioritizable protocol could help here, as it would give a huge amount of flexibility. That way we could do something like SwiftUI does in ForEach where, if the elements aren't Identifiable you can provide a KeyPath to a Hashable property you want to use instead.

So I would encourage some exploration here to see which, if any, higher level abstraction can be optimized away before committing to a particular approach.

2 Likes

Instead of a shared type, maybe we could use "AscendingView" and "DescendingView" as the types. (As someone who tried this type last year, I knew releasing a sorted version was non-trivial. But you came up with a performant workaround.)

The view types should implement underestimatedCount. They should implement the hidden method needed to activate contains(_ element: Element) -> Bool, since the semi-sorting means searching should be slightly optimized.

The main type needs a contains method too.

If we're adding a variant that uses a key-path to a sub-object to determine priority, then that version should have a contains(somethingWithPriority priority: Aspect) -> Element? method.

A concern I have with adding a closure/KeyPath comparator is that PriorityQueue is not the only data structure that would benefit from this kind of feature. I think Set, OrderedSet, and (soon!) SortedSet would similarly benefit from being able to compare/hash on a projection of their element. It seems undesirable to have an alternate variant of all of these data structures.

2 Likes

A small thing: I’d love to have the comment actually include the title of the paper and not just say "based off this paper".

Links can go dead and I think it's nicer not to force people to click the link if all they want is read the title. I'm not necessarily looking for a full citation that fulfills scientific standards (though there's nothing wrong with that). In my opinion, the most important pieces are title, year, author(s) (in that order). Maybe something like this:

"The implementation is based off Atkinson et al., Min-Max Heaps and Generalized Priority Queues (1986)."

17 Likes

Making a comparison to OperationQueue (which is a kind-of priority queue of operations to be done) this type seems to to not preserve the order of insert of like prioritized items whereas OperationQueue does.


struct Prioritized<Content>: Comparable {
  var content: Content
  var priority: Int = 0
  
  static func == (_ lhs: Prioritized, _ rhs: Prioritized) -> Bool {
    return lhs.priority == rhs.priority
  }
  
  static func < (_ lhs: Prioritized, _ rhs: Prioritized) -> Bool {
    return lhs.priority < rhs.priority
  }
}

var queue = PriorityQueue<Prioritized<String>>()
queue.insert(contentsOf: [
  .init(content: "foo 0"),
  .init(content: "foo 1", priority: 1),
  .init(content: "bar 0"),
  .init(content: "bar 1", priority: 1),
  .init(content: "baz 0"),
  .init(content: "baz 1", priority: 1),
])

while let item = queue.popMax() {
  print(item.content)
}

I would expect to produce printing:

foo 1
bar 1
baz 1
foo 0
bar 0
baz 0

It does not seem to preserve that order of insertion.

Uh, I wrote out some code to add contains to PriorityQueue. But since you can't make a pull request to another pull request, I have no idea where to send it.

You should be able to open a PR against the AquaGeek:priority-queue branch, no?

1 Like

Correct; we don't currently make that guarantee for elements with equal priority.

Yeah, seconding what @Jumhyn said, you can open the pull against the priority-queue branch in my fork.

Alternatively, you could wait until the main chunk of work lands and open it as an improvement against the main repo. (I'm honestly not sure how @kylemacomber et al want to structure this. Should additions like this be part of the initial merge, or would you prefer that they happen as separate PRs?)

Thank you for calling this out, Ole! I've updated the citation:

/// A double-ended priority queue built on top of a [Min-Max Heap](https://en.wikipedia.org/wiki/Min-max_heap)
/// data structure.
///
/// In a min-max heap, each node at an even level in the tree is less than all
/// its descendants, while each node at an odd level in the tree is greater than
/// all of its descendants.
///
/// The implementation is based off [Atkinson et al. Min-Max Heaps and
/// Generalized Priority Queues (1986)](http://akira.ruc.dk/~keld/teaching/algoritmedesign_f03/Artikler/02/Atkinson86.pdf).
///
/// M.D. Atkinson, J.-R. Sack, N. Santoro, T. Strothotte. October 1986.
/// Min-Max Heaps and Generalized Priority Queues. Communications of the ACM.
/// 29(10):996-1000.
5 Likes

I did a quick rework to add keypath support, and my benchmarks of the (admittedly over simplistic) case of a single Int (with the keypath being \.self) resulted in ~10x slower performance.

I like @Nevin's suggestion of splitting the type in half (effectively undoing our merging of the types :rofl:), with MinMaxHeap<Element: Comparable> being the base implementation and PriorityQueue<Element, Priority: Comparable> being the more ergonomic wrapper. Yeah, we don't want duplicate implementations of all of the data structures in here, but to me the MinMaxHeap is an implementation detail of the PriorityQueue. If we were to move to a different implementation, I would think we'd want to keep the MinMaxHeap around.

For my use case (priority queue for pending network requests), I'm more than willing to take a 10x (and maybe even a 100x) performance hit, because there will only be a dozen or so requests in the queue at a time and that additional delay isn't material. For @hassila's case of pending trades, that performance hit is unacceptable, so he could stick to the MinMaxHeap.

Edit:
This separation might also make a good place to add the insertion order tracking that @Philippe_Hausler mentioned, so that elements in a PriorityQueue with the same priority are FIFO.

It's at Priority queue by CTMacUser · Pull Request #3 · AquaGeek/swift-collections · GitHub .

Great, thanks!

We've been discussing this a little offline, and I think I agree about splitting the types... but with a little twist :grin:.

As I mentioned my biggest concern is how adding a closure/ KeyPath comparator would scale to other types like Set, OrderedSet, and (soon!) SortedSet. It turns out each of these data structures already has a variant that can be used to conveniently compare/hash on a single property rather than the entire value—Dictionary, OrderedDictionary, and (soon!) SortedDictionary!

I think we should define PriorityQueue<Value, Priority: Comparable> and Heap<Element: Comparable> where PriorityQueue is to Heap what Dictionary is to Set.This would be even more convenient than the closure/KeyPath approach, because often you don't have a type handy that includes both the priority and element:

var jobs: PriorityQueue<() -> Void, Double> = []

jobs.insert(backgroundCleanup, priority: 0.1)
jobs.insert(submitPayment, priority: 1)
jobs.insert(generateThumbnails, priority: 0.5)

I don't think it's a big concern that PriorityQueue would be double storing the priority in the case where the value already includes the priority. That's the case with Dictionary and Set today. If you really need to micro optimize, you can use Heap/Set and create bespoke struct that defines a custom conformance to Comparable/Hashable.

I think PriorityQueue's API could take inspiration from Dictionary and can probably be implemented just as a trivial wrapper around Heap. Something like:

public struct PriorityQueue<Value, Priority: Comparable> {
  public typealias Element = (value: Value, priority: Priority)

  struct Pair: Comparable {
    var value: Value
    var priority: Priority

    static func ==(lhs: Self, rhs: Self) -> Bool {
        lhs.priority == rhs.priority
    }
    static func <(lhs: Self, rhs: Self) -> Bool {
        lhs.priority < rhs.priority
    }
  }

  var base: Heap<Pair>
}
9 Likes

I think that’s really nice and clean, will be interesting to see how performance compares - would expect it to be much closer now without keypaths… thanks.

1 Like

I like it!

I haven’t thought through all the implications yet, but there’s a chance the pair-based approach might be faster than the reference/index approach.

It also nicely sets the priority independent of the value, so that even if the value is a class instance whose properties change while in the queue, the priority stays fixed.

That greatly reduces the likelihood of the underlying heap entering a bad state where the invariants are broken.

(The same issue affects dictionaries and sets as well: if a class instance’s hash value changes while in a set or dictionary, bad things happen.)

1 Like

On a different tack, do we think there are use-cases for changing the priority of an item that’s already in the queue?

If we do go with @kylemacomber’s idea, perhaps it makes sense to use dictionary-literals for PriorityQueue?

var jobs: PriorityQueue = [
  backgroundCleanup : 0.1,
  submitPayment     : 1.0,
  generateThumbnails: 0.5
]
1 Like