[API Review] PriorityQueue APIs

tstromberg · June 25, 2021, 8:12pm

PriorityQueue API Review

Hey everyone! After several rounds of feedback, we have the PR for adding a priority queue to the repo in a pretty good spot.

@kylemacomber recommended we post an overview of the API here for review. Please take a look at the APIs presented below (as well as the associated code, if interested) and provide any feedback.

Do the type, function, and parameter names make sense?
Does the performance match your expectations?
Does the Sequence conformance (i.e. exposing two separate iterators) make sense?
Any other feedback?

Thanks!

Authors: Tyler Stromberg, Amanuel Ephrem, Joakim Hassila, and Tim Vermeulen
Implementation: PR #51

Introduction

Priority queues are useful data structures that can be leveraged across a variety of applications (sorting algorithms, graph algorithms, network clients, task managers, etc).

One common way to implement a priority queue is with an array-backed binary heap. This provides performant lookups (O(1)) of the lowest or highest priority element (depending on whether it's a min-heap or max-heap) as well as insertion and removal (O(log n)).

A variant on this, the min-max heap, allows for performant lookups and removal of both the lowest and highest priority elements by interleaving min and max levels in the backing array (see details below).

This proposal adds an implementation of a double-ended priority queue built on a min-max heap.

Design

/// A double-ended priority queue built on top of a [Min-Max Heap](https://en.wikipedia.org/wiki/Min-max_heap)
/// data structure.
///
/// In a min-max heap, each node at an even level in the tree is less than all
/// its descendants, while each node at an odd level in the tree is greater than
/// all of its descendants.
///
/// The implementation is based off [this paper](http://akira.ruc.dk/~keld/teaching/algoritmedesign_f03/Artikler/02/Atkinson86.pdf).
public struct PriorityQueue<Element: Comparable>

We originally started with the PriorityQueue type wrapping a MinMaxHeap. Ultimately, the former ended up being really thin and didn't seem to add much value, so the two layers were merged.

Min-max heap

"A min-max heap is a complete binary tree data structure." This means that all the levels in the tree are filled except the last and that they are filled from left to right. The tree can be stored in an array:

// Min-max heap:
               8
       71              41
   31      10      11      16
 46  51  31  21  13

// Array representation:
[8, 71, 41, 31, 10, 11, 16, 46, 51, 31, 21, 13]

Heap property: Each element in an even level in the tree is less than all its descendants; each element in an odd level in the tree is greater than all its descendants. Mutations to the heap must maintain this heap property.

The levels start at 0 (even), so the smallest element in the tree is at the root. In the example above, the root of the tree is 8, which is the smallest element. The next level in the tree (containing 71 and 41) is a max level, so those elements are greater than all their respective descendants. Because this is the first max level in the tree, the largest element in the tree is one of the two elements.

Note that the comparisons only take into account descendants — it is possible, for example, for elements at a lower level to be larger than elements in a different branch that are higher up the tree. In the example above, 41 isn't the second-largest element in the tree, but it is the largest element in the right branch of the root node.

Initialization

There are a couple of options for initializing a priority queue. The first is an empty queue:

/// Creates an empty queue.
public init()

You can also create a queue from an existing sequence in linear time:

/// Initializes a queue from a sequence.
///
/// Utilizes [Floyd's linear-time heap construction algorithm](https://en.wikipedia.org/wiki/Heapsort#Floyd's_heap_construction).
///
/// - Complexity: O(n), where `n` is the length of `elements`.
public init<S: Sequence>(_ elements: S) where S.Element == Element

Insertion

Of a single element

To insert an element into a queue, call insert(_:):

/// Inserts the given element into the queue.
///
/// - Complexity: O(log `count`) / 2
public mutating func insert(_ element: Element)

This works by adding the new element into the end of the backing array and then bubbling it up to where it belongs in the heap.

Of a sequence of elements

You can also insert a sequence of elements into the queue:

/// Inserts the elements in the given sequence into the priority queue.
///
/// - Parameter newElements: The new elements to insert into the queue.
///
/// - Complexity: O(n * log `count`), where `n` is the length of `newElements`.
public mutating func insert<S: Sequence>(contentsOf newElements: S) where S.Element == Element

This is currently a naïve implementation where we simply iterate over the elements and call insert(_:) with each one.

Future work: Figure out a more performant way of doing this.

Lookup

As mentioned earlier, the lowest and highest priority elements can be queried in constant time:

/// Returns the element with the lowest priority, if available.
///
/// - Complexity: O(1)
public func min() -> Element?

/// Returns the element with the highest priority, if available.
///
/// - Complexity: O(1)
public func max() -> Element?

In a min-max heap, the smallest element is stored at index 0; the largest element is stored at either index 1 or index 2, the first max level in the heap (so to look up the largest, we compare the two and return the larger one).

We also expose a read-only view into the array-backed min-max heap, should somebody need that.

/// A read-only view into the underlying heap.
///
/// In the current implementation, the elements aren't _arbitrarily_ ordered,
/// as a min-max heap is used for storage. However, no guarantees are given as
/// to the ordering of the elements.
///
/// - Complexity: O(1)
public var unordered: [Element]

Removal

Removal has logarithmic complexity, and removing both the lowest and highest priority elements is supported:

/// Removes and returns the element with the lowest priority, if available.
///
/// - Complexity: O(log `count`)
public mutating func popMin() -> Element?

/// Removes and returns the element with the highest priority, if available.
///
/// - Complexity: O(log `count`)
public mutating func popMax() -> Element? {

To remove the lowest priority element, we remove and return the element at index 0. The element at the end of the backing array is put in its place at index 0, and then we trickle it down to where it belongs in the heap. To remove the highest priority element, we do the same except the index is whatever the index of the highest priority element is (see above) instead of 0.

We also have non-optional flavors that assume the queue isn't empty:

/// Removes and returns the element with the lowest priority.
///
/// The queue *must not* be empty.
///
/// - Complexity: O(log `count`)
public mutating func removeMin() -> Element

/// Removes and returns the element with the highest priority.
///
/// The queue *must not* be empty.
///
/// - Complexity: O(log `count`)
public mutating func removeMax() -> Element

Iteration

PriorityQueue itself doesn't conform to Sequence, because of the potential confusion around which direction it should iterate (highest-to-lowest priority? lowest-to-highest?). Instead, we expose two iterators that conform to Sequence:

/// Returns an iterator that orders elements from lowest to highest priority
public var ascending: Iterator

/// Returns an iterator that orders elements from highest to lowest priority
public var descending: Iterator

Example usage

Let's pretend we have some tasks that we need to process and each has a priority. We want to process the highest-priority tasks first.

struct Task: Comparable {
  let description: String
  let priority: Float // Between 0.0 and 1.0
  let work: () -> Void

  static func < (lhs: Task, rhs: Task) -> Bool {
    lhs.priority < rhs.priority
  }
}

// Create our queue
var queue = PriorityQueue<Task>()

// Add some Tasks
let cleanupTask = Task(description: "Background cleanup", priority: 0.1) {
  // ...
}
queue.insert(cleanupTask)

let uploadTask = Task(description: "Submit payment", priority: 1) {
  // ...
}
queue.insert(uploadTask)

let generateThumbnails = Task(description: "Generate thumbnails", priority: 0.5) {
  // ...
}
queue.insert(generateThumbnails)

// Now let's perform all our work
for task in queue.descending {
  print("Starting '\(task.description)'")
  task.work()
}

// This will print out:
// Starting 'Submit payment'
// Starting 'Generate thumbnails'
// Starting 'Background cleanup'

// The iteration above can also be done via a while loop:
while let task = queue.popMax() {
  print("Starting '\(task.description)'")
  task.work()
}

Source impact

This is purely additive. No existing APIs were changed, deprecated, or removed.

Potential future directions

Add support for merging two PriorityQueues
Add replaceMin and replaceMax functions
Implement a more performant algorithm for insert(contentsOf:)

Split priority out of the `Element` type

Sometimes, it doesn't make sense for a type to conform to Comparable directly, but it may have a property that does — e.g. Task.priority. Making PriorityQueue generic around both the element to be stored and its priority allows more flexibility.

This could potentially be done via a KeyPath or by separating priority from the Element type altogether (see discussion in the linked issue).

tstromberg · June 25, 2021, 8:13pm

Here is a snapshot of the performance of PriorityQueue and CFBinaryHeap for comparison:

kylemacomber · June 25, 2021, 8:37pm

Really great work on this @tstromberg, @AmanuelEphrem, and @hassila!

One thing missing from the Initialization section is that you can create a priority queue using array literal syntax:

let queue: PriorityQueue = [
  Task(description: "Background cleanup", priority: 0.1),
  Task(description: "Submit payment", priority: 1),
  Task(description: "Generate thumbnails", priority: 0.5)
]

for task in queue.descending {
  print("Starting '\(task.description)'")
  task.work()
}

// This will print out:
// Starting 'Submit payment'
// Starting 'Generate thumbnails'
// Starting 'Background cleanup'

This is possibly counter-intuitive because the PriorityQueue will re-order the elements to be in priority order. I advocated for this addition because Set has this same property and already accepts array literal syntax, but I'm curious to hear any counter arguments.

This same question will come up again when we do SortedSet.

Jumhyn · June 25, 2021, 8:52pm

This is great! I have a couple questions about the noted time complexities:

Is there a particular reason that we've included the constant factor here?

tstromberg:

/// Inserts the elements in the given sequence into the priority queue.
///
/// - Parameter newElements: The new elements to insert into the queue.
///
/// - Complexity: O(n * log `count`), where `n` is the length of `newElements`.
public mutating func insert<S: Sequence>(contentsOf newElements: S) where S.Element == Element

This is currently a naïve implementation where we simply iterate over the elements and call insert(_:) with each one.

For the naïve implementation, wouldn't this be something like O((n * log count) + (n * log n)), since with each insertion we're increasing count?

Nevin · June 25, 2021, 9:08pm

I’m not sure we need both the optional-valued popMin and the non-optional removeMin.

Swift programmers are already accustomed to handling optionals, and the language provides strong affordances for doing so.

Notably, Collection.first is optional, and does not have a non-optional counterpart.

• • •

I think replaceMin could be valuable, as it avoids doing two sift operations.

• • •

Was consideration given to relaxing the Comparable requirement and storing a comparison closure in the PriorityQueue?

There are plenty of situations where I can imagine wanting a priority queue of non-comparable elements, or in a different order than < provides.

For example, one might wish to prioritize by \.timestamp or \.size or \.trustLevel or any number of other possible criteria.

AmanuelEphrem · June 25, 2021, 9:27pm

I like the idea of being able to create a priority queue using the array literal syntax. As you mentioned about Sets having the same property and accepting array literal syntax, I think that developers would be expecting the priority queue to behave in the same way.

Also the lookup, removal, and traversal functions/variables are all aptly named (specifying whether it is dealing with the min or max value or whether it is in ascending or descending order), so I doubt people will assume that the underlying storage of the PriorityQueue is untouched based off of the initialization.

tstromberg · June 25, 2021, 9:38pm

Fair. The counterargument to that is that Array has both func popLast() -> Element? and func removeLast() -> Element.

Yeah, this was discussed. (This was how the original implementation did the comparison, FWIW.) One of the problems with using a closure is that it would prevent us from safely merging two priority queues (e.g. if one queue's closure is looking at one property and the other is looking at a different one).

The "Split priority out of the Element type" section above talks about supporting the kind of flexibility I think you're looking for. The general consensus in the linked issue was to start with something simpler and go from there, which I think makes sense. One of the things mentioned in that issue is that using a KeyPath-based Comparable conformance could be a wrapper around this.

Nevin · June 26, 2021, 3:10am

I think keypaths are a great approach here.

I expect the vast majority of the time a full-fledged predicate function is not needed, and also the comparison from Comparable is not appropriate.

Storing a keypath would cover most use-cases while still allowing priority queues to be merged. This seems like it’s in the “just right” goldilocks zone.

• • •

I have some concern about introducing a priority queue without such a provision for users to select their desired prioritization key.

Importantly, the spelling PriorityQueue can only belong to one single type. If we give that name to a type which requires Comparable and does not allow customizing the priority key, then the more useful type that does allow it will end up with a less favorable name.

hassila · June 26, 2021, 4:41am

Though Keypaths are nice, it is important to validate performance - just want to point out e g Issues · apple/swift · GitHub - for us, ultimate performance for a priority queue is very high on the agenda (and we usually would just store a small custom struct which easily can have Comparable added). I don’t mind keypath support, but then as a wrapper on top unless it performs the same as the current comparable requirement (perhaps it was what was suggested, if so it’s fine :-) )

hassila · June 26, 2021, 8:44am

Just thought I'd share performance numbers for the three key operations alone so it's easier to see as only 1us scale is needed - done on iMac (Retina 5K, 27-inch, 2017), 4,2 GHz Quad-Core Intel Core i7. Major kudos to @tstromberg for driving this!

Nevin · June 26, 2021, 3:12pm

That’s a great point Joakim, performance is important.

In particular, it would be helpful to see realistic benchmarks for both the thin-comparable-wrapper approach and the keypath approach.

As in, given a non-trivial type with a priority property, as one might encounter in real-world applications, how does the performance compare when using a custom wrapper that implements Comparable by forwarding to wrapped.priority, versus storing the \.priority keypath in the priority queue and applying it for all comparisons.

hassila · June 26, 2021, 3:31pm

Btw, It was discussed back and forth in the case linked above, final comment on that topic was Priority queue · Issue #3 · apple/swift-collections · GitHub (which I would agree with).

Nevin · June 26, 2021, 3:42pm

…and I would disagree with that.

The keypath version makes for a substantially more ergonomic programming experience, since the “<” operator is almost never what one wants to prioritize with.

Unless there are major, measurable performance benefits to requiring the proliferation of extraneous wrapper types whose sole purpose is to introduce a Comparable conformance that one doesn’t actually want or need, I would prefer not to go that route at all.

If there are significant speedups from doing so, then I would still prefer to reserve the convenient spelling (“PriorityQueue”) for the convenient-to-use keypath version.

The version that requires boilerplate wrapper types is intrinsically inconvenient anyway, so giving it a slightly more cumbersome name won’t matter much.

• • •

For example, the Comparable version could be named MinMaxHeap, since it is based on the min and max from Comparable.

Then the keypath version could be PriorityQueue since it allows the user to specify a priority key.

hassila · June 26, 2021, 5:56pm

I don't know, at some point the performance difference is significant enough that it would be a disservice to proliferate the usage of a paradigm that simply doesn't perform. Being fast is one of the three tentpole features of the language after all. Nice and safe high-level abstractions that keep a great performance level is always welcome, but if standard library collections will be subpar compared to other languages (and I primarily see C/C++ as the reference here), it would not be good IMHO.

I disagree with that the "<" operator is almost never what one wants to prioritise with, having a different experience - but I can see the desire to have an higher-level abstraction on top in the future (as suggested in the comment I linked to).

I did a quick-and-dirty hack with key paths just to see where it ends up performance-wise, and here are the results from the insert for comparison.

I also did a quick couple of profiling runs - there were no extra allocations compared to the previous version, but this ended up when doing a sample profiling:

So it's basically 100x slower in this test at 1us instead of 10ns (perhaps I did something wrong, who knows...) - you are welcome to try too of course (GitHub - AquaGeek/swift-collections at priority-queue has the current implementation which is fast (not my hacked one)).

Nevin · June 26, 2021, 7:25pm

Well that is worrisome.

Like, I expected we’d be debating whether, say, a 25% slowdown was acceptable.

But 100x slower? That suggests there might be significant room for the compiler to better optimize keypaths.

• • •

Just to be clear, were you in fact comparing apples-to-apples performance, where you used a \.priority keypath for one test (stored in the PriorityQueue itself so there’s only one keypath instance), and a wrapper struct which implements < by comparing wrapped.priority for the other test, both with a nontrivial “realistic” base data type?

hassila · June 26, 2021, 7:36pm

Not exactly, the around 10ns is with just a pure Int using the comparable implementation (results above) - the slow implementation was basically just storing the keypath in the priority queue (so just a single instance, yes) and re-evaluating it for each access when comparing with an access pattern like: storage[rightChildIdx][keyPath: self.keyPath] instead of the previous versions storage[rightChildIdx]. I didn't try with a wrapper yet.

I just had a minimal struct with a single priority Int and saved the key path \PriorityQueueBenchmarkStruct.priority - I'm personally not that interested in putting larger data types into the priority queue as I've never encountered the need for using that and if we can't get even a minimalistic case to perform decently, it seemed like a waste of time.

Maybe it isn't the best way to do it, but then I haven't worked with keypaths at all previously, so maybe I am missing something.

hassila · June 26, 2021, 8:08pm

Also as a reference, this is the performance level of C++ for min/max heaps - NB those performance numbers don't include the min/max-heap creation time IIRC, so they are not exactly comparable, but they show that the current Swift PriorityQueue is fundamentally in the right magnitude of performance as it currently stands when comparing to other languages (when looking at say 1-2K items being processed in the test such that the setup time is amortised over a larger set).

Nevin · June 26, 2021, 8:49pm

I’m curious what use-cases you have for a priority queue of Int.

Every real-world use-case I can imagine involves having some data structure which needs to be prioritized based on some property.

hassila · June 26, 2021, 8:51pm

Usually, it would be a minimal struct with a priority (enum/int) and an external reference (e.g. key to a dictionary (with larger structs/actors), index to an array, reference to an object, ... all of them possibly large). I'd never put the actual large entity into the priority queue for performance reasons.

hassila · June 26, 2021, 8:53pm

As a concrete example, among other things I've used this to keep track of pending automatic trading strategies that should be run in priority order as cpu resources becomes available.