[swift-collections 1.1] Heap API RFC

This post is collecting feedback on the API of the Heap type, planned for inclusion in a future version of the Swift Collections package. This type implements the classic min-max heap data structure.

Back in 2021, @tstromberg has kindly contributed an implementation of Heap; it has since been slowly refined to a production ready state. We are way, way waaaaaaay overdue for shipping it in a tagged release; so let's take some steps toward doing that sometime in the foreseeable future!

[Important note: For better or worse, the swift-collections package is not under the Swift Evolution Process; it is not under the governance of any Swift working group; it has no release schedule. Accordingly, this post is not formatted as an API proposal, and this thread is not a formal API review -- it is a merely an informal request for comments on the current API surface, as a way to catch minor oversights before we actually ship them.]

Heap is a container of elements, implementing the classic min-max heap data structure, as introduced by Atkinson et al. 1986. They support the following operations:

  • Constructing a heap out of an arbitrary sequence (O(n) complexity, Floyd's algorithm).
  • Accessing their minimum or maximum item (O(1) complexity, direct access)
  • Removing their minimum or maximum item (O(log(n)) complexity)
  • Inserting an arbitrary new element (O(log(n)) complexity). Duplicate items are inserted as separate values.
  • Inserting an arbitrary sequence of new items (O(n + k) complexity, implemented either by a naive insert loop or Floyd's algorithm, depending on n and k).

The current API synopsis of Heap is as follows:

// Shipping in a new module called `HeapModule`.

struct Heap<Element: Comparable>
  : ExpressibleByArrayLiteral, CustomStringConvertible
{
  init()
  init(_ elements: some Sequence<Element>)

  var count: Int { get }
  var isEmpty: Bool { get }
  var unordered: [Element] { get }

  func max() -> Element?
  func min() -> Element?
  
  mutating func insert(_ element: Element)
  mutating func insert(contentsOf newElements: some Sequence<Element>)

  mutating func popMax() -> Element?
  mutating func popMin() -> Element?

  mutating func removeMax() -> Element
  mutating func removeMin() -> Element

  mutating func replaceMax(with replacement: Element) -> Element
  mutating func replaceMin(with replacement: Element) -> Element
} 

Trivial example for a Heap in action:

var queue: Heap = [4, 5, 2, 2, 1, 3, 0, 0, 0, 2, 6]
print(queue.min()) // 0
print(queue.max()) // 6
queue.insert(7)
print(queue.max()) // 7
while let next = queue.popMin() { print(next) }
// Prints: 0, 0, 0, 1, 2, 2, 2, 3, 4, 5, 6, 7

Unorganized notes, remarks:

  1. Heap implements a core low-level data structure, optimized for efficiency. We want Heap to be as useful as possible, but we do not want to over-generalize it to serve every possible use case: it needs to remain relatively straightforward to use, and any bells and whistles we add must not lead to reducing the efficiency of heap's core operations.

  2. Heap implements value semantics with the copy-on-write optimization. Mutations of uniquely held heaps are implemented in place; mutations of shared copies start by making a full copy of the container. This is similar to how other flat collections such as Array, Set, Deque, OrderedSet etc. work. (That is to say, Heap is not a persistent data structure -- such copies are all-or-nothing.)

  3. Heap needs an ordering relation to be defined over its items, so Element is required to conform to Comparable. We want each Heap type to encode a specific ordering over its instances, so that a function taking a Heap can dictate the order it wants to retrieve items in its type signature. (This is a direct analogue of Set requiring its elements to be Hashable.)

  4. As a rule, I expect most use cases will want to define dedicated wrapper types to serve as the Element in the Heap, and the wrapper type will implement whatever Comparable conformance makes most sense for the case at hand. Populating a Heap with instances of some existing type will be the exception, not the rule.

  5. Heap can contain duplicate items; each duplicate is stored separately. Elements that compare equal to each other are returned by popMin/popMax in some arbitrary, unspecified order -- Heap does not implement FIFO ordering.

    Use cases that require such an ordering will need to implement it on their own, by e.g. adding a serial number to each element. (See point (1) above -- implementing FIFO ordering would add overhead to the core operations. This is in addition to the notion of FIFO ordering being fundamentally incompatible with the min-max representation.)

  6. Heap stores its contents in an array instance under its full control, but it does not conform to Sequence (and consequently, neither to Collection). However, the storage Array is exposed through the read-only unordered view, allowing clients to iterate (or otherwise access) its full contents without having to reach them by repeated popMin/popMax calls.

    let queue: Heap = [6, 2, 3, 1, 0, 4, 5, 7]
    for i in queue { // error: For-in loop requires 'Heap<Int>' to conform to 'Sequence'
      print(i) 
    }
    for i in queue.unordered { // OK!
      print(i) // Prints: 0, 7, 5, 1, 6, 4, 3, 2
    }
    

    The order of items in the unordered view is dictated by the heap's invariants; the implementation details might change, so it's best to treat it as an unordered collection. Having to spell out unordered alerts readers that items will not be returned in any particular order -- specifically, they are not in any sorted order.

  7. Heap is not a generic container adaptor like std::priority_queue is in the C++ STL: you cannot substitute your own collection type for the heap's array. Swift does not support default values for generic type arguments, so the best way to emulate that would be to rely on generic typealiases:

    struct CustomHeap<Elements: RandomAccessCollection & RangeReplaceableCollection & MutableCollection> {
      typealias Element = Elements.Element
      ...
    }
    typealias Heap<Element> = CustomHeap<Array<Element>>
    
    var q1: CustomHeap<[Int]>  = [1, 2, 3, 4]
    var q2: CustomHeap<Deque<Int>> = [1, 2, 3, 4]
    

    This does not seem a particularly attractive direction, but if it turns out to be necessary, we could refactor the existing implementation to follow it without breaking source stability. The current Heap implementation is (heavily) relying on the storage type being contiguous; if we want to preserve that, the more general type could also be introduced as a completely separate add-on.

Questions to consider:

  • Do the new API additions follow Swift's naming guidelines?
  • Does it have any issues interacting with the APIs in the Swift Standard Library and the other packages in the Apple Swift Package Collection? (E.g., are things consistently named throughout these?)
  • Do any of the API names seem confusing to you? Would you need to look up the documentation to understand what they do?
15 Likes

As a reminder, the initial discussion thread for landing Heap can be found at the following link:

I feel like I'm missing something, regarding all this and particularly what leads to the concluding sentence. It seems to be saying that instead of e.g. putting Strings in a Heap, it's expected that most uses will have some custom wrapper around String instead?

This seems related to Swift's general choice to eschew collection-instance-specific customisation of equality / hashing / comparison. As opposed to e.g. C++ where STL (and Boost etc) collection types are typically parameterised (via the generics system) for these factors.

Even in Swift there are exceptions to this apparent rule, e.g. the standard sort et al methods take an optional comparison closure.

So hopefully this isn't too tangential or generic (vs Heap), but even knowing Swift's patterns it's not apparent to my why new collections should be implemented this way.

Plenty of Collection- and Sequence-conforming types have undefined orders of iteration, e.g. Dictionary ("The order of key-value pairs in a dictionary is stable between mutations but is otherwise unpredictable.").

It'd be one thing if unordered were an actual view that might have a different presentation to the base type, e.g. String's unicodeScalars, but it sounds like it's nothing more than the actual internal storage?

That seems less flexible for the future, since it's explicitly stating that unordered is an Array<Element> - and therefore making it a potentially expensive operation if the underlying storage is one day made something else - whereas if Heap were simply a Collection itself it could abstract away any connection to the internal storage.

2 Likes

Great questions!

No — if you want a heap of Strings, and you’re happy with String’s ordering, then Heap<String> is perfect. All I’m saying is that it’s no big deal if you want your heap to use a different order.

If you need a custom ordering relation, then you’ll simply need to wrap your strings in some custom type of your own. (This is not a design defect, but it’s very much by design. Defining new types is easy.)

Yes, this is directly following a clear, strong pattern established by the Swift Standard Library. Set is a collection of Hashable elements, not a collection of arbitrary elements with a separate type parameter that describes how to compare and hash them.

As I noted in point (7), Swift does not make it possible to supply default values for generic type arguments, so trying to do C++-style mix-in parameters would make the API unpalatable for the common case.

The crucial distinction is that sort et al. are generic functions, not generic types. A type whose values can get passed around in between modules of an app generally needs to guarantee a particular behavior for itself, so that code can usefully interact with it.

All code that encounters a Set<String> needs to agree on how == and hash(into:) get implemented.

All code that deals with SortedSet<String> values need to agree on how the ==/< relations work.

All code that receives a Heap<String> is guaranteed that the minimum and maximum items will be the ones that correspond to the < relation.

Swift is a strongly typed language. We like Swift's rich type system, and we want to make the best use of it.

This topic has been extensively covered in previous discussions, some of them directly involving Heap. (Beware, the linked discussion ended quite rudely.)

Set and Dictionary directly conforming to Sequence is an API design regret. If we were starting from scratch, I’d probably push for moving these conformances into a separate view, just like with Heap. (Suggesting such a move may have been possible in Swift 1 era. I don’t think it would be practical today.)

Like OrderedSet's elements view, Heap's unordered view simply directly exposes the underlying Array instance. This does not in any way lessen its view-ness -- it is still a bona fide, card carrying, actual view!

A view simply provides an alternate way of looking at the underlying representation, different from the default one.

The name unordered comes from OrderedSet's unordered view, where it is used to provide a SetAlgebra conformance. (As SetAlgebra's expectation for equality is incompatible with OrderedSet's order-sensitive equivalence relation.)

Heap's unordered view fulfills an analogue role for Sequence/Collection as OrderedSet's unordered view did for SetAlgebra.

(Note: TreeSet and TreeDictionary did not relegate their Sequence conformances to a separate view, as they are designed to be drop-in replacements of Set and Dictionary -- they needed to faithfully emulate their API surface, wrinkles and all.)

Min-max heaps are, by definition, represented by an implicit tree stored in an array. Having to expose this storage is not going to ever become a limitation.

(Note: We are not trying to pretend that it would be practical for Heap to switch to some radically different implementation, such as Fibonacci heaps. If we want to implement a different data structure, then we'd do it by defining a different type.)

If we ever want to generalize Heap to support storage types other than Array, then we will be able to do so by doing that in a separate type, as explained in point (7).

As has been very clearly noted, we sometimes want to form a heap in place using some pre-existing storage -- and this initial form of Heap intentionally does not allow that.

This is similar to how we decided that OrderedSet would hardwire Array as its underlying storage -- just like with Heap, it would sometimes be useful to be able to define, say, a uniquing Deque, and the current OrderedSet does not allow that.

(For what it's worth, in the three years OrderedSet has been in use, I have seen zero requests for generalizing it -- even though a Deque-based OrderedSet would allow allow efficient insertions/removals at its start, not just at its end.

That said, I think Heap is a bit more likely to receive such requests, so I expect we'll eventually end up doing this -- but probably not before we have non-copyable sequence/collection protocols. Most heap operations are implemented on the core Heap._UnsafeHandle type, which is an UnsafeBufferPointer-like construct, abstracting away the minutiae of Array access. One way to reformulate this in a safer way would be to use something like a non-copyable BufferView -- this would let us define a fixed-size, backing-store agnostic contiguous heap type as a public replacement of Heap._UnsafeHandle, and then use it in all sorts of generalizations of Heap.

I don't think any of these future directions need to hold up the release of Heap in the source-stable (but not ABI-stable!) swift-collections package.

4 Likes

Should we consider providing min and max elements as computed properties in the API since accessing them is a constant time complexity O(1) operation?

4 Likes

I think it would be nice, though perhaps difficult to implement without the ability to transfer ownership, if there were ascending and descending views, allowing one to write:

var myHeap = Heap(...)

for x in myHeap.descending {
  ...
}
2 Likes

Those existed earlier but were removed here:

  • edit fixed wrong link -
2 Likes

I think that's a good idea! These were originally made functions to avoid clashing with the standard min()/max() algorithms on Sequence, but as long as Heap isn't going to become one, the collision does not really apply.

There aren't any strong expectations on the complexity of computed properties. (For instance, hashValue's complexity is linear in the size of self, and Collection.last involves an index(before:) operation that has unspecified (although typically at most linear) complexity.)

Beyond the notational benefits, turning .min and .max into properties would also make it immediately possible for them to yield borrows of their value, rather than a copy of it -- which will become important when this type starts supporting non-copyable elements. (Eventually, functions will probably be able to yield a value, too, but it's unclear when that might become a thing.)

Edit: the returned type being Optional renders this last point largely moot, unless and until we get optional borrows. Non-copyable heaps will have to make do without accessing min/max until the dust settles. (They will still be able to provide popMin/popMax/removeMin/removeMax.)

1 Like

These seemed pretty neat!

Can someone please explain what the non-trivial processing was, that motivated their removal?

These two do very non-trivial processing,

Yes, but unfortunately they only seemed neat.

From my comment from Oct 21, 2021:

[...] the iterators of these sequences make a full copy of the underlying storage the first time next() is called, which is a performance gotcha that is (as far as I can remember) unique to these two. This gotcha means that these constructs don't work like views at all, and using them in production would likely be a bad idea.

If these concepts actually proved useful, we could add them at any point later as "views" that consume their heap. However, consuming doesn't quite work yet, and these views would be non-essential at best (perhaps even superfluous) -- so the right move right now is to omit them from the initial release.

4 Likes

Oof!

The only workaround I see is to back it with a TreeDictionary, which lets you make the new mutated copies more cheaply... but the overall cost of that is not worth it.