This post is collecting feedback on the API of the Heap
type, planned for inclusion in a future version of the Swift Collections package. This type implements the classic min-max heap data structure.
Back in 2021, @tstromberg has kindly contributed an implementation of Heap
; it has since been slowly refined to a production ready state. We are way, way waaaaaaay overdue for shipping it in a tagged release; so let's take some steps toward doing that sometime in the foreseeable future!
[Important note: For better or worse, the swift-collections package is not under the Swift Evolution Process; it is not under the governance of any Swift working group; it has no release schedule. Accordingly, this post is not formatted as an API proposal, and this thread is not a formal API review -- it is a merely an informal request for comments on the current API surface, as a way to catch minor oversights before we actually ship them.]
Heap
is a container of elements, implementing the classic min-max heap data structure, as introduced by Atkinson et al. 1986. They support the following operations:
- Constructing a heap out of an arbitrary sequence (O(n) complexity, Floyd's algorithm).
- Accessing their minimum or maximum item (O(1) complexity, direct access)
- Removing their minimum or maximum item (O(log(n)) complexity)
- Inserting an arbitrary new element (O(log(n)) complexity). Duplicate items are inserted as separate values.
- Inserting an arbitrary sequence of new items (O(n + k) complexity, implemented either by a naive insert loop or Floyd's algorithm, depending on n and k).
The current API synopsis of Heap
is as follows:
// Shipping in a new module called `HeapModule`.
struct Heap<Element: Comparable>
: ExpressibleByArrayLiteral, CustomStringConvertible
{
init()
init(_ elements: some Sequence<Element>)
var count: Int { get }
var isEmpty: Bool { get }
var unordered: [Element] { get }
func max() -> Element?
func min() -> Element?
mutating func insert(_ element: Element)
mutating func insert(contentsOf newElements: some Sequence<Element>)
mutating func popMax() -> Element?
mutating func popMin() -> Element?
mutating func removeMax() -> Element
mutating func removeMin() -> Element
mutating func replaceMax(with replacement: Element) -> Element
mutating func replaceMin(with replacement: Element) -> Element
}
Trivial example for a Heap
in action:
var queue: Heap = [4, 5, 2, 2, 1, 3, 0, 0, 0, 2, 6]
print(queue.min()) // 0
print(queue.max()) // 6
queue.insert(7)
print(queue.max()) // 7
while let next = queue.popMin() { print(next) }
// Prints: 0, 0, 0, 1, 2, 2, 2, 3, 4, 5, 6, 7
Unorganized notes, remarks:
-
Heap implements a core low-level data structure, optimized for efficiency. We want
Heap
to be as useful as possible, but we do not want to over-generalize it to serve every possible use case: it needs to remain relatively straightforward to use, and any bells and whistles we add must not lead to reducing the efficiency of heap's core operations. -
Heap
implements value semantics with the copy-on-write optimization. Mutations of uniquely held heaps are implemented in place; mutations of shared copies start by making a full copy of the container. This is similar to how other flat collections such asArray
,Set
,Deque
,OrderedSet
etc. work. (That is to say,Heap
is not a persistent data structure -- such copies are all-or-nothing.) -
Heap
needs an ordering relation to be defined over its items, soElement
is required to conform toComparable
. We want eachHeap
type to encode a specific ordering over its instances, so that a function taking aHeap
can dictate the order it wants to retrieve items in its type signature. (This is a direct analogue ofSet
requiring its elements to beHashable
.) -
As a rule, I expect most use cases will want to define dedicated wrapper types to serve as the
Element
in theHeap
, and the wrapper type will implement whateverComparable
conformance makes most sense for the case at hand. Populating aHeap
with instances of some existing type will be the exception, not the rule. -
Heap
can contain duplicate items; each duplicate is stored separately. Elements that compare equal to each other are returned bypopMin
/popMax
in some arbitrary, unspecified order --Heap
does not implement FIFO ordering.Use cases that require such an ordering will need to implement it on their own, by e.g. adding a serial number to each element. (See point (1) above -- implementing FIFO ordering would add overhead to the core operations. This is in addition to the notion of FIFO ordering being fundamentally incompatible with the min-max representation.)
-
Heap
stores its contents in an array instance under its full control, but it does not conform toSequence
(and consequently, neither toCollection
). However, the storageArray
is exposed through the read-onlyunordered
view, allowing clients to iterate (or otherwise access) its full contents without having to reach them by repeatedpopMin
/popMax
calls.let queue: Heap = [6, 2, 3, 1, 0, 4, 5, 7] for i in queue { // error: For-in loop requires 'Heap<Int>' to conform to 'Sequence' print(i) } for i in queue.unordered { // OK! print(i) // Prints: 0, 7, 5, 1, 6, 4, 3, 2 }
The order of items in the
unordered
view is dictated by the heap's invariants; the implementation details might change, so it's best to treat it as an unordered collection. Having to spell outunordered
alerts readers that items will not be returned in any particular order -- specifically, they are not in any sorted order. -
Heap
is not a generic container adaptor likestd::priority_queue
is in the C++ STL: you cannot substitute your own collection type for the heap's array. Swift does not support default values for generic type arguments, so the best way to emulate that would be to rely on generic typealiases:struct CustomHeap<Elements: RandomAccessCollection & RangeReplaceableCollection & MutableCollection> { typealias Element = Elements.Element ... } typealias Heap<Element> = CustomHeap<Array<Element>> var q1: CustomHeap<[Int]> = [1, 2, 3, 4] var q2: CustomHeap<Deque<Int>> = [1, 2, 3, 4]
This does not seem a particularly attractive direction, but if it turns out to be necessary, we could refactor the existing implementation to follow it without breaking source stability. The current
Heap
implementation is (heavily) relying on the storage type being contiguous; if we want to preserve that, the more general type could also be introduced as a completely separate add-on.
Questions to consider:
- Do the new API additions follow Swift's naming guidelines?
- Does it have any issues interacting with the APIs in the Swift Standard Library and the other packages in the Apple Swift Package Collection? (E.g., are things consistently named throughout these?)
- Do any of the API names seem confusing to you? Would you need to look up the documentation to understand what they do?