Heap Sequence conformance

Right now, the proposed Heap structure does not conform to Sequence, purely out of concern for ambiguity about iteration order. I feel this is a mistake.

I believe it would make more sense to choose an iteration order, make Heap use that, then provide a Reversed view that uses the other order. This would fit the semantics of existing data structures much better, and allow Heap to conform to other relevant protocols that inherit Sequence.

The ordered views it exposes do conform to Sequence (i.e. you can call heap.ascending to get a Sequence of the elements in ascending order). Does adding conformance to Sequence to Heap itself give us any benefits that using one of the ordered views does not?

2 Likes

It would fit the semantics of existing types like Array (which can also be iterated forward and backward), and allow Heap to conform to protocols like Collection to make its behavior more explicit.

Part of the challenge with Collection conformance is that heaps are only partially ordered — you can get the min and max elements in O(1) time, but other elements take O(log n). Types that conform to Collection are expected to have O(1) subscript access to elements. I'm not sure how we would support that.

Are there specific aspects of the Collection protocol that you would want Heap to support?

Edit: Heap also exposes read-only access to the underlying array via its unordered property.

1 Like

Ah, I didn’t catch that aspect of Collection’s requirements. That’s rather unfortunate, as it makes it difficult to communicate that access is non-destructive.

I was really just using Collection as an example, anyway: the important thing is conforming to Sequence and allowing the use of every generic algorithm that uses it. A protocol with associated type requirements isn’t going to know about an ordered view.

Collection requires O(1) access to elements by Index, not by Int (offset). Heap can have an opaque Index representing its position without it being an Int.

6 Likes

You mean the worst-case time complexity for getting the index at a certain offset in a Heap can be O(log n), while getting the element at that index can be O(1), allowing Heap to conform to Collection?

I knew something seemed off about Collection needing O(1) access to elements, since that’s what RandomAccessCollection describes. I couldn’t put my finger on the mistake, though.

Right. Consider Set, backed by a hash table: you can't ask for the Nth element (not that it makes sense to), since the table contains empty spots you have to skip over. But once you have a position in the table (Set.firstIndex(of:)), you can access that element directly.

4 Likes

Because an index is effectively a safe form of pointer.

That's one possible implementation, sure. (It might be the only reasonable one for a linked structure, including trees made up of classes.) In Set's case, it's probably an offset in the internal table structure.

1 Like

@tstromberg I think Heap should conform to BidirectionalCollection. The implementation of the ordered views tells me that you can iterate forwards and backwards from any point: the only difference between them is which method they call.

Given a known position, you can't (trivially?) determine the next (or previous) element non-destructively. This isn't compatible with Collection.

That isn’t a requirement of Collection.

My point is that, unlike a lot of common sequences, Heap iterates destructively and logically adjacent elements are scrambled in memory. Both of these mean Collection conformance can't be translated from the existing Sequence code.

You can’t use slicing to mimic a draining iterator?

@CTMacUser Unless iteration is literally mutating, and I don’t believe that’s the intent, you are making a copy of the original anyway. That copy could be represented as a specialized SubSequence, which could even be a Heap as well.

As for being scrambled in memory, that is only problematic if you want to conform to ContiguousBytes. I don’t see why we would.

At any rate, can we start by agreeing that Heap should have Sequence conformance and at least mimic the semantics of BidirectionalCollection?

Actually, I can rule that last part out right now. Collection requires O(1) deletion of the first element if Self is SubSequence, and min-max heaps can only do that in logarithmic time.

Looking at Collection’s requirements, I’d say the most difficult one is formIndex(_:offsetBy:). So let’s discuss that: could you find the, say, fifth-smallest element in O(4) time?

Heap already has Sequence conformance via two computed properties. If Collection conformance was ever added, it would also be through such a property.

Iteration through a Heap is destructive. When you use official iteration, like through a for-in loop, the IteratorProtocol object is a copy of the receiver, so the original isn't affected. But all Collection operations are non-destructive; given an element's index, you must be able to compute the next index without mutating the collection. That can't really be done with a heap.

(Yes, there are refinements of Collection for mutation, but neither are appropriate for Heap, because a Heap allows only narrow changes, but RangeReplaceableCollection and MutableCollection mandate arbitrary changes in their domains.)

I've been thinking out how it could be done. My thought so far was a linked list; the internal storage changes from Element to a (Element, previous: Int?, next: Int?) tuple, and you keep track of the maximum element's index. But the updates whenever you add (or remove?) an element end up being linear.

For scrambled in memory, I only meant for unscrambled elements be logically adjacent, not physically (i.e. ContiguousBytes).

Let’s step back from Collection and return to my original post: Heap should conform to Sequence, and the descending iterator should be exposed using similar semantics to BidirectionalCollection.

So, a BidirectionalSequence? We don’t have that. And what’s wrong with the two existing accessors?

Sequence has a quasi-hidden requirement that instances’ vended publication order has a semantic meaning. All the common non-required methods assume this. This is why said methods are broken when used with Set and Dictionary. Those types fundamentally can’t meet that requirement, but conform anyway. This is an (unfixable) bug that shouldn’t be repeated. That’s why Heap uses accessors.

(There should be a base protocol for Sequence that removes the linear semantic, but Set and Dictionary still can’t use it for ABI and source compatibility.)